mendosus a san-based fault injection test-bed for construction of highly available network services...

29
Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D. Nguyen and Bin Zhang Dept. of Computer Science, Rutgers University http://www.panic-lab.rutgers.edu

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

MendosusA SAN-Based Fault Injection Test-Bed for

Construction of Highly Available Network Services

Xiaoyan Li, Richard Martin, Kiran Nagaraja,

Thu D. Nguyen and Bin Zhang

Dept. of Computer Science, Rutgers University

http://www.panic-lab.rutgers.edu

Page 2: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Talk Outline

Motivation Design Implementation Benchmarks Case Studies Related Work Future Work

Page 3: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Motivation

Ubiquitous network access exponential growth in network services

Availability is one key challenge Networked systems are comprised of large numbers of

heterogeneous components Faults are not uncommon Complex interaction between components

Examples of costly failures: Ebay, Brittanica

Currently difficult to assess service availability How to analyze impact of failures? How to set up an appropriate test-bed?

Page 4: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Mendosus

Goal: provide infrastructure for service designers to assess the availability of network services

Overview: Provide flexible infrastructure to accurately model a

variety of different networking systems from the application’s point-of-view

Run application in real-time and inject faults to assess application’s behavior

Two key components: Real-time emulation of a variety of interconnects General fault injection infrastructure

Page 5: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Vision

Map available resources to emulated network

Page 6: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Design

Page 7: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Mendosus Architecture

Applications

KernelLatency

Routing

Fault Inclusion

Mendosus daemon

Central Controller

Network State

User Level

Fast & Reliable SAN

Emulator Module

Events

Page 8: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Design Decisions

Central controller Advantage: consistent network and fault information Disadvantage: limits scalability

Not involved in network emulation so should still scale well to targeted system sizes (thousands or tens of thousands of components)

Entire network state is maintained at each end node Advantage: performance Disadvantage: limits scalability

Only maintain state for LAN

Emulation module embedded within kernel Advantage: no modifications to application code Disadvantage: more difficult to modify and extend

Page 9: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Functional Components

Topology Maintenance

Fault Injection

Emulation

Page 10: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Topology Maintenance

Specification - simple ns-2 like topology scripts Specify available resources

Central controller manages topology Initializes original topology on each node Consistent view

Real time topology changes Specified as scripted events

Controller monitors network connectivity Detects partitions

Page 11: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Fault Injection

Every n/w component can have a fault profile Switches, hubs, NICs, links, end nodes

Fault specification: trace files or theoretical distributions Exponential, Weibull, constant

Simulate fail-stop components MTTR - constant or follow a distribution E.g. unplugging, port shutdown

Page 12: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Emulation

Completely distributed Every node has enough network state

Emulation Messaging sequence Application initiates communication Routing – determine route Fault Inclusion – effect of injected faults Latency – corresponding to route taken

We do not implement the innards of network components Switching

Page 13: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Implementation

Page 14: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Ethernet LAN Emulation

Routing Emulate computation of Ethernet spanning tree

Controller chooses root of tree Emulator on each node computes identical spanning tree

Reconfiguration performed periodically (every 2 secs)

Broadcast & Multicast Emulate using sequence of unicast

Page 15: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Ethernet LAN Emulation - Faults

Network partitions Controller monitors connectivity Multiple roots - one for each partition

NIC fail-over Multiple interfaces using IP aliasing support in Linux

Page 16: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Emulation completeness…

YesYesP-to-P

Software (multiple unicast)

HardwareBroadcast

Not implementedSome advanced switches

Layer 3, 4 services

E.g.VLAN, IGMP

Software(Broadcast w/ filters)

HardwareMulticast

Emulated Ethernet

EthernetFeature

Page 17: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Micro-benchmarks

Page 18: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Emulation Limits

53.479.61Emulator

54.879.18

130.066.00Gigabit Ethernet

88.911.81Fast Ethernet

RTT usecThroughput MB/sec

No. of Switches in Topology

Network

Page 19: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Software Broadcast Scaling

Page 20: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Fault View Convergence

Page 21: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Case Studies

Page 22: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Group Membership

Test protocol behavior under faults subtle interactions in distributed protocols

Three Round Membership algorithm Robust against multiple node failures, packet drops and

network partitions Two modes of operation: normal and FCM

Page 23: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Membership Observations

A C

B D

5. Link L up

4. Packet drops at A

3. NIC at B recovers

2. Link L down

1. NIC failure at B

1 2 3 4 5

L

Page 24: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Multi-Level Switched Network

Large enterprise LANs have multiple layers of network components Access, core and aggregation switches

How to evaluate availability vs. cost vs. complexity?

Study service availability with increased redundancy Faults following exponential distributions

Page 25: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Enterprise LAN

Page 26: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Availability Vs Redundancy

Page 27: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Related Work

Network Emulation Distributed emulation

Emulab [Utah], DelayLine

Centralized emulation NISTNET, Lancaster emulator

Fault injection Script-based probing and fault injection

Orchestra, DOCTOR

Co-related faults Loki [UIUC]

Simulation NS-2, REAL[Cornell], SSFNet, x-sim[Arizona]

Page 28: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Future Work

Extend Mendosus to emulate other networks WAN: Build in performance dynamics model Wireless LAN - Realistic fault and performance models

Support pluggable modules within network components which add functionality and additional failures ! Intelligent Routing protocols (E.g. HSRP) Dynamic DNS, RR DNS

Page 29: Mendosus A SAN-Based Fault Injection Test-Bed for Construction of Highly Available Network Services Xiaoyan Li, Richard Martin, Kiran Nagaraja, Thu D

Summary

Test-bed for service designers to systematically analyze network and protocol design against failures

Results show that real-time emulation is feasible given capability of current SAN networks

Demonstrated the flexibility and usefulness of Mendosus through 2 case studies

Another step towards building highly available services…