adaptive failover mechanism motivation end-to-end connectivity can suffer during net failures...

1
Adaptive Failover Mechanism Motivation End-to-end connectivity can suffer during net failures •Internet path outage detection and recovery is slow (shown to take as long as 3 to 30+ mins) •Path outages are common in mobile networks Network fault tolerance should cope with dynamic network conditions and varying applications needs Current SCTP failovers are not adaptive to application requirements and network conditions Current Research SCTP’s current failover mechanism uses the Path.Max.Retrans parameter for failover We propose a two-level (-) threshold failover mechanism - maintain primary, but failover temporarily - change the primary, making failover permanent Two-level threshold mechanism provides added control over failover actions Using ns-2, we have modeled SCTP data transfer latency to a dual homed destination with failovers incorporating the two-level threshold mechanism We are investigating the relationships between the thresholds and the network parameters to develop an adaptive failover mechanism Future Research • Develop an adaptive failover mechanism for SCTP • Incorporate application requirements in the mechanism Related Research Resilient Overlay Networks (RON) •Allows a small group of Internet applications to detect and recover from path outages within several seconds Rocks: Reliable Sockets •Protects apps from path failures common to mobile computing such as link failures and IP address changes • Migrate •End-to-end framework for Internet mobility that supports rebinding of endpoints for established TCP connections •Fine-grained server failover mechanism of long-running connections End-to-End Load Balancing Motivation • Exploit all network resources visible at transport layer • Perform fine-grain load balancing in transport layer • Avoid replication of work and coarse- grain load balancing in applications Issues Scheduling of traffic on multiple paths Reordering introduced by the sender Loss detection & recovery Congestion control: shared or separate? Future Research Investigate new loss detection and recovery techniques, which are robust to changeover Investigate dynamic shared bottleneck detection techniques for congestion control Investigate algorithms for Current Work: Reordering Issues Reordering due to changeover causes spurious fast retransmissions and congestion window overgrowth Occurrence of reordering increases due to sender introduced route changes Illustration of cwnd overgrowth problem Is this problem a “corner case”? NO! Cause of the problem: inadequacies and solutions •Inadequacy: Retransmission ambiguity Solution: Rhein Algorithm (variation of Eifel Algorithm) Distinguish between acks for transmissions and retransmissions •Inadequacy: Congestion control is unaware of changeover Solution: Changeover Aware Congestion Control (CACC) (C 2 =2)81.3 21 21.1 21.2 (C 2 =2)41.2 61.2 61.3 22 23 (C 2 =3)82 (C 2 =4)83 81 61 62 63 64 82 83 84 69 89 70 90 B 1 A 1 A 2 B 2 Sender A Receiver B Receiver B (C 2 =2)2 41 * TSNs 1-50 have been buffered at the sender’ s link layer corresponding to A 1 and are being sent. 101.3 101.4 102.1 104.1 65 85 66 86 103.1 1-50* 105.1 106.1 42 43 (C 2 =2)41.1 (C 2 =2)81.2 (C 2 =2)1 (C 2 =5)84 (C 2 =6)85 (C 2 =7)86 (C 2 =10)89 (C 2 =11)90 0 Overview History Originally intended for telephony signaling Overcomes several TCP & UDP limitations Designed as a general purpose transport protocol Became an IETF Proposed Standard in October 2000 (RFC2960) under the SIGTRAN working group Handed to Transport Area working group for continued work Features Reliable data transfer Ordered and unordered data delivery Multistreaming – no head-of-line blocking • Multihoming Changeover Sender decides primary destination address for traffic Sender can change primary destination address during an active association Utilities (pending further research): •End-to-end mobility (with ADD/DELETE IP extension) •End-to-end load balancing Using SCTP Multihoming for Fault Tolerance & Load Balancing Armando L. Caro Jr. and Janardhan R. Iyengar Paul D. Amer, Gerard J. Heinz, Randall R. Stewart http://pel.cis.udel.edu Protocol · Engineering · Laboratory University of Delaware new path avail bw (kbps) changeover time (ms) Using: Old path avail bw – 500 kbps Old path end-to-end delay for negligible sized packets – 50 m New path end-to-end delay for negligible sized packets – 50 m min packets outstanding on old path during changeover Multihoming 4 possible TCP connections: (A 1 ,B 1 ) or (A 1 ,B 2 ) or (A 2 ,B 1 ) or (A 2 ,B 2 ) 1 SCTP association: ({A 1 ,A 2 },{B 1 ,B 2 }) Primary destinations for A & B (e.g., A 1 & B 1 ) Heartbeats determine reachability of idle destinations Failover to an alternate destination if primary fails Network Host A Host B B 1 B 2 A 1 A 2 Failover What happens if B 1 fails? TCP connection: (A 1 ,B 1 ) Connection dies SCTP association: ({A 1 ,A 2 },{B 1 ,B 2 }) Failover to B 2 (ie, traffic temporarily migrates to B 2 ) Upon B 1 ’s restoration, traffic migrates back to B 1 Network Host A Host B B 1 B 2 A 1 A 2

Upload: buck-webb

Post on 28-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Adaptive Failover Mechanism Motivation End-to-end connectivity can suffer during net failures Internet path outage detection and recovery is slow (shown

Adaptive Failover MechanismAdaptive Failover Mechanism

Motivation• End-to-end connectivity can suffer during net failures

•Internet path outage detection and recovery is slow (shown to take as long as 3 to 30+ mins)

•Path outages are common in mobile networks

• Network fault tolerance should cope with dynamic network conditions and varying applications needs

• Current SCTP failovers are not adaptive to application requirements and network conditions

Motivation• End-to-end connectivity can suffer during net failures

•Internet path outage detection and recovery is slow (shown to take as long as 3 to 30+ mins)

•Path outages are common in mobile networks

• Network fault tolerance should cope with dynamic network conditions and varying applications needs

• Current SCTP failovers are not adaptive to application requirements and network conditions

Current Research • SCTP’s current failover mechanism uses the

Path.Max.Retrans parameter for failover

• We propose a two-level (-) threshold failover mechanism

- maintain primary, but failover temporarily

- change the primary, making failover permanent

• Two-level threshold mechanism provides added control over failover actions

• Using ns-2, we have modeled SCTP data transfer latency to a dual homed destination with failovers incorporating the two-level threshold mechanism

• We are investigating the relationships between the thresholds and the network parameters to develop an adaptive failover mechanism

Current Research • SCTP’s current failover mechanism uses the

Path.Max.Retrans parameter for failover

• We propose a two-level (-) threshold failover mechanism

- maintain primary, but failover temporarily

- change the primary, making failover permanent

• Two-level threshold mechanism provides added control over failover actions

• Using ns-2, we have modeled SCTP data transfer latency to a dual homed destination with failovers incorporating the two-level threshold mechanism

• We are investigating the relationships between the thresholds and the network parameters to develop an adaptive failover mechanism

Future Research• Develop an adaptive failover mechanism for SCTP

• Incorporate application requirements in the mechanism

Future Research• Develop an adaptive failover mechanism for SCTP

• Incorporate application requirements in the mechanism

Related Research• Resilient Overlay Networks (RON)

•Allows a small group of Internet applications to detect and recover from path outages within several seconds

• Rocks: Reliable Sockets

•Protects apps from path failures common to mobile computing such as link failures and IP address changes

• Migrate

•End-to-end framework for Internet mobility that supports rebinding of endpoints for established TCP connections

•Fine-grained server failover mechanism of long-running connections

• Migratory TCP (M-TCP)

•Mechanism to migrate live TCP sessions to a redundant server upon server overload, network congestion, etc.

Related Research• Resilient Overlay Networks (RON)

•Allows a small group of Internet applications to detect and recover from path outages within several seconds

• Rocks: Reliable Sockets

•Protects apps from path failures common to mobile computing such as link failures and IP address changes

• Migrate

•End-to-end framework for Internet mobility that supports rebinding of endpoints for established TCP connections

•Fine-grained server failover mechanism of long-running connections

• Migratory TCP (M-TCP)

•Mechanism to migrate live TCP sessions to a redundant server upon server overload, network congestion, etc.

End-to-End Load BalancingEnd-to-End Load Balancing

Motivation• Exploit all network resources visible at transport layer

• Perform fine-grain load balancing in transport layer

• Avoid replication of work and coarse-grain load balancing in applications

Motivation• Exploit all network resources visible at transport layer

• Perform fine-grain load balancing in transport layer

• Avoid replication of work and coarse-grain load balancing in applications

Issues• Scheduling of traffic on multiple paths

• Reordering introduced by the sender

• Loss detection & recovery

• Congestion control: shared or separate?

Issues• Scheduling of traffic on multiple paths

• Reordering introduced by the sender

• Loss detection & recovery

• Congestion control: shared or separate?

Future Research• Investigate new loss detection and recovery

techniques, which are robust to changeover

• Investigate dynamic shared bottleneck detection techniques for congestion control

• Investigate algorithms for scheduling traffic on multiple paths

Future Research• Investigate new loss detection and recovery

techniques, which are robust to changeover

• Investigate dynamic shared bottleneck detection techniques for congestion control

• Investigate algorithms for scheduling traffic on multiple paths

Current Work: Reordering Issues• Reordering due to changeover causes spurious fast

retransmissions and congestion window overgrowth

• Occurrence of reordering increases due to sender introduced route changes

• Illustration of cwnd overgrowth problem

• Is this problem a “corner case”? …NO!

• Cause of the problem: inadequacies and solutions

•Inadequacy: Retransmission ambiguity

Solution: Rhein Algorithm (variation of Eifel Algorithm)Distinguish between acks for transmissions

and retransmissions

•Inadequacy: Congestion control is unaware of changeover

Solution: Changeover Aware Congestion Control (CACC)Algorithms - prevents spurious fast

retransmits

Current Work: Reordering Issues• Reordering due to changeover causes spurious fast

retransmissions and congestion window overgrowth

• Occurrence of reordering increases due to sender introduced route changes

• Illustration of cwnd overgrowth problem

• Is this problem a “corner case”? …NO!

• Cause of the problem: inadequacies and solutions

•Inadequacy: Retransmission ambiguity

Solution: Rhein Algorithm (variation of Eifel Algorithm)Distinguish between acks for transmissions

and retransmissions

•Inadequacy: Congestion control is unaware of changeover

Solution: Changeover Aware Congestion Control (CACC)Algorithms - prevents spurious fast

retransmits

(C2=2) 81.3

2121.121.2

(C2=2) 41.2

61.261.3

22

23

(C2=3) 82

(C2=4) 83

81

616263

64 828384

69

89

70

90

B1A1 A2 B2Sender

A Receiver

BReceiver

B

(C2=2) 2

41

* TSNs 1-50 have been buffered at the sender’s link layer corresponding to A1 and are being sent.

101.3101.4

102.1

104.1

65

85

66

86

103.1

1-50*

105.1

106.1

4243

(C2=2) 41.1

(C2=2) 81.2

(C2=2) 1

(C2=5) 84

(C2=6) 85

(C2=7) 86

(C2=10) 89

(C2=11) 90

0

OverviewOverview

History• Originally intended for telephony signaling

• Overcomes several TCP & UDP limitations

• Designed as a general purpose transport protocol

• Became an IETF Proposed Standard in October 2000 (RFC2960) under the SIGTRAN working group

• Handed to Transport Area working group for continued work

History• Originally intended for telephony signaling

• Overcomes several TCP & UDP limitations

• Designed as a general purpose transport protocol

• Became an IETF Proposed Standard in October 2000 (RFC2960) under the SIGTRAN working group

• Handed to Transport Area working group for continued work

Features• Reliable data transfer

• Ordered and unordered data delivery

• Multistreaming – no head-of-line blocking

• Multihoming

Features• Reliable data transfer

• Ordered and unordered data delivery

• Multistreaming – no head-of-line blocking

• Multihoming

Changeover• Sender decides primary destination address for traffic

• Sender can change primary destination address during an active association

• Utilities (pending further research):

•End-to-end mobility (with ADD/DELETE IP extension)

•End-to-end load balancing

Changeover• Sender decides primary destination address for traffic

• Sender can change primary destination address during an active association

• Utilities (pending further research):

•End-to-end mobility (with ADD/DELETE IP extension)

•End-to-end load balancing

Using SCTP Multihoming for Fault Tolerance & Load BalancingArmando L. Caro Jr. and Janardhan R. Iyengar

Paul D. Amer, Gerard J. Heinz, Randall R. Stewart

http://pel.cis.udel.edu

Using SCTP Multihoming for Fault Tolerance & Load BalancingArmando L. Caro Jr. and Janardhan R. Iyengar

Paul D. Amer, Gerard J. Heinz, Randall R. Stewart

http://pel.cis.udel.eduProtocol · Engineering · LaboratoryUniversity of Delaware

new path avail bw (kbps)

changeover time (ms)

Using:•Old path avail bw – 500 kbps•Old path end-to-end delay for negligible sized packets – 50 ms•New path end-to-end delay for negligible sized packets – 50 ms

min packets outstanding on old path during changeover

Multihoming

• 4 possible TCP connections:•(A1,B1) or (A1,B2) or (A2,B1) or (A2,B2)

• 1 SCTP association:•({A1,A2},{B1,B2})

•Primary destinations for A & B (e.g., A1 & B1)

•Heartbeats determine reachability of idle destinations•Failover to an alternate destination if primary fails

Multihoming

• 4 possible TCP connections:•(A1,B1) or (A1,B2) or (A2,B1) or (A2,B2)

• 1 SCTP association:•({A1,A2},{B1,B2})

•Primary destinations for A & B (e.g., A1 & B1)

•Heartbeats determine reachability of idle destinations•Failover to an alternate destination if primary fails

NetworkHost A Host BB1

B2

A1

A2

Failover• What happens if B1 fails?

• TCP connection: (A1,B1)•Connection dies

• SCTP association: ({A1,A2},{B1,B2})

•Failover to B2 (ie, traffic temporarily migrates to B2)

•Upon B1’s restoration, traffic migrates back to B1

Failover• What happens if B1 fails?

• TCP connection: (A1,B1)•Connection dies

• SCTP association: ({A1,A2},{B1,B2})

•Failover to B2 (ie, traffic temporarily migrates to B2)

•Upon B1’s restoration, traffic migrates back to B1

NetworkHost A Host BB1

B2

A1

A2