ethernet automatic protection switching (eaps) a small comparison with eternet ring protection...
TRANSCRIPT
ETHERNET AUTOMATIC PROTECTION SWITCHING (EAPS)A small comparison with Eternet Ring Protection Switching (ERPS)
Introduction
• EAPS is a protocol invented to increase the availability of Ethernet rings
• Developed by Extreme Networks (RFC3619 – 2003)
• Objective:• Provide a resilience level comparable to SONET rings
• Current version (v1.3 - 2011) has some enhancements over version 1 (RFC3619 – 2003)
Motivation• Ethernet is widely used in Local Area Networks (LANs)
and Metropolitan Area Networks (MANs)• Typically present a ring topology
• MAN operators want to reduce recovery time• Spanning Tree Protocol (STP) could take 30 – 60 second to recover• Rapid Spanning Tree Protocol (RSTP) is faster...
• Convergence time depends on the number of nodes
• Both STP and RSTP limit the number of nodes
• EAPS recovers in less than 1 second (100 ms)• Does not limit the number of nodes!!!
Basic Considerations (I)• A ring is made up of two or more switches
• Each switch has two ports connected to the ring
• An EAPS domain exists on a single Ethernet ring• A domain protects a group of VLANs• A domain has a unique control VLAN
• Multiple EAPS domains could coexist on the same ring
• Multiple control VLANs
Basic Considerations (II)• For each EAPS domain:• One of the nodes is the Master (S1)
• One port is designated as the Primary port (P)• The other is the Secondary Port (S)
• All other nodes (S2-S6) are known as Transit nodes
Normal Operation• The Master node blocks its secondary port -> avoid loops
• Non-control traffic is blocked (Control VLAN is NOT blocked)• Master is in COMPLETE state• Transient nodes are in LINKS-UP state
• The Master sends health-check frames (HEALTH-CHECK- PDU) periodically (Hello timer)
• From primary port to secondary port• Control frames consumed by the Master -> NOT forwarded
Fault Operation• When a fault is detected:
• The Master changes to FAILED state• Unblocks secondary port• Flushes it bridging table
• The Master orders the other nodes to flush their tables• Sends a RING-DOWN-FLUSH-FDB-PDU frame
• Transit nodes learn the new topology
Fault Detection (I)• 2 ways of detecting a failure
• Link Down Alert• Ring Polling
• Link Down Alert• Transient nodes detect a link-down• Transient detecting the failure changes to LINKS-DOWN state • Transient sends a LINK-DOWN-PDU frame to the Master• Master changes to FAILED state• Master unblocks secondary port• ...
Fault Detection (II)• Ring Polling (version 1 – RFC3619)
• Master sends HEALTH-CHECK-PDU frames periodically• From primary to secondary port
• Master has a Fail-period timer• If health check frame received before timer expires -> reset timer
• If health check frame NOT received before timer expires• Master changes to FAILED state• Master unblocks secondary port• ...
Fault Detection (III)• Ring Polling (version 1.3)
• 2 options if the Fail-period timer expires (configurable)• «Open Secondary Port» -> previous slice
• «Send-Alert»• Master DO NOT unblock its secondary port yet• Master sends a QUERY-LINK-STATUS-PDU frame out of both ports• Transit nodes with link failure reply with LINK-DOWN-PDU frame• Master changes to FAILED state• ...
• Prevents False Failures• Health frames could not return to Master –> even if the ring is complete
• Control VLAN misconfigurations• Too much traffic• Master node’s CPU busy
Why?
Fault Restoration (I)• Master in FAILED state -> continues sendind HEALTH-
CHECK-PDU frames• Ring restored -> Master’s secondary port receives health
frame• Master changes to COMPLETE state• Blocks non-control frames on secondary port• Flushes its bridge table• Orders the other nodes to flush their tables
• Sends a RING-UP-FLUSH-FDB-PDU frame
• Transit nodes re-learn the topology
Fault Restoration (II) – PREFORWARDING State
• Time between• The Transit node detecting its link is restored• The Master detecting the ring is restored
• Master’s secondary port is unblocked• Possible temporary loop !!!!
• When Transit node detects its link is restored• Changes to PREFORWARDING state and starts Preforwarding timer• Protected VLANs in that port are temporary blocked• Waits till a RING-UP-FLUSH-FDB-PDU is received• Changes to LINKS-UP state• Unblocks previously blocked VLANs• Flushes its bridge table and stops Preforwarding timer• Re-learns topology
Fault Restoration (III) – PREFORWARDING State
• Preforwarding timer deals with:• Lost RING-UP-FLUSH-FDB-PDU from the Master
• Another break in the ring• If the transient node remains in PREFORWARDING state indefinitely ->
disconnected network
• Preforwarding timer is derived from the Hello-timer for HEALTH-CHECK-PDU frames
Enhancements of version 1.3• «Send-alert» configuration for Ring Polling fault detection
method• INIT state
• Master comes up for first time and its ports are up• Master does not know if the ring is up• Master starts in INIT state -> blocks secondary port• When the first health frame is received -> changes to COMPLETE state• Helps spotting misconfigurations in control VLAN
• LINK-UP-PDU• Transient detects a link comes up -> sends LINK-UP-PDU to Master• Timestamp used for trouble-shooting
• If the Master never changes to COMPLETE state
• Allows use of EAPS Shared-Ports
VLANs in Multiple EAPS domains (Multiple Rings) (I)• EAPS could handle a simple configuration
• Each ring has a EAPS domain, a Master node and a Control VLAN
• VLAN spanning in both rings is added as protected by both EAPS domains
VLANs in Multiple EAPS domains (Multiple Rings) (I)• Topologies with a common link could be problematic
• If the common link fails• Both Masters open secondary ports
• Protected VLANs spanning both rings will have a loop• S1-S2-S3-S4-S5-S6-S7-S8-S9-S10-S1
• EAPS Shared-Ports deals with it
• Out of the scope
States and Control Frames
Version 1 – RFC3619 Version 1.3
Ethernet Ring Protection Switching (I)• Ethernet Ring Protection Switching (ERPS) is defined by
ITU-T G.8032 -> achieve sub-50 ms recovery times in rings
• Basic considerations:• One link is designated as the Ring Protection Link (RPL) -> blocked to
prevent loops• The node setting the block is the RPL Owner (Master in EAPS)• Nodes monitor link failure using Ethernet Continuinity Check (ETH-CC)
messages• Four defined local events:
• Local Signal Failure (local SF) -> detection of link failure• Local clear Signal Failure (local clear SF) -> detection of link restoration• Wait-To-Restore Expire (WTR-Expire) -> timer expiration• Wait-To-Restore Running (WTR-Running) -> timer running
Ethernet Ring Protection Switching (II)
• Basic considerations (cont.):• The protocol uses Ring Automatic Protection Switching (R-APS)
messages:• R-APS(SF): sent by the node detecting link failure (gets local SF)• R-APS(NR): sent by the node detecting link restoration (gets local clear SF)• R-APS(NR,RB): sent by RPL Owner indicating the RPL is blocked
• Two important timers• Wait-To-Restore (WTR) Timer: used by the RPL Owner to verify that the ring
has stabilized before blocking the RPL after failure• Guard Timer: used by links detecting link restoration to avoid receiving
outdated R-APS messages
• Three states for nodes• Initialization: first defining the node• Idle: normal state, RPL blocked, all nodes/ports working• Protecting: protection switching is in effect
Ethernet Ring Protection Switching (III)
• Basic considerations (cont.):• An R-APS channel is configured using a VLAN -> transmitting R-APS
messages
ERPS Principle of Operation (I)• In normal operation (nodes in state Idle): RPL is blocked• Link failure (local SF): nodes detecting it block failed port,
send R-APS(SF) and flush filtering database (FDB)• Nodes receiving R-APS(SF) flush FDBs• RPL Owner receives R-APS(SF): flushes FDB, unblocks RPL
• Link Restoration (local clear SF): detecting nodes send R-APS(NR) periodically and start Guard Timer
• RPL Owner receives R-APS(NR): starts WTR Timer• WTR Timer expires: RPL Owner blocks RPL, sends R-APS(NR,RB) and
flushes DFB• Nodes receiving R-APS(NR,RB) flush FDBs• Nodes detecting link restoration unblock recovered ports, stop sending
R-APS(NR) and flush FDBs
ERPS Principle of Operation (II)
EAPS vs. ERPS• Same basic idea: break the loop in the ring by blocking
one port• In case of failure, unblock the blocked port and keep
connectivity• EAPS:
• Both the Master and Transient nodes can detect a failure• Only the Master detects the failed link is restored
• ERPS: • Only the nodes adjacent to a failed link detect failures and restoration
References• S.Shah, M. Yip, «RFC3619: Extreme Networks’ Ethernet
Protection Switching (EAPS), Version 1», Network Working Group, October 2003.
• A. Lim, S. Blake, S. Shah, «Extreme Networks’ Ethernet Protection Switching (EAPS), Version 1.3», Internet-Draft, July 2011.
• Extreme Networks Whitepaper «Ethernet Automatic Protection Switching (EAPS)», Extreme Networks, Inc., 2006.
• J. D. Ryoo, H. Long, Y. Yang, M. Holness. Z. Ahmad, J. K. Rhee, «Ethernet Ring Protection for Carrier Ethernet Networks», IEEE Comm. Magazine, September 2008