automatic protection switching

Upload: m4prashanth

Post on 04-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Automatic Protection Switching

    1/79

    AutomaticProtection Switching

    Yaakov (J) SteinCTORAD Data Communications

    Mar 2012

  • 8/13/2019 Automatic Protection Switching

    2/79

    Y(J)S APS Slide 2

    Course Outline

    General protection switching principles

    Examples of protection mechanisms

    SONET/SDH

    Ethernet linear protection

    Ethernet ring protection

    MPLS fast reroute

    MPLS-TP APS

  • 8/13/2019 Automatic Protection Switching

    3/79

    Y(J)S APS Slide 3

    General principles

    Definition

    References

    Traffic types

    Network topologies

    Triggers

    Protection classes

    Entities

    Protection types

    Signaling

  • 8/13/2019 Automatic Protection Switching

    4/79

    Y(J)S APS Slide 4

    Definition

    Automatic Protection Switching (APS)is a functionality of carrier-grade transport networks

    is often called resilience

    since it enables service to quickly recover from failures

    is required to ensure high reliability and availability

    APS includes :

    detection offailures(signal fail or signal degrade) on a working channel

    switching traffic transmissionto aprotection channel

    selecting traffic receptionfrom the protection channel (optionally) reverting back to the working channel once failure is repaired

    Automaticmeans uses (at most) control plane protocols

    no management layeror manual operations needed

  • 8/13/2019 Automatic Protection Switching

    5/79

    Y(J)S APS Slide 5

    Some useful references

    G.808.1generic linear protectionG.808.2generic ring protection (not yet written)

    G.841 and G.842SDH

    G.774.3/4/9/10SDH protection management

    G.870 and G.873.1OTN

    G.8031Ethernet linear protectionG.8032Ethernet ring protection

    G.8131T-MPLS APS

    Y.1720MPLS

    I.630ATM

    M.495analog signal protection

    G.781clock selection (can be used to protect synchronization)

    RFC 4090MPLS Fast ReRoute

    RFC 6372MPLS-TP Survivability Framework

    RFC 6378MPLS-TP Linear Protection

  • 8/13/2019 Automatic Protection Switching

    6/79

    Y(J)S APS Slide 6

    Traffic types

    In a network with APS capabilities, there are three types of traffic :

    protectedtraffic

    traffic that may be rapidly switched to protection channel at any time it may be on the working channel or protection channel

    Nonpreemptible Unprotected Traffic (NUT) noncritical traffic that does not require protection mechanism

    not affected by protection mechanism

    somewhat less expensive to customer

    extra(preemptible) traffic best effort background traffic that runs on protection channel

    preempted (blocked) when protection channel is needed

    very inexpensive to customer

  • 8/13/2019 Automatic Protection Switching

    7/79Y(J)S APS Slide 7

    Network topologies

    APS can be defined for any topology with redundant links

    e.g., for tree topologies no protection is possibleWe will often discuss protection of individual links

    However, there are two topologies that are of particular interest :

    rings

    protection is natural for rings although there are other reasons for using rings as well

    rings are so important that protection for other topologies

    is often called linear protection

    dense meshes for this topology multiple local bypasses can be preconfigured

    protection switching is similar to routing change, but faster

    often called Fast ReRoute (FRR)

  • 8/13/2019 Automatic Protection Switching

    8/79Y(J)S APS Slide 8

    Triggers

    Protection switching is usually triggered by afailure

    although the operator may manuallyforcea protection switch

    Afailureis declared when a fault condition

    persists long enoughfor the ability to perform the required function

    to be considered terminated

    Failures are Signal Fail (SF) or Signal Degrade (SD)(of various types)

    and may be :

    detected by physical layer indicated by signaling (e.g. AIS)

    detected by OAM mechanisms

    When there is no SF or SD, the state is called No Request (NR)

  • 8/13/2019 Automatic Protection Switching

    9/79Y(J)S APS Slide 9

    Switching time (1)

    SONET/SDH protection switching takes place in under 50 ms

    Regarding multiplex section shared protection rings, G.841 states :

    The following network objectives apply:

    1) Switch time In a ring with no extra traffic, all nodes in the idle state (no detected failures,

    no active automatic or external commands, and receiving only Idle K-bytes), and with less

    than 1200 km of fibre, the switch (ring and span) completion time for a failure on a singlespan shall be less than 50 ms. On rings under all other conditions, the switch completion

    time can exceed 50 ms (the specific interval is under study) to allow time to remove extra

    traffic, or to negotiate and accommodate coexisting APS requests.

    while for linear VC trail protection, it says :

    The following network objectives apply:

    1) Switch time

    The APS algorithm for LO/HO VC trail protection shall operate as fast as

    possible. A value of 50 ms has been proposed as a target time. Concerns have been

    expressed over this proposed target time when many VCs are involved. This is for further

    study. Protection switch completion time excludes the detection time necessary to initiate the

    protection switch, and the hold-off time.

    There are similar statements in other clauses as well

  • 8/13/2019 Automatic Protection Switching

    10/79Y(J)S APS Slide 10

    Switching time (2)

    This 50 ms time has become the golden standard

    and new protection schemes are expected to meet this objectiveHowever, studying the literature that lead up to SONET/SDH standards

    shows that the objective was to attain the minimum possible time

    for the sum of persistent (i.e. non-transient) failure detection

    speed of light propagation signaling protocol time

    regaining sync alignment

    and 50 ms was the minimum that was considered practical !

    Many modern standards have built in 50 ms

    and much marketing literature boasts faster than 50 msBut there is really nothing special about 50 ms

    50 ms gaps in voiced speech are noticeable,

    but not fatal if infrequent

    50 ms of data at high rates can not be stored and later forwarded

    timing circuits can withstand much more than 50 ms without clock

  • 8/13/2019 Automatic Protection Switching

    11/79Y(J)S APS Slide 11

    Protection classes

    It is useful to distinguish two different protection classes

    path protection (AKA trail protection, end-to-end protection)

    when a failure is detected on the end-to-end path

    we switch to an alternative end-to-end path

    the failure is usually detected by end-to-end OAM

    local protection (AKA local restoration, SNC protection, bypass, detour)

    we protect individual network elements, links, or groups of same when such an entity fails

    only that local entity is bypassed

    the failure may be detected by link OAM or physical layer means

  • 8/13/2019 Automatic Protection Switching

    12/79Y(J)S APS Slide 12

    APS entities (1)

    The following entities are important in APS

    working channelchannel used when no failure exists

    protection channelchannel used when a failure exists

    head-endentity transmitting data to working/protection channel

    tail-endentity receiving data from the working/protection channel

    Note: we will usually consider traffic to be bidirectional

    so that the head-end for one direction

    is the tail-end for the opposite direction

    head-end tail-end

    working channel

    protection channel

  • 8/13/2019 Automatic Protection Switching

    13/79Y(J)S APS Slide 13

    APS entities (2)

    Bridgefunction at head-end that connects traffic (including extra traffic) to the

    working and protection channels

    Selectorfunction at tail-end that extracts traffic (perhaps extra traffic) from

    the working or protection channel

    APS signaling channelchannel used to communicate between head-

    end and tail-end for APS purposes

    Trail terminationfunction responsible for failure detection

    including injection and extraction of OAM

    head-end

    (bridge)

    tail-end

    (selector)

    working channel

    protection channel

    signaling channel

  • 8/13/2019 Automatic Protection Switching

    14/79Y(J)S APS Slide 14

    Revertive operation

    Reversion means returning to use the working channel

    after the failure has been rectified

    Protection mechanisms can be revertiveor nonrevertive

    Revertive mechanisms may be preferable

    when the working channel has better performance (free BW, BER, delay)

    when there are frequent switches (easier to manage)

    when there is extra traffic

    but nonrevertive also has advantages only one service disruption due to protection switching

    may be simpler to implement

  • 8/13/2019 Automatic Protection Switching

    15/79Y(J)S APS Slide 15

    Uni/bi-directional

    We will usually consider bidirectional traffic

    but even then the failures can be uni- or bi- directional

    and for unidirectional failures there can be uni- or bi- directional switching

    unidirectional

    failure

    bidirectional

    failure

    working channel

    protection channel in use

    working channel

    protection channel

    unidirectional

    protection

    working channel

    protection channel in use

    working channel

    protection channel in use

    bidirectionalprotection

  • 8/13/2019 Automatic Protection Switching

    16/79Y(J)S APS Slide 16

    Uni- / bi- directional switching

    Unidirectional switching may be advantageous

    for 1+1 - faster and no signaling channel is needed

    no unnecessary service disruption for direction without failure

    higher chance of protection under multiple failures

    easier to implement for local protection

    maintains extra traffic in direction without failure

    But bidirectional may be preferable

    easier management since directions traverse same network elements

    does not disrupt delay balance between direction may simplify repair since failed spans are unused

  • 8/13/2019 Automatic Protection Switching

    17/79Y(J)S APS Slide 17

    Protection types

    We distinguish several different protection types

    1+1

    1:1

    1:n

    m:n (1:1)n

    Each type has its applicability, advantages, and disadvantages

    and there are trade-offs between

    simplicity

    BW consumption

    protection switch time

    signaling requirements

  • 8/13/2019 Automatic Protection Switching

    18/79Y(J)S APS Slide 18

    1+1 protection

    Simplest and fastest form of protection

    but wasteful - only 50% of actual physical capacity is used

    Head-end bridge always sends data on both channels

    Tail-end selector chooses channel to use (based on BER, dLOS, etc.)

    For unidirectional1+1 switching there is no need for APS signaling

    If non-revertive

    there is no distinction between working and protection channels

    channel A

    channel B

  • 8/13/2019 Automatic Protection Switching

    19/79Y(J)S APS Slide 19

    1:1 protection

    Head-end bridge usually sends data on working channelWhen failure detected it starts sending data over protection channel

    and tail-end needs to select the protection channel

    When not in use, protection channel can be used for extra traffic

    However, since failure is detected by tail-end, APS signaling is needed

    Protection channel should have OAM running to ensure its functionality

    working channel

    protection channel

    extra traffic

    APS signaling

  • 8/13/2019 Automatic Protection Switching

    20/79Y(J)S APS Slide 20

    1:n protection

    One protection channel is allocated for n working channels

    Only can protect one working channel at a time

    but improbable that more than 1 working channel will simultaneously fail

    Only 1/(n+1) of total capacity is reserved for protection

    working channels

    protection channel

  • 8/13/2019 Automatic Protection Switching

    21/79Y(J)S APS Slide 21

    m:n protection

    To enable protection of more than 1 channel

    m protection channels are allocated for n working channels (m < n)

    m simultaneous failures can be protected

    Less protection capacity dedicated than for n times 1:1

    When failure detected,

    1 of the m protection channels need to be assigned and signaled

    High complexity but conserves resources

    working channels

    protection channels

  • 8/13/2019 Automatic Protection Switching

    22/79Y(J)S APS Slide 22

    (1:1)nprotection

    This is like n times 1:1 but the n protection channels share bandwidth

    Only 1 failed working channel can be protected

    This is different from 1:n since

    n protection channels are preconfigured

    n working channels need not be of the same type

    Protection bandwidth must be at least that of the largest working channel

  • 8/13/2019 Automatic Protection Switching

    23/79

    Y(J)S APS Slide 23

    APS algorithm

    We have seen that protection switching is a tricky business

    So it is not surprising that network elements that support APS

    run anAPS algorithm

    This algorithm inputs : configuration (protection type, revertive?, available channels, )

    failure indications (NR, SF, SD)

    operator commands

    APS signaling (more on that soon)

    and makes switching decisions

    The algorithm maintains state information for head-end and tail-end

    APS algorithms are detailed in standards documents

  • 8/13/2019 Automatic Protection Switching

    24/79

    Y(J)S APS Slide 24

    Priority

    Not every failure event / operator command results in a protection switch

    For example

    in 1:n protection the protection channel may already be in use !

    Conflicts are resolved by assigning priorities to events/commands

    When an event is detected or a command received

    the APS algorithm will notact

    if an event/command or equal or higher priority is already in effect

    True failure conditions usually have higher priority than manual commands

  • 8/13/2019 Automatic Protection Switching

    25/79

    Y(J)S APS Slide 25

    Timers

    Even failure events with priority are not acted upon immediately

    to do so would cause unnecessary switches after transient defects

    The APS algorithm may maintains several timers, such as

    Holdoff timers

    the time between detection of a SF or SD event

    and the APS algorithm acting upon this even the algorithm usually used is called peek twice

    i.e., the condition is checked again after the timer expires

    Wait To Restore timer

    for revertive switching, the time between detection of the failure being

    cleared and the APS algorithm acting upon this event also used in SDH optimized bidirectional 1+1 (nonrevertive)

    Guard timer

    for ringsblockout time during which APS messages are ignored (sincethey may be old and outdated)

  • 8/13/2019 Automatic Protection Switching

    26/79

    Y(J)S APS Slide 26

    APS signaling

    In all types except unidirectional 1+1, some APS signaling is needed

    APS signaling is used to synchronize between head-end and tail-end

    It is critical that head-end and tail-end always be in the same state

    Example messages include :

    No Request (NR)

    by tail-end to inform head-end of Signal Failure (SF)

    by head-end to confirm the events priority

    by head-end to report the particular protection channel

    by head-end to inform tail-end of Reverse (bidirectional) Request (RR)

    by tail-end after failure cleared to Wait To Restore (WTR)

    by tail-end after failure cleared to Do Not Revert (DNR) for nonrevertive

  • 8/13/2019 Automatic Protection Switching

    27/79

    Y(J)S APS Slide 27

    APS signaling phases

    When APS signaling is used, it needs to be as rapid as possible

    Depending on the scenario it may be

    1-phase tailhead (fastest)

    tail-end informs head-end of failure

    both ends uniquely know the protection channel to be used

    only for 1+1 and unidirectional-(1:1)n (including 1:1)

    2-phase 1) tailhead 2) headtail

    tail-end informs head-end of failure

    head-end signals that it has switched to protection channel

    not for bidirectional-1:n or m:n

    3-phase 1) tailhead 2) headtail3) tailhead (slowest)

    works for all protection types (including m:n)

  • 8/13/2019 Automatic Protection Switching

    28/79

    Y(J)S APS Slide 28

    Examples of 1-phase

    Example of when 1-phase signaling is possible is 1:1 or (1:1)n

    1. upon detection of failure the tail-end sends SF to the head-end

    and immediately changes its selector (blind switch)

    upon receipt the head-end changes the bridge setting

    (no priority is checked)

    1-phase can also be used for bidirectional 1:1

    1. upon detection of failure the tail-end sends SF to the head-end

    and immediately changes both its selector and bridge

    upon receipt the head-end changes its bridge and selector

  • 8/13/2019 Automatic Protection Switching

    29/79

    Y(J)S APS Slide 29

    Example of 2-phase

    2-phase is useful for unidirectional 1:n with priority checking

    1. upon detection of failure the tail-end sends SF to the head-end

    but does not change its selector

    2. the head-end checks priority

    sends confirmation to tail-end (with identity of working channel)

    the bridge setting is changed

    3. the tail-end changes its selector

  • 8/13/2019 Automatic Protection Switching

    30/79

    Y(J)S APS Slide 30

    Example of 3-phase

    3-phase signaling is imperative for bidirectional 1:n

    1. upon detection of failure the tail-end sends SF to the head-end

    but does not change its selector

    2. the head-end checks priority, and sends confirmation to tail-end

    head-end changes its bridge setting

    and also sends a reverse request

    3. the tail-end changes selector

    checks priority and sends confirmation to head-end

    tail-end changes its bridge setting (as head-end of opposite direction)

    head-end receives confirmation and changes its selector

  • 8/13/2019 Automatic Protection Switching

    31/79

    Y(J)S APS Slide 31

    For G.805 buffs

    to add 1+1 trail protection to a trail - expand a trail termination functionwe use a special transport processing function - the protection switch

    unprotectedtrail

    the unprotected TTs report status

    to the protection switch

    protected trail

  • 8/13/2019 Automatic Protection Switching

    32/79

    Y(J)S APS Slide 32

    SONET/SDH APS

  • 8/13/2019 Automatic Protection Switching

    33/79

    Y(J)S APS Slide 33

    SONET protection ?

    SONET/SDH networks need to be highly reliable (five nines)

    Down-time should be minimal (less than 50 msec)

    So systems mustrepair themselves (no time for manual intervention)

    Upon detection of a failure (dLOS, dLOF, high BER)

    the network must reroute traffic (protection switching)

    from working channelto protection channelSDH APS is unidirectional

    SDH APS maybe revertive

    head-end NE tail-end NE

    working channel

    protection channel

  • 8/13/2019 Automatic Protection Switching

    34/79

    Y(J)S APS Slide 34

    SONET/SDH layers

    Between regenerators there are sections (regenerator sections)

    Between ADMs there are lines (multiplex sections)

    Between path terminations there are paths

    Protection can be at OC-n level (different physical fibers)

    or at STM/VC level

    or end-to-end path (trail protection)

    Path

    Termination

    Path

    Termination

    Line

    Termination

    Line

    Termination

    Section

    Termination

    path

    line line (MS section) line

    ADM ADMregenerator

    section section sectionsection

  • 8/13/2019 Automatic Protection Switching

    35/79

    Y(J)S APS Slide 35

    Synchronous Payload Envelope

    Line APS

    9rows

    TOH

    6r

    ows

    3rows

    90 columns

    9rows

    TOH consists of

    3 rows of section overhead - frame sync, trace, EOC,

    6 rows of line overhead - pointers, SSM, FEBE, and

    Line APS signaling uses bytes K1 and K2

    A1 A2 J0

    B1 E1 F1

    D1 D2 D3

    H1 H2 H3

    B2 K1 K2

    D4 D5 D6

    D7 D8 D9

    DA DB DC

    S1 M0 E2

  • 8/13/2019 Automatic Protection Switching

    36/79

    Y(J)S APS Slide 36

    HO Path APS

    POH is responsible for type, status, path performance monitoring, VCAT, trace

    HO Path APS signaling uses 4 MSBs of byte K3

    J1

    B3

    C2

    G1

    F2

    H4F3

    K3

    N1

    POH

  • 8/13/2019 Automatic Protection Switching

    37/79

    Y(J)S APS Slide 37

    LO Path APS

    VC OH is responsible for

    Timing, PM, REI,

    LO Path APS signaling is

    4 MSBs of byte K4

    1 875930

    V5

    J2

    N2

    K4

    VC OH

    V1

    V2

    V3

    V4

  • 8/13/2019 Automatic Protection Switching

    38/79

    Y(J)S APS Slide 38

    How does it work?

    Head-end and tail-end NEs have bridges (muxes)

    Head-end and tail-end NEs maintain bidirectional signaling channel

    Signaling is contained in Kbytes ofprotectionchannel

    For lineAPS

    K1tail-end status and requests K2head-end status

    head-end bridge tail-end bridge

    working channel

    protection channel signaling channel

  • 8/13/2019 Automatic Protection Switching

    39/79

    Y(J)S APS Slide 39

    Linear 1+1 protection

    Can be at OC-n level (different physical fibers)or at STM/VC level (SubNetwork Connection Protection)

    or end-to-end path (called trail protection)

    Head-end bridge always sends data on both channelsTail-end chooses channel to use based on BER, dLOS, etc.

    No need for signaling

    If non-revertive

    there is no distinction between working and protection channels

    head-end NE tail-end NE

    working channel

    protection channel

  • 8/13/2019 Automatic Protection Switching

    40/79

    Y(J)S APS Slide 40

    Linear 1:1 protection

    Head-end bridge usually sends data on working channelWhen tail-end detects failure it signals (using K1)to head-end

    Head-end then starts sending data over protection channel

    When not in use

    protection channel can be used for (discounted) extra traffic(pre-emptible unprotected traffic)

    May be at any layer (but only OC-n level protects against fiber cuts)

    working channel

    protection channel

    extra traffic

  • 8/13/2019 Automatic Protection Switching

    41/79

    Y(J)S APS Slide 41

    Linear 1:N protection

    In order to save BW

    we allocate 1 protection channel for every N working channels

    N limited to 14

    4 bits in K1 byte from tail-end to head-end 0 protection channel

    1-14 working channels 15 extra traffic channel

    working channels

    protection channel

  • 8/13/2019 Automatic Protection Switching

    42/79

    Y(J)S APS Slide 42

    Two fiber vs. Four-fiber rings

    Ring based protection is popular in North America (100K+ rings)

    Full protection against physical fiber cutsSimpler and less expensive than mesh topologies

    Protection at line (multiplexed section) or path layer

    Four-fiber ringsfully redundant at OC level

    can support bidirectional routing at line layerTwo-fiber rings

    support unidirectional routing at line layer

    2 fibers in opposite directions

    U idi i l bidi i l

  • 8/13/2019 Automatic Protection Switching

    43/79

    Y(J)S APS Slide 43

    Unidirectional vs. bidirectional

    Unidirectional routingworking channel B-A same direction (e.g. clockwise)as A-Bmanagement simplicity: A-B and B-A can occupy same timeslotsInefficient: waste in ring BW and excessive delay in one direction

    Bidirectional routingA-B and B-1 are opposite in directionboth using shortest route

    spatial reuse: timeslots can be reused in other sections

    A

    BA-B

    B-A

    A

    B

    B-A

    A-B

    C

    B-C

    C-B

  • 8/13/2019 Automatic Protection Switching

    44/79

    Y(J)S APS Slide 44

    UPSR vs. BLSR (MS-SPRing)

    Of all the possible combinations, only a few are in use

    Unidirectional (routing) Path Switched Rings

    protects tributariesextension of 1+1 to ring topology

    Bidirectional (routing) Line Switched Rings (two-fiber and four-fiber versions)

    called Multiplex Section Shared Protection Ring in SDHsimultaneously protects all tributaries in STMextension of 1:1 to ring topology

    Path switching

    Line switching

    Two-fiber

    Four-fiber

    Unidirectional

    Bidirectional

    UPSR

    BLSR

  • 8/13/2019 Automatic Protection Switching

    45/79

    Y(J)S APS Slide 45

    UPSR

    Working channel is in one direction

    protection channel in the opposite directionAll path traffic is added in both directions (1+1)

    decision as to which to use is made at drop point (no signaling)

    Normally non-revertive, so effectively two diversitypaths

    Good match for access networks1 access resilient ring

    less expensive than fiber pair per customer

    Inefficient for core networks

    no spatial reuse

    every signal in every spanin both directions

    node needs to continuously monitorevery tributary to be dropped

    SONET ADM

    2 rings

  • 8/13/2019 Automatic Protection Switching

    46/79

    Y(J)S APS Slide 46

    BLSR

    Switch at line levelless monitoring

    When failure detected tail-end NE signals head-end NE

    Works for unidirectional/bidirectional fiber cuts, and NE failures

    Two-fiber versionhalf of OC-N capacity devoted to protectiononly half capacity available for traffic

    Four-fiber version

    full redundant OC-N devoted to protection

    twice as many NEs as compared to two-fiber

    Example

    recovery from unidirectional fiber cut

    wrap-around

    2 rings

  • 8/13/2019 Automatic Protection Switching

    47/79

    Y(J)S APS Slide 47

    Ethernet linear APS

    STP

    LAG

    G.8031

  • 8/13/2019 Automatic Protection Switching

    48/79

    Y(J)S APS Slide 48

    STP

    The original Spanning Tree Protocol automatically removed loops

    from arbitrary networks (with loops)

    However, its convergence was very slow (about a minute)

    STP can not be used as a protection mechanism

    since its reconvergence time is very long

    due to a cumbersome protocoland long holdoff timer settings

    An evolutionary update called Rapid STP802.1w

    was incorporated into 802.1D-2004 clause 17

    that converges in about the same time as STP

    but can reconverge after a topology change in less than 1 second

    RSTP can be used to detect failures and reconverge

    and thus can be used as a primitive protection mechanism

    However, the switching time will be many tens of ms to 100s of ms

  • 8/13/2019 Automatic Protection Switching

    49/79

    Y(J)S APS Slide 49

    Use of LAG

    Ethernet link aggregation (AKA bonding, Ethernet trunk, inverse mux, NIC teaming)

    enables bonding several ports together as single uplink

    Defined by 802.3ad task force and folded into 802.3-2000 as clause 43

    Binding of ports to Link Aggregation Groups (LAGs) distributed via

    Link Aggregation Control Protocol (LACP)

    LACP uses slow protocol frames (up to 5 per second)

    Links may be dynamically added/removed from LAG

    and LACP continuously monitors to detect if changes needed

    Upon link failure LAG delivers traffic at a reduced rate

    Thus LAG can be used as a primitive protection mechanism

    When used this way it is called worker/standby orN+N mode

    The restoration time will be on the order of 1 second

  • 8/13/2019 Automatic Protection Switching

    50/79

    Y(J)S APS Slide 50

    G.8031

    Q9 of SG15 in the ITU-T is responsible for protection switching

    In 2006 it produced G.8031 Linear Ethernet Protection Switching

    G.8031 uses standard Ethernet formats, but is incompatible with STP

    The standard addresses

    point-to-point VLAN connections

    SNC (local) protection class 1+1 and 1:1 protection types

    unidirectional and bidirectional switching for 1+1

    bidirectional switching for 1:1

    revertive and nonrevertive modes

    1-phase signaling protocol

    G.8031 uses Y.1731 OAM CCM messages in order to detect failures

    G.8031 defines a new OAM opcode (39) for APS signaling messages

    Switching times should be under 50 ms (only holdoff timers when groups)

  • 8/13/2019 Automatic Protection Switching

    51/79

    Y(J)S APS Slide 51

    G.8031 signaling

    The APS signaling message looks like this :

    regular APS messages are sent 1 per 5 seconds

    after change 3 messages are sent at max rate (300 per sec)

    where

    req/state identifies the message (NR, SF, WTR, SD, forced switch, etc)

    prot. type identifies the protection type (1+1, 1:1, uni/bidirectional, etc.)

    requested and bridged signal identify incoming / outgoing traffic

    since only 1+1 and 1:1 they are either null or traffic (all other values reserved)

    MEL

    (3b)

    VER=0

    (5b)

    OPCODE=39

    (1B)

    FLAGS=0

    (1B)

    OFFSET=4

    (1B)

    req/state

    (4b)

    prot. type

    (4b)

    requested sig

    (1B)

    bridged sig

    (1B)

    reserved

    (1B)

    END=0

    (1B)

  • 8/13/2019 Automatic Protection Switching

    52/79

    Y(J)S APS Slide 52

    G.8031 1:1 revertive operation

    In the normal (NR) state :

    head-end and tail-end exchange CCM (at 300 per second rate)on both working and protection channels

    head-end and tail-end exchange NR APS messages

    on the protection channel (every 5 seconds)

    When a failure appears in the working channel tail-end stops receiving 3 CCM messages on working channel

    tail-end enters SF state

    tail-end sends 3 SF messages at 300 per second on the APS channel

    tail-end switches selector (bi-d and bridge)to the protection channel

    head-end (receiving SF)switches bridge (bi-d and selector)to protection channel tail-end continues sending SF messages every 5 seconds

    head-end sends NR messages but with bridged=normalWhen the failure is cleared tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12 min)

    tail-end sends WTR message to head-end (in nonrevertive - DNR message)

    tail-end sends WTR every 5 seconds

    when WTR expires both sides enter NR state

  • 8/13/2019 Automatic Protection Switching

    53/79

    Y(J)S APS Slide 53

    Ethernet ring APS

    G.8032

    RPR

    CLEER

  • 8/13/2019 Automatic Protection Switching

    54/79

    Y(J)S APS Slide 54

    Ethernet rings ?

    Ethernet has become carrier grade :

    deterministic connection-oriented forwarding

    OAM

    synchronization

    The only thing missing to completely replace SDH is ring protection

    However, Ethernet and ring architectures dont go together Ethernet has no TTL, so looped traffic will loop forever

    STP builds trees out of any architectureno loops allowed

    There are two ways to make an Ethernet ring

    open loop

    cut the ring by blocking some link

    when protection is required - block the failed link

    closed loop

    disable STP (but avoid infinite loops in some way !)

    when protection is required - steerand/or wraptraffic

  • 8/13/2019 Automatic Protection Switching

    55/79

    Y(J)S APS Slide 55

    Ethernet ring protocols

    Open loop methods G.8032 (ERPS)

    rSTP (ex 802.1w)

    RFER (RAD)

    ERP (NSN)

    RRST (based on RSTP)

    REP (Cisco)

    RRSTP (Alcatel)

    RRPP (Huawei)

    EAPS (Extreme, RFC 3619)

    EPSR (Allied Telesis)

    PSR (Overture)

    Closed loop methods

    RPR (IEEE 802.17)

    CLEER and NERT (RAD)

  • 8/13/2019 Automatic Protection Switching

    56/79

    Y(J)S APS Slide 56

    G.8032

    Q9 of SG15 produced G.8032 between 2006 and 2008

    G.8032 is similar to G.8031

    strives for 50 ms protection (< 1200 km, < 16 nodes)

    but here this number is deceiving as MAC table is flushed

    standard Ethernet format but incompatible with STP

    uses Y.1731 CCM for failure detection employs Y.1731 extension for R-APS signaling (opcode=40)

    R-APS message format similar to APS of G.8031

    (but between every 2 nodes and to MAC address 01-19-A7-00-00-01)

    revertive and nonrevertive operation defined

    However, G.8032 is more complex due to

    requirement to avoid loop creation under any circumstances

    need to localize failures

    need to maintain consistency between all nodes on ring

    existence of a special node (RPL owner)

  • 8/13/2019 Automatic Protection Switching

    57/79

    Y(J)S APS Slide 57

    RPL

    G.8032v1 defines the Ring Protection Link (RPL)as the link to be blocked (to avoid closing the loop) in NR state

    One of the 2 nodes connected to the RPL

    is designated the RPL owner

    Unlike RFER

    there is only one RPL owner

    the RPL and owner are designated before setup

    operation is usually revertive

    All ring nodes are simultaneously in 1 of 2 modesidle or protecting

    in idle mode the RPL is blocked in protecting mode the failed link is blocked and RPL is unblocked

    in revertive operation

    once the failure is cleared the block link is unblocked

    and the RPL is blocked again

  • 8/13/2019 Automatic Protection Switching

    58/79

    Y(J)S APS Slide 58

    G.8032 revertive operation

    In the idle state :

    adjacent nodes exchange CCM at 300 per second rate (including over RPL) exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5 seconds (but notoverRPL)

    R-APS messages are never forwarded

    When a failure appears between 2 nodes node(s) missing CCM messagespeek twice with holdoff time

    node(s) block failed link and flush MAC table node(s) send SF message (3 times @ max rate, then every 5 sec)

    node receiving SF message will check priority and unblock any blocked link

    node receiving SF message will send SF message to its other neighbor

    in stable protecting state SF messages over every unblocked link

    When the failure is cleared node(s) detect CCM and start guard timer (blocks acting on R-APS messages)

    node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec)

    RPL owner receiving NR starts WTR timer

    when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB

    node receiving NR RB flushes table, unblocks any blocked ports, sends NR RB

    G 8032 2010

  • 8/13/2019 Automatic Protection Switching

    59/79

    G.8032-2010

    After coming out with G.8032 in 2008 (G.8032v1)

    the ITU came out with G.8032-2010 (G.8032v2) in 2010

    This new version is not backwards-compatiblewith v1

    but a v2 node must support v1 as well (but then operation is according to v1)

    Major differences :

    2 designated nodesRPL owner node and RPL neighbor node

    and for optionalflush-optimization next neighbor node

    significant changes to

    state machine

    priority logic

    commands (forced/manual/clear) and protocol

    new Wait To Block timer supports more general topologies (sub-rings)

    ladders (For Further Study in v1)

    multi-ring

    ring topology discovery

    virtual channel based on VLAN or MAC address

    Y(J)S APS Slide 59

    ring subringsubring

    ladder

    RPL

    RPLowner

    RPLneighbor

    RPLnext

    neighbor

  • 8/13/2019 Automatic Protection Switching

    60/79

    Y(J)S APS Slide 60

    RPR802.17

    Resilient Packet Rings

    are compatiblewith standard Ethernet, but different frame format are robust (lossless,

  • 8/13/2019 Automatic Protection Switching

    61/79

    Y(J)S APS Slide 61

    Basic RPR queuing

    traffic from local source

    sent according to fairness

    first sent to ringlet selection

    PTQ

    STQ

    AC B

    fairness

    AC B

    traffic going around ringplaced into internal bufferin dual-transit queue mode

    placed into 1 of 2 buffers

    according to service class

    sent according to fairness

    traffic for local sinkplaced in output buffer

    according to service class

    Primary/Secondary Transit Queue

  • 8/13/2019 Automatic Protection Switching

    62/79

    Y(J)S APS Slide 62

    RPR service classes

    class use info rate D/FDV FE

    A0 RT reserved low No

    A1 RT allocated,

    reclaimable

    low No

    B-CIR near RT allocated,

    reclaimable

    bounded No

    B-EIR near RT opportunistic unbounded Yes

    C BE opportunistic unbounded Yes

    RPR defines 3 main classes

    class A : real time (low latency/FDV) class B : near real time (bounded predictable latency/FDV) class C : best effort

  • 8/13/2019 Automatic Protection Switching

    63/79

    Y(J)S APS Slide 63

    RPR Class use

    A0 ring BW is reservednot reclaimed even if no traffic

    in dual-transit queue mode:

    class A frames from the ring are queued in PTQ

    class B, C in STQ

    priority for egress

    frames in PTQ local class A frames

    local class B (when no frames in PTQ)

    frames in STQ

    local class C (when no PTQ, STQ, local A or B)

    Notes:class A have minimal delayclass B have higher priority than STQ transit frames, so bounded delay/FDVclasses B and C share STQ, so once in ring have similar delay

  • 8/13/2019 Automatic Protection Switching

    64/79

    Y(J)S APS Slide 64

    RPR - protection

    rings give inherent protection against single point of failure

    RPR specifies 2 mechanisms

    steering

    wrapping (optional)

    (implementations may also do wrapping then steering)

    wrap

    steering info

  • 8/13/2019 Automatic Protection Switching

    65/79

    Y(J)S APS Slide 65

    NERT and CLEER

    New Ethernet Ring Technology / Closed Loop Encapsulated Ethernet Ring

    Similar to RPR but uses real Ethernet format

    NERT and CLEER distinguish between

    ring nodes

    switches connected to ring nodes

    Traffic in ring is MAC-in-MAC encapsulated External MACs are of ring node

    Internal MACs are original

    Unexpected external MACs discarded

    External MACs learned as in 1ah

    Ring nodes forward according to table

    NERT floods, CLEER never floods

    Protection switch only involves changing table

    so service restoration isfast

    ring nodes

    switches

  • 8/13/2019 Automatic Protection Switching

    66/79

    Y(J)S APS Slide 66

    MPLS fast reroute

    IP FRR

    RFC 4090

  • 8/13/2019 Automatic Protection Switching

    67/79

    Y(J)S APS Slide 67

    IP FRR

    True protection mechanisms do not exist for connectionless IPIn practice, routing protocols discover breaks and recalculate routes

    but this usually takes a long time

    Link-state IGPs detect link-down state using hellos

    for OSPF - typically every 10 sec, and detection after 40 sec

    and then Dijkstra algorithm avoids the failed link

    BFD can be used to speed up the detection

    However,

    the information still has to be propagated further (seconds?)

    and FIBs updated (100s of ms)

    Various IP Fast ReRoute (IP FRR) mechanisms have been proposed

    but true protection is best done at the MPLS level

    f

  • 8/13/2019 Automatic Protection Switching

    68/79

    Y(J)S APS Slide 68

    MPLS fast reroute

    RSVP-TE enables MPLS traffic engineering by fine control over placementspecifies explicit path using information gathered from IGP

    resources may be reserved at LSRs along the way

    RFC 4090 defines extensions to RSVP-TEFast ReRoute (FRR)

    LSRs along the path preconfigure local bypasses (detours)

    Upon detection of failure by

    BFD (specified in microseconds, typically 10s of ms) or

    RSVP hellos (RFC default is 5 ms) or

    RESV / PATH messages (driven by IGP)

    upstream LSR simply enables the detour

    Since this is a local action, it should be fast

    RFC 4090 only discusses adding FRR to RSVP-TE network

    but its use with LDP is possible if there is a single label generator

    not

    discussed in

    RFC 4090

    d

  • 8/13/2019 Automatic Protection Switching

    69/79

    Y(J)S APS Slide 69

    PLRs and MPs

    A fundamental entities in MPLS FRR are Point of Local Repair (PLR)

    Merge Point (MP)

    A PLR is the LSR before the failed element (link or node)

    All LSRs except the egress LER can be PLRs

    The PLR is solely responsible for the FRR (no explicit APS signaling)

    During path setup, potential PLRs create detours towards the egress LER

    A MP is the LSR where the detour rejoins the LSP

    All LSRs except the ingress LER can be MPs

    ingress

    LER

    egress

    LERPLR MP

    M h d

  • 8/13/2019 Automatic Protection Switching

    70/79

    Y(J)S APS Slide 70

    Methods

    RFC 4090 defines two different protection methods

    Usually one orthe other is employed in a given network

    One-to-one backup

    each LSP protected separately

    detour LSP created for each LSP at each potential PLR

    no labels pushed

    Facility backup

    backup tunnel for multiple LSPs bypass tunnel created at each potential PLR

    uses label stacking

    PLR MP

    PLR MP

    NHOP d NNHOP

  • 8/13/2019 Automatic Protection Switching

    71/79

    Y(J)S APS Slide 71

    NHOP and NNHOP

    MPLS FRR can bypass a failed link or a failed node

    In order to bypass a single failed link

    we need an alternative path to the next hop (NHOP)

    In order to bypass a single failed node, we need an alternative path to the

    next next hop (NNHOP)

    PLR MP

    PLR MP

  • 8/13/2019 Automatic Protection Switching

    72/79

    Y(J)S APS Slide 72

    MPLS TP APS

    RFC 6372 (MPLS-TP Survivability Framework)

    RFC 6378 (MPLS-TP Linear Protection)

    draft-ietf-mpls-tp-ring-protection

    MPLS-TP resilience

  • 8/13/2019 Automatic Protection Switching

    73/79

    Y(J)S APS Slide 73

    MPLS-TP resilience

    Since it strives to be a carrier-grade transport network

    TP has strong protection switching requirements

    APS has been almost as contentious issue as OAM

    and indeed the arguments are inter-related

    RFC 6372 gives a general framework

    and differentiates between

    linear

    shared-mesh and ring protection

    Linear protection

  • 8/13/2019 Automatic Protection Switching

    74/79

    Y(J)S APS Slide 74

    Linear protection

    from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection)

    1+1, 1:1, 1:n and uni/bidi are supported

    APS signaling protocol (for all modes except 1+1 uni)

    is single-phase

    and called the Protection State Coordination protocol

    PSC messages are sent over the protection channelAPS messages are sent over the GACh with a single channel type

    message functions identified by a request field

    6 states: normal, protecting due to failure, admin protecting,

    WTR, protection path unavailable, DNRwhen revertive, a WTR timer is used

    PSC message format

  • 8/13/2019 Automatic Protection Switching

    75/79

    Y(J)S APS Slide 75

    PSC message format

    Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR

    PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n

    R = Revertive

    FPath = which path has fault Path = which data path is on protection channel

    0001 VER 00000000 PSC channel type

    Ver Request PT R Res FPath Path

    GAL Label (13) TC S=1 TTL GAL

    GACh

    PSCTLV Length Res

    Optional TLVs

    PSC control logic states

  • 8/13/2019 Automatic Protection Switching

    76/79

    Y(J)S APS Slide 76

    PSC control logic states

    Normal state - no trigger events reported

    Unavailable state - protection path is unavailable

    Protecting failure state

    traffic is being transported on the protection path

    Protecting administrative stateoperator issued command switching traffic to protection path

    Wait-to-Restore state - recovering from working path SF/SD

    WTR timer not up

    Do-not-Revert state - recovered from a protecting statebut operator has configured DNR

    PSC local requests

  • 8/13/2019 Automatic Protection Switching

    77/79

    Y(J)S APS Slide 77

    PSC local requests

    In order from highest to lowest priority :

    1. Clear (operator command)

    2. Lockout of protection (operator command)

    3. Forced Switch (operator command)

    4. Signal Fail on protection (OAM / control-plane / server indication)

    5. Signal Fail on working (OAM / control-plane / server indication)

    6. Signal Degrade on working (OAM / control-plane / server indication)

    7. Clear Signal Fail/Degrade (OAM / control-plane / server indication)

    8. Manual Switch (operator command)

    9. WTR Expires (WTR timer)10. No Request (default)

    Linear protection ITU style

  • 8/13/2019 Automatic Protection Switching

    78/79

    Y(J)S APS Slide 78

    Linear protection ITU style

    from draft-zulr-mpls-tp-linear-protection-switching

    Similar to previous, but uses Y.1731/G.8031 format (no surprise!)

    0001 VER 00000000 allocated channel type

    GAL Label (13) TC S=1 TTL GAL

    GACh

    G.8031

    MEL VER OPCODE=39 FLAGS=0 OFFSET=4

    req

    state

    prot

    type

    requested

    sigbridged

    sigreserved

    END=0

    Ring protection

  • 8/13/2019 Automatic Protection Switching

    79/79

    Ring protection

    once again there weretwo drafts, both supporting

    p2p and p2mp, wrapping and steering, link/node failures

    draft-ietf-mpls-tp-ring-protection (not yetRFC)Between any 2 LSRs can define a Sub-Path Maintenance Entity

    So between 2 LSRs on a ring there are 2 SPMEs

    we define 1 as the working channel and 1 as the protection channelNow we re-use the linear protection mechanisms, including the PSC protocol

    draft-helvoort-mpls-tp-ring-protection-switchingBoth counter-rotating rings carry working and protection traffic

    The bandwidth on each ring is divided

    X BW is dedicated to working traffic and Y dedicated to protection trafficThe protection bandwidth of one ring is used to protect the other ring

    Each node should have information about the sequence of ring nodes

    MPLS-TP Ring Protection Switching is G.8032-like, but forwards non-NR msgs