topology-focused availability analysis of basic protection schemes in optical transport networks

Vol. 7, No. 4 / April 2008 / JOURNAL OF OPTICAL NETWORKING 351

Topology-focused availabilityanalysis of basic protection schemes

in optical transport networks

Juan Segovia,1,* Eusebi Calle,1 Pere Vila,1 Jose Marzo,1 and Janos Tapolcai2

1Institute of Informatics and Applications (IIiA), University of Girona,Girona 17071, Spain

2Budapest University of Technology and Economics, H-1117 Budapest, Hungary*Corresponding author: [email protected]

Received November 2, 2007; revised February 4, 2008;accepted February 12, 2008; published March 26, 2008 �Doc. ID 89280�

We present a novel study of the availability in optical networks based on net-work topology. Connection availability is studied under two basic path protec-tion schemes. Different topologies are selected in order to have heterogeneityin geographic coverage, network diameter, link lengths, and average node de-gree. Connection availability is also computed considering the reliability dataof physical components and a well-known network availability model. Resultsreport some useful information to select the suitable protection algorithms ac-cording to the network topology features and the required levels ofavailability. © 2008 Optical Society of America

OCIS codes: 060.4257, 060.4261, 060.4510.

1. IntroductionThe aggregate bit rate that an optical medium can carry continues growing steadily.Supported by advances in key technologies such as wavelength division multiplexing(WDM), current networks can carry data at hundreds of gigabits per second over asingle fiber. Optical transport networks (OTNs), built upon these technologies, areseen as one of the most suitable technologies for core networks in order to support theexpected increase of traffic demand, driven by content-rich applications.

The increase in traffic volume, however, puts even more pressure on the ability ofnetworks to withstand adverse conditions caused by events such as component fail-ure, natural disasters, and fiber cuts. In an OTN a failure can disrupt service forthousands of users and severely affect operators’ revenue, even if the resultant inter-ruption lasts for just a few seconds. In this context, the quality of a service can becharacterized and assessed in terms of its reliability and availability. Several tech-niques have been proposed and deployed for dealing with network failures in order torestore the service as soon as possible. They differ in scope, approach to resource allo-cation and provisioning, routing strategy and goal, recovery model, etc. [1]. This richset of recovery options gave birth to several resilience-oriented routing schemes andquality of service (QoS) architectures, such as those in [2–5]. However, network pro-tection techniques improve availability at the cost of increasing resource usage due tothe underlying redundancy applied.

Two basic protection schemes are dedicated path protection (DPP) and shared pathprotection (SPP). DPP offers a high level of availability and fast recovery times butrequires more than 100% extra network capacity [6]. On the other hand, SPP is ableto share resources allocated for recovery purposes and thus reduce the capacityrequirements, although the offered availability can be lower. Both DPP and SPP areimportant protection schemes; the former is currently widely deployed, and the latteris appealing for its resource efficiency.

A network-wide assessment of availability can be useful to network operators. How-ever, the unit of service in a transport network is a connection, i.e., the virtual com-munication channel created between two designated nodes. A connection is unavail-able as soon as one of its components becomes unavailable. Therefore, performanceparameters defined in service level agreements between operator and customer usu-ally refer to that unit of service. The relevance of connection availability is reflected inthe growing number of papers on the topic. Their focus, however, is either on optimal

1536-5379/08/040351-14/$15.00 © 2008 Optical Society of America


design for availability or efficient connection provisioning, where the demand is fixedand known beforehand.

Our goal in this paper is to study the effects that topology properties of a given net-work have on connection availability when different path protection schemes areapplied. Given that availability is defined as the probability that a system (e.g., a con-nection) is found in the operating state sometime in the future, it is clear that thecomponents forming a connection should be analyzed. Furthermore, the combinationof topology, protection scheme, and routing strategies can generate different connec-tion availability patterns. Therefore, the challenge is to identify such patterns associ-ated with topology properties (e.g., node degree, link length, and network diameter)that can be used for assessing and comparing network performance from the point ofview of connection availability.

A secondary goal of this paper is to measure the uncertainty propagated in connec-tion availability values. This uncertainty exists because the manufacturer-providednetwork components’ reliability data have in turn uncertainties. The importance ofthis aspect is recognized in the literature [7] but it has been essentially overlooked inprevious studies about connection availability. By measuring this uncertainty it ispossible to evaluate the impact it has on connection downtime.

The rest of this paper is organized as follows. Section 2 gives an overview of therelated work. Section 3 presents the network availability model, as well as the connec-tion availability equations for each protection scheme. Section 4 presents the selectedtopologies and their main properties, together with the figures of merit. Section 5 dis-cusses the results. Finally, Section 6 concludes the paper.

2. Related WorkThe analysis of connection and network availability in OTNs is a topic that is attract-ing growing interest. In [8] a strategy for provisioning connections with guaranteedavailability in a dynamic traffic scenario is presented. The strategy attempts to mini-mize the allocation of spare capacity. It considers the coexistence of different protec-tion schemes (unprotected, dedicated, and shared protection). The protection schemechosen depends on the availability requirement. Results are given for two referencenetworks. The network availability model assumes that nodes never fail and no spe-cific observation is made about topology properties. It is concluded that SPP can beused in national networks, but more than one protection path would be needed if highavailability is required in bigger networks.

The focus in [9] is on design for availability. A network design technique that maxi-mizes connection availability for a given set of static demands is presented. It alsominimizes the number of fibers required. An expression to approximate the value ofconnection availability under SPP is given; however, it is different from the one cho-sen for our study. Externally generated failures such as cable cuts are not included, asthere is no broad agreement among authors about the effective statistical relevance ofthese events.

The availability of connections in a large network under three protection schemes(path protection, span protection, and p cycles) is analyzed in [7]. The first part of thestudy uses paths consisting of fixed-length links, while the second part focuses on aspecific path of a given large network. By means of a sensitivity analysis it is shownthat path-based protection schemes are noticeably dependent on both the link repairtimes and the fiber failure rates. Therefore, uncertain reliability data can affect theestimated availability of connections.

A performance comparison of several pan-European networks with hypotheticaltopologies is presented in [10]. The authors analyze their cost and capacity usage, aswell as the availability of connections. Starting from a base topology, three additionaltopologies with varying average node degrees and numbers of nodes are derived andstudied. Traffic is considered heterogeneous in type; demands for voice, transactiondata, and Internet traffic are serviced concurrently.

Connection availability from the perspective of different optical node architectures(e.g., opaque and transparent nodes) is analyzed in [11]. The study provides availabil-ity maps (contour plots) that relate availability to the length of paths of a connection(in number of hops and distance). Connection availability is estimated with a particu-lar expression that depends on path length, number of hops traversed, and the need


for and the impact of regenerators, among other factors. The context is assumed to bea dedicated protection scheme in which working and backup paths share exactly thesame characteristics.

A multiobjective optimization approach to incorporating uncertainty in the networkdesign phase is presented in [12]. The design premise is that components used tobuild the network have uncertain reliability and so the objective is to maximize reli-ability with minimum variance.

3. Connection Availability and Protection Schemes3.A. Availability EvaluationThe term availability refers to the ability of an item or system to be in the operatingstate at any time in the future. This ability is conveniently expressed as a probability.Likewise, reliability is the probability that a system or item will operate without anyfailure performing its function for a specific interval of time. In the context of thisstudy, a connection is a system whose components are elements such as fiber opticcables, amplifiers, and switches. Equation (1) uses two common reliability measuresto compute availability: the mean time to repair (MTTR), and the mean time betweenfailures (MTBF), measures obtained over long periods of time [13]:

A = �MTBF − MTTR�/MTBF. �1�

Reliability data can be obtained from many sources: Vendors can provide informa-tion about their equipment, and it can be forecast by prediction models or even gath-ered from field studies by network operators. Nonetheless, data are not always readilyavailable to researchers and information is often incomplete or suffers from excessivedispersion [7]. As part of their simulations and studies, authors provide the reliabilitydata that they have considered, for example [7,9,14]; unsurprisingly, they usually dif-fer in scope and values. The reference values presented in [13] are used in this paper.That collection of reliability numbers include a variety of equipment types for OTNs,without being tied to vendor-specific products.

Independently of the protection scheme being applied, computing the availability ofa connection means obtaining the availability of one or more paths. Assuming statis-tically independent component availability, path availability is the product of theavailability of each node and link from an origin to a destination, as shown in Eq. (2).A path is available only if all the components along its route are available:

Apath = AN1AL1

AN2AL2

. . . ANkALk

ANk+1, �2�

where ANiand ALi

denote the availability of the nodes and links forming a given pathof k hops. The availability of a link results from the availability of a series systembased on its components. On the other hand, nodes are usually treated as indivisibledue to the fact that they have much higher availability compared to the other ele-ments. In this paper, the availability of nodes is represented by a single reliabilitymeasure.

3.B. Connection Availability and Basic Protection SchemesThere are many protection schemes proposed in the literature; they differ in their pro-visioning method, their number of working paths per backup path, their recoveryscopes, and so on. This paper studies availability under the following schemes: (a) atype of DPP known as 1+1 protection, (b) SPP, and (c) no protection, for comparisonpurposes.

For simplicity, connections are assumed to be bidirectional and symmetric: Capacityis allocated so that traffic can flow in both directions simultaneously following exactlythe same route, although the resources assigned to each direction need not be thesame. If failure arises in components serving one direction, the whole connection isconsidered unavailable. Under this assumption, the path availability as given byEq. (2) needs to be squared to obtain the availability of an unprotected connection [seeEq. (3)]. In fact, this behavior is assumed for protected connections as well:

A = Apath2 . �3�

With DPP, one dedicated backup path protects exactly one working path, and undernormal operation (i.e., nonfailing state) traffic is duplicated at the ingress node so that


both working and protection paths carry the same data toward the destination node.The receiving end chooses the better one. Working and protection paths are chosen sothat no node or link is shared between them. Then, a connection is available if at leastone of its paths is available. This arrangement corresponds to a parallel system. Itsavailability expression is given as

A = Aw + �1 − Aw�Ab, �4�

where Aw and Ab denote the availability of the working and backup paths, respec-tively. The expression is based on the observation that either the working path hasnot failed or it has failed but the backup path has not. Aw and Ab are computed as perEq. (3). As the path protection mechanism cannot cover the case in which the endnodes themselves fail, complementary protection is needed for them in real-life net-works, for example, internal redundancy of components. Such an extramechanism isnot included in the network availability model selected for this paper. Instead, theavailability of connection end nodes was assumed to be 1, a simplification also appliedin other studies of this kind [13].

When connections are protected with SPP, working and protection paths are alsoselected to be node disjoint. However, traffic is sent and received exclusively throughthe working path. Only when a failure is first detected does the protection path carrythe traffic. This scheme is called shared because a single backup path can be assignedto many working paths, leading to the 1:N notation (with N�1). Under the assump-tion that there can only be one outstanding failure at any given time, the total capac-ity required for protection in a given link s can be computed as follows: Let C be theset of connections that share a given link m on their protection path; i is an activeconnection, i∋C; Di is the capacity demanded by connection i; and E is the set of con-nections whose working paths share at least one link with each other. The requiredcapacity for sharing is max�max�Di� ,�k�EDk�, that is, the accumulated demand of con-nections that can be affected simultaneously or the largest demand of connectionsthat can only fail alone, whichever is larger. Therefore, under the single-failureassumption, the total capacity assigned for protection in m is such that, when a fail-ure occurs in the network, all the affected connections are guaranteed to have backupresources on that link. Clearly, this scheme can help reduce the capacity needednetwork-wide to support a given traffic matrix when compared to DPP.

Taking into account the arrangement of paths, and assuming that only one backuppath is provisioned, connection availability under DPP and SPP is computed in thesame way—by applying Eq. (4). However, because of the sharing nature of SPP, if twoor more simultaneous or near-simultaneous failures affect unrelated (disjoint) work-ing paths, there is no guarantee that all of them will be recovered because they mightbe sharing backup resources. Specifically, a successful corrective action for one of thefailing connections could leave others without the resources they need for their ownrecovery if the capacity for protection is already used up. A penalty for this potentialaccess conflict should then be considered when using SPP, lowering the initial avail-ability computed by Eq. (4). Equation (5) is used in this paper for approximating thispenalty P, as suggested in [15] (other approaches exist, such as the one based on Mar-kov chains given in [14]):

P = �1 − Aw�Ab�1 − Awn−1��1 −

1

2 + �n − 2��1 − Aw�� , �5�

where n is the size of the backup-path-sharing group. If b is the backup path selectedfor a new connection, and bi is every link in path b, then n is the maximum of thenumber of working paths that have bi as part of their own backup path (the backup-path-sharing group size). The expression ties the penalty to the probability that allthe working paths in the connection’s backup-path-sharing group fail, although thebackup path continues working, which is clearly a worst-case scenario. The last factorin the equation adjusts that probability upwards or downwards according to the valueof n: the larger its value, the higher the probability of not obtaining the requiredresource. Then, a connection is unavailable under SPP when working and protectionpaths are simultaneously unavailable or when the backup resources cannot beobtained after a failure: USPP= �1−Aw��1−Ab�+P, which can be transformed intoU =U+P for clarity. Hence, A =1− �U+P�= �1−U�−P, leading to its final form:
SPP SPP


ASPP = A − P. �6�

3.C. Network Availability ModelThe network availability model considered in this paper consists of the following com-ponents [13]:

• Optical cross connects (OXCs): one at each end of a link.• Transponders: one set at each end of a link.• A pair of multiplexers–demultiplexers: at each end of a link.• Fiber optic: its length depends on the distance between the two nodes it connects.• Amplifiers: assumed to be deployed every 100 km. There can be zero or more

depending on link length.

Figure 1 shows how these components are arranged to form a link. For simplicity,this model only includes components of the physical layer, although there is no restric-tion to adding others from higher-level layers.

Assumptions about nodes are that they consist only of an OXC and that this is anindivisible element whose availability is computed by directly applying Eq. (1).Another assumption is that links are capable of transporting all the offered trafficthrough a single optical channel; thus a link needs only one pair of transponders, asingle fiber optic cable, and one set of amplifiers. Two values are needed to computethe MTBF of the optical fiber cable: the average number of cable cuts per year for areference cable length (usually 1 km) and the actual length of the link or span [13].The relationship is given as

MTBFcable =CCkm � 365 � 24

cablelength, �7�

where CCkm is the average cable length (in kilometers) that has one cut per year. Forexample, CCkm=300 means that for every 300 km, an average of one cable cut peryear is expected. The total cable length is cablelength measured in kilometers. The timeto repair a cable cut is independent of the cable length; thus the MTTR does notrequire any special treatment.

Table 1 lists the MTTR and MTBF associated with each component. These valuesare from [13], in which three sets of reliability data are provided for each component:optimistic, nominal, and conservative. The conservative set gives a pessimistic estima-tion of reliability. The nominal set is an average of values taken from different infor-mation sources consulted, and the optimistic set represents a best-case scenario. Itcan be seen that there is a large difference between the conservative and optimisticvalues. In some cases (e.g., multiplexer–demultiplexer) the optimistic MTBF value is10 times better than its conservative counterpart.

This triplet representation (optimistic, nominal, and conservative) has the advan-tage that one set of reliability values can be chosen in a specific network scenario.However, the need for such a representation (instead of a single value), as well as thelarge differences observed between the values, highlight the difficulty in agreeing ontheir true value, which in turn stresses the necessity to consider the underlyinguncertainty.

3.D. Accounting for UncertaintyThe uncertainty associated with value x can be expressed as a delta, so that the valuex and its uncertainty is x±u. Whenever possible, u is the standard deviation of x. The

Amplifier(s)

Fiberoptic

OXC

Multiplexer/Demultiplexer

Transponderset

OXC

Fig. 1. A link in the network availability model.


law of propagation of uncertainty [16] states that the uncertainty �f of a function fthat depends on variables x1 ,x2 , . . . ,xn with known uncertainties is

�f =��i=1

n �f

�xi�xi2

. �8�

That is, the partial derivative of f with respect to each one of its variables is needed.Applied to the fundamental expression of availability [Eq. (1)] A= �x−y� /x, wherex=MTBF and y=MTTR, the uncertainty �A in the computed availability of a singlenetwork component is given as

�A =� y

x2�x2

+ − 1

x�y2

. �9�

Hence, computing path availability with propagated uncertainty means applyingthis approach to the expression given by Eq. (2), where Apath takes the place of func-tion f. Conceptually, the same process can be repeated until the main expression ofconnection availability corresponding to the desired protection scheme is reached. Onepractical difficulty faced when applying this approach is that the deltas that corre-spond to the MTBF and MTTR values in Table 1 are not available. Two possibilitieswere conceived and implemented in our simulation software, based on the followingalternate assumptions: (a) that the uncertainty of MTBF and MTTR is within a givenpercentage, for example, 20% �x±10% �, and (b) that the uncertainty is the differencebetween the nominal and conservative values. The result reported in this paper corre-sponds to assumption (b) as it produces values with higher variability and so presentsa more challenging scenario.

4. Topology-Based Availability Analysis4.A. Comparative Study of TopologiesA total of six topologies were selected for this study (see Table 2). Four of them arefrom the Survivable Network Design Library (SNDlib), a repository of topologies,models, and solutions for the design of survivable fixed telecommunication networks[17]. KL-A and NSFNET-A are variations of two well-known topologies that have beenadapted for this study.

These topologies were selected in order to obtain heterogeneity in average nodedegree, number of nodes and links, network diameter, geographical coverage, anddiversity in link lengths. The following observations can be made about the propertiesof these networks (see Table 3):

Table 1. MTBF and MTTR Values of Network Components

Component TypeMTBF

(kh)MTTR

(h)

Optical–amplifier Optimistic 500 2Nominal 250 6

Conservative 100 9Multiplexer–demultiplexer Optimistic 1000 2

Nominal 167 6Conservative 100 9

Transponder Optimistic 500 2Nominal 400 6

Conservative 294 9Optical fiber (buried; failure causedby cable cuts. Values are for 1 km.)

Optimistic 5500 9

Nominal 2630 12Conservative 2410 24

Optical cross connect (OXC) Nominal 100 6


• GERMANY50 is the smallest network in terms of geographic dispersion; its linksare short and their lengths do not differ much. It also has the smallest average mini-mum distance between node pairs (1.15 hops; 343 km).

• The node degree of these topologies varies in general between 3.00 and 3.73.DFN-GWIN, however, has a node degree of 8.5.

• If the number of hops along the shortest paths (in kilometers) is used to measurethe network diameter, there are two groups of topologies: those with 6 or less(DFN-GWIN, KL-A, and NSFNET-A) and those with 11 or more (GERMANY50,COST266, JANOS-US-CA). However, if the diameter is measured only in kilometers,the topologies are quite dissimilar.

Table 2. Topologies Selected for This Study

Topology Description

DFN-GWIN Corresponds to Germany’s National Research andEducation Network, operated by DFN Verein. Thedemands are based on traffic measurements.

GERMANY50 A reference network originating from theEuropean project NOBEL. Its traffic matrixincludes demand for 30% of the nodes. Althoughboth DFN-GWIN and GERMANY50 are networksdesigned for Germany, they have differentproperties.

COST266 An abstract reference network defined in thecontext of the European project COST266 thatinterconnects 37 major cities across Europe. Itwas designed to carry international traffic only.Demand between cities is a function ofpopulation, number of hosts, distance, and othervariables. Details can be found in [10].

JANOS-US-CA An abstract reference network that interconnectsmajor cities in the U.S. and Canada. It has bothshort and very long links. Each pair of nodes hasa demand in the traffic matrix.

KL-A A variation of the well-known KL topology,adapted specifically for this study. Every node ofthe original topology was mapped to a Europeancity included in COST266. The traffic matrix is asubset of COST266 and covers all the city pairs.

NSFNET-A An adaptation of the well-known NSFNETtopology of 14 nodes and 21 links. A subset of thetraffic matrix of JANOS-US-CA was used.

Table 3. Properties of the Six Topologies Studied

y DFN-GWIN GERMANY50 COST266 KL-A JANOS-US-CA NS

ber of nodes 17 50 37 15 39ber of links 47 88 57 28 61ge node degree 8.5 3.52 3.08 3.73 3.13ork diameter (ind kilometers)

2 hops 13 hops 11 hops 5 hops 11 hops 6

698 km 950 km 4052 km 2814 km 5035 km 45ge minimum(in hops and

ers)

1.15 hops 4.05 hops 3.74 hops 2.13 hops 4.21 hops 2.1

343 km 383 km 1479 km 1257 km 2198 km 21link length (km) 80 27 147 147 134link length (km) 603 254 1585 1064 1205

age link length ±d deviation

317±123 102±45 440±247 634±256 525±269 10

Propert FNET-A

A. Num 14B. Num 21C. Avera 3.00D. Netwhops an

hops

58 kmE. Averadistancekilomet

4 hops

55 kmF. Min. 383G. Max. 2748H. Averstandar

83±558


• The link length distributions are such that links in NSFNET-A are generally long(85% is at least 500 km). They are also long in JANOS-US-CA ��50% � and KL-A��70% �. COST266 has much shorter links (�70% with less than 500 km).

In these topologies, the average minimum distance between node pairs, measuredin hops and given in row (E), shows a correspondence with the network diameter inhops: Similar minimum distances correspond to similar diameters, although the cor-respondence is only approximately proportional.

4.B. Figures of MeritThe following figures of merit are used for evaluating the behavior of connections onthe selected topologies:

• Average availability. The average connection availability of all accepted requests.Connection availability is computed according to the protection scheme applied,namely, unprotected, DPP, and SPP. This value is made more readable by showing itsopposite (i.e., the average unavailability) as downtime in hours per year.

• Worst availability. The lowest connection availability obtained for the giventopology and protection scheme.

• Average uncertainty. The arithmetic mean of the uncertainties embodied in theavailability of all the accepted connections: U= 1 / n�i=1

n Ui. In this expression, Ui iscomputed as described in Subsection 3.D and represents the uncertainty propagatedin the computation of the availability of the ith accepted connection.

• Average hop count for working paths. The average length of the working pathsassigned to accepted connections. The same measure is computed for backup paths.

• Restoration overbuild. Measures how much extra capacity is devoted to protec-tion, compared to the capacity assigned to working paths alone. It is an average takenover all accepted connections n:

C =1

n�i=1

n Cwi + Cpi

Cwi, �10�

where Cwi and Cpi identify the capacity assigned for working and protection paths forconnection i, measured at the moment its connection setup request is received. Cw iscomputed as numhops�demand, but the evaluation of Cp depends on the protectionscheme applied. For DPP, it is also numhops�demand, but for SPP Eq. (11) is used inorder to take into account the sharing that occurs among the backup paths [18]:

CpiSPP = �

m=1

k CTmd��j=1

p

Cm,j , �11�

where k is the number of links in a given protection path of connection i; p is the num-ber of backup paths that exist when the new connection is being set up; CTm is thecapacity effectively allocated for sharing purposes in link m based on the demands ofall the connections whose protection path traverses link m (see Subsection 3.B); d isthe demand of the arriving connection; and Cm,j is the capacity required by all thebackup paths that are sharing link m, irrespective of the capacity effectively allo-cated. Then, by this equation, the effective shared capacity attributed to a given con-nection is in proportion to its participation in the total capacity requested for protec-tion. The computational cost of this expression can be reduced by keeping �j=1

p Cm,jupdated for every link m, instead of evaluating the summation for every connectionrequest.

5. Simulation Results and DiscussionConnection availability was evaluated with an event-based simulator. The events con-sidered were the arrival of connection requests and their corresponding release andarrivals following a Poisson process, with exponentially distributed connection holdingtimes. The capacity demanded was chosen randomly following a uniform distribution.The results reported correspond to an average of ten runs, and every run processed


80,000 connections. Link capacity was assumed to be uniform in the network; itsvalue was chosen so that connection blocking would be �30% and network utilization�70% in every case.

Routing was carried out along the shortest path, using distance (in kilometers) as ametric. Candidate shortest paths that did not have enough capacity for the arrivingdemand were discarded, i.e., available capacity in every link was updated dynamicallyat connection arrival time as well as at release time. For protected connections (DPPand SPP), node-disjoint working and protection paths were chosen. Blocking resultedwhen no path could be found to satisfy a demand with a given protection scheme.Dijkstra’s algorithm was used to find the shortest paths, and a two step approach wasapplied to select protection paths.

Several routing algorithms exist to choose from in the literature. Some of them aregeneral with respect to protection schemes, while others are tailored to one of them.Considering that our goal is to study the effects that topology properties of a givennetwork have on connection availability when different path protection schemes areapplied, a general albeit simplistic routing algorithm was chosen, whose main ele-ments were described in the preceding paragraph. It is simplistic in the sense that itmakes no effort to achieve any optimization, such as, for example, minimizing thetotal path length of connections, or blocking probability, or maximizing availability,which is not the objective of this study. However, it is understood that the choice ofrouting algorithm influences some performance metrics, both positively and nega-tively, as discussed in Subsection 5.C.

5.A. Case 1: Unprotected ConnectionsAs can be seen in Table 4, the average availability of unprotected connections is lowerthan two 9’s in four cases out of six. These connection availabilities represent an aver-age downtime of 30 h/yr in the best cases (DFN-GWIN and GERMANY50) and up to6 days in the worst case (148 h/yr, NSFNET-A). The individual availabilities are quitedisperse with respect to their average (see row B); the lowest connection availability isgiven in row D (worst availability). Compared to the average availability, the down-times corresponding to these worst values are from five to six times higher.

The worst average connection availability is obtained with NSFNET-A, a topologythat has much longer link lengths than the others. This is consistent with the factthat fiber links, because of the frequency of cable cuts and the relative length of repairtimes, are dominant with respect to unavailability. It was observed during the simu-lations that �97% of the unavailability computed for a given unprotected connectioncan be explained by the intrinsic reliability values of the fiber optic; the contributionof optical amplifiers was minimal. This is similar to what was reported in the sensi-tivity analysis performed in [7], which highlights the importance of MTTR and failurerates of links in path protection. Figure 2 shows how the mean downtime of connec-tions in two topologies (GERMANY50 and COST266) increases when the link lengths

Table 4. Unprotected Connections

DFN-GWIN GERMANY50 COST266 KL-A JANOS-US-CA NS

ge connectionlity

0.996097 0.996797 0.987971 0.988759 0.987545 0.

ability standardn (± h/yr)

0.002236�±10 h�

0.002669�±12 h�

0.008814�±39 h�

0.007177�±31 h�

0.011242�±49 h�

0.�

time for givenconnectionlity (h/yr)

34 28 105 98 109

t availability 0.979198 0.978522 0.929671 0.940580 0.924385 0.time for worst

lity (h/yr)182 188 616 521 662

ge uncertainty 3.78 1.83 7.62 9.23 8.09

ase in downtimebyinty (%)

10 6 6 9 7

age hop count 1.76 3.93 4.17 2.57 3.62

Aspect FNET-A

A. Averaavailabi

983112

B. Availdeviatio

011861±52 h�

C. Downaverageavailabi

148

D. Wors 913965E. Downavailabi

754

F. Avera��10−4�

14.2

G. Increcauseduncerta

8

H. Aver 2.44


are multiplied by a given factor. The figure shows that the average downtimeincreases linearly when connections are unprotected, with different slopes for differ-ent topologies. When protection is applied, however, the degradation rate is lower.

Row F in Table 4 gives an average of the uncertainty embodied in the computedavailability of accepted connections. Taking a conservative (i.e., pessimistic) point ofview, uncertainty can be assimilated to unavailability. That way, the computed con-nection availability can be adjusted downwards with its corresponding uncertainty.The resulting extended downtime can then be compared with the base downtime tomeasure the contribution of the uncertainty-related downtime, as shown in row G. Iftaken into account, uncertainty would cause an increase of 7.6% in connection down-time in the scenario of unprotected connections, averaged over the six studied topolo-gies.

5.B. Case 2: Dedicated Path ProtectionThe average availability is improved substantially with DPP compared to the unpro-tected case (see Table 5); the geographically bigger topologies (COST266, KL-A,JANOS-US-CA, and NSFNET-A) have average availabilities that are 50 times better.However, none of the studied topologies reach an average downtime of 5 min or lessper year, a usual requirement of highly available services. With respect to the worstavailability, it has consistently one 9 less than the average availability, giving down-times per year of �20 h/yr in half of the topologies. This is more than 200 times worsethan the goal of 5 min/yr.

It can be seen in Table 5 row J that the topologies with lower average minimum dis-tance in hops (DFN-GWIN, KL-A, and NSFNET-A) are the ones in which connectionshave lower total hop count. This relationship holds also for the other topologies. How-ever, no discernible relationship exists between total hop count and connection avail-ability, at least not without also considering link lengths. For example, NSFNET-Ahas the second lowest total hop count, but also the worst availability by a large mar-gin.

Backup paths in the studied topologies tend to be between 50% and 70% longerthan their corresponding working paths when measured in hop counts. This gives anaverage restoration overbuild of 2.7 for DPP. A higher node degree helps in findingshorter paths in general and in particular can decrease restoration overbuild in DPPbecause it makes it easier to find disjoint paths for backup. For example, disregardingdistances in kilometers, KL-A and NSFNET-A have similar properties, except thatKL-A’s node degree is 25% higher. This can explain why KL-A’s backup paths are 60%longer than their working paths, while those of NSFNET-A’s are 90% longer; bothcases on average.

Finally, DFN-GWIN and GERMANY50 are very different with respect to averagenode degree, network diameter, and average minimum distance. Despite this, theiraverage connection availabilities under DPP are very close (downtimes of 0.15 and0.13 per year, respectively). This emphasizes that the effects of topology properties onconnection availability in such geographically small networks are negligible.

Fig. 2. Average downtime as a function of scaled link length. Dotted curves are plottedagainst the secondary y axis.


With respect to uncertainty, it produces a 10% increase in extended downtime forDPP, compared to the 7.6% of the unprotected case. It is relatively high in DFN-GWINand NSFNET-A, although for opposing reasons. In the first case, the average connec-tion availability is very good; hence its corresponding downtime is comparativelysmall. In the case of NSFNET-A, however, the average uncertainty itself is highbecause its paths are long; hence connection availability involves more operations,causing more uncertainty to be propagated. This behavior applies to the other casesas well; it is not specific to DPP.

5.C. Case 3: Shared Path ProtectionAn improvement in average availability can also be observed under SPP (see Table 6).Compared to DPP, however, the improvement is more irregular from one topology tothe other. Figure 3 plots the average downtime per year of DPP and SPP. The geo-graphically bigger topologies (COST266, KL-A, JANOS-US-CA, and NSFNET-A)share similar downtimes under DPP (all of them lower than 5 h/yr), but are quite dif-ferent under SPP. This variability can also be observed when performance is assessedin terms of the number of 9’s achieved (see Table 7). The three worst-performing

Table 5. Dedicated Path Protection (DPP)


ge connectionlity

0.999983 0.999985 0.999790 0.999838 0.999723 0.


0.15 0.13 1.84 1.42 2.43


lity (h/yr)1.36 2.3 21.02 10.21 21.56

ge uncertainty 0.022 0.011 0.174 0.180 0.226

ase in downtimeby uncertainty (%)

13 8 8 11 8

oration overbuild 2.45 2.78 2.70 2.65 2.73age hop counting paths

1.49 3.26 3.57 2.20 3.15

ge hop countup paths

2.19 5.52 5.93 3.52 5.36

hop count �J+K� 3.68 8.78 9.50 5.72 8.51

Table 6. Shared Path Protection (SPP)


ge connectionlity

0.999932 0.999884 0.998272 0.999483 0.996525 0.


0.60 1.02 15.14 4.53 30.44


lity (h/yr)12.12 22.27 313.09 107.99 416.35 1

ge uncertainty 0.109 0.103 1.79 0.66 3.46

ase in downtimeby uncertainty (%)

17 9 10 13 10

oration overbuild 1.69 1.85 2.02 2.24 2.06age hop counting paths

1.68 3.82 3.92 2.40 3.29

ge hop countup paths

2.54 6.43 6.92 4.13 7.08

hop count �J+K� 4.22 10.25 10.84 6.53 10.37

Aspect FNET-A

A. Averaavailabi

999620

B. Downaverageavailabi

3.33

C. Wors 997408D. Downavailabi

22.71

E. Avera��10−4�

0.441

F. Increcaused

12

G. Rest 2.96H. Averfor work

2.00

I. Averafor back

3.79

J. Total 5.79

Aspect FNET-A

A. Averaavailabi

998842

B. Downaverageavailabi

10.14

C. Wors 982196D. Downavailabi

55.96

E. Avera��10−4�

1.53

F. Increcaused

13

G. Rest 2.30H. Averfor work

2.30

I. Averafor back

4.13

J. Total 6.43


topologies (COST266, JANOS-US-CA, and NSFNET-A) have a large portion of theirconnections with no better availability than two 9’s (48%, 50.6%, and 38.5%, respec-tively). In the JANOS-US-CA topology, 11.2% of connections have only one 9 of avail-ability, meaning that the worst availability is not an isolated case but is part of alarger group of connections with low availability. KL-A also has a significant fre-quency of lower values (13.6%), although these correspond to availabilities with atleast two 9’s instead of just one. The remaining topologies, however, have only �1% oftheir connections in the lower-performing group.

Availability in SPP is adversely affected by sharing because of the so-called penaltyfor potential access conflict [see Eq. (5)]. Sharing has a complex relationship with con-nection availability. On the one hand, it can improve it due to the fact that shorterbackup paths can be found. On the other hand, it degrades connection availabilitybecause penalty grows with the size of the sharing group. This points out one trade-offthat has to be faced when applying protection with SPP. The routing algorithm in oursimulations does not restrict the size of the sharing group. With respect to uncer-tainty, its average reaches 12% in SPP, 2% higher than in DPP.

One aspect that differentiates SPP from DPP with respect to connection availabilityis that, once calculated, it can be treated as a constant in DPP because it remains thesame until the connection is terminated. In SPP, however, due to the influence of thepenalty for potential access conflict, connection availability can change throughout itslifetime, improving and degrading relative to the initial value as its backup path shar-ing group size varies. The study of this aspect is left to a future work.

As expected, restoration overbuild is considerably lower in SPP than in DPP, theaverage being 2.03 for the six topologies. Unlike DPP’s case, no clear correspondencecan be observed in our results between any topology property and restoration over-build. Average node degree shows a partial correlation, but fails when KL-A and NSF-Net are compared together. Due to the simple routing strategy applied, the existenceof preferred shared paths (because they are short) can significantly alter the results.Moreover, other access conflict methods of assessment need to be studied. In addition,more refined routing strategies need to be incorporated to keep the sharing groupbounded.

le 7. Frequency Distribution of the Number of Nines in the Average Availabiper Topology

DFN-GWIN GERMANY50 COST266 KL-A JANOS-US-CA NSF

of 9’s DPP SPP DPP SPP DPP SPP DPP SPP DPP SPP DPP

1.4 11.20.1 0.9 0.8 46.6 13.6 6.8 39.4 3.9

0.1 18.3 0.5 31.9 60.0 40.8 60.9 69.3 44.4 34.3 84.663.7 73.7 43.8 43.2 39.0 11.2 37.9 17.0 48.8 15.1 11.536.2 7.9 55.7 24.0 0.2 1.2 0.1

Fig. 3. Average downtime under DPP and SPP. Uncertainty-related average downtimeis plotted in black at the top of the stack.

Tab lity

NET-A

Number SPP

1 0.12 38.43 58.54 3.05


6. ConclusionIn this paper, a novel study considering the influence of topology properties on connec-tion availability in the context of optical transport networks (OTNs) has beenreviewed. Results have reported some interesting conclusions. In the case ofcontinental-size network topologies the average availability values are very differentcompared with small- or medium-size networks, where no differences can be observed,independently of the application of DPP or SPP. In the case of SPP, the worst-performing one has more than six times less availability than the best. Nonetheless,these larger topologies can only achieve a maximum of 9’s of availability, even whenDPP is applied. On the other hand, results have shown that the dominant componentin the network availability model is the fiber optic cable, due to the frequency of cablecuts and the relatively long duration of repair times. Therefore, connection availabil-ity is dependent on the length of links, or in general terms, on the link length distri-bution of its network topology.

Another interesting result concerns the average minimum distance, evaluated inhops between node pairs. For topologies with a small average minimum distance, ashort total hop count can be expected in both DPP and SPP, and consequently a suit-able availability value. However, other factors such as link length distribution andsharing rules can modify the expected values. It has also been observed in SPP thatthe sharing group size can change the expected availability values if only topology fea-tures are considered. This aspect is to be further studied in our future work. Withrespect to average node degree, it can be noted that under DPP it improves restora-tion overbuild because it helps in finding disjoint paths for backup.

Finally, the impact of uncertainty on connection availability has also been consid-ered. With the given reliability data of network components and their intrinsic uncer-tainty, it was found that connection downtime increases 10% under DPP and 12%under SPP due to uncertainty.

ACKNOWDLEGMENTSThis work was partially supported by the Spanish Science and Technology Ministry(MCYT) research project TIC2003-05567 and by the Generalitat of Catalonia’sresearch support program (SGR 00296).

References1. J. L. Marzo, E. Calle, C. Scoglio, and T. Anjah, “QoS on-line routing and MPLS multilevel

protection: a survey,” IEEE Commun. Mag. 41(10), 126–132 (2003).2. A. Urra, E. Calle, and J. L. Marzo, “Reliable services with fast protection in IP/MPLS over

optical networks,” J. Opt. Netw. 5, 870–880 (2006).3. A. Autenrieth and A. Kirstadter, “Engineering end-to-end IP resilience using resilience-

differentiated QoS,” IEEE Commun. Mag. 40(1), 50–57 (2002).4. J. Tapolcai, P. Cholda, T. Cinkler, K. Wajda, A. Jajszczyk, A. Autenrieth, S. Bodamer, D.

Colle, G. Ferraris, H. Lonsethagen, I.-E. Svinnset, and D. Verchere, “Quality of resilience(QoR): NOBEL approach to the multi-service resilience characterization,” in Proceedings ofthe Workshop on Guaranteed Optical Service Provisioning (GOSP) Co-located withBroadNets 2005 (2005), pp. 405–414.

5. P. Cholda, A. Jajszczyk, and K. Wajda, “A unified framework for the assessment of recoveryprocedures,” in Proceedings of the IEEE 2005 Workshop on High Performance Switchingand Routing (IEEE, 2005) pp. 269–273.

6. M. Jaeger, R. Huelsermann, D. A. Schupke, and R. Sedlak, “Evaluation of novel resilienceschemes in dynamic optical transport networks,” in Proceedings of the Asia-Pacific OpticalCommunication Conference (APOC) (2003), pp. 82–93.

7. M. Held, L. Wosinska, P. M. Nellen, and C. Mauz, “Consideration of connection availabilityoptimization in optical networks,” in Proceedings of the 4th International Workshop onDesign of Reliable Communication Networks (DRCN) (2003), pp. 173–180.

8. D. Mello, J. Pelegrini, R. Ribeiro, D. Schupke, and H. Waldman, “Dynamic provisioning ofshared-backup path protected connections with guaranteed availability requirements,” inProceedings of the IEEE/CreateNet International Workshop on Guaranteed Optical ServiceProvisioning (GOSP) (IEEE, 2005), pp. 1320–1327.

9. M. Tornatore, G. Maier, and A. Pattavina, “Availability design of optical transportnetworks,” IEEE J. Sel. Areas Commun. 23, 1520–1532 (2005).

10. S. De Maesschalck, D. Colle, I. Lievens, M. Pickavet, P. Demeester, C. Mauz, M. Jaeger, R.Inkret, B. Mikac, and J. Derkacz, “Pan-European optical transport networks: anavailability-based comparison,” Photonic Network Commun. 5, 203–225 (2003).

11. D. Mello, D. Schupke, M. Scheffel, and H. Waldman, “Availability maps for connections inWDM optical networks,” in Proceedings of the 5th International Workshop on Design of


Reliable Communication Networks (DRCN) (2005), 77–84.12. M. Marseguerra, E. Zio, L. Podofillini, and D. Coit, “Optimal design of reliable network

systems in presence of uncertainty,” IEEE Trans. Reliab. 54, 179–188 (2005).13. S. Verbrugge, D. Colle, P. Demeester, R. Huelsermann, and M. Jaeger, “General availability

model for multilayer transport networks,” in Proceedings of the 5th IEEE Workshop onDesign of Reliable Communication Networks (DRCN) (2005), pp. 85–92.

14. J. Zhang and B. Mukherjee, “A review of fault management in WDM mesh networks: basicconcepts and research challenges,” IEEE Network 18, 41–48 (2004).

15. R. Huelsermann, M. Jaeger, A. Koster, S. Orlowski, R. Wessaely, and A. Zymolka,“Availability and cost based evaluation of demand-wise shared protection,” in Proceedings ofthe 7th ITG-Workshop on Photonic Networks (VDE Verlag GmbH, 2006), pp. 161–168.

16. International Organization for Standardization, “Guide to the expression of uncertainty inmeasurement,” ISO/IEC Guide 98 (International Organization for Standardization and theInternational Committee on Weights and Measures, 1995).

17. S. Orlowski, M. Pióro, A. Tomaszewski, and R. Wessäly, “SNDlib 1.0—survivable networkdesign library,” in Proceedings of the 3rd International Network Optimization Conference(INOC 2007) (2007), pp. 1–6.

18. P. Cholda, “The reliability analysis of recovery procedures in GMPL-based optical IPnetworks,” Ph.D. dissertation (AGH University of Science and Technology, 2005).

topology-focused availability analysis of basic protection schemes in optical transport networks

Documents