enabling fast failure recovery in openflow networks using

6
Enabling Fast Failure Recovery in OpenFlow networks using RouteFlow Sachin Sharma Department of Computing National College of Ireland Dublin, Ireland Email: [email protected] Didier Colle Department of Information Technology Ghent University - IMEC Gent, Belgium Email: [email protected] Mario Pickavet Department of Information Technology Ghent University - IMEC Gent, Belgium Email: [email protected] Abstract—OpenFlow provides a protocol to control a network from an external server called controller. Moreover, RouteFlow presents a framework to run Internet routing protocols in OpenFlow networks by running them in virtual machines or containers. The problem is that OpenFlow networks running RouteFlow do not recover fast from a port failure (e.g., port down event). The failure recovery time is dependent on user configurable parameters and is in seconds. To overcome this problem, we implement a solution in which a port failure of a physical OpenFlow node is detected immediately in its corresponding virtual machine and an immediate action is taken by the routing protocol. Therefore, once a routing protocol running on the corresponding virtual machine detects this failure, it broadcasts the failure in the network and a new failure free path is immediately configured over the OpenFlow network. We implement the proposed solution in an OpenFlow controller and test it over single autonomous and multiple autonomous system scenarios (including OpenFlow and non-Openflow scenarios) of the Internet emulated on the virtual wall testbed of the Fed4Fire facility in Europe. The results show that an OpenFlow network can recover from a failure in a short time interval using the proposed solution. Index Terms—SDN, OpenFlow, OSPF, BGP, RouteFlow I. I NTRODUCTION Software Defined Networking (SDN) refers to an approach that aims to facilitate simplified network control by enabling programmatic efficient network configurations. It simplifies networks by removing the complex control plane (a part thereof) from the devices (e.g., switches or routers) and deploying it in external servers called controllers. OpenFlow is a de facto SDN protocol to communicate between the control and data plane of network devices [1]. Using OpenFlow, network devices have become much simpler, as they do not have to deal with complex decision making software (control plane). Today, SDN/OpenFlow has been deployed in several networks, e.g., in local-area networks [1], in access networks (such as in access points), in content delivery networks as well as in wide-area networks (such as in Google B4 [2]). In order to run traditional IP routing protocols in OpenFlow networks, RouteFlow provides a framework with which a physical OpenFlow network (see the network of OpenFlow switches OF-A, OF-B, OF-C and OF-D in Fig. 1) is replicated on the controller using virtual machines (VM) or containers 978-1-7281-8154-7/20/$31.00 ©2020 IEEE OF-A OF-D OF-C OF-B VM-D (Routing protocol) VM-C (Routing Protocol) VM-A (Routing Protocol) VM-B (Routing Protocol) Virtual Environment Controller Physical Topology OpenFlow Protocol OF-A, OF-B, OF-C and OF-D are OpenFlow switches or routers. VM-A, VM-B, VM-C, and VM-D are the corresponding virtual machines or containers. The replicated link of OF-A and OF-B Fig. 1. RouteFlow Framework (see VM-A, VM-B, VM-C and VM-D in Fig. 1) and tradi- tional routing protocols are run on the corresponding virtual machines (i.e., the routing protocol of OF-A is run on VM- A, the routing protocol of OF-B is run on VM-B and so on) [3]. Moreover, the decisions made by the traditional IP routing protocols running on VMs are converted to forwarding entries by the controller and installed into switch forwarding tables. Motivation of using RouteFlow is that traditional routing protocol’s software (such as Quagga) can be used in OpenFlow networks without any modifications [4]. Using RouteFlow, we just need to run the same traditional software (e.g., using Quagga) in virtual machines instead of running them in an OpenFlow switch or router. Moreover, RouteFlow addresses the challenges of vendor lock-in, as it relies on Linux VMs (or containers) and switches/routers implementing the standard OpenFlow protocol. Therefore, it does not rely on a specific vendor [4]. Furthermore, RouteFlow is also important in per- forming inter-domain routing where each domain is controlled by a separate controller. RouteFlow has been successfully deployed in several network scenarios [5] including multiple autonomous network scenarios implemented in the European funded FP7-CityFlow project, emulating a city of 1 million users [6]. It is currently useful in inter-operating OpenFlow networks with traditional IP networks. The problem is that OpenFlow networks running RouteFlow do not implement a fast failure recovery solution in networks. For failure recovery, OpenFlow networks depend on routing

Upload: others

Post on 26-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Fast Failure Recovery in OpenFlow networks using

Enabling Fast Failure Recovery in OpenFlownetworks using RouteFlow

Sachin SharmaDepartment of ComputingNational College of Ireland

Dublin, IrelandEmail: [email protected]

Didier ColleDepartment of Information Technology

Ghent University - IMECGent, Belgium

Email: [email protected]

Mario PickavetDepartment of Information Technology

Ghent University - IMECGent, Belgium

Email: [email protected]

Abstract—OpenFlow provides a protocol to control a networkfrom an external server called controller. Moreover, RouteFlowpresents a framework to run Internet routing protocols inOpenFlow networks by running them in virtual machines orcontainers. The problem is that OpenFlow networks runningRouteFlow do not recover fast from a port failure (e.g., portdown event). The failure recovery time is dependent on userconfigurable parameters and is in seconds. To overcome thisproblem, we implement a solution in which a port failureof a physical OpenFlow node is detected immediately in itscorresponding virtual machine and an immediate action is takenby the routing protocol. Therefore, once a routing protocolrunning on the corresponding virtual machine detects this failure,it broadcasts the failure in the network and a new failure freepath is immediately configured over the OpenFlow network. Weimplement the proposed solution in an OpenFlow controller andtest it over single autonomous and multiple autonomous systemscenarios (including OpenFlow and non-Openflow scenarios) ofthe Internet emulated on the virtual wall testbed of the Fed4Firefacility in Europe. The results show that an OpenFlow networkcan recover from a failure in a short time interval using theproposed solution.

Index Terms—SDN, OpenFlow, OSPF, BGP, RouteFlow

I. INTRODUCTION

Software Defined Networking (SDN) refers to an approachthat aims to facilitate simplified network control by enablingprogrammatic efficient network configurations. It simplifiesnetworks by removing the complex control plane (a partthereof) from the devices (e.g., switches or routers) anddeploying it in external servers called controllers. OpenFlow isa de facto SDN protocol to communicate between the controland data plane of network devices [1]. Using OpenFlow,network devices have become much simpler, as they do nothave to deal with complex decision making software (controlplane). Today, SDN/OpenFlow has been deployed in severalnetworks, e.g., in local-area networks [1], in access networks(such as in access points), in content delivery networks as wellas in wide-area networks (such as in Google B4 [2]).

In order to run traditional IP routing protocols in OpenFlownetworks, RouteFlow provides a framework with which aphysical OpenFlow network (see the network of OpenFlowswitches OF-A, OF-B, OF-C and OF-D in Fig. 1) is replicatedon the controller using virtual machines (VM) or containers

978-1-7281-8154-7/20/$31.00 ©2020 IEEE

OF-A

OF-D

OF-C

OF-B

VM-D(Routing protocol)

VM-C(Routing Protocol)

VM-A(Routing Protocol)

VM-B(Routing Protocol)

Virtual Environment

Controller

Physical Topology

OpenFlow Protocol

OF-A, OF-B, OF-C and OF-D are OpenFlow switches or routers. VM-A, VM-B, VM-C, and VM-D are the

corresponding virtual machines or containers.

The replicated link of OF-A and OF-B

Fig. 1. RouteFlow Framework

(see VM-A, VM-B, VM-C and VM-D in Fig. 1) and tradi-tional routing protocols are run on the corresponding virtualmachines (i.e., the routing protocol of OF-A is run on VM-A, the routing protocol of OF-B is run on VM-B and so on)[3]. Moreover, the decisions made by the traditional IP routingprotocols running on VMs are converted to forwarding entriesby the controller and installed into switch forwarding tables.

Motivation of using RouteFlow is that traditional routingprotocol’s software (such as Quagga) can be used in OpenFlownetworks without any modifications [4]. Using RouteFlow, wejust need to run the same traditional software (e.g., usingQuagga) in virtual machines instead of running them in anOpenFlow switch or router. Moreover, RouteFlow addressesthe challenges of vendor lock-in, as it relies on Linux VMs(or containers) and switches/routers implementing the standardOpenFlow protocol. Therefore, it does not rely on a specificvendor [4]. Furthermore, RouteFlow is also important in per-forming inter-domain routing where each domain is controlledby a separate controller. RouteFlow has been successfullydeployed in several network scenarios [5] including multipleautonomous network scenarios implemented in the Europeanfunded FP7-CityFlow project, emulating a city of 1 millionusers [6]. It is currently useful in inter-operating OpenFlownetworks with traditional IP networks.

The problem is that OpenFlow networks running RouteFlowdo not implement a fast failure recovery solution in networks.For failure recovery, OpenFlow networks depend on routing

Page 2: Enabling Fast Failure Recovery in OpenFlow networks using

protocols running on a VM to detect a failure and recover fromit. Currently, the failure recovery process of a routing protocolis delayed, as a failure in a link of a physical OpenFlowswitch is not immediately detected by the corresponding linkof the virtual machines. In our proposed solution in this paper,when an OpenFlow switch detects a link failure (e.g.,portdown failure) in one of the OpenFlow ports, it sends aport-failure message to the controller. The controller thendisables the corresponding port of a VM. Therefore, a routingprotocol running on the VM detects the failure immediatelyand broadcasts the failure in the network. A failure free pathis then installed in the network. Hence, failure recovery ispossible in a short time.

We implement the proposed solution in the controller run-ning RouteFlow and test it in a single Autonomous System(AS) and multiple AS scenarios emulated on the virtual walltestbed of the Fed4Fire facility1. A single AS scenario is emu-lated using Mininet (an emulator for SDN experimentation) [7]and multiple AS scenarios are emulated using different nodesof the Fed4Fire testbed by deploying a topology developedin the EU funded CityFlow project to emulate a city of 1million users. In the multiple AS experiments, there are a mixof OpenFlow and non-OpenFlow networks, and an OpenFlownetwork communicates with a non-OpenFlow network usingtraditional standard routing protocols. These multiple AS ex-periments emulate inter-operation scenarios of OpenFlow andnon-OpenFlow networks. A failure is emulated by disablinga port of an OpenFlow switch/router or a non-OpenFlowswitch/router in the considered topology. The failure recoverytime is compared for the cases when RouteFlow is deployedwith and without the proposed solution. The results show thatRouteFlow can recover fast when our solution is implementedin it.

The contributions of this paper are following: (1) a fastfailure recovery solution for OpenFlow required to inter-operate with non-OpenFlow networks and (2) a large scaleexperimentation of inter operations of OpenFlow and non-OpenFlow networks on a real testbed such as the Fed4Firetestbed. The rest of the paper is organized as follows: SectionII presents the related work, Section III presents the problemand our proposed solution, Section IV describes the emulationenvironment and results, and finally Section V concludes.

II. RELATED WORK

There are many works exist to perform fast failure recoveryin OpenFlow. In [10], restoration and protection techniquesare proposed for OpenFlow. It is concluded in this researchthat OpenFlow can meet carrier-grade fast failure recoveryrequirements in a large network serving many flows whenprotection is implemented in the network.

In [12], segment protection is implemented in an Open-Flow based Ethernet network. In this protection scheme, theworking and redundant paths (with different priorities) are pre-configured for a segment of the network and when a failure is

1https://www.fed4fire.eu/testbeds/

detected in the working path, an auto reject mechanism (pro-posed for protection) removes the working path and thereby,enables traffic to be forwarded through the redundant pathforwarding entries.

In [13], protection schemes are implemented for each linkin a network (instead of for each path). In these schemes, aprotection path (and a BFD session) is established for eachlink and when a failure occurs in a link, traffic is redirected tothe corresponding protection path. In addition, control trafficprotection schemes are researched in [14]. In these schemes,multiple controllers are used to recover from controller failurescenarios. In [15], a mechanism is proposed to recover froma failure when the fast-failover group type in an OpenFlowswitch is not available. For this case, another group type (suchas SELECT2) is used to implement protection.

In [16], a fast failure recovery solution was implementedfor SDN with zero packet loss even though OpenFlow’s fast-failover group feature was not used. This solution is basedon OpenState, an extension to OpenFlow that allows a switchto autonomously adapt in a stateful fashion. This reduces theneed to depend on remote controllers to take a recovery action.In [17], a multicast tree based failure recovery solution isproposed for OpenFlow, while reducing the Ternary ContentAddressable Memory (TCAM) consumption simultaneously.

None of the above works ran a fast failure recoverymechanism keeping in mind inter-operations activities withtraditional networks. Therefore, it is difficult to run abovemechanisms without any modification in inter-operation situa-tions where there are OpenFlow and non-OpenFlow nodes innetworks. In [6], quality of service is achieved in failure condi-tions in an OpenFlow enabled city network where RouteFlowwas used to run traditional routing protocols in OpenFlow.However, the focus of that work was not fast failure recovery.The focus was that high priority flows should receive highprecedence even after a failure occurs in the networks.

In this paper, we perform fast failure recovery in OpenFlownetworks when traditional Internet routing protocols are runin the network. Therefore, it is possible to run our mechanismin inter-operation situations where there are traditional non-OpenFlow network domains together with OpenFlow networkdomains.

III. FAST FAILURE RECOVERY FOR OPENFLOW RUNNINGROUTEFLOW

This section describes the problem considered and providesour proposed solution.

A. Problem Description

Fig. 2 shows a failure scenario in an OpenFlow networkrunning RouteFlow. The physical OpenFlow network topologyand the virtual environment are the same for Fig. 1 and Fig. 2.The only difference between Fig. 1 and Fig. 2 is that the linkbetween OpenFlow switches OF-A and OF-B is failed in Fig.2 and OpenFlow switches OF-A and OF-B send port-failure

2https://www.opennetworking.org/software-defined-standards/specifications/

Page 3: Enabling Fast Failure Recovery in OpenFlow networks using

OF-A

OF-D

OF-C

OF-B

VM-D(Routing protocol)

VM-C(Routing Protocol)

VM-A(Routing Protocol)

VM-B(Routing Protocol)

Virtual Environment

Controller

Physical Topology

OpenFlow Protocol

The replicated link of OF-A and OF-B

Link Failure

PORT_STATUS PORT_STATUS

Fig. 2. Failure Scenario in RouteFlow

messages (PORT STATUS) to the controller. In this case weconsider a port failure in which the port becomes down.

The problem is that RouteFlow does not take any actionwhen the controller receives the PORT STATUS message.Both virtual machines VM-A and VM-B continue to considerthe link between them alive until the routing protocols runningon the virtual machines (VM-A and VM-B) detect the failureonce they stop receiving alive messages from each other. In theOSPF (Open Shortest Path First) routing protocol, this failuredetection interval is the router dead interval which is mostly4 times the hello send interval [8]. Generally, the hello sendand router dead intervals are in seconds. Hence, an OpenFlownetwork takes a very long time to detect the failure and thento recover from it.

B. Our Proposed Fast Failure Recovery Solution

OF-A

OF-D

OF-C

OF-B

VM-D(Routing protocol)

VM-C(Routing Protocol)

VM-A(Routing Protocol)

VM-B(Routing Protocol)

Controller

Physical Topology

OpenFlow Protocol

Link Failure

PORT_STATUS

PORT_FAILURE

PORT_STATUS

Fast Failure Recovery Action: Get the VM ID and disable the

faulty port in the VM

Fig. 3. Our Failure Recovery Solution

Fig. 3 shows our solution in the same topology as shownin Fig. 2. In our solution, when a link between OpenFlowswitches (the link between OF-A and OF-B in Fig. 3) fails,both OpenFlow switches (OF-A and OF-B) send port-failuremessages (PORT STATUS) to the controller. When the con-troller receives any of the port-failure messages, it gets thecorresponding VM ID (Identification number) from RouteFlowand passes this information to the VM by disabling theparticular port of the VM. The routing protocol such as OSPFor BGP (Border Gateway Protocol) running on the VM thenimmediately detects the failure. The failure is then broadcasted

over the network according to the routing protocol’s standardand a failure free path is established immediately in thevirtual and physical network. Like RouteFlow, our solutionprevents vendor lock-in, as we do not change anything in thefunctionality of the routing protocol. The routing protocols justtake the actions defined in their standard on disabling a portof a VM.

In the proposed mechanism, the failure detection time ofInternet routing protocols is reduced in the virtual environmentcreated by RouteFlow and thereby, it reduces the failurerecovery time.

IV. EXPERIMENTATION AND EMULATION RESULTS

Experimentation presented in this paper was performed onthe virtual wall testbed of the Fed4Fire facility3. Fed4Fire is aproject under the European union’s programme Horizon 2020and offers the large federation worldwide of next generationInternet testbeds, which provide open, accessible and reliablefacilities to do experiments.

We performed single AS experiments and multiple AS ex-periments on the Fed4Fire testbed. The single AS experimentswere done on a node of the virtual wall testbed facility atIMEC, Gent, Belgium, provided by the Fed4Fire testbeds3,while the multi AS experiments were done on multiple nodesof the testbed where a separate node was used to emulatean OpenFlow router, non-OpenFlow router or controllers.The following paragraphs explain the single and multiple ASemulation scenarios and results:

A. Single AS Emulation Scenario

A pcgen04 node of the virtual wall testbed3 was chosenfor the single AS emulation. This node has 8 CPU coresof Intel E5-2650v2 processor with 2.6 GHz speed, 48 GBRAM (Random-Access Memory) and one 250 GB hard disk.A single physical CPU core with enabled hyper-threading inthis node appears as four logical CPUs to an operating system.Therefore, this node has 32 logical CPUs.

Amsterdam

London

Brussels

Paris

Zurich Milan

Berlin

Vienna

Munich

Rome

Hamburg

Lyon

Frankfurt

Strasbourg

Zagreb

Controller

Fig. 4. Emulated Single AS Pan-European Core Topology

3https://www.fed4fire.eu/

Page 4: Enabling Fast Failure Recovery in OpenFlow networks using

We emulated a pan-European core topology (see Fig. 4)using Mininet [7] on the above node of the virtual wall testbed.There are 15 city nodes (see London, Paris etc. in Fig. 4) inthe topology. A separate CPU was assigned to each city nodeincluding the controller and its virtual machines. The controllerhas an out-of-band connection with each OpenFlow city nodein the network (not shown in Fig. 4). Open vSwitch version2.3.0 is used for emulations. We used the method given in [9]to automatically configure RouteFlow in the above topology4.

We run the OSPF routing protocol in the considered Open-Flow pan-European network topology using RouteFlow. TheOSPF hello interval is kept as 1 second and the router deadinterval is kept as 4 seconds. It means that OSPF running ona VM sends hello messages to its neighboring VMs after theinterval of 1 second. If the OSPF routing protocol running ona VM does not receive a hello message from a neighboringVM until 4 seconds, the link between the VMs is consideredto be broken. In the current RouteFlow, the OSPF depends onthe router dead interval to declare the failure. However, usingour solution, RouteFlow also depends on a PORT STATUSmessage to declare a failure.

0

100

200

300

400

500

600

700

800

-100 -75 -50 -25 0 25 50 75 100

Num

ber

of p

acke

ts

Experiment Time in seconds

ARP, OSPF and Flow-mod messages

PORT DOWN Event

OSPF and Flow-Mod reconfiguration messages

ECHO, OSPF and ARP messages

Fig. 5. Controller Traffic Capture - Experimental Scenario

We describe the failure scenario of our emulation by dis-abling the link between emulated Milan and Rome nodes (seeFig. 4. Fig. 5 shows the considered emulation scenario usingthe controller traffic (the traffic coming to or going from thecontroller node). The traffic is captured using the tcpdumputility available in Linux. At the start of the experiment(from -99 seconds to -70 seconds), there are large spikesin Fig. 5. These spikes are due to messages transmitted inthe warm-up period to run the OSPF protocol and configureforwarding entries in the network. These messages are theAddress Resolution Protocol (ARP), OSPF and Flow-Modmessages. ARP messages are transmitted to know the MACaddresses, OSPF messages are transmitted to know a pathto each node in the network and Flow-Mod messages aretransmitted to insert forwarding entries in the network. Thereare also periodic spikes in the controller traffic. These spikes

4The source code available athttps://github.com/routeflow/AutomaticConfigurationRouteFlow

are due to ECHO, OSPF and ARP messages. These messagesare transmitted to check the aliveness of the network.

At second 0, we fail the link between emulated Milan andRome nodes and we see a large spike in the controller trafficafter a few seconds (approximately after 4 seconds). This largespike is due to the traffic sent to recover from the failureand broadcast the failure information in the network. Fig. 5just shows the control traffic when RouteFlow is used withoutour proposed solution. However, the traffic in the controllerwith our solution is similar to as shown in Fig. 5. The onlydifference is that using our solution, a large spike after second0 appears quickly compared to as shown in Fig. 5, as oursolution depends to the PORT STATUS message to detect thefailure. We compare this in the next section.

Each OpenFlow node in our experiment sends data trafficto all other OpenFlow nodes in the network. The data packettransmit interval to each OpenFlow node is 6 ms and packetsize is 1000 bytes. We used an open-source traffic generatorknown as DITG (Distributed Internet Traffic Generator) [11]to transmit data packets in the network.

B. Single AS Results

0

5

10

15

20

25

30

35

40

-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3

Tra

ffic

to R

ome

(No.

of

Pac

kets

per

10

mill

isec

ond)

Experiment Time (in second)

RouteFlow (without our solution)

RouteFlow (with our solution)

4.10 4.15 4.20 4.25 4.30

Failure

0.040 s

4.160 s

Fig. 6. Traffic on the restoration link (Rome-Zagreb)

Fig. 6 shows traffic destined to Rome on link Rome-Zagreb(see Fig. 4). After link Milan-Rome is taken down, this is theonly link connecting Rome. Therefore, after failure detection,all traffic to Rome must follow link Rome-Zagreb. In Fig. 6,we see that traffic to Rome over link Rome-Zagreb increasesfrom 10 packets per 10 ms (before a failure) to 25 packets per10 ms (after the failure) and then becomes almost constantafterwards. When our solution is not applied in RouteFlow,increase in traffic after the failure takes around 4.160 seconds.However, when our solution is applied, the increase in trafficafter the failure takes around 0.040 seconds. This is the failurerecovery time, as all traffic to Rome should go through linkRome-Zagreb after the link between Rome-Milan is broken.The results show that RouteFlow using our solution canrecover faster compared to RouteFlow when our solution isnot applied.

Fig. 7 shows the average failure recovery time in our ex-periments when different links of the considered topology arebroken. The failure recovery time with respect to the number

Page 5: Enabling Fast Failure Recovery in OpenFlow networks using

0.041 0.05 0.057 0.061 0.073 0.074 0.08 0.088 0.098 0.103

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

30 40 45 60 80 84 96 108 122 132

Fai

ure

Rec

over

y tim

e (i

n se

cond

s)

Number of Affected Flows

RouteFlow without our solution

RouteFlow with our solution

Traditional Networks (without OpenFlow)

4.14 4.15 4.157 4.161 4.173 4.174 4.181 4.188 4.196 4.2034.4

4.0

4.2

4.6

4.8

Fig. 7. Average Failure Recovery Time in Seconds

of affected flows are shown in Fig. 7. As the router deadinterval in OSPF is kept as 4 seconds, OSPF in RouteFlowdetects the failure after 4 seconds. Once the failure is detected,the failure recovery action is taken. Therefore, the failurerecovery time in RouteFlow (when our proposed solution isnot applied) is more than 4 seconds. In our solution once thefailure is detected by the controller, the controller disables thecorresponding port immediately and the failure recovery actionis taken immediately. Therefore, the failure recovery time isshort using our solution.

As shown in Fig. 7, the failure recovery time in RouteFlow(without using our solution) is dependent on the router deadinterval when OSPF is used as a routing protocol. We have alsodone experiments with a large value of router dead interval(i.e., 10 seconds). In these experiments, the failure recoverytime was more than 10 seconds. However, RouteFlow usingour solution does not depend on the router-dead interval todeclare the port failure and hence, it always recovers fast usingour solution. Fig. 7 also shows that the failure recovery timeincreases when the number of affected flows in the networkincreases.

Fig. 7 also shows the failure recovery time in the non-OpenFlow network emulated using the considered pan-European topology. It shows that the failure recovery timeof the traditional non-OpenFlow network is shorter than theOpenFlow network. This is because the controller is locatedexternal to the network and the virtual environment added upby RouteFlow adds some delay in recovery.

C. Multiple AS Emulation Scenario

The topology of the multiple AS experiments is shown inFig. 8. This topology is developed in part of the EU fundedCityFlow project to emulate a city of 1 million users takinginto account xDSL (Digital Subscriber Line), LTE (Long-Term Evolution) and Fibre technologies network scenarios.The figure shows the Internet infrastructure from access net-works to aggregation networks to the core network to a CDN(Content Delivery Network) server. The description about thistopology is given in [6]. Using this topology, we emulatea fast failure recovery scenario for inter-operation situations

USER

O

O

O O

CDN Server

O

O

R

USER

OO

OOO

O

O OO

OO

OO

OO

OOO

O

O OO

OO

OO

RR

RRR

R

R RR

RR

RR

AGGREGATION NETWORK (AS1)

AGGREGATION NETWORK (AS 2 )

AGGREGATION NETWORK (AS 3)

ACCESS NETWORK

CORE

CDN

USER

• AS1 and AS3 are OpenFlow networks

• AS2 is a non-OpenFlownetwork

• R is a non-OpenFlowrouter

• O is an OpenFlow router

(emulating several users)

(emulating several users)

(emulating several users)

Fig. 8. Multiple AS experiments topology

between OpenFlow and non-OpenFlow networks. AS1 andAS3 in Fig. 8 emulate OpenFlow networks and AS2 emulatea non-OpenFlow network.

OSPF is used as a routing protocol within an AS networkand BGP is used as a routing protocol to communicate betweendifferent AS networks. RouteFlow is used in the OpenFlownetworks to run the aforementioned routing protocols in theirnetworks. The failure recovery scenario is emulated by for-warding the traffic from an OpenFlow network to a non-OpenFlow network (as shown in Fig. 8 and breaking anOpenFlow or non-OpenFlow link. The rest of the scenariois the same as given for the single AS experiments in theprevious subsection.

D. Multiple AS results

1.41

0.040.085 0.0340

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2

Fai

lure

Rec

over

y T

ime

Link Down

RouteFlow without our solution

RouteFlow with our solution

4.6

4.0

4.2

4.4

OpenFlow Link Non-OpenFlow Link

4.41

Fig. 9. Average Failure Recovery Time in Seconds when OpenFlow or non-OpenFlow link is broken in Multiple AS topology

Fig. 9 shows the average failure recovery time of theexperiment performed on the multiple AS topology shownin Fig. 8. In this experiment, the failure recovery time iscalculated when an OpenFlow link or a non-OpenFlow link isbroken (see Fig. 8). In the case an OpenFlow link is broken,RouteFlow without our solution takes a very long time torecover from the failure because it depends on the routerdead interval to declare the failure. However, RouteFlow withthe proposed solution recovers fast, as the failure is detectedimmediately by the virtual environment created by RouteFlow.

Page 6: Enabling Fast Failure Recovery in OpenFlow networks using

Traditional routing protocols such as OSPF and BGP run-ning on a non-OpenFlow router detect the failure immediatelyif their interface becomes down (e.g., due to the port failure).Therefore, an immediate failure recovery action is taken by anon-OpenFlow router on a port down event. Hence, we seeshort failure recovery time in Fig. 9 when a non-OpenFlowlink is broken. In this case, the failure recovery time isapproximately the same irrespective of whether our solutionis applied in RouteFlow.

V. CONCLUSIONS

In this paper, we implemented a fast failure recoverysolution in OpenFlow networks when RouteFlow is used torun traditional routing protocols such as OSPF and BGP.In our solution, when the controller receives a port failuremessage from an OpenFlow switch, it immediately disablesthe corresponding port of a virtual machine in the virtualenvironment. This starts the recovery process earlier than thetime given by user configurable parameters of the routing pro-tocols (such as the router dead interval of OSPF) and therefore,recovery happens faster. Like RouteFlow, our solution alsoprevents vendor lock-in, as we do not change anything in thefunctionality of routing protocols.

In addition, we performed two types of extensive experi-ments on the large scale European Fed4Fire testbed facility:(1) single AS experiments and (2) multiple AS experiments.The results of the single AS experiments show that an Open-Flow network running RouteFlow together with the proposedsolution recovers faster compared to when the proposed solu-tion is not integrated with RouteFlow. In addition, the resultsshow that traditional non-OpenFlow networks recover fasterthan OpenFlow networks when the same routing protocols arerun in both networks. Furthermore, inter-operation results ofOpenFlow with non-OpenFlow networks in multiple AS exper-iments show that inter-operation works fine when RouteFlowis applied in OpenFlow networks. In addition, it shows thatinter-operated networks recover faster when our solution isapplied in RouteFlow.

ACKNOWLEDGEMENT

The work on this project has been supported by the Horizon2020 Fed4FIREplus project, Grant Agreement No. 723638.

REFERENCES

[1] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,J. Rexford, S. Shenker, J. Turner, “Openflow: Enabling innovation incampus networks,” SIGCOMM Comput. Commun. Rev. 38 (2), 69–74,2008

[2] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S.Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Holzle, S. Stuart andA. Vahdat, “B4: Experience with a Globally-Deployed Software DefinedWAN”, ACM SIGCOMM, 2013

[3] RouteFlow Git Hub link and architecture:https://routeflow.github.io/RouteFlow/

[4] S. N. Rizvia, D. Raumera, F. Wohlfarta, G. Carle, “Towards carrier gradeSDNs”, Computer Networks, Vol. 92(2), pp. 218–226, 2015

[5] Z. Latif, K. Sharif, F. Li, M. Karim, S. Biswas, Y. Wang, “A compre-hensive survey of interface protocols for software defined networks”,Journal of Network and Computer Applications, Vol. 156, 2020

[6] S. Sharma, D. Palma, J. Goncalves, D. Staessens, N. Johnson, C.Palansuriya, R. Figueiredo, L. Cordeiro, D. Morris, A. Carter, R.Baxter, D. Colle, M. Pickavet, “CityFlow, enabling quality of service inthe Internet: Opportunities, challenges, and experimentation”. In: 2017IFIP/IEEE Symposium on Integrated Network and Service Management(IM). IEEE, pp. 272-280, 2017

[7] Mininet Emulator Web Link: http://mininet.org/[8] V. Manral, R. White, A. Shaikh, “OSPF Benchmarking Terminology and

Concepts”, RFC 4062, 2005[9] S. Sharma, D. Staessens, D. Colle, M. Pickavet, and P. Demeester,

“Automatic configuration of routing control platforms in OpenFlownetworks”. ACM SIGCOMM Computer Communication Review, 43 (4).pp. 491-492, 2013

[10] S. Sharma, D. Staessens, D. Colle, M. Pickavet, and P. Demeester,“OpenFlow: Meeting carrier-grade recovery requirements”, ComputerCommunications, Vol. 36(6), pp. 656–665, 2013

[11] D-ITG, Distributed Internet Traffic Generator Open-Source Software,http://traffic.comics.unina.it/software/ITG/

[12] A. Sgambelluri, A. Giorgetti, F. Cugini, F. Paolucci, and P. Castoldi,“OpenFlow-Based Segment Protection in Ethernet Networks, Journal ofOptical Networks, Vol. 5(9), 2013

[13] N. L. M. v. Adrichem, B. J. v. Asten and F. A. Kuipers, “Fast Recoveryin Software-Defined Networks,” 2014 Third European Workshop onSoftware Defined Networks, pp. 61-66, 2014

[14] Y. Hu, W. Wendong, G. Xiangyang, C. H. Liu, X. Que and S. Cheng,”Control traffic protection in software-defined networks,” 2014 IEEEGlobal Communications Conference, pp. 1878-1883, 2014

[15] K. Nguyen, Q. Tran Minh and S. Yamada, ”Novel fast switchover onOpenFlow switch,” 2014 IEEE 11th Consumer Communications andNetworking Conference (CCNC), Las Vegas, pp. 543-544, 2014

[16] A. Capone, C. Cascone, A.Q.T. Nguyen, B. Sanso, “Detour Planningfor Fast and Reliable Failure Recovery in SDN with OpenState,” IEEEDRCN 2015, Kansas City, USA, March 24-27, 2015

[17] J. Chen, F. Yan, D. Li, S. Chen and X. Qiu, ”Recovery and Reconstruc-tion of Multicast Tree in Software-Defined Network: High Speed andLow Cost,” in IEEE Access, vol. 8, pp. 27188-27201, 2020.