hal9000 report

Upload: sachin-sreelal

Post on 08-Mar-2016

10 views

Category:

Documents


0 download

DESCRIPTION

Netfpga Report

TRANSCRIPT

EE 533 Final ProjectPolicy Based Routing

Team Name: HAL 9000TeamMembers: Sachin SreelalSonamWaghrayHerilChhedaTable ofContents

List ofFiguresiiIntroduction11.1 Abstract21.2 Motivation21.3Overview2 1.3.1 Quality Factor3 1.3.2 Operation31.4Benefits41.5Product Specifications4DesignFeatures52.1 Design Outline 62.2NetFPGA72.3Reference Router82.4Custom Switch9 2.4.1 Hardware10 2.4.2 Software12 2.4.3 Compiler Interface13 2.5 Scheduling Logic13Design Implementation143.1 Experimental Setup153.2 Working163.3 Evaluation and Results163.4 Future Work...18References19Appendix21nsFile22Instruction SetArchitecture23Important Hardware/SoftwareRegisters25

i

List ofFigures

1. Block Diagram of the NetFPGA 1G board....................................................................................7

2. User Data Path of the Reference Router on the NetFPGA 1G ......83. User Data Path of the HAL 9000 Custom Router...94. Screen shot for RSA Key Exchange.175. Screen shot for Data Transmission...18

ii

Chapter1

Introduction

1.1Abstract

In a conventional router or switch, all the incoming packets are treated alike but in real time network scenarios, we have different kinds of packets some of which cannot tolerate delay as much as others, these packets need to be routed with priority. Packets coming into a router are generally routed out on FIFO basis. A priority based router checks for the priority of each arriving packet. The regular packets get lined up in the queue to go out in the order of their arrival but if a priority is detected on a packet, it cuts into the existing queue and is rerouted out first to a high priority node before the queue resumes its operation. This is the basic idea of our priority based routing

1.2Motivation

Not all packets entering a network are of the same type. Hence, preferential treatment should be given to important packets when compared to the normal ones. This simple idea is of utmost importance in real time network scenarios.

In this project, we have designed a custom router which routes the packets depending on their priority. This priority is determined on a Cumulative Quality Factor which is calculated using User-defined content.

1.3 Overview

The hardware of the custom router is a dual core processor build on a NetFPGA reference router which is used to process the content of the packets entering the network and determine the priority for custom routing. The software is written to manage the flow of the packets in and out of the router.

The processor has an inter-convertable FIFO. Additionally, two more queues are implemented. A High Priority Queue(HPQ) is used to buffer all the incoming high priority packets and a Low Priority Queue(LPQ) buffers the other packets. The scheduling of the packets OUT of these queues is dependent on a Quality factor.

1.3.1 Quality Factor

The content of each incoming packet is checked with the priority content that has already been predetermined. In the event of a match, the Quality factor(QF1) is assigned to the packet. There can be multiple content which deserve priority and the QF1 value assigned is directly proportional to the priority of the packets i.epackets of higher importance are assigned with a higher QF1 value and vice versa. For non-priority packets, the value of QF1 is set to zero.

The MAC address of each node in the network is initially assigned a nominal Quality Factor value (QF2). Therefore, when the network is setup, all nodes have an equal QF2 then depending on the traffic which each node handles; its QF2 is either increased or decreased. If there is more traffic on a particular node, then the incoming packet whose destination MAC id is that of this high traffic node will be assigned a low value as its QF2 and in the case when the MAC id is that of a node with lesser traffic, a high value of QF2 is assigned.

1.3.2 Operation

The core of the router has two queues before the main output queue. For all the incoming packets in the router, a cumulative quality factor is calculated. The cumulative Quality Factor is dependent on the user defined contentanddepending on this QF, it is determined that the incoming packet will enter which of the two implemented queues.

A Threshold value for the QF is fixed. In the event when the QF of the packet exceeds the threshold, it is marked as a priority packet and enters the High Priority Queue and is buffered there. If the QF of the packet is lesser than the defined threshold, then the packet is marked to be low priority packet and is buffered in the low priority queue.

An output arbiter is used to schedule the high priority packets before the low ones. It makes sure that the packets from the low priority queue are not routed out until the high priority queue is empty.

27

1.4Benefits

The benefits of this implementation are:1. Lesser Resource Allocation The priority is calculated as the packet arrives in real time using Deep packet inspection of the content and hence this saves the resource space of an additional priority field.2. Secure Routing with RSA Encryption RSA Encryption and Decryption is implemented in software which ensures secure routing and prevents against intrusions.3. Use of Hardware AcceleratorsThe priority determination, scheduling and routing is more fast and efficient as the implementation is done on Hardware4. Better ThroughputThe router routes all the priority packets on a high cost line which increases the throughput of the low cost line and in turn increases the average throughput of the network. 5. Importance of applicationIn this design, the switch makes sure that important packets are given priority in terms of scheduling and bandwidth and this application is of importance in real time networks.

1.5Product Specifications

TheRouterbehavesasatypicallayer2switchwithaddedfeatures

1. It processes packets at line rate of1GBps2. The module is clocked at 125MHz3. Deep packet inspection (DPI) is performed on thepackets4. Scheduling of the Queues is done depending on the priority

Chapter2

DesignFeatures

2.1Design Outline

We usedaNetFPGA board as hardware and developed a multithreaded processor to support our custom ISA. The reference router configuration that is already available is used as the base upon which modifications are done to implement the desired application.

ThisNetFPGA reference router module has been extended to make thecustom router design. In the user data path of the Reference router, our module is implemented after the input arbiter logic and before the output queues. The module is implemented in hardware in Verilog HDL language.

For the priority based routing, a priority decider module inspects the header and decides if the packet deserves a priority. Then once priority is established, the header is modified for routing out on the high cost line. Next, two queues are created in hardware. One is used to buffer the low priority type of packets and the other is used to buffer the high priority type of packets. The flow of the packets into these different queues respectively is controlled by the input arbiter and the flow of packets out of these queues is controlled by the output arbiter which again is implemented in hardware.

To interface with the processor inside the router, perl, shell and python scripts have been written to perform various software functions and access the values stored in the hardware registers. (Refer Appendix C).

Security for the system is implemented using RSA encryption implemented in software. At first, the RSA symmetric key is encrypted at the node with its own private key and then sent to the control node of the NetFPGA which decrypts this with the public key and stores the result (original symmetric key) in a register. Now both the node and the router have the same symmetric key. When the node sends data, it encrypts the data using simple xor operation with the symmetric key and at the router, the encrypted data received is decrypted back using xor operation and the processing is done. Once done, the router encrypts the data in the same way and it is decrypted at the receiving node.

Each component of the design is explained in detail in the following section.

2.2 NetFPGATheNetFPGAisalow-costopenplatform for research and experimentation. It hasprimarilybeendesignedasatoolforteachingnetworkinghardware and router design. It has also been proved to be a useful tool fornetworkingresearchers.The following table gives the specifications of the NetFPGA 1G Board

BoardNetFPGA1G

NetworkInterface4 x 1Gbps Ethernetports

HostInterfacePCI

FPGAVirtexII-Pro50

LogicCells53,136

BlockRAMs4176kbits

External Memories(SRAM)4.5MB ZBT SRAM (72bitwidth,36x2)

External Memories(DRAM)64MB DDR2 SDRAM (36bitwidth,36x1)

The NetFPGA includes logic resources, memory, and Gigabit Ethernet interfaces necessary to build a complete switch, router, and/or security device. Because the entire datapath is implemented in hardware, the system can support back-to-back packets at full Gigabit line rates and has a processing latency measured in only a few clock cycles.

Figure 2.1: Block Diagram of the NetFPGABoard[2]2.3 Reference Router

We have used the NetFPGA reference router module and extended it tothecustomswitchdesign. The user data path of the reference router design consists of the input arbiter, the output port look up and the output queuesThe first stage in the pipeline consists of several queues which we call the Rx queues. These queues receive packets from IO physical ports (i.e., 10GMAC) and provide a unified interface (AXI) to the rest of the system.In the main datapath, the first module a packet passes through is the Input Arbiter. The input arbiter decides which Rx queue to service next, and pulls the packet from that Rx queue and hands it to the next module in the pipeline: The output port lookup module. The output port lookup module is responsible for deciding which port a packet goes out of. Here it is actually implemented the router forwarding logic. The 32-entries TCAM where the LPM table is stored is consulted to find the next-hop-ip value. Such a value is then used in the 32-entried CAM (ARP table) to find the right destination MAC address. After that decision is made, the packet is then handed to the output queues module which stores the packet in the output queues corresponding to the output port until the Tx queue is ready to accept the packet for transmission.

Figure 2: User Data Path of the NetFPGA Reference Router[2]2.4 Custom SwitchThe custom switch has the reference router at its base. The user data path of the reference router is inserted with a module which implements the main logical unit modules in hardware as shown in Figure 3. The software component of the switch is limited to key exchange and RSA encryption and decryption. Software is also used to create a wrapper interface around the hardware logic generated. This wrapper interfaces with the hardware registers of the design to catch the internal results of the hardware module. Additionally, a compiler interface also runs parallel to this software wrapper. All the three components of the design have been elaborated below for better understanding.

Figure 3: User Data Path of the HAL 9000 Custom Router

2.4.1 Hardware

idsThis is the top levelmodule for our design and implementsthecustomswitchdesign. It has the following sub-modules:fallthrough_small_fifo: This module buffers all oftheincomingpacketsbeforetheyareprocessed.Thefifo has 16 locations each of whichis72bitsinsize.

cpuThis is the central processing unit of the networkprocessor.It is adualthreaded processor, with each thread being implemented as a 5 stage pipeline with forwarding unit to solve data dependency issues. It implements the late branchdesign and supports a custom ISA which is a subset of the MIPS ISA(refer Appendix B). Some of the specifications for the data path unit are defined thus:1. It has an inter-convertible fifo which acts as the Data Memory2. Datapath width is64bits3. Instruction width is 32bits4. Instruction memory is 1024 locations deep5. Data Memory is 1024 locations deep6. Register file is 64 bits * 32locations

priority_decider

Inside the Dpu, we have a hardware accelerated Priority Decider which is a deep packet inspection unit that inspects the content of the packets and matches them to some pre-defined patterns which are known to deserve priority. This unit has multiple such flag patterns and compares the content of each incoming packet with all of them to assign the Quality Factor and in turn decide the priority.

Once the priority is decided it should be buffered into the High priority queue. The priority decider module not only decides the priority but also works on the packet once it is assessed and determined to have a high priority. It performs header modification such that the high priority module is routed out on the high cost line after being buffered in the high priority queue.

input_arbiter

This module is once again an extension of the dpu. It receives the packets from the output port of the priority decider; hence the incoming packets to this module have a priority that has been determined. The input arbiter then routes the packet either into the high priority queue or into the low priority queue depending on the priority Quality Factor they hold.

high_priority_queue

It is a FIFO module which buffers the high priority packets.

low_priority_queue

It is a FIFO module which buffers the high priority packets.

output_arbiter

The last module of the design is the output arbiter; this arbiter implements the scheduling logic. It makes sure that the high priority packets are routed out first and only when the high priority queue is empty, then the low priority packets are routed out into the output queues of the user data path.

2.4.2 Software

The following scripts are implemented to create a software platform to interface with the hardware of the network.

Initialize.sh:This script is run on the control node at the very beginning of the test. In this script, several commands are given to the control node to perform various functions such as: Assign MAC ID and IPs to each node on the network and the control node nf download the bitfile generated to perform the priority routing Run the router kit daemon (rkd) Receive the symmetric key Decrypt the RSA symmetric key with the public key Assign a particular node as a high priority node Assign the respective link as the high cost link Define the content to be matched to evaluate priority Setup to get ready to receive data

Start.sh:This script is run on all the other nodes once the initialization script is run on the control node. In this script, the following commands are given to all the nodes to perform functions such as: Encrypt the symmetric key with the private key Send the encrypted key Send/Receive Data(For sending and receiving data, one of the nodes of the network is programmed to send the data to the control node once key exchange has occurred and the other nodes are programmed to receive the data from the control node after the packets are routed out.)

2.4.3 Compiler Interface

The compiler consists of threephases:

GCC compilation: C code is converted into MIPS-32 assemblycode. The outputisof.sfileformat.

Assembly Translator:MIPS assembly code is then translated to an assembly code that is supported by our custom ISA (Refer AppendixB).Outputisanassemblyfilewhichcontainsinstructionssupportedbyour custom processor.Instruction generation:The translated assembly file is further converted to hexadecimal formatwhich are then split into 8 unit blocks to give the opcodes.

2.4Scheduling Logic

A High cost line is reserved only for the priority packets and cannot be accessed by any other packet. The router uses deep packet inspection to check the content of the packet and determine priority for each incoming packet. Then, using this assigned priority information, all the Priority packets are buffered into the High priority queue and the Low priority packets are buffered into the Low priority queue. The packets from the HPQ are routed out of the router to the High Cost line and all the other packets in the LPQ are not allowed to access this High cost line.This ensures that the throughput of the High cost line remains high and is a privilege only for the high priority packets. The low priority packets in the LPQ are routed out to the low cost line.

Due to the segregation of packets depending on their priority and the alternative high cost path taken by the high priority packets, the average throughput of the network also increases when compared to a network without any high cost line access. The scheduling logic is defined in the output_arbiter module and the header modification for re-routing on the high cost line is performed in the priority decider module.

Chapter3

Design Implementation

3.1 Experimental Setup

1.Start an Experiment on DeterLab with the NS file in Appendix A

2.ssh into netfpga host [email protected]

3. From the control node, setup the router using the following commandlinesource initialize.sh

4. For each node, ssh into the node and run the start script

ssh..USCEE533.isi.deterlab.netsource start.sh

5. Ping and test if the network is up. ping -c 5

6. In order to change the content which determine priority from the control node

hal9000.pl content_write7.Send and receive multiple data packets with and without priority content to make the comparison

3.2 Working

The hardware logic explained in Chapter 2 was implemented in Verilog HDL, simulated with appropriate test benches and synthesized to completion. The bitfile is generated and is used to emulate the design onto the Virtex FPGA on the NetFPGA board.

To initiate the process, we create an ns file (Appendix A) and swap the experiment in on deterlab. Once swapped in, we ssh into the control node of the network and run the initialization script which assigns all the nodes of the network with MAC and IP addresses. Then on running the setup script, the hall9000 bitfile is downloaded onto the NetFPGA. Next, we run the rkd (router kit daemon) which helps build the routing table. In order to check if the network is set up and running, we ping and test it.

Next, we run the start script on each of the nodes which causes the encryption of the symmetric key with the respective private key and sends it to the control node of the router where the initialization script causes this encrypted key to be decrypted using the public key and stores them in corresponding registers.

Once the RSA symmetric key is exchanged, the data can be sent in the encrypted form to the router. At the router, decryption happens and then the packet enters the module written for priority assignment and routing out on the high cost link. Finally the data is routed out in an encrypted format, to be decrypted at the respective node.

Ping is used to test the connection, Iperf is used for bandwidth measurements and tcpdump is used to analyze the data transmitted and received.

3.3 Evaluation and Results

For comparison purposes, we have two systems: 1. A regular switching router 2. A priority switching router (with an additional high cost line). We send the same packets through both the designs and observe the difference.

RTT Comparison:

Compared below is a table giving the average round trip time delays on pinging each node of the network from every other node for two cases. One with the reference router bitfile loaded onto the NetFPGA and the other with the HAL9000 bitfile loaded onto the NetFPGA where in priority is assigned to a node( in this case to n1) and high priority packets are routed through high cost links in the network.

The average RTT was calculated from every ping and then the cumulative average RTT was calculated. It can be seen that when n1 node comes into the route, the use of the high cost link makes the delay lesser and this in turn decreases the average RTT and increases the average throughput of the network.

FromToAvg RTT(ms) for Reference RouterAvg RTT(ms)For Hal9000

n0n137.67033.030

n234.10834.614

n330.95731.004

n1n034.40534.114

n231.09731.341

n330.92131.325

n2n033.03131.295

n131.16027.197

n329.94631.097

n3n034.10131.258

n131.04427.018

n232.20831.214

Cumulative Average RTT 32.55430.959

The following screenshots show the results obtained when

RSA Key Exchange:

Data Transmission:

Inferences:In the convectional router, we observe that the throughput is less because all the packets are routed through the same lines as there is no concept of priority. In the HAL9000 router, we observe that the throughput is better on the low cost line and the average throughput of the network is also comparatively better. This is because of the high priority packets being routed to the high cost line.

3.4 Future Work

The HAL 9000 router so far performs Deep packet Inspection, assigns priority and routes the priority packets through a high priority queue onto a high cost line. In future we plan to implement dynamic routing to make it adaptive to the traffic on the link. For example, if a particular low cost link has more traffic then another Quality Factor threshold will push low priority packet from the congested low cost link to the more free high cost link. This is dynamically done till the Quality factor crosses over the threshold which implies that the traffic on the node has cleared up.

Chapter 4

References

References:

NetFPGA website: https://github.com/NetFPGA/netfpga/wiki

http://yuba.stanford.edu/~jnaous/papers/ancs-openflow-08.pdf

G.W. Wong and R.W. Donaldson, "Improving the QoSoerformance of EDCF in IEEE 802.11e wireless LANs," IEEE PACRIM, vol. I, pp.392-396, Aug 2003.

E. Crawley, R. Nair, B. Rajagopalan, H. Sandick, "A Framework for QoS-based Routing in the Internet", RFC 2386, Aug. 1998.

P. Nanda, A. J. Simmonds: Policy based QoS support using BGP routing, 2006 International Conference on Communications in Computing, CIC 2006, Las Vegas, Nevada, USA, CSREA Press 2006, ISBN 1-60132-012-4, pp. 63 69, June 26-29, 2006.

S. Chen and K. Nahrstedt. An Overview of Quality of Service Routing for Next-Generation High-Speed Networks: Problems and Solutions. IEEE Network, pages 6479, Nov. 1998

https://github.com/NetFPGA/netfpga/wiki/OpenFlowNetFPGA100

Appendix

A. nsFile

# NS file to create a network with 4 nodes and a NetFPGAhostsourcetb_compat.tcl

set ns [ new Simulator]set nfrouter [ $ns node]tb-set-hardware $nfrouternetfpga2set control [ $ns node]tb-bind-parent $nfrouter$control#Create endnodesset n0 [ $ns node]set n1 [ $ns node]set n2 [ $ns node]set n3 [ $ns node]#Put all the nodes in alanset lan0 [ $ns make-lan $nfrouter $n0 1000Mb 0ms]tb-set-ip-lan $n0 $lan010.1.0.3set lan1 [ $ns make-lan $nfrouter $n1 1000Mb 0ms]tb-set-ip-lan $n1 $lan110.1.1.3set lan2 [ $ns make-lan $nfrouter $n2 1000Mb 0ms]tb-set-ip-lan $n2 $lan210.1.2.3set lan3 [ $ns make-lan $nfrouter $n3 1000Mb 0ms]tb-set-ip-lan $n3 $lan310.1.3.3$ns rtprotoStatic$nsrun

B. Instruction SetArchitecture

There are 24 instructions included in the Instruction Set Architecture (ISA) oftheprocessor.These24instructionsareasubsetoftheMIPS32ISA.Eachinstructionis32-bitwidemachinelanguageencodedandthedataoperandwordsare64-bitvalues.Rs, Rt, Rd are fields whichpoint to32General Purpose Registers (GPR) of the RegisterFile.

OPCODEOperationBriefDescription

000000NOPNoOperation.

010101LW Rt,offset(Rs)Load Word. The 16-bit signed offset is added to the contentsofRs, to form the effective address. The 64-bit value at thememorylocationspecifiedbythealignedeffectiveaddressisfetched from the data memory and placed in theRt

010100SW Rt,offset(Rs)Store Word. The 16-bit signed offset is added to the contentsofRs, to form the effective address. The 64-bit value in Rtisstoredatthelocationspecifiedbythealignedeffectiveaddress

100001BEQ Rs, Rt,offsetBranchonEqual.64-bitvalueinRsiscomparedwith64-bit value in Rt, if equal, PC gets PC +offset.If[Rs]=[Rt],PCPC+offset

111111JUMP,offsetJump to the location given as offset.PC gets PC +offset.

010101ADDI Rd, Rs,immediateAdd Immediate Word. 16-bit signed immediate valueisaddedwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs]+immediate

010001SUBI Rd, Rs,immediateSubtract Immediate Word. 16-bit signed immediate valueisaddedwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs]immediate

000101ADD Rd, Rs,RtAddWord.64-bitvalueinRsisaddedwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs]+[Rt]

000001SUB Rd, Rs,RtSubtractWord.64-bitvalueinRsissubtractedby64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs]-[Rt]

100010AND Rd, Rs,RtBitwiseAndWord.64-bitvalueinRsisbitwiseanddwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs]&&[Rt]

001001OR Rd, Rs,RtBitwiseOrWord.64-bitvalueinRsisbitwiseordwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs]||[Rt]

001010XOR Rd, Rs,RtBitwiseXorWord.64-bitvalueinRsisbitwisexordwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd][Rs][Rt]

001100LOGICAL NOR BitwiseNorWord.64bitvalueinRsisbitwisenordwith64bitvalueinRttoproduce64-bitresultinRd.[Rd] ~([Rs]||[Rt])

001101LOGICAL NANDBitwiseNandWord.64-bitvalueinRsisbitwisenordwith64-bitvalueinRttoproduce64-bitresultinRd.[Rd] ~([Rs]&&[Rt])

000111SLT Rd, Rs,RtSetonLessThan.64-bitvalueinRsiscomparedwith64-bitvalueinRttoproduce64-bitresultinRd.RdissetifRsislessthanRt,elseitisreset.[Rd][Rs] 1

000010SGT Rd, Rs,RtSetonGreater Than.64-bitvalueinRsiscomparedwith64-bitvalueinRttoproduce64-bitresultinRd.RdissetifRsislessthanRt,elseitisreset.[Rd][Rs]>[Rt]

000100INC Rs,RtIncrement[Rt][Rs]+1

001011MULT Rs, Rt, RdMultiplication[Rd][Rs]* [Rt]

100001BNE Rs, Rt,offsetBranchonNotEqual.64-bitvalueinRsiscomparedwith64-bit value in Rt, if not equal, PC gets PC +offset.If[Rs]=[Rt],PCPC+offset

110101LWI Rd, immediateLoad Immediate Word. The 16bit immediate data taken in is placed in theRd.

110100STOREI Rd, immediateStore Immediate Word. The 16-bit immediate data is storedatthelocationspecifiedbytheaddress stored in the destination register

001111DEC Rs,RtDecrement[Rt][Rs]1

C. Important Hardware/SoftwareRegisters

IDS_CONTROL_REG 0x2000300IDS_INSTRUCTION_IN_REG 0x2000304IDS_DATA_IN_HIGH_REG 0x2000308IDS_DATA_IN_LOW_REG 0x200030cIDS_DATA_ADDR_REG 0x2000310IDS_INSTRUCTION_ADDR_REG 0x2000314IDS_PORT_DEST_REG 0x2000318IDS_CONTENT_REG_REG 0x200031cIDS_CONTENT0_LOW_REG 0x2000320IDS_CONTENT0_HIGH_REG 0x2000324IDS_CONTENT1_LOW_REG 0x2000328IDS_CONTENT1_HIGH_REG 0x200032cIDS_CONTENT2_LOW_REG 0x2000330IDS_CONTENT2_HIGH_REG 0x2000334IDS_CONTENT3_LOW_REG 0x2000338IDS_CONTENT3_HIGH_REG 0x200033cIDS_DES_KEY_LOW_REG 0x2000340IDS_DES_KEY_HIGH_REG 0x2000344IDS_DATA_OUT_HIGH_REG 0x2000348IDS_DATA_OUT_LOW_REG 0x200034cIDS_INSTRUCTION_OUT_REG 0x2000350IDS_HIGH_COUNT_REG 0x2000354IDS_LOW_COUNT_REG 0x2000358