advanced topics in computer networks

52
Univ. of Tehran Adv. topics in Computer N etwork 1 Advanced topics Advanced topics in in Computer Networks Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Lecture 9: Tree-based lookup

Upload: alanna

Post on 08-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Advanced topics in Computer Networks. Lecture 9: Tree-based lookup. University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani. Outline. Issues Multiway and Multicolumn search DMP-Tree Some implementation issues. Issues. How to sort prefixes Prefixes as ranges - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

1

Advanced topicsAdvanced topics inin

Computer Computer NetworksNetworks

University of TehranDept. of EE and Computer Engineering

By:Dr. Nasser Yazdani

Lecture 9: Tree-based lookup

Page 2: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

2

OutlineOutline Issues Multiway and Multicolumn search DMP-Tree Some implementation issues

Page 3: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

3

IssuesIssues How to sort prefixes

Prefixes as ranges Comparing prefixes

Based on length Add extra bits at the end. New definition (DMP-tree)

How to apply tree structures like binary tree or m_way tree to prefixes

Page 4: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

4

Multiway tree lookup. Proposed by G. Varghese and his

students. Consider prefixes as range. First try: Pad 0’s to prefixes in order to

apply binary search tree. consider {1*, 101* and 10101* }prefixes

100000101000101010 101011

101110111110

Binary search endshere.

Should match here

Binary search fail for all of them!.

Page 5: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

5

Multiway tree lookup(cont) Two problem in the previous example

Being Far away from matching prefix Multiple addresses matching different prefixes end up

in the same region. Solution: Prefixes as ranges, Put the end of

range in the table.100000101000101010101011101111111111

We have the explicit ranges.

Search maps to one range only.

101011101110111110

L

L

L

HHH

Page 6: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

6

Multiway tree lookup(cont)100000101000101010101011101111111111

For 101011, we try to find first L which is not followed by H.

For the rest, we can have a stack operation to find the first L.

Problem: Linear search to find L

101011101110111110

L

L

L

HHH

Page 7: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

7

Multiway tree lookup(cont)Solution: Precompute prefixes corresponding to

ranges.

100000101000101010101011101111111111

101011101110111110

L

L

L

HHH 1* matching

prefix.

> =P1)100000 P1 P1P2)101000 P2 P2P3)101010 P3 P3 101011 P2 P3 101111 P1 P2 111111 - P1

Page 8: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

8

DMP-TreeDMP-Tree

Comparing prefixes. Sorting prefixes Binary prefix Tree. M_way prefix tree.

Page 9: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

9

Trie structure Trie or radix tree

Page 10: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

10

Sorting prefixes Question? Why well-known tree structures

cannot be applied to the longest prefix matching problem?

Answer- No a well-known method for sorting. Definition: Assume Aa1a2…an and B=b1b2…

bm to be prefixes of and there a character 1.  If n=m, the numerical values of A and B are

compared. 2.  If n m (assume n<m), the two substrings a1a2…

an and b1b2…bn are compared. If a1a2…an and b1b2…bn are equal, then, the (n+1)th character of string B is checked. It is considered B>A if bn+1 is before and B A otherwise.

Page 11: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

11

Sorting prefixes (cont) Example- Assume M is Then, BOAT is

smaller than GOAT and SAD is bigger than BALLOON. CAT is considered bigger than CATEGORY since the fourth character in CATEGORY, E, is smaller than M.

Sorting is a function to determine the position of each prefix.

Prefixes of table is sorted as:00010*,0001*,001100*,01001100*,0100110*,010

11,001*,01011*,01*,10*,10110001*,1011001*,10110011*,1011010*,1011*,110*

Page 12: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

12

Binary prefix tree

•Unfortunately, it fails for 101100001000 Why? •Prefixes are ranges and not just a data point in the search space.

Page 13: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

13

Binary prefix tree (cont) Definition: prefixes A and B are disjoint

if none of them is a prefix of other. Definition : prefix A is called enclosure

if there exists at least one element set such that A is a prefix of that element.

We modify the sort structure;1. Each enclosure has a bag to put its data element on

it.2. Sort remaining elements.3. Distribute the bag elements to the right and left

according the sort definition.

4. Apply algorithm recursively.

Page 14: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

14

Binary prefix tree (cont) Example- Prefixes in table 1. First

step.

The second step,

Note- enclosures are in the higher level than the contained elements. (important!)

Page 15: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

15

Binary prefix tree (cont) The final tree structure

Page 16: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

16

Sorting prefixes (cont) Sorting algorithms

Based on bubble sort Based on Radix sort.

Tmp= MinLength(list)for all i in list except tmp do

compare i with tmpif i matches tmp then;

put i in tmp’s bagif i<tmp then

put i in leftList:if i>tmp then

put i in rightist:endforlist = Sort(leftList) Sort(rightList)

Page 17: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

17

M_way prefix tree Problems with the binary prefix tree.

Two way branching. The structure is not dynamic and insertion

may cause problems!.

1. Divide by m after sorting the strings Static m_way tree.

2. Build a dynamic data structure like B-tree.

How to guarantee enclosure to be in the higher level than its contained elements.

Define node splitting and insertion.

Page 18: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

18

M_way prefix tree (Cont) Node splitting: Finding the split point.

1. Take the median if the data elements are disjoint.

2. If there is an enclosure containing other elements, take it as split point.

3. Otherwise, take an element which gives the best splitting result.

Note, this does not guarantee the final tree will be balanced.

Page 19: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

19

M_way prefix tree (Cont)

Insertion:1. If the new element is not an enclosure of

others, find its place and insert in the corresponding leaf, like B-tree.

2. Otherwise, replace the closet element with element and reinsert the replace elements.

3. Resort the resulted subtree, (space division) if necessary.

Building tree is similar to building B-tree.

Page 20: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

20

M-way prefix tree (cont) Example Prefix Abbrv. Prefix Abbrv.

10 - 1101110010 K01 - 10001101 L110 - 11101101 M1011 - 01010110 N0001 - 00100101 O01011 - 100110100 P00010 - 101011011 Q001100 A 11101110 R1011001 B 10110111 S1011010 C 011010 T0100110 D 011011 U01001100 E 011101 V10110011 F 0110010 W10110001 G 101101000 X01011001 H 101101110 Y001011 I 00011101 Z00111010 J 011110110 II

Page 21: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

21

M-way prefix tree (cont) We insert prefixes randomly. The tree uses 5 branching factor (at

most 4 prefixes in each node) Insert 01011, 1011010, 10110001 and

0100110. Then, adding 110 cause overflow. Split node

10110001 (0100110,01011) (1011010, 110)

(all element are disjoint)

Page 22: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

22

M-way prefix tree (cont) Insert 10110011, 1101110010, 00010.

Adding 1011001 causes overflow. 10110001 1011010

(00010,0100110,01011) (1011001,10110011) (110,1101110010)

(case 3 of splitting) Latter adding 1011 cause problem. It is

the case of adding an enclosure. We will have space division.

Page 23: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

23

M-way prefix tree (cont) The final tree

• The tree supersede B-tree or B-tree is a special case of this tree. Then, when data element are relatively disjoint, the height of tree is logMN.

Page 24: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

24

DMP-Tree

0

5

10

15

20

2550

60

70

80

90

100

• BF is Branching factor in the internal nodes.• No. of Data is in1000s.

Max. height No. of Data

Page 25: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

25

DMP-Tree

• Number of prefixes in the right.

Min. memory utilization

0.62

0.64

0.66

0.68

3 4 5 6 7 8 9 10

branching

50K

60K

70K

80K

90K

100K

No. of Data

Page 26: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

26

DMP-Tree

• Height of tree for 100K data prefixes.

0

5

10

15

20

25

3 4 5 6 7 8 9 10

Min

Ave.

Max

Branching

Height

Page 27: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

27

DMP-Tree

Analyzing of results. With increasing BF, Branching Factor,

the height decreases. The result are for the worst case, Max

height, and the ave. case is much less. After BF=9, increasing Branching

Factor does not decrease the max. height.

The results are for the set of prefixes of 50,000-100,000 with lengths from 8 t0 31. The size of actual prefixes in use is around 50,000 and the length is 8-31.

Page 28: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

28

DMP-Tree

Memory utilization:, Mem. Utilization is 0.64%-0.67%

without considering the tree branching overhead.

Mem. Utilization is 0.53%-0.62% with tree branching overhead (pointers).

Without considering branching pointers, the mem. Utilization decreases with increasing the branching factor.

Total mem. Utilization increases with increasing the branching factor.

Page 29: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

29

DMP-Tree

Therefore, The longest matching prefix of a

network can be determined in 5 steps with 9 or more branching factor.

In the worst case, we need at most 2 times of total prefix data size of memory to implement the scheme. For instance, for 50,000 prefixes of 32bit, we need at most 3.2 Mbit of memory.

Page 30: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

30

Overall Design All operations need search first in the

Tree structure. Two search procedures, one for the

longest matching prefix and another for update.

The prefix tree data structure is on the chip.

The Policy table is on the off chip memory.

There is a port to data link layer mapping module.

Page 31: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

31

Tree Nodes Internal nodes.

Each prefix has a left and right pointer which are pointing to left and right subtrees respectively.

We can have N prefixes in each internal node. Then, N+1 is the branching factor.

The bigger N, the faster search time, but the more logic is needed.

Port is the address of the port in the switch to which the packet will be sent.

Internal nodes

Leaf nodes

Addr 1[?]

Prefix 1[33]

Port[?]

Addr 2[?]

Prefix 2[33]

port[?] … Addr

[?]

Branching factor

Page 32: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

32

Tree Nodes Leaf nodes.

There is no left and right subtree pointers. The number of prefixes in the leaf node is

M. The leaf nodes are stored in a off chip

memory to make the scheme scalable to the large number of prefixes.

Prefix 1[33]

port[?]

Prefix 2[33]

port[?] …

Page 33: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

33

Branching Factor What is the best number for N?

(Branching factor) The bigger N, the faster search process. (Fact

1) The bigger N, the more memory pins are and

usually the more mem. Bandwidth is needed (Fact 2).

The bigger N, the more logic we need to process the node (Fact 3).

Simulation result shows The bigger N, the better memory utilization in

the memory. For N 8, the max. height of the tree does

not decrease considerably.

Page 34: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

34

Simulation result Total memory: assuming one memory

block and OC-192.# of Prefixes

required Mem.

Branching Factor

Mem. Pins

Mem BW (G/s) max

Max memAccess

Mem. Size (on chip)mm2

Max heights

64K 5.4 Mbits

15 897 89.7 4 81 5

64K 5.5 Mbits

11 655 65.5 4 82 5

64K 5.4Mbit 9 527 52.7 4 81 5

64K 6 Mbit 6 335 46.9 6 96 7

64K 6.6 Mbit 5 275 44 7 110 8

64K 6.5 Mbit 4 207 62.1 14 112 15

100K 8.3 Mbit 15 897 89.7 4 122 5

100K 8.5 Mbit 11 655 65.5 4 125 5

100K 8.3 9 527 63.24 5 122 6

100K 9 Mbit 6 335 53.6 7 135 8

100K 9.1Mbit 5 275 49.5 8 140 9

100K 9.5 4 207 62.1 14 150 15

30K 2.6 9 527 52.7 4 expected 40 5 expected

Page 35: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

35

Branching Factor It seems any number between 8-16 is

reasonable. But, N=9 gives a better search time, memory size.

Assuming 9 branching factors in the internal node, %50 node utilization and 128K prefixes, we need max. 128K/4.5= 28.5K address. Then, 15 bit address for left and right pointers are more than enough. But, we need more for off chip addressing

The number of switch port are usually limited, around 64, We can assume 256, then 8 bit is enough to address them.

Page 36: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

36

Branching Factor In order to make the internal node

branching and leaf node branching even, M=10.

If we want to read a node at once, we will need 41x10=410 pins which is difficult to support in one chip.

We can divide a node in two and read/write in two clock cycles. This reduce the memory pins to 205 which is affordable.

Page 37: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

37

Memory requirement Prefix tree: Assuming 128K prefixes.

N = 9 (BF) and M=10 (BF in leaves), the majority of prefixes, %80 will be in leaves, assume %65 node utilization,

# of ave prefixes in a leaf node node = 10*0.6 5= 6.5 # of leaf nodes 128Kx%80x2/6.5 = 31.5K and %10

overhead 35 K Total off chip memory = 35K x 205(Mem BW) = 7.2 MbitsThen, we need 16 bits for addressing. 1 bit for

internal/external.# of internal nodes= 128Kx%20/5.8=4.41K and %10

overhead 4.9 K Total on chip memory=4.9Kx529K 2.6Mbits

Port to link address mapping table.For each port corresponding link addressMax. 256 ports, on chip, some mem for indexing

Link Addr

48 bit addr.

Page 38: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

38

Memory requirement In summary:# of Prefixes

on chip memory Mbits

Branching Factor

On chip Mem. Pins

Off chip memoryMbits

memAccess(search)

Mem. Size (on chip) mm2

Off chip mem pins

128K 2.6 10 529 7.2 5 40 250Note: Branching factor is the # of branching in internal nodes. The size of the memory scales with the size of data or

# of prefixes. Power dissip. depends on the r/w freq, current & core voltage Considering Faraday Mem. Modules

A 10Kx32 bits single port mem size is 36x1.45 mm2.

Page 39: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

39

Overall Design

Search

Memory

Insertion

Mem. Ctrl

update delete

Update Searchroot addr

root content

MEMCtrl

Port2Addr

Output mem Ctrl

To/From out Mem

CPUInterface

To/From CPU

To/From NP

Page 40: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

40

Search Path

Input

Data[32]

SOA[1]

Cashing

Mem Ctrl

Piping

Addr[32]

Addr[32]

SOA[1]Fo

und[

1]GetLen

RdA

dd[1

9]

Nod

e

Addr[32]

First[1]Compare Next

Roo

t

Addr[32]

Node

Len[Nx6]

Node

Addr[32]

Match[1]

CResult[1]

MemAddr[14]

IpAddr[32]

Out

Mem

Add

r

Dispatch

Addr[32]

Port[8]

LinkAddr[48]

InClk[1]

Prty[1]Pa

ckA

dd[2

9]

Dat

aOut

[32]

To Scheduler

There are data assertion signals between blocks which has not beenshown every where because of space limitation.

To/From Off mem

Page 41: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

41

Search Path Input Module:

Get the packet destination addresses from the parser.

Do parity checking. It has the following input signals

Input data which 32 bits. Start of Address, 1 bit, (SOA) Parity, 1 bit, (prty) Input clock, (InClk)

It gets Data in two clock cycles, first the IP address and then, the packet address in the memory or packet id (cid)

Page 42: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

42

Search Path

Input Module: 29 bits is used for the packet address and

the last 3 bit for the policy, Then, 512 Mbytes can be supported to store the packet before sending them out.

The 2nd clock cycle data format:

The timingInClk

SOA

IpAddr PackAddrOr cid

Packet address Policy31 2 0

Data

Page 43: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

43

Search Path

Piping Module: Pipelines the search process. For new elements from input block does.

For each new IP address doIf found in the hash table

send the packet memory address to dispatcherElse

Enter IP address and the policy into the pipe FIFO

End do,

Page 44: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

44

Search Path

Piping Module: For elements in the FIFO

For the first IP address in FIFO doIf IP address is new then,

assert first signal and send IP address and policy out.Else if next addr is on chip send the next node address to Mem.

Ctrl. Else send the next node address to OffMemCtrl.send to the pipe the IP address and policy.

For the recirculated addressIf the node was leaf then,

Send the longest matching address to OutMemCtrl.Send Policy to Extract port and the packet address to dispatch.

ElsePut the IP address into the FIFOReplace the longest matching prefix address if a new one

found.

Page 45: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

45

Search Path

Piping Module: FIFO . Keep the current information

of IPs.

LMPA = Longest Matching Prefix AddressNew = 1 new , 0 oldIf the packet is new the next address will

be zero and we can read root cash content instead of reading from memory.

The address is off chip if the first, most significant bit is 1, otherwise it on chip.

IP Addr [32]

Port [8] Next Node [19]

LMPA[] New [1]

Page 46: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

46

Search Path

GetLen Module: This module get the length of prefixes. We add 1 to the end of a prefix and then padded with ‘0’s to make it 33 bits.Ex. 11011010 110110101000…0 (33 bits).Then, we should start from right and the first

‘1’ we meet, the rest is the prefix length.GetLen can be implemented as a multiplexer

with case statement (32 case statement) and it can be done in one clock cycle.

Page 47: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

47

Search Path

Compare Module: compare two prefix A and B with lengths L1 and L2.Assume L1>L2 and A[1:L2] is the first L2

bit from A, Then, 1. If A[1:L2] = B A and B match. If A[L2+1]

= 0 A B. Otherwise, A> B.2. If A[1:L2] > B A >B, otherwise A<B.

One of the prefixes here is IP address with length 32.

We assume there are no two identical elements in the tree.

Page 48: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

48

Search Path

Next Module: Get the next node address to read and also the matching prefix and its corresponding port number. It gets two signals for each prefix, Match

and ComResult (compare),Match =1 the prefix match, ComResult = 1 Prefix is bigger.It gets the left address of the first prefix, from

the left, such that its ComResult signal is 1.It compares the matching prefix lengths and

the get the one with the largest length.

Page 49: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

49

Search Path

Dispatch Module: forms the Routing Group Address, RGA, from the port number and send it with packet stored memory address (PSMA) or CID.

RGA is a 64 bit size bit map. The bit correspond to port number is set to 1.

PSMA is dispatched first and Port and DLL address follows.

Cashing Module: keep a cash of IP address and corresponding port.

IP address [32]

Port[8]

Page 50: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

50

Search Path

Cashing Module:The cash is kept as a FIFO and its depth

depends on the technology.Check IP address in FIFO.If the address found, then,

assert found signal.write IP address on top of FIFO if it is not there already.

Else write IP address on top of FIFO

Cashing system always removes the last reference IP address from the cash.

Page 51: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

51

Search ProcessSearch Process

operations: This is for large prefixes (50K &up)cashingcashing

PipingPiping

Off Mem Off Mem

Time

DispatchDispatch

PipingPipingPipingPiping PipingPiping

20 5 8 11 14 16 19

• This operation is for an IP address lookup• Piping is the bottleneck in the system and in ave. take 5 cycles.• Assuming 100 MHZ operation

# of packets = 109/50 = 20 MillionLine speed = 512x20 = 10.24 G for 64 byte packets.

= 256x8x20= 41 G for 256 byte packets.• It is possible to support higher speeds with duplicating pipe.

PipingPiping

rootroot On chip memory nodesOn chip memory nodes Leaf nodeLeaf node

Page 52: Advanced topics in Computer Networks

Univ. of Tehran Adv. topics in Computer Network

52

Output pins

Pin Name Type Number Comment

DataIn (IP Addr) Input 32 IP address, from parser

DataOut(Port&cid)

Output 32 Port and cid, to schedular

DataBus(cpu) In/out 32 CPU data bus

CtrlBus(cpu) In/out 12 CPU control bus

MemData2(Tree) In/out 205 If off chip is used

MemAddr2(Tree) In/out 18 If off chip is used

MemCtrl1(Tree) In/out 8? If off chip is used

Total 340 This value can change around 10 percent