advanced topics in computer networks
DESCRIPTION
Advanced topics in Computer Networks. Lecture 9: Tree-based lookup. University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani. Outline. Issues Multiway and Multicolumn search DMP-Tree Some implementation issues. Issues. How to sort prefixes Prefixes as ranges - PowerPoint PPT PresentationTRANSCRIPT
Univ. of Tehran Adv. topics in Computer Network
1
Advanced topicsAdvanced topics inin
Computer Computer NetworksNetworks
University of TehranDept. of EE and Computer Engineering
By:Dr. Nasser Yazdani
Lecture 9: Tree-based lookup
Univ. of Tehran Adv. topics in Computer Network
2
OutlineOutline Issues Multiway and Multicolumn search DMP-Tree Some implementation issues
Univ. of Tehran Adv. topics in Computer Network
3
IssuesIssues How to sort prefixes
Prefixes as ranges Comparing prefixes
Based on length Add extra bits at the end. New definition (DMP-tree)
How to apply tree structures like binary tree or m_way tree to prefixes
Univ. of Tehran Adv. topics in Computer Network
4
Multiway tree lookup. Proposed by G. Varghese and his
students. Consider prefixes as range. First try: Pad 0’s to prefixes in order to
apply binary search tree. consider {1*, 101* and 10101* }prefixes
100000101000101010 101011
101110111110
Binary search endshere.
Should match here
Binary search fail for all of them!.
Univ. of Tehran Adv. topics in Computer Network
5
Multiway tree lookup(cont) Two problem in the previous example
Being Far away from matching prefix Multiple addresses matching different prefixes end up
in the same region. Solution: Prefixes as ranges, Put the end of
range in the table.100000101000101010101011101111111111
We have the explicit ranges.
Search maps to one range only.
101011101110111110
L
L
L
HHH
Univ. of Tehran Adv. topics in Computer Network
6
Multiway tree lookup(cont)100000101000101010101011101111111111
For 101011, we try to find first L which is not followed by H.
For the rest, we can have a stack operation to find the first L.
Problem: Linear search to find L
101011101110111110
L
L
L
HHH
Univ. of Tehran Adv. topics in Computer Network
7
Multiway tree lookup(cont)Solution: Precompute prefixes corresponding to
ranges.
100000101000101010101011101111111111
101011101110111110
L
L
L
HHH 1* matching
prefix.
> =P1)100000 P1 P1P2)101000 P2 P2P3)101010 P3 P3 101011 P2 P3 101111 P1 P2 111111 - P1
Univ. of Tehran Adv. topics in Computer Network
8
DMP-TreeDMP-Tree
Comparing prefixes. Sorting prefixes Binary prefix Tree. M_way prefix tree.
Univ. of Tehran Adv. topics in Computer Network
9
Trie structure Trie or radix tree
Univ. of Tehran Adv. topics in Computer Network
10
Sorting prefixes Question? Why well-known tree structures
cannot be applied to the longest prefix matching problem?
Answer- No a well-known method for sorting. Definition: Assume Aa1a2…an and B=b1b2…
bm to be prefixes of and there a character 1. If n=m, the numerical values of A and B are
compared. 2. If n m (assume n<m), the two substrings a1a2…
an and b1b2…bn are compared. If a1a2…an and b1b2…bn are equal, then, the (n+1)th character of string B is checked. It is considered B>A if bn+1 is before and B A otherwise.
Univ. of Tehran Adv. topics in Computer Network
11
Sorting prefixes (cont) Example- Assume M is Then, BOAT is
smaller than GOAT and SAD is bigger than BALLOON. CAT is considered bigger than CATEGORY since the fourth character in CATEGORY, E, is smaller than M.
Sorting is a function to determine the position of each prefix.
Prefixes of table is sorted as:00010*,0001*,001100*,01001100*,0100110*,010
11,001*,01011*,01*,10*,10110001*,1011001*,10110011*,1011010*,1011*,110*
Univ. of Tehran Adv. topics in Computer Network
12
Binary prefix tree
•Unfortunately, it fails for 101100001000 Why? •Prefixes are ranges and not just a data point in the search space.
Univ. of Tehran Adv. topics in Computer Network
13
Binary prefix tree (cont) Definition: prefixes A and B are disjoint
if none of them is a prefix of other. Definition : prefix A is called enclosure
if there exists at least one element set such that A is a prefix of that element.
We modify the sort structure;1. Each enclosure has a bag to put its data element on
it.2. Sort remaining elements.3. Distribute the bag elements to the right and left
according the sort definition.
4. Apply algorithm recursively.
Univ. of Tehran Adv. topics in Computer Network
14
Binary prefix tree (cont) Example- Prefixes in table 1. First
step.
The second step,
Note- enclosures are in the higher level than the contained elements. (important!)
Univ. of Tehran Adv. topics in Computer Network
15
Binary prefix tree (cont) The final tree structure
Univ. of Tehran Adv. topics in Computer Network
16
Sorting prefixes (cont) Sorting algorithms
Based on bubble sort Based on Radix sort.
Tmp= MinLength(list)for all i in list except tmp do
compare i with tmpif i matches tmp then;
put i in tmp’s bagif i<tmp then
put i in leftList:if i>tmp then
put i in rightist:endforlist = Sort(leftList) Sort(rightList)
Univ. of Tehran Adv. topics in Computer Network
17
M_way prefix tree Problems with the binary prefix tree.
Two way branching. The structure is not dynamic and insertion
may cause problems!.
1. Divide by m after sorting the strings Static m_way tree.
2. Build a dynamic data structure like B-tree.
How to guarantee enclosure to be in the higher level than its contained elements.
Define node splitting and insertion.
Univ. of Tehran Adv. topics in Computer Network
18
M_way prefix tree (Cont) Node splitting: Finding the split point.
1. Take the median if the data elements are disjoint.
2. If there is an enclosure containing other elements, take it as split point.
3. Otherwise, take an element which gives the best splitting result.
Note, this does not guarantee the final tree will be balanced.
Univ. of Tehran Adv. topics in Computer Network
19
M_way prefix tree (Cont)
Insertion:1. If the new element is not an enclosure of
others, find its place and insert in the corresponding leaf, like B-tree.
2. Otherwise, replace the closet element with element and reinsert the replace elements.
3. Resort the resulted subtree, (space division) if necessary.
Building tree is similar to building B-tree.
Univ. of Tehran Adv. topics in Computer Network
20
M-way prefix tree (cont) Example Prefix Abbrv. Prefix Abbrv.
10 - 1101110010 K01 - 10001101 L110 - 11101101 M1011 - 01010110 N0001 - 00100101 O01011 - 100110100 P00010 - 101011011 Q001100 A 11101110 R1011001 B 10110111 S1011010 C 011010 T0100110 D 011011 U01001100 E 011101 V10110011 F 0110010 W10110001 G 101101000 X01011001 H 101101110 Y001011 I 00011101 Z00111010 J 011110110 II
Univ. of Tehran Adv. topics in Computer Network
21
M-way prefix tree (cont) We insert prefixes randomly. The tree uses 5 branching factor (at
most 4 prefixes in each node) Insert 01011, 1011010, 10110001 and
0100110. Then, adding 110 cause overflow. Split node
10110001 (0100110,01011) (1011010, 110)
(all element are disjoint)
Univ. of Tehran Adv. topics in Computer Network
22
M-way prefix tree (cont) Insert 10110011, 1101110010, 00010.
Adding 1011001 causes overflow. 10110001 1011010
(00010,0100110,01011) (1011001,10110011) (110,1101110010)
(case 3 of splitting) Latter adding 1011 cause problem. It is
the case of adding an enclosure. We will have space division.
Univ. of Tehran Adv. topics in Computer Network
23
M-way prefix tree (cont) The final tree
• The tree supersede B-tree or B-tree is a special case of this tree. Then, when data element are relatively disjoint, the height of tree is logMN.
Univ. of Tehran Adv. topics in Computer Network
24
DMP-Tree
0
5
10
15
20
2550
60
70
80
90
100
• BF is Branching factor in the internal nodes.• No. of Data is in1000s.
Max. height No. of Data
Univ. of Tehran Adv. topics in Computer Network
25
DMP-Tree
• Number of prefixes in the right.
Min. memory utilization
0.62
0.64
0.66
0.68
3 4 5 6 7 8 9 10
branching
50K
60K
70K
80K
90K
100K
No. of Data
Univ. of Tehran Adv. topics in Computer Network
26
DMP-Tree
• Height of tree for 100K data prefixes.
0
5
10
15
20
25
3 4 5 6 7 8 9 10
Min
Ave.
Max
Branching
Height
Univ. of Tehran Adv. topics in Computer Network
27
DMP-Tree
Analyzing of results. With increasing BF, Branching Factor,
the height decreases. The result are for the worst case, Max
height, and the ave. case is much less. After BF=9, increasing Branching
Factor does not decrease the max. height.
The results are for the set of prefixes of 50,000-100,000 with lengths from 8 t0 31. The size of actual prefixes in use is around 50,000 and the length is 8-31.
Univ. of Tehran Adv. topics in Computer Network
28
DMP-Tree
Memory utilization:, Mem. Utilization is 0.64%-0.67%
without considering the tree branching overhead.
Mem. Utilization is 0.53%-0.62% with tree branching overhead (pointers).
Without considering branching pointers, the mem. Utilization decreases with increasing the branching factor.
Total mem. Utilization increases with increasing the branching factor.
Univ. of Tehran Adv. topics in Computer Network
29
DMP-Tree
Therefore, The longest matching prefix of a
network can be determined in 5 steps with 9 or more branching factor.
In the worst case, we need at most 2 times of total prefix data size of memory to implement the scheme. For instance, for 50,000 prefixes of 32bit, we need at most 3.2 Mbit of memory.
Univ. of Tehran Adv. topics in Computer Network
30
Overall Design All operations need search first in the
Tree structure. Two search procedures, one for the
longest matching prefix and another for update.
The prefix tree data structure is on the chip.
The Policy table is on the off chip memory.
There is a port to data link layer mapping module.
Univ. of Tehran Adv. topics in Computer Network
31
Tree Nodes Internal nodes.
Each prefix has a left and right pointer which are pointing to left and right subtrees respectively.
We can have N prefixes in each internal node. Then, N+1 is the branching factor.
The bigger N, the faster search time, but the more logic is needed.
Port is the address of the port in the switch to which the packet will be sent.
Internal nodes
Leaf nodes
Addr 1[?]
Prefix 1[33]
Port[?]
Addr 2[?]
Prefix 2[33]
port[?] … Addr
[?]
Branching factor
Univ. of Tehran Adv. topics in Computer Network
32
Tree Nodes Leaf nodes.
There is no left and right subtree pointers. The number of prefixes in the leaf node is
M. The leaf nodes are stored in a off chip
memory to make the scheme scalable to the large number of prefixes.
Prefix 1[33]
port[?]
Prefix 2[33]
port[?] …
Univ. of Tehran Adv. topics in Computer Network
33
Branching Factor What is the best number for N?
(Branching factor) The bigger N, the faster search process. (Fact
1) The bigger N, the more memory pins are and
usually the more mem. Bandwidth is needed (Fact 2).
The bigger N, the more logic we need to process the node (Fact 3).
Simulation result shows The bigger N, the better memory utilization in
the memory. For N 8, the max. height of the tree does
not decrease considerably.
Univ. of Tehran Adv. topics in Computer Network
34
Simulation result Total memory: assuming one memory
block and OC-192.# of Prefixes
required Mem.
Branching Factor
Mem. Pins
Mem BW (G/s) max
Max memAccess
Mem. Size (on chip)mm2
Max heights
64K 5.4 Mbits
15 897 89.7 4 81 5
64K 5.5 Mbits
11 655 65.5 4 82 5
64K 5.4Mbit 9 527 52.7 4 81 5
64K 6 Mbit 6 335 46.9 6 96 7
64K 6.6 Mbit 5 275 44 7 110 8
64K 6.5 Mbit 4 207 62.1 14 112 15
100K 8.3 Mbit 15 897 89.7 4 122 5
100K 8.5 Mbit 11 655 65.5 4 125 5
100K 8.3 9 527 63.24 5 122 6
100K 9 Mbit 6 335 53.6 7 135 8
100K 9.1Mbit 5 275 49.5 8 140 9
100K 9.5 4 207 62.1 14 150 15
30K 2.6 9 527 52.7 4 expected 40 5 expected
Univ. of Tehran Adv. topics in Computer Network
35
Branching Factor It seems any number between 8-16 is
reasonable. But, N=9 gives a better search time, memory size.
Assuming 9 branching factors in the internal node, %50 node utilization and 128K prefixes, we need max. 128K/4.5= 28.5K address. Then, 15 bit address for left and right pointers are more than enough. But, we need more for off chip addressing
The number of switch port are usually limited, around 64, We can assume 256, then 8 bit is enough to address them.
Univ. of Tehran Adv. topics in Computer Network
36
Branching Factor In order to make the internal node
branching and leaf node branching even, M=10.
If we want to read a node at once, we will need 41x10=410 pins which is difficult to support in one chip.
We can divide a node in two and read/write in two clock cycles. This reduce the memory pins to 205 which is affordable.
Univ. of Tehran Adv. topics in Computer Network
37
Memory requirement Prefix tree: Assuming 128K prefixes.
N = 9 (BF) and M=10 (BF in leaves), the majority of prefixes, %80 will be in leaves, assume %65 node utilization,
# of ave prefixes in a leaf node node = 10*0.6 5= 6.5 # of leaf nodes 128Kx%80x2/6.5 = 31.5K and %10
overhead 35 K Total off chip memory = 35K x 205(Mem BW) = 7.2 MbitsThen, we need 16 bits for addressing. 1 bit for
internal/external.# of internal nodes= 128Kx%20/5.8=4.41K and %10
overhead 4.9 K Total on chip memory=4.9Kx529K 2.6Mbits
Port to link address mapping table.For each port corresponding link addressMax. 256 ports, on chip, some mem for indexing
Link Addr
48 bit addr.
Univ. of Tehran Adv. topics in Computer Network
38
Memory requirement In summary:# of Prefixes
on chip memory Mbits
Branching Factor
On chip Mem. Pins
Off chip memoryMbits
memAccess(search)
Mem. Size (on chip) mm2
Off chip mem pins
128K 2.6 10 529 7.2 5 40 250Note: Branching factor is the # of branching in internal nodes. The size of the memory scales with the size of data or
# of prefixes. Power dissip. depends on the r/w freq, current & core voltage Considering Faraday Mem. Modules
A 10Kx32 bits single port mem size is 36x1.45 mm2.
Univ. of Tehran Adv. topics in Computer Network
39
Overall Design
Search
Memory
Insertion
Mem. Ctrl
update delete
Update Searchroot addr
root content
MEMCtrl
Port2Addr
Output mem Ctrl
To/From out Mem
CPUInterface
To/From CPU
To/From NP
Univ. of Tehran Adv. topics in Computer Network
40
Search Path
Input
Data[32]
SOA[1]
Cashing
Mem Ctrl
Piping
Addr[32]
Addr[32]
SOA[1]Fo
und[
1]GetLen
RdA
dd[1
9]
Nod
e
Addr[32]
First[1]Compare Next
Roo
t
Addr[32]
Node
Len[Nx6]
Node
Addr[32]
Match[1]
CResult[1]
MemAddr[14]
IpAddr[32]
Out
Mem
Add
r
Dispatch
Addr[32]
Port[8]
LinkAddr[48]
InClk[1]
Prty[1]Pa
ckA
dd[2
9]
Dat
aOut
[32]
To Scheduler
There are data assertion signals between blocks which has not beenshown every where because of space limitation.
To/From Off mem
Univ. of Tehran Adv. topics in Computer Network
41
Search Path Input Module:
Get the packet destination addresses from the parser.
Do parity checking. It has the following input signals
Input data which 32 bits. Start of Address, 1 bit, (SOA) Parity, 1 bit, (prty) Input clock, (InClk)
It gets Data in two clock cycles, first the IP address and then, the packet address in the memory or packet id (cid)
Univ. of Tehran Adv. topics in Computer Network
42
Search Path
Input Module: 29 bits is used for the packet address and
the last 3 bit for the policy, Then, 512 Mbytes can be supported to store the packet before sending them out.
The 2nd clock cycle data format:
The timingInClk
SOA
IpAddr PackAddrOr cid
Packet address Policy31 2 0
Data
Univ. of Tehran Adv. topics in Computer Network
43
Search Path
Piping Module: Pipelines the search process. For new elements from input block does.
For each new IP address doIf found in the hash table
send the packet memory address to dispatcherElse
Enter IP address and the policy into the pipe FIFO
End do,
Univ. of Tehran Adv. topics in Computer Network
44
Search Path
Piping Module: For elements in the FIFO
For the first IP address in FIFO doIf IP address is new then,
assert first signal and send IP address and policy out.Else if next addr is on chip send the next node address to Mem.
Ctrl. Else send the next node address to OffMemCtrl.send to the pipe the IP address and policy.
For the recirculated addressIf the node was leaf then,
Send the longest matching address to OutMemCtrl.Send Policy to Extract port and the packet address to dispatch.
ElsePut the IP address into the FIFOReplace the longest matching prefix address if a new one
found.
Univ. of Tehran Adv. topics in Computer Network
45
Search Path
Piping Module: FIFO . Keep the current information
of IPs.
LMPA = Longest Matching Prefix AddressNew = 1 new , 0 oldIf the packet is new the next address will
be zero and we can read root cash content instead of reading from memory.
The address is off chip if the first, most significant bit is 1, otherwise it on chip.
IP Addr [32]
Port [8] Next Node [19]
LMPA[] New [1]
Univ. of Tehran Adv. topics in Computer Network
46
Search Path
GetLen Module: This module get the length of prefixes. We add 1 to the end of a prefix and then padded with ‘0’s to make it 33 bits.Ex. 11011010 110110101000…0 (33 bits).Then, we should start from right and the first
‘1’ we meet, the rest is the prefix length.GetLen can be implemented as a multiplexer
with case statement (32 case statement) and it can be done in one clock cycle.
Univ. of Tehran Adv. topics in Computer Network
47
Search Path
Compare Module: compare two prefix A and B with lengths L1 and L2.Assume L1>L2 and A[1:L2] is the first L2
bit from A, Then, 1. If A[1:L2] = B A and B match. If A[L2+1]
= 0 A B. Otherwise, A> B.2. If A[1:L2] > B A >B, otherwise A<B.
One of the prefixes here is IP address with length 32.
We assume there are no two identical elements in the tree.
Univ. of Tehran Adv. topics in Computer Network
48
Search Path
Next Module: Get the next node address to read and also the matching prefix and its corresponding port number. It gets two signals for each prefix, Match
and ComResult (compare),Match =1 the prefix match, ComResult = 1 Prefix is bigger.It gets the left address of the first prefix, from
the left, such that its ComResult signal is 1.It compares the matching prefix lengths and
the get the one with the largest length.
Univ. of Tehran Adv. topics in Computer Network
49
Search Path
Dispatch Module: forms the Routing Group Address, RGA, from the port number and send it with packet stored memory address (PSMA) or CID.
RGA is a 64 bit size bit map. The bit correspond to port number is set to 1.
PSMA is dispatched first and Port and DLL address follows.
Cashing Module: keep a cash of IP address and corresponding port.
IP address [32]
Port[8]
Univ. of Tehran Adv. topics in Computer Network
50
Search Path
Cashing Module:The cash is kept as a FIFO and its depth
depends on the technology.Check IP address in FIFO.If the address found, then,
assert found signal.write IP address on top of FIFO if it is not there already.
Else write IP address on top of FIFO
Cashing system always removes the last reference IP address from the cash.
Univ. of Tehran Adv. topics in Computer Network
51
Search ProcessSearch Process
operations: This is for large prefixes (50K &up)cashingcashing
PipingPiping
Off Mem Off Mem
Time
DispatchDispatch
PipingPipingPipingPiping PipingPiping
20 5 8 11 14 16 19
• This operation is for an IP address lookup• Piping is the bottleneck in the system and in ave. take 5 cycles.• Assuming 100 MHZ operation
# of packets = 109/50 = 20 MillionLine speed = 512x20 = 10.24 G for 64 byte packets.
= 256x8x20= 41 G for 256 byte packets.• It is possible to support higher speeds with duplicating pipe.
PipingPiping
rootroot On chip memory nodesOn chip memory nodes Leaf nodeLeaf node
Univ. of Tehran Adv. topics in Computer Network
52
Output pins
Pin Name Type Number Comment
DataIn (IP Addr) Input 32 IP address, from parser
DataOut(Port&cid)
Output 32 Port and cid, to schedular
DataBus(cpu) In/out 32 CPU data bus
CtrlBus(cpu) In/out 12 CPU control bus
MemData2(Tree) In/out 205 If off chip is used
MemAddr2(Tree) In/out 18 If off chip is used
MemCtrl1(Tree) In/out 8? If off chip is used
Total 340 This value can change around 10 percent