1 ents689l: packet processing and switching classification engines classification engines vahid...
Post on 21-Dec-2015
224 Views
Preview:
TRANSCRIPT
1ENTS689L: Packet Processing and SwitchingClassification Engines
Classification Engines
Vahid Tabatabaee
Fall 2007
2ENTS689L: Packet Processing and SwitchingClassification Engines
References
Pankaj Gupta, “Lookups and Classification presentation,” Lecture notes of EE384Y: Packet Switch Architecture” course of Prof. Nick McKeown in Stanford University” available online at http://www.stanford.edu/class/ee384y/
Pnkaj Gupta, Nick McKeown, "Algorithms for Packet Classification,” IEEE Network, March 2001.
Title: Network Processors Architectures, Protocols, and PlatformsAuthor: Panos C. Lekkas, Publisher: McGraw-Hill
3ENTS689L: Packet Processing and SwitchingClassification Engines
Two General Classification Problems
Look-up and Classification: It is mainly used in simple packet routing switching
context. It consists of the identification of correct output port,
channel or interface that the packet should be forwarded.
This decision is based on the destination address. Deep Packet Classification:
A packet must be distinguished among several others.
It is based on several internal bit fields of variable length or format.
4ENTS689L: Packet Processing and SwitchingClassification Engines
Deep Packet Classification
Distinguished: Different processing awaits each packet after it is singled out. These different types of processing corresponds to flows.
Several: Simultaneous application of multiple rules.
Internal: The bits may by buried deeper inside the packet and they are not conveniently located at fixed position on the header.
Variable length or format: They are not as straight forward 32-bit addresses, but they can represent range of values and can be of variable length such Uniform Resource Locators (URL).
5ENTS689L: Packet Processing and SwitchingClassification Engines
Algorithms and Data Structures to Support Lookup and Forwarding
6ENTS689L: Packet Processing and SwitchingClassification Engines
Binary Search Trees
In computer science, a binary search tree (BST) is a binary tree which has the following properties: Each node has a value. A total order is defined on these
values. The left subtree of a node contains
only values less than the node's value.
The right subtree of a node contains only values greater than or equal to the node's value.
Lookup time O(log N), but independent of address length From Wikipedia, the free encyclopedia
)(log NO
7ENTS689L: Packet Processing and SwitchingClassification Engines
Binary Search Tries
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are strings.
Looking up keys is faster. Looking up a key of length m takes worst case O(m) time. Independent of the table size.
Tries can require less space when they contain a large number of short strings, because the keys are not stored explicitly and nodes are shared between keys O(NW).
Tries help with longest-prefix matching, where we wish to find the key sharing the longest possible prefix with a given key efficiently.
From Wikipedia, the free encyclopedia
tennistent
8ENTS689L: Packet Processing and SwitchingClassification Engines
Tries for Exact Matches in Ethernet Switches
We do not need to chase one bit at a time. We can trade memory for search time. Pointer 0, means no children. Storage is O(NW), N number of entries and W is width of them.
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
Source: http://www.stanford.edu/class/ee384x/
9ENTS689L: Packet Processing and SwitchingClassification Engines
Trade off between speed and memory size
As the degree increases more and more pointers are 0 (wasted).
Degree ofTree
# MemReferences
# Nodes(x106)
Total Memory(Mbytes)
FractionWasted (%)
2 48 1.09 4.3 494 24 0.53 4.3 738 16 0.35 5.6 8616 12 0.25 8.3 9364 8 0.17 21 98256 6 0.12 64 99.5
Table produced from 215 randomly generated 48-bit addresses
Source: http://www.stanford.edu/class/ee384x/
10ENTS689L: Packet Processing and SwitchingClassification Engines
Tries for Longest Prefix Match
P1
111*
H1
P2
10* H2
P3
1010*
H3
P4
10101
H4
P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1
next-hop-ptr (if prefix)
left-ptr right-ptr
Trie node
11ENTS689L: Packet Processing and SwitchingClassification Engines
Tries for Longest Prefix Match
P1
111*
H1
P2
10* H2
P3
1010*
H3
P4
10101
H4
P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1
Lookup 10111
next-hop-ptr (if prefix)
left-ptr right-ptr
Trie node
12ENTS689L: Packet Processing and SwitchingClassification Engines
Tries for Longest Prefix Match
P1
111*
H1
P2
10* H2
P3
1010*
H3
P4
10101
H4
P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1 Add P5=1110*
I
0
P5
next-hop-ptr (if prefix)
left-ptr right-ptr
Trie node
13ENTS689L: Packet Processing and SwitchingClassification Engines
Radix Trie
For W bit prefixes and N routes:Lookup Complexity: O(W)Storage Complexity: O(NW)Update Complexity: O(W)
Advantages:SimplicityExtensible to wider fields and larger tables
Disadvantage:Waste of memoryWorst-case look-up slow
14ENTS689L: Packet Processing and SwitchingClassification Engines
Leaf Pushing Technique
Leaf pushing reduces the amount of information stored in each table entry.
The best match information is pushed to the leaf nodes. Each table entry contains either a pointer or next hop information.
A
B
C
G
D
E
1
0
0
1
1
left-ptr or next-hop
Trie node
right-ptr or next-hop
P2
P4P3
P2
P1
P1
111*
H1
P2
10* H2
P3
1010*
H3
P4
10101
H4
15ENTS689L: Packet Processing and SwitchingClassification Engines
A
B
C
G
D
E
1
0
0
1
1
P2
P4P3
P2
P1P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1
Leaf Pushing Technique
16ENTS689L: Packet Processing and SwitchingClassification Engines
Incremental Rebuilding with Leaf Pushing
Information changes at a node close to the root can potentially change a large number of leaves.
Add P5=1*
A
B
C
G
D
E
1
0
0
1
1
P2
P4P3
P2
P1
A
B
C
G
D
E
1
0
0
1
1
P2
P4P3
P2
P1P5
17ENTS689L: Packet Processing and SwitchingClassification Engines
Multi-bit Tries
Faster Search Larger Memory
Depth = WDegree = 2Stride = 1 bit
Binary trieW
Depth = W/kDegree = 2k
Stride = k bits
Multi-ary trie
W/k
18ENTS689L: Packet Processing and SwitchingClassification Engines
Prefix Expansion with Multi-bit Tries
If stride = k bits, prefix lengths that are not a multiple of k need to be expanded
Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-1
Prefix Expanded prefixes
0* 00*, 01*
11* 11*
E.g., k = 2:
19ENTS689L: Packet Processing and SwitchingClassification Engines
Example 4-ary Trie
P2
P3 P12
A
B
F11
next-hop-ptr (if prefix)
ptr00 ptr01
A four-ary trie node
P11
10
P42
H11
P41
10
10
1110
D
C
E
G
ptr10 ptr11
Lookup 10111
P1
111*
H1
P2
10* H2
P3
1010*
H3
P4
10101
H4
20ENTS689L: Packet Processing and SwitchingClassification Engines
Memory expansion in Multi-bit Tries
Replication of next-hop ptr (more leaf nodes) Greater number of unused (null) pointers in a
node: (2k child not only 2)
Time ~ W/kStorage ~ NW/k * 2k-1
21ENTS689L: Packet Processing and SwitchingClassification Engines
Generalization: Different Strides at different levels.
16-8-8 split4-10-10-8 split24-8 split21-3-8 split
22ENTS689L: Packet Processing and SwitchingClassification Engines
Deep Packet Classification
Checking Multiple Fields
23ENTS689L: Packet Processing and SwitchingClassification Engines
Motivation: Desire for Additional Services
ISP1NAP
E1
ISP2
ISP3X
Service Example
Differentiated Service
Ensure that traffic from ISP2 is given higher priority over traffic from ISP3.
Packet Filtering
Deny all web traffic from ISP3 at interface X.
Policy-based routing
Ensure that all web traffic from ISP2 is sent via interface Z.
Y
Z
Other examples: Accounting & billing, rate-limiting, etc.
24ENTS689L: Packet Processing and SwitchingClassification Engines
Special Processing Requires Identification of Flows
All packets of a flow obey a pre-defined rule and are processed similarly by the router
E.g. a flow = (src-IP-address, dst-IP-address), or a flow = (dst-IP-prefix, protocol) etc.
Router needs to identify the flow of every incoming packet and then perform appropriate special processing based on negotiated service agreements
25ENTS689L: Packet Processing and SwitchingClassification Engines
Special processing
Control
Datapath:(per-packet processing)
Routing lookup
Flow-aware Router: Basic Architectural Components
Routing, resource reservation, admission control, SLAs
Packet classification
Switching
Scheduling
26ENTS689L: Packet Processing and SwitchingClassification Engines
Multi-field Packet Classification
Packet Classification: Find the action associated with the highest priority rule matching an incoming packet header.
Field 1 Field 2 … Field k Action
Rule 1
5.3.40.0/21
2.13.8.11/32
… UDP A1
Rule 2
5.168.3.0/24
152.133.0.0/16
… TCP A2
… … … … … …
Rule N
5.168.0.0/16
152.0.0.0/8
… ANY AN
Example: packet (5.168.3.32, 152.133.171.71, …, TCP)
L3-DA L3-SA L4-PROT
27ENTS689L: Packet Processing and SwitchingClassification Engines
Example 4D Classifier
Rule L3-DA (address/mask)
L3-SA (address/mask)
L4-Destination
L4-PROT Action
R1 152.163.190.69/
255.255.255.255
152.163.80.11/
255.255.255.255
* * Deny
R2 152.168.3/
255.255.255
152.163.200.157/255.255.255.255
eq www udp Deny
R3 152.168.3/
255.255.255
152.163.200.157/255.255.255.255
range 20-21 udp Permit
R4 152.168.3/
255.255.255
152.163.200.157/255.255.255.255
eq www tcp Deny
R5 * * * * Deny
28ENTS689L: Packet Processing and SwitchingClassification Engines
Example Classification Results
Pkt Hdr
L3-DA L3-SA L4-DP L4-PROT Rule, Action
P1 152.163.190.69 152.163.80.11 www tcp R1, Deny
P2 152.168.3.21 152.163.200.157 www udp R2, Deny
29ENTS689L: Packet Processing and SwitchingClassification Engines
R5
Geometric Interpretation
R4
R3
R1R2
R7
Dimension 1
Dim
ensi
on 2
R6
P2 P1
Packet classification problem: Find the highest priority rectangle containing an incoming point
30ENTS689L: Packet Processing and SwitchingClassification Engines
Metrics for Classification Algorithms
Speed Storage requirements Ability to handle large classifiers Low preprocessing time Update time Scalability in the number of header fields Flexibility in rule specification
31ENTS689L: Packet Processing and SwitchingClassification Engines
Linear Search
Keep rules in a linked listO(N) storage, O(N) lookup time, O(1) update complexity
32ENTS689L: Packet Processing and SwitchingClassification Engines
TCAMs (Recap)
Advantages
Extensible to multiple fieldsFast: 6-8 ns today (133-150 searches per second) going to 250 MspsSimple to understand and use
Disadvantages
Inflexible: range-to-prefix blowupPower: ~15-20W @ 100MspsCost: $200-$250 for ~2MByteDensity: largest available in 2006 is ~2MB, i.e., 128K x 128 (can be cascaded)Tough memory soft-error problem
33ENTS689L: Packet Processing and SwitchingClassification Engines
Example Classifier
Rule Destination Address
Source Address
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
34ENTS689L: Packet Processing and SwitchingClassification Engines
Hierarchical Tries
Dimension DA
O(NW) memoryO(W2) lookup
Rule
DA
SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
Search (000,010)
Dimension SAR5 R2 R1
R3R6
R7
R4
35ENTS689L: Packet Processing and SwitchingClassification Engines
Set-pruning Tries [Tsuchiya, Sri98]
Reduced query time obtained by replicating rules to eliminate traversals
Dimension DA
Rule
DA
SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
R7 Dimension SAR2 R1 R5 R7 R2 R1
R3
R7
R6
R7
R4
O(WN2) memoryO(2W) lookup
36ENTS689L: Packet Processing and SwitchingClassification Engines
Set-pruning Tries [Tsuchiya, Sri98]
Reduced query time obtained by replicating rules to eliminate traversals
Dimension DA
Rule
DA
SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
R7 Dimension SAR2 R1 R5 R7 R2 R1
R3
R7
R6
R7
R4
O(WN2) memoryO(2W) lookup
Search (000,010)
37ENTS689L: Packet Processing and SwitchingClassification Engines
Recursive Flow Classification
It looks at classification as mapping S bits onto T bits. S bits are concatenation of all fields T bits represent classification outcomes It breaks down the mapping task into multiple stages At each stage one set of values is mapped to a smaller set
38ENTS689L: Packet Processing and SwitchingClassification Engines
RFC Algorithm
1. In the first phase, fields of the packet header are split up into multiple chunks that are used to index into multiple memories in parallel. The contents of each memory are chosen so that the result of the lookup is narrower than the index.
2. In subsequent phases, memories are indexed using the results from earlier phases.
3. In the final phase, the memory yields the action d
39ENTS689L: Packet Processing and SwitchingClassification Engines
RFC Performance
RFC is shown to perform 31.25 Mpps classification using a three-stage pipeline.
It requires two 4Mb SRAM. Four banks of 64Mb SDRAM under 125 MHz
clock rate. It is estimated to do 15000 rules in 10 Gbps
40ENTS689L: Packet Processing and SwitchingClassification Engines
Classification: What’s Used Out There?
Majority of hardware platforms: TCAMsHigh performance, cost, power, determinstic worst-
case Some others: Modifications of RFC
Low speed, low cost DRAM-based, heuristicWorks well in software platforms
Some others: HyperCuts/HiCuts Others: nothing/linear search/simulated-parallel-search
etc.
41ENTS689L: Packet Processing and SwitchingClassification Engines
Lookup: What’s Used Out There?
Overwhelming majority of routers:Modifications of multi-bit tries (h/w optimized trie
algorithms)DRAM (sometimes SRAM) based, large number of
routes (>0.25M)Parallelism required for speed/storage becomes
an issue Others mostly TCAM based
Allows sharing the same TCAM for both lookup and classification
42ENTS689L: Packet Processing and SwitchingClassification Engines
Packet Classification: References
F. Baboescu and G. Varghese, “Scalable packet classification,” Proc. Sigcomm 2001 [Lak98] T.V. Lakshman. D. Stiliadis. “High speed policy based packet forwarding
using efficient multi-dimensional range matching”, Sigcomm 1998, pp 191-202 K. Lakshminarayanan, A. Rangarajan and S. Venkatachary. “Algorithms for advanced
packet classification with Ternary CAMs”, Sigcomm 2005. [Sri98] V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. “Fast and scalable layer
4 switching”, Sigcomm 1998, pp 203-214 [Grid-of-tries, crossproducting] V. Srinivasan, G. Varghese, S. Suri. “Fast packet classification using tuple space
search”, Sigcomm 1999, pp 135-146 P. Gupta, N. McKeown, “Packet classification using hierarchical intelligent cuttings,”
Hot Interconnects VII, 1999 [Gupta99] P. Gupta, N. McKeown, “Packet classification on multiple fields,” Sigcomm
1999, pp 147-160 [RFC] P. Gupta, “Algorithms for routing lookups and packet classification”, PhD Thesis, Ch
1 and 4, Dec 2000, available at http://yuba.stanford.edu/ ~pankaj/phd.html [Background and introduction to Classification]
P. Gupta and N. McKeown, “Algorithms for packet classification,” IEEE Network, March/April 2001, vol. 15, no. 2, pp 24-32
S. Singh, F. Baboescu, G. Varghese and J. Wang, “Packet classification using multidimensional cutting,” Proc. ACM Sigcomm 2003. [HyperCuts]
S. Iyer, R.R. Kompella, and A. Shelat, “ClassiPI: An architecture for fast and flexible packet classification,” IEEE Network, March/April 2001, vol. 15, no. 2, pp 33-41
TCAM vendors: netlogicmicro.com, idt.com
top related