1 fabio vitucci - viii workshop pisatel - december, 6th 2005 - sssup titolo tesi viii workshop...
TRANSCRIPT
1 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
TITOLOTESI
VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Gruppo RETI di TELECOMUNICAZIONI
Dipartimento di Ingegneria dell’Informazione - Università di Pisa
Ing. Fabio Vitucci
DESIGN AND IMPLEMENTATIONOF A MULTI-DIMENSIONAL
PACKET CLASSIFIER FOR NETWORK PROCESSORS
2 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Outline
• Resume of previous activities
• Implementation of classification module
• Programming problems
• Measurements
• Future works
• Conclusions
3 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Resume of previous activities/1
• Detailed analysis of the Intel® IXP2400 Network Processor and the available board (Radysis ENP-2611)
• Choice of a proper application to be implemented on NPs: a packet classification
• Comparative analysis among many research algorithms
Source Address
Layer 4 Destination
Layer 4 Protocol
... Rule
11.14.2.21 www TCP ... R1
13.11.23.* gt 1023 TCP ... R2
112.*.*.* www UDP ... R3
4 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Resume of previous activities/2
Comparative analysis among many research algorithms
Algorithm Worst case Time Worst Case Storage
Linear Search O(N) O(N)
Hierarchical tries O(WD) O(NDW)
Set-pruning tries O(WD) O(ND)
Grid-of-tries O(WD-1) O(NDW)
Cross-producting O(DW) O(ND)
Area-Based Quadtree O(NW) O(W)
FIS-tree O((L+1)W) O(LN1+1/L)
RFC O(D) O(ND)
Bitmap-intersection O(DW+N/W) O(DN2)
HiCuts O(D) O(ND)
Ternary CAMs O(1) O(N)
N = number of entries W = maximum number of bit for level
D = number of fields to be processed L = number of level of data structure
5 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Multidimensional Multibit Trie
• Fields: – IP Source Address and IP Destination Address
– Layer 4 Source Port and Destination Port
– Layer 4 Protocol Type
• Hierarchical trie: a tree per dimension– Many levels for dimension
– A fixed number of bits for level
• Performance parameters:– Research speed: 5×O(W/K)
– Memory accesses: 12
– Storage complexity: 5×O(2(k-1)×N×W/K)
Resume of previous activities/3
SA Trie
DA Trie
SP Trie
DP Trie
PR Trie
6 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
• Main bound:– Memory consumption– Rules with unspecified fields (e.g. 131.114.*.*) need
explosion of all possible rules
• Modifications:– A level transition in case of wild-cards
• Less number of nodes
• Sometimes more memory accesses
• More complexity
• Validation tests with a C simulator– Large saving in memory consumption (table in SRAM)– Small increase in instruction store size
Resume of previous activities/4
7 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/1
Packet _RX
MSF
uE 0:0
ETH_RX_TO_IPV4_SRC_RING
ID = 4 (x 4)BASE_ADDRESS = 0SIZE = 1024
Eth_Decap_Classify
uE 0:2
IPv4 Fwd L2_Validate
IPV4_TO_QM_SCR_RINGalias QM_RING_INalias QM_RING_IN_0alias ENQ_RING_NUMBER
ID = 5 (x 4)BASE_ADDRESS = 4096SIZE = 512
Packet_QM
uE 0:3
SCHEDULER_TO_QM_SCR_RINGalias QM_RING_IN_1alias DEQ_RING_NUMBER
ID = 6 (x 4)BASE_ADDRESS = 6144SIZE = 512
Scheduler
uE 1:0
Eth_Decap_Classify
uE 0:1
IPv4 Fwd L2_Validate
Eth_Decap_Classify
uE 1:3
IPv4 Fwd L2_Validate
Sphy_mphy4_tx
uE 0:3 uE 1:1
MSF
QM_TO_PACKET_TX_SCR_RING_0alias PACKET_TX_IN_0
ID = 7 (x 4)BASE_ADDRESS = 8192SIZE = 128
NN_RING
Reflector Bus
IPv4 Forwarder Intel
8 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Packet _RX
MSF
uE 0:0
ETH_RX_TO_IPV4_SRC_RING
ID = 4 (x 4)BASE_ADDRESS = 0SIZE = 1024
Eth_Decap_Classify
uE 0:2
IPv4 Fwd
L2_Validate
IPV4_TO_QM_SCR_RINGalias QM_RING_INalias QM_RING_IN_0alias ENQ_RING_NUMBER
ID = 5 (x 4)BASE_ADDRESS = 4096SIZE = 512
Packet_QM
uE 0:3
SCHEDULER_TO_QM_SCR_RINGalias QM_RING_IN_1alias DEQ_RING_NUMBER
ID = 6 (x 4)BASE_ADDRESS = 6144SIZE = 512
Scheduler
uE 1:0
Eth_Decap_Classify
uE 0:1
IPv4 Fwd
L2_Validate
Eth_Decap_Classify
uE 1:3
IPv4 Fwd
L2_Validate
Sphy_mphy4_tx
uE 0:3 uE 1:1
MSF
QM_TO_PACKET_TX_SCR_RING_0alias PACKET_TX_IN_0
ID = 7 (x 4)BASE_ADDRESS = 8192SIZE = 128
NN_RING
Reflector Bus
Classify
Classify
Classify
Implementation of module/1
IPv4 Forwarder Intel
9 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
• Functions of XScale (implemented in C language): – Receiving classification rules– Building multidimensional trie according to received rules to
calculate the number of nodes per level and SRAM addresses– Rebuilding multidimensional trie to put data in SRAM to
precalculated addresses
• Functions of Microengines:– Receiving packets– Retrieving proper fields to packet headers– Finding matching rules using data structure in SRAM– Modifying TOS fields
Implementation of module/2
10 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
• Functions of XScale (implemented in C language): – Receiving classification rules– Building multidimensional trie according to received rules to
calculate the number of nodes per level and SRAM addresses– Rebuilding multidimensional trie to put data in SRAM to
precalculated addresses
• Functions of Microengines:– Receiving packets– Retrieving proper fields to packet headers– Finding matching rules using data structure in SRAM– Modifying TOS fields
Implementation of module/2
11 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
• Functions of XScale (implemented in C language): – Receiving classification rules– Building multidimensional trie according to received rules to
calculate the number of nodes per level and SRAM addresses– Rebuilding multidimensional trie to put data in SRAM to
precalculated addresses
• Functions of Microengines:– Receiving packets– Retrieving proper fields to packet headers– Finding matching rules using data structure in SRAM– Modifying TOS fields
Implementation of module/2
12 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Implementation of module/3
index of node * index of node of 2nd level index of node of 2nd level index of node of 2nd level
index of node of 2nd level index of node of 2nd level index of node of 2nd level index of node of 2nd level
index of node * value of field index of next node
value of field index of next node value of field index of next node
index of node * value of field index of next node
value of field index of next node value of field index of next node
index of node * index of next node
minimumvalue maximum value
index of node * index of next node
minimumvalue maximum value
index of node * value of field number of rule
value of field number of rule value of field number of rule
long word
SRAM Data Table
13 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
• Functions of µ-engines (implemented in µ-code assembler):– Receiving packets– Retrieving proper fields to packet headers– Finding matching rules using data structure in SRAM– Modifying TOS fields
• Number of added cycles: 1600– 50 = memory registers initialization– 180 = reading first node– 150 × 2 = reading nodes of ports– 145 × 7 = reading other nodes– 15 = final matching– 40 = writing TOS field
Implementation of module/4
14 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/1
• Main problems:– Number of SRAM accesses– Rate of SRAM accesses
Packet _RX
MSF
uE 0:0
ETH_RX_TO_IPV4_SRC_RING
ID = 4 (x 4)BASE_ADDRESS = 0SIZE = 1024
Eth_Decap_Classify
uE 0:2
IPv4 Fwd
L2_Validate
IPV4_TO_QM_SCR_RINGalias QM_RING_INalias QM_RING_IN_0alias ENQ_RING_NUMBER
ID = 5 (x 4)BASE_ADDRESS = 4096SIZE = 512
Packet_QM
uE 0:3
SCHEDULER_TO_QM_SCR_RINGalias QM_RING_IN_1alias DEQ_RING_NUMBER
ID = 6 (x 4)BASE_ADDRESS = 6144SIZE = 512
Scheduler
uE 1:0
Eth_Decap_Classify
uE 0:1
IPv4 Fwd
L2_Validate
Eth_Decap_Classify
uE 1:3
IPv4 Fwd
L2_Validate
Sphy_mphy4_tx
uE 0:3 uE 1:1
MSF
QM_TO_PACKET_TX_SCR_RING_0alias PACKET_TX_IN_0
ID = 7 (x 4)BASE_ADDRESS = 8192SIZE = 128
NN_RING
Reflector Bus
Classify
Classify
Classify
15 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
We want to reduce the idle time
Programming problems/2Multithreaded Programming
running thread context swap idle thread idle µe
µe control memory access latency
time
thread 0
thread 1
thread 2
thread 3
thread 4
thread 5
thread 6
thread 7
16 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/3Stalling
running thread context swap idle thread idle µe
µe control memory access latency
time
thread 0
thread 1
thread 2
thread 3
time
thread 0
thread 1
thread 2
thread 3
• Decrease the number of active threads for µ-engine
17 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Programming problems/4
• Filling
running thread context swap idle thread idle µe
µe control memory access latency
time
thread 0
thread 1
thread 2
thread 3
• Consolidate adjacent memory accesses
time
thread 0
thread 1
thread 2
thread 3
18 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
AdTech AX4000
Cross-Compiler(XScale programming)
Serial Cable
Developers’ Workbench(Microengines Programming)
Measurements/1
19 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/2ADTech AX4000
20 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/3• Max packet rate: 2033000 pkt/s (0 lost packets)• Number of supported rules: 10000• Performance indipendent from number of rules• A fundamental feature: robustness
21 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Measurements/4• Packet delay
35 μsec
100 μsec 1130 μsec
22 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Future Works: Resources/Link Scheduler
MSF MSFPacket _RX
uE 0:0
Classifier ResourceScheduler
uE 1:0
uE 1:3
Packet_TX
uE 1:2
LinkScheduler
uE 1:1
Scratchpad Memory
uE 0:3
uE 0:2
uE 0:1
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
uE 0:2
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
uE 0:3
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
uE 1:3
Eth_DecapClassify
IPv4 Fwd
L2_Validate
FB
23 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
Conclusions
• Analyse the Intel® IXP2400 hardware architecture• Select a proper algorithm of packet classification for the IXP2400• Modify the algorithm to capitalize properties of our hardware• Build a C Simulator to test the new version
• Implement XScale functions in C language (building rule table)• Implement μ-engines functions in µ-code (finding matching rule)
• Analyse multithreaded programming• Study stalling, filling, and other “phenomenons”
• Test working and performance of the classifier• Characteristics: 1600 added cycles, 2 Mpkt/s, 10000 rules
supported, scalability, robustness in case of congestion
24 Fabio Vitucci - VIII Workshop PisaTel - December, 6th 2005 - SSSUP
TITOLOTESI
Workshop PisaTel - December 6th 2005 - SSSUP
Gruppo RETI di TELECOMUNICAZIONI
Dipartimento di Ingegneria dell’Informazione - Università di Pisa
Ing. Fabio Vitucci
DESIGN AND IMPLEMENTATION OF A MULTI-DIMENSIONAL
PACKET CLASSIFIER FOR NETWORK PROCESSORS