enabling high throughput and virtualization for traffic
TRANSCRIPT
![Page 1: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/1.jpg)
Enabling High Throughput and Virtualization for Traffic Classification
on FPGA
Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering
University of Southern California Los Angeles, CA 90089
![Page 2: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/2.jpg)
Outline
2
• Introduction • Background • Algorithms & Architecture • Evaluation • Conclusion
![Page 3: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/3.jpg)
Outline
3
• Introduction – Routing devices, virtualization
• Background • Algorithms & Architecture • Evaluation • Conclusion
![Page 4: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/4.jpg)
Internet & Routing Devices
4 [1] http://www.kpcb.com/internet-trends
[1] • Internet – Internet users ↑ – network traffic ↑
• Gateway/router/switch – Packet forwarding – Access control – Packet/traffic classification
![Page 5: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/5.jpg)
Trend: Virtual Machines
5
• Hardware Virtualization – Sharing expensive hardware devices – e.g., a virtualized classification engine – Require efficient dynamic updates !
![Page 6: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/6.jpg)
Outline
6
• Introduction • Background
– Traffic Classification – Related works – Motivations
• Algorithms & Architecture • Evaluation • Conclusion
![Page 7: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/7.jpg)
Internet Traffic Classification
7
• Definition – Classifying traffic into application classes – Based on various features extracted from traffic
• e.g., port numbers, packet length, etc. – Extract features from
• Packet headers → accuracy issue • Deep packet inspection → privacy issue
– Major challenges • Performance (e.g., throughput) • Hardware virtualization
![Page 8: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/8.jpg)
Existing Approaches (1)
8
• Port-number-based – Transport-layer port number
• e.g., HTTP: 80, FTP control: 21 – Dynamic port → no longer reliable
• Deep Packet Inspection (DPI) – Slow, privacy issue, etc.
• Heuristic-based
– Large training data, low accuracy
![Page 9: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/9.jpg)
Existing Approaches (2)
9
• Machine Learning (ML)
– Supervised / unsupervised – Extract features – Discretization
• Into unique numbers – Decision-making
• Determine traffic class • e.g., C4.5 decision-tree
Our focus
![Page 10: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/10.jpg)
Existing Approaches (3)
10
• C4.5 decision-tree – Range-trees (discretization) + decision-tree (decision-making) – Not easy for dynamic updates
![Page 11: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/11.jpg)
Motivation
11
• Facilitate Dynamic Updates – Combine range-trees and decision-tree – Architectural support
• Explore Parallel Searches – Search multiple root-to-leaf paths in parallel – Search multiple features in parallel
![Page 12: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/12.jpg)
Outline
12
• Introduction • Background • Algorithms & Architecture
– Converting Tree to Rule Set Table (RST) – Modular Processing Element (PE) – 2-dimensional Pipelined Architecture – Efficient Dynamic Updates
• Evaluation • Conclusion
![Page 13: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/13.jpg)
Converting Decision-tree
13
• Conversion techniques – Each root-to-leaf path ↔ a set of criteria – All the criteria of a path → an entry of RST
𝑁 leaf nodes
Check feature 0 Check feature 1
Check feature 2
Feature 0 Feature 1 Feature 2
Rule
Set
Tab
le
(RST
)
![Page 14: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/14.jpg)
Architecture (1)
14
• Modular Processing Element (PE) – Compare a stride of input against a stride of a range boundary
• 2-dimensional Pipelined Architecture – Propagate packet headers vertically (f_in / f_out) – Propagate partial / final results horizontally
• Snake-like Pipeline – Scan chain for writing data into data memory (wr_in / wr_out)
![Page 15: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/15.jpg)
Architecture (2)
15
Empty PE used for updates
![Page 16: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/16.jpg)
Virtualization
16
• Dynamic Updates – Tree update ↔ deletion, insertion, modification of rules on RST – e.g., a top-down ping-pong update on the first column
![Page 17: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/17.jpg)
Outline
17
• Introduction • Background • Algorithms & Architecture • Evaluation
– Throughput, resource consumption – Comparison with prior works
• Conclusion
![Page 18: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/18.jpg)
Experimental Setup
18
• FPGA – XC7VX1140t FLG1930 -2L – 1100 I/O, 218800 logic slices – 67 Mb BRAM, up to 17 Mb distributed RAM (dual-ported) – Verilog on Vivado Design Suite 2013.4
• Data Sets – Real decision-tree: 42-levels, 96 leaves – Similar synthetic trees – 6 features, 8 application classes
![Page 19: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/19.jpg)
Throughput
19
• Parameterized Design – Best: each PE compares 3-bit strides against 2 boundaries
• Overall Throughput – Sustain 2× throughput of state-of-the-art
Parameters State-of-the-art This work
Feature With 80 160 240 320 80 160 240 320
Max
. no.
of
leav
es 32 512 493 458 421 759 715 701 679
64 437 425 401 374 705 682 653 621
96 399 382 354 305 645 620 609 588
![Page 20: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/20.jpg)
Resource Consumption
20
• No. of used logic slices, I/O pins, etc. – Set max. no. of leaves : 64 – Baseline: two copies of the same architecture
![Page 21: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/21.jpg)
Comparison
21
Approach Algorithm Platform Throughput
w/o
dyn
amic
upd
ates
Este’s SVM Dual Xeon, 2.6 GHz, 24 cores, 48 GB RAM 2 MCPS
Gringoli’s SVM Dual Xeon, 2.6 GHz, 24 cores, 48 GB RAM 7 MCPS
Groleat’s SVM Synthesized on Virtex-5 290 MCPS
Tong’s C4.5 Place-and-routed on Virtex-7 399 MCPS
w/
dyna
mic
up
date
s Jiang’s 𝑘-NN Place-and-routed on Virtex-5 125 MCPS
This work C4.5 Place-and-routed on Virtex-7 645 MCPS
![Page 22: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/22.jpg)
Outline
22
• Introduction • Background • Algorithms & Architecture • Evaluation • Conclusion
![Page 23: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/23.jpg)
Conclusion
23
• Conclusion – Based on a conversion from decision-tree to RST – Parallel searching on FPGA – Achieve high performance & scalability
• Future work – Power, latency, etc. – Self-learning mechanism
![Page 24: Enabling High Throughput and Virtualization for Traffic](https://reader033.vdocuments.us/reader033/viewer/2022051204/6277e85d85ac2066c875adc6/html5/thumbnails/24.jpg)
Questions?
Thanks
http://ganges.usc.edu