fpga based 10g performance tester for hw openflow switch
TRANSCRIPT
Why (data plane) Performance Test needs for HW OpenFlow switch?
• There are some “Conformance Test” activities
• RYU Certification
• ONF PlugFest
• How about “Performance Test” ?
• Lack of it, you may fall into the pitfall.
• “It works, but too slow”
Typical Story : Here is a Flow Entry on the OpenFlow HW Switch…
• 2 possibilities to handle it, by Hardware (ASIC) or Software (CPU).
• It is the same functionally, but 1000 times difference in latency. ( μsec vs msec )
• It is not always documented. (basically, no reason to confess it for vendors)
• Features reply is not enough.
• May be depends on the version of the firmware and NOS of the switch.
• No easy & straight way to know it.
• Imagine, what happen when you update your firmware, NOS or OF App…..
Real Example? Here is.
OpenFlow Controller
Pica8 3290
Spirent
port#1 port#2
Dev. 2 Dev. 3
port#3
#1#2 #3
1. Spirent sends 64B length packets. 2. Pica8 has a flow entry to forward it from #2 to #3. 3. Spirent checks the latency.
Pica8 + Spirent experiment
In Simple and Basic configuration
• Just forwarding here to there (see below)
• Succeed to forward in wire speed. (1Gbps)
• Latency : Avg. 4.26, Min 4.13, Max 4.28 (usec)
cookie=0x0, duration=1379.649s, table=0, n_packets=0, n_bytes=0, idle_age=1379, in_port=1,dl_src=00:10:94:00:00:05 actions=output:2
Example of the flow entry:
looks fine!
Good! and Boom! results
• Good results
• MAC rewrite : no additional latency, no degradation of throughput.
• ToS rewrite : same as above
• Bad and Unexpected result
• IP rewrite : deadly slow. Avg. 140ms, Min 0.8ms, Max 350ms (boom!)
• over 1000 times slow throughput
cookie=0x0, duration=3.402s, table=0, n_packets=0, n_bytes=0, idle_age=3, ip,in_port=1,nw_src=192.85.1.5 actions=mod_nw_dst:192.85.1.16,output:2
Example of the flow entry:
Features Reply?
• It looks only VLAN, MAC treatment are available.
• In fact….
• ToS modification runs on the hardware.
• IP modification will fall back to the software.
• You never know if you never have a go.
root@PicOS-OVS#ovs-ofctl show br0OFPT_FEATURES_REPLY (xid=0x2): dpid:0000000000000111n_tables:254, n_buffers:256capabilities: FLOW_STATS TABLE_STATS PORT_STATS STP ARP_MATCH_IPactions: OUTPUT SET_VLAN_VID SET_VLAN_PCP SET_DL_SRC SET_DL_DST ENQUEUE ………
You can test by yourself : several options
• Buy Ixia or Spirent : very accurate but super expensive, just overkill
• PC + 10G NIC + Software : cheap but inaccurate
• not easy to tune and calibrate enough. yes you can, but not for everyone.
• FPGA + 10G I/F : not super-cheap but accuracy guaranteed
• time-stamped by hardware, in clock cycle. (8ns currently)
• all time-sensitive components run independently with PC as mothership.
• easy setup. just put the board and run controller app.
My project : FPGA based solution
Xilinx Kintex-7, 125MHz 10G (SFP+) x4 Hardware TCP/UDP implement PCIe gen2 x1 (just for control)
enough external memory
4x10G ports no need to use SAS this time
test scenario................
test scenario................
Host PC
Target Switch
FPGA + 10G I/Fs
monitor controller
RYU+ custom App
set packet pattern to FPGA
Operator's Browser
test scenario................
HTTP POST
result
oputput
includes : packet generate pattern + flow entries configuration
REST API
10G Ethernet
OpenFlow 1.x protocol
System Console(JavaScript App)
load
OF Controller
System Structurepacket generation/send/receive/counting will be
done in FPGA board
detail data
send packets & observe latency
Experiment #1 : 10G/1G stable forwarding measurement
IP DST mod
Match pattern Action
In-port X
Figure 1. 2. shows "ASIC" powered result. Every switch has different distributions, but all done in sub-micro seconds. Switch A did around 2.7μ in very steep. C has 9μ or around cause it is 1G switch.
020406080100120140160180200
2728
2736
2744
2752
2760
2768
2776
2784
2792
2800
2808
2816
2824
2832
2840
2848
2856
2864
2872
2880
pack
ets�
latency (ns)�
Figure 1. Switch A (10G) latency distribution
0"
20"
40"
60"
80"
100"
120"
8448"
8576"
8704"
8832"
8960"
9088"
9216"
9344"
9472"
9600"
9728"
9856"
9984"
packets(
latency((ns)(
Figure 2. Switch B (1G) latency distribution.
(as a proof of the accuracy)
Experiment #2 : Unexpected show forwarding (software fallback)
IP DST mod
Match pattern Action
IP SRC
Only add an IP SRC matching added, the Switch did "software fallback". (Fig 3) Around 350-500μ. But still 2.7% packets exist on the outside of the graph, far right. The slowest one over 10ms. And this case, 1000 times slower forwarding.
0
20
40
60
80
100
120
362496
372736
382976
393216
403456
413696
423936
434176
444416
454656
464896
475136
485376
495616
505856
516096
526336
536576
546816
557056
567296
577536
587776
598016
608256
618496
628736
638976
649216
659456
669696
679936
690176
700416
710656
720896
731136
741376
751616
761856
772096
782336
792576
pack
ets�
latency (ns)�
In this case, the maximum throughput is only 16Kpps. In 100Byte length packet, it means 12.8Mbps.
Figure 3. Switch B (1G) latency distribution, in software fallback situation
continue to right more...
In this case, the maximum throughput is only 16Kpps. As 100Byte length packets, it means 12.8Mbps.
Experiment #3 : When it will go slow?
In switch B case;IP matching and IP mod are able to handle by ASIC separately. But if you specify them at once, it will be slow. BUT IP matching and ToS mod are able to specify both at once!
Totally unexpected.... (sigh)
Use Case #1 Hunt the “killer entry” - unexpected slow processing order you may have
• OF Apps set the flow entries as their needs, but they don’t care about the performance.
• When your service has performance degradation, you need to make sure that “no killer entry” exists.
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
Your OpenFlow Network
flow entries
testbed switch
packet pattern
packet generator
observe latency
Performance Tester
send packets
set
visualize
collect(w counter info)
Use Case #2Comparison “before & after” about the update of SW driver or NOS
• Need to check the performance degradation BEFORE you apply the update to REAL network.
• For the future, need to see what happen if the flow entries and traffic will go double.
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
Your OpenFlow Network
flow entries X
flow entries Y
collect
before the update
after the updateflow entries
testbed switch
packet pattern
packet generator
observe latency
Performance Tester
send packets
setresult X
result Y
test & record
compare