robert olsson experiments & experiences with fib lookup...
TRANSCRIPT
What we hear/got
dst cache overflow reports RCU related
mistuned, misunderstod etc.
fib_lookup complaints what to expect
BSD comparisons. Radix-tree ToS/semantic questionable
fib_hash considered bad
Getting forward :)
“Infrastructure” for test & development
stats to understand what happenstools and setups to study
Preroute patches w.. Jamal 2004 pktgen DoS, scripts w. routing table
steady Linux API work to prepare to plugin new algos. Most from DaveM.
So much research Still so little usable for Linux
FIB overview
FIB vs. dst hash performance
fib_hashfib_hlistfib_hash2fib_trieclassifier lookup?unified lookup?
fib_hlist performance
fib_hlist fib_hash0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
Main title
dst cache
/24
rDoS 6 r
rDos 123kr
Note! Zero for fib_hlist :) Still decent many apps.
fib_hash2
Vargese inspired, use what got
2^24 hash lookup w. sorted hlist Makes /24 entries of plens 1-23
/0 special case. Huge...TABLE_LOCAL with a few entries
Idea was to test performance with the fastest algo we could think of.
Not for embedded system etc? :-)
Reduced it became fib_hlist
fib_hash2/route cache compare
route cache FIB lookup FIB lookup DoS0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
Row 1
fib_trie
First trie. In theory variable key length, 32, 128 bits etc
Algo for dynamic trie written in Java. Memoryleak and stack handling were problems.
Also prefix matching based on fib_sematic match
Cisco CEF has fixed 256 childs 8-8-8-8 or 16-8-8 (GSR)LC-trie is child size is dynamic 2-12 bits seen
Need to be verified. New netlink call to do fib_lookup
Can be improved...
fib_trie performance comparison
fib_hash fib_trie0
50100150200250300350400450500550600650700
forwarding kpps
Linux 2.6.16 1 CPU used(SMP) Opteron 1.6 GHz e1000
dsh hash
5 r single flow
5 r rDoS
123kr rDoS
Preroute pathes to disable route hash
LOCAL/MAIN tables
fib_lookup() in ip_fib.h
Always looks up LOCAL table before MAIN
Extra lookup costs performance when notto localhost.
We discussed this with Alexey...
LOCAL/MAIN tables
Aver depth: 4.48Max depth: 6Leaves: 25Internal nodes: 18
Aver depth: 3.22Max depth: 7Leaves: 158936Internal nodes: 39440
Route hash/GCStrategies for GC run. Better work!!
Timer based vs on demand /proc/sys/net/ipv4/route/gc_interval /proc/sys/net/ipv4/route/gc_min_interval_ms GC without GC run. Very robust...
rt_intern_hash() cand rt_free() chain lengthto long.
ip_rt_gc_elasticity can be dynamic.... ????
total flush for fib insert/delete....
32/64 bit || sizeof(sk_buff)
32 64
0
25
50
75
100
125
150
175
200
225
250
275sizeof(struct sk_buff)
size
64 bit 32 bit
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
relative forwarding
T-put
Gcc 3.4 x86_64 vs i686 on same HW
Per device hash
Per device input route hash
isolate dev'sless lockingsame performance
output used shared hash
given up for the moment
Preroute patches
Started hacking with Jamal a year ago
Do full fib_lookup() for every packet
Lot's of interest from Paul's and peopledoing “hi-risk” hosting.
Very useful for FIB testing.
Works only with gatewayed hosts.
TCP performance
4 512 1024 2048 4096 8192 16384 327680
100
200
300
400
500
600
700
800
900
1000
NAPI
Non_NAPI
2.6.11.7 SMP kernel using one CPU driver e1000 NAPI - no-NAPI. Opteron 1.6 GHz e1000 w 82546GB.
TCP performancewhen receiving DoS on other NIC
4 512 1024 2048 4096 8192 16384 327680
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
NAPI
Non_NAPI
2.6.11.7 SMP kernel using one CPU driver e1000 NAPI - no-NAPI. Opteron 1.6 GHz e1000 w 82546GB.
ipv6 performance
T-put0
50
100
150
200
250
300
350
400
450
500
550
600
650
Forwarding kpps 76 byte pkt.
Linux 2.5.12 1 CPU(SMP) Opteron 1.6 GHz e1000
Single flow small
Singe flow 543 r
rDoS 543 r
How rDoS work on sparse routing table?
10 GbE early days
64 128 256 512 1024 15000
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
TX performance IXGB
in pps
Op 1,6 NAPI
OP 1.6 noNAPI
XEON