robert olsson experiments & experiences with fib lookup...

Netconf 2005

Robert Olsson

Experiments & Experiences with FIB lookup and route cache

What we hear/got

dst cache overflow reports RCU related

mistuned, misunderstod etc.

fib_lookup complaints what to expect

BSD comparisons. Radix-tree ToS/semantic questionable

fib_hash considered bad

Getting forward :)

“Infrastructure” for test & development

stats to understand what happenstools and setups to study

Preroute patches w.. Jamal 2004 pktgen DoS, scripts w. routing table

steady Linux API work to prepare to plugin new algos. Most from DaveM.

So much research Still so little usable for Linux

FIB overview

FIB vs. dst hash performance

fib_hashfib_hlistfib_hash2fib_trieclassifier lookup?unified lookup?

fib_hash (current)

Fast - YesGeneral purposeVery integrated

fib_hlist

TutorialKISShlist with semantic_match

Very fast with small tablesFor embedded system etc?

fib_hlist performance

fib_hlist fib_hash0

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

Main title

dst cache

/24

rDoS 6 r

rDos 123kr

Note! Zero for fib_hlist :) Still decent many apps.

fib_hash2

Vargese inspired, use what got

2^24 hash lookup w. sorted hlist Makes /24 entries of plens 1-23

/0 special case. Huge...TABLE_LOCAL with a few entries

Idea was to test performance with the fastest algo we could think of.

Not for embedded system etc? :-)

Reduced it became fib_hlist

fib_hash2/route cache compare

route cache FIB lookup FIB lookup DoS0

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

Row 1

fib_trie

First trie. In theory variable key length, 32, 128 bits etc

Algo for dynamic trie written in Java. Memoryleak and stack handling were problems.

Also prefix matching based on fib_sematic match

Cisco CEF has fixed 256 childs 8-8-8-8 or 16-8-8 (GSR)LC-trie is child size is dynamic 2-12 bits seen

Need to be verified. New netlink call to do fib_lookup

Can be improved...

fib_trie performance comparison

fib_hash fib_trie0

50100150200250300350400450500550600650700

forwarding kpps

Linux 2.6.16 1 CPU used(SMP) Opteron 1.6 GHz e1000

dsh hash

5 r single flow

5 r rDoS

123kr rDoS

Preroute pathes to disable route hash

LOCAL/MAIN tables

fib_lookup() in ip_fib.h

Always looks up LOCAL table before MAIN

Extra lookup costs performance when notto localhost.

We discussed this with Alexey...

LOCAL/MAIN tables

Aver depth: 4.48Max depth: 6Leaves: 25Internal nodes: 18

Aver depth: 3.22Max depth: 7Leaves: 158936Internal nodes: 39440

Route hash/GCStrategies for GC run. Better work!!

Timer based vs on demand /proc/sys/net/ipv4/route/gc_interval /proc/sys/net/ipv4/route/gc_min_interval_ms GC without GC run. Very robust...

rt_intern_hash() cand rt_free() chain lengthto long.

ip_rt_gc_elasticity can be dynamic.... ????

total flush for fib insert/delete....

32/64 bit || sizeof(sk_buff)

32 64

0

25

50

75

100

125

150

175

200

225

250

275sizeof(struct sk_buff)

size

64 bit 32 bit

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

relative forwarding

T-put

Gcc 3.4 x86_64 vs i686 on same HW

Per device hash

Per device input route hash

isolate dev'sless lockingsame performance

output used shared hash

given up for the moment

Preroute patches

Started hacking with Jamal a year ago

Do full fib_lookup() for every packet

Lot's of interest from Paul's and peopledoing “hi-risk” hosting.

Very useful for FIB testing.

Works only with gatewayed hosts.

Skb recycling/reuse

TCP performance

4 512 1024 2048 4096 8192 16384 327680

100

200

300

400

500

600

700

800

900

1000

NAPI

Non_NAPI

2.6.11.7 SMP kernel using one CPU driver e1000 NAPI - no-NAPI. Opteron 1.6 GHz e1000 w 82546GB.

TCP performancewhen receiving DoS on other NIC

4 512 1024 2048 4096 8192 16384 327680

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

NAPI

Non_NAPI

2.6.11.7 SMP kernel using one CPU driver e1000 NAPI - no-NAPI. Opteron 1.6 GHz e1000 w 82546GB.

ipv6 performance

T-put0

50

100

150

200

250

300

350

400

450

500

550

600

650

Forwarding kpps 76 byte pkt.

Linux 2.5.12 1 CPU(SMP) Opteron 1.6 GHz e1000

Single flow small

Singe flow 543 r

rDoS 543 r

How rDoS work on sparse routing table?

Goodbye to old friends?

FASTROUTEHW-FLOWCONTROL

10 GbE early days

64 128 256 512 1024 15000

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

TX performance IXGB

in pps

Op 1,6 NAPI

OP 1.6 noNAPI

XEON

Hi-perf filtering

Need for hi-pref stateless filteringnetfilter API

hi-pac?

tc-stuff?

netfilter API

share fib_semantic_match()

robert olsson experiments & experiences with fib lookup...

Documents