•packet classification -...

66
From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham 1 ATM and MPLS switches Direct Lookup Bridges and Ethernet switches Associative Lookup – Hashing Trees and tries IP Routers CIDR Patricia trees/tries Other methods Caching Packet Classification

Upload: doannhi

Post on 24-May-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

1

• ATM and MPLS switches– Direct Lookup

• Bridges and Ethernet switches– Associative Lookup– Hashing– Trees and tries

• IP Routers– CIDR– Patricia trees/tries– Other methods– Caching

• Packet Classification

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

2

Direct Lookup

VCIMemory

(Port, VCI)

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

3

• ATM and MPLS switches– Direct Lookup

• Bridges and Ethernet switches– Associative Lookup– Hashing– Trees and tries

• IP Routers– CIDR– Patricia trees/tries– Other methods– Caching

• Packet Classification

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

4

Associative Lookups

NetworkAddress

AssociatedData

AssociativeMemory or CAM

Search Data

48

log2N

AssociatedData

Hit?

Address{Advantages:• Simple

Disadvantages• Slow

• High Power

• Small

• Expensive

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

5

Hashing

HashingFunction

Memory

Add

ress

Dat

a

Search Data

48

log2N

AssociatedData

Hit?

Address{16

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

6

An example

Hashing Function

CRC-1616

#1 #2 #3 #4

#1 #2

#1 #2 #3Linked lists

Memory

Search Data

48

log2N

AssociatedData

Hit?

Address{M entries

N lists

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

7

Performance of simple example

Where:

ER Expected number of memory references=

M Number of memory addresses in table=

N Number of linked lists=

α M N⁄=

ER 12--- 1 α

1 1 1N----–

M–

--------------------------------+

=

Most addresses in their own list

Most addresses in one list

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

8

Advantages:

• Simple

• Expected lookup time can be small

Disadvantages

• Non-deterministic lookup time

• Inefficient use of memory

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

9

Binary Search Tree

< >

< > < >

2

N entries

Binary Search Trie

0 1

0 1 0 1

111010

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

10

• An entry is:– a pointer to another array,– a special symbol indicating no

better match– a null pointer indicating that

the longst match is the parent node

• Two ways to improve performance– cache recently used addresses

in a CAM– move common entries up to a

higher level (match longer strings)

128.32.1.2 ?

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

11

Multiway tries

16-ary Search Trie

0000, ptr 1111, ptr

0000, 0 1111, ptr

000011110000

0000, 0 1111, ptr

111111111111

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

12

Multiway tries

Degree ofTree

# MemReferences

# Nodes(x106)

Total Memory(Mbytes)

FractionWasted (%)

2481664256

Ew DL 1– 1 1 N

DL-------–

D–

Di 1 Di 1––() N 1 D1 i––() N–()i 1=

L 1–

∑+=

En 1 DL 1 N

DL-------–

DDi Di 1– 1 Di 1––() N–

i 1=

L 1–

∑+ +=

Where:

D Degree of tree=

L Number of layers/references=

N Number of entries in table =

En Expected number of nodes=

Ew Expected amount of wasted memory=

Table produced from 215 randomly generated 48-bit addresses

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

13

• ATM and MPLS switches– Direct Lookup

• Bridges and Ethernet switches– Associative Lookup– Hashing– Trees and tries

• IP Routers– CIDR– Patricia trees/tries– Other methods– Caching

• Packet Classification

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

14

Class-based addresses

Class A Class B Class C D

212.17.9.4

Class A

Class B

Class C212.17.9.0 Port 4

ExactMatch

Routing Table:

IP Address Space

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

15

CIDR

A B C D0 232-1

0 232-1

128.9/16

128.9.0.0

216

142.12/19

65/24

Classless:

Class-based:

128.9.16.14

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

16

CIDR

0 232-1

128.9/16

128.9.16.14

128.9.16/20 128.9.176/20

128.9.19/24

128.9.25/24

Most specific route = “longest matching prefix”

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

17

Metrics for Lookups

128.9/16128.9.16/20

128.9.176/20

128.9.19/24128.9.25/24

142.12/19

65/24

Prefix Port35271013

128.9.16.14

• Lookup time• Storage space• Update time• Preprocessing time

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

18

Lookup

IPv4 unicast destination address based lookup

Dstn Addr Next Hop

--------

---- ----

--------

Destination Next HopForwarding Table

Next Hop Computation

Forwarding Engine

Incoming Packet

HEADER

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

20

Gigabit Ethernet (84B packets): 1.49 Mpps

Line Line Rate Pkt­size=40B Pkt­size=240B

T1 1.5Mbps 4.68 Kpps 0.78 Kpps

OC3 155Mbps 480 Kpps 80 Kpps

OC12 622Mbps 1.94 Mpps 323 Kpps

OC48 2.5Gbps 7.81 Mpps 1.3 Mpps

OC192 10 Gbps 31.25 Mpps 5.21 Mpps

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

21

Source: http://www.telstra.net/ops/bgptable.html

Exponentialgrowth before

CIDR

About10k newprefixes per year

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

22

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

SizeSource: http://www.telstra.net/ops/bgptable.html

95 96 97 98 99 00Year

Num

ber

of

Pref

ixes

10,000/year

Renewed ExponentialGrowth

Renewed growth due to multi-homing of enterprise networks!

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

31

Prefix length

Most prefixes are 24-bits or shorter

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

32

Prefixes up to 24-bits

1 Next Hop

24

Next Hop

142.19.6

224 = 16M entries

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

33

Prefixes up to 24-bits

1 Next Hop

128.3.72

24 0 Pointer

8

Prefixes above 24-bits

Next Hop

Next Hop

Next Hop

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

36

CPU BufferMemory

LineCard

DMA

MAC

LocalBuffer

Memory

LineCard

DMA

MAC

LocalBuffer

Memory

LineCard

DMA

MAC

LocalBuffer

Memory

Fast Path

Slow Path

Advantages

Increased average lookup performance

Disadvantages

Decreased locality in backbone traffic

Cache size

Cache management overhead

Hardware implementation difficult

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

37

LAN:Average flow < 40 packets

WAN: Huge Number of flows

0%10%

20%30%

40%50%

60%70%

80%90%

100%

Cache = 10% of Full Table

CacheHit Rate

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

38

References

• A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14.

• B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiwayand multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.

• M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25-36.

• P. Gupta, S. Lin, N.McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241-1248, vol. 3.

• S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.

• V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998.

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

45

• Packet Lookup and Classification:Where does a packet go next?

• Switching Fabrics:How does the packet get there?

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

46

• Overview• Output and Input Queueing • Output Queueing• Input Queueing

– Scheduling algorithms– Combining input and output queues– Multicast traffic– Other non-blocking fabrics

• Multistage Switches

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

47

Datapath: per-packet processing

ForwardingDecision

ForwardingDecision

ForwardingDecision

ForwardingTable

ForwardingTable

ForwardingTable

Interconnect

OutputScheduling

1.

2.

3.

Transfers data from an input to an output

many ports (density), high speeds

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

48

• A switch that can handle N calls has N logical inputs and N logical outputs– N up to 200,000

• Moves 8-bit samples from an input to an output port– Recall that samples have no headers– Destination of sample depends on time at which it arrives at the

switch

• In practice, input trunks are multiplexed– Multiplexed trunks carry frames = set of samples

• Goal: extract samples from frame, and depending on position in frame, switch to output– each incoming sample has to get to the right output line and the

right slot in the output frame

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

49

• Can’t find a path from input to output

• Internal blocking– slot in output frame exists, but no path

• Output blocking– no slot in output frame is available

• Output blocking is reduced in transit switches– need to put a sample in one of several slots

going to the desired next hop

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

50

• Most trunks time division multiplex voice samples

• At a central office, trunk is demultiplexed and distributed to active circuits

• Synchronous multiplexor– N input lines

– Output runs N times as fast as input

123

N

MUX…

123

N

De-MUX1 2 3 … N

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

51

• Key idea: when de-multiplexing, position in frame determines output trunk

• Time division switching interchanges sample position within a frame: time slot interchange (TSI)

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

52

• To build a 120,000 circuit switch– read and write samples 120,000 every 125us, a

R&W operation in 0.5 ns!

– Today DRAM has access time from 80 to 40 ns

– If we use 40 ns DRAM, it's 80 times more than what we need

– Maximum #circuit= 120,000/80=1500!

– Too small!!

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

53

• Each sample takes a different path through the switch, depending on its destination

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

54

• Simplest possible space-division switch

• Crosspoints can be turned on or off, long enough to transfer a packet from an input to an output

• Expensive• Internally nonblocking

– but need N2 crosspoints– time to set each crosspoint

grows quadratically

configuration Data Out

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

55

• In a crossbar during each switching time only one cross-point per row or column is active

• Can save crosspoints if a cross-point can attach to more than one input line

• This is done in a multistage crossbar

N/narraysn x k

karraysN/n x N/n

N/narraywk x n

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

56

• Can suffer internal blocking– unless sufficient number of second-level stages,

k ≥ n

• Number of crosspoints < N2

• Finding a path from input to output requires a depth-first-search

• Scales better than crossbar, but still not too well– 120,000 call switch needs ~250 million

crosspoints

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

57

• In a central switching system, the high cost is the line card.

• Now the true cost is the copper wire to the customer premises!!

• In long-distance, the high cost is in laying lines, acquiring rights of way and switch-control software!

• So, saving a few thousand crosspoints is not going to make phone call cheaper!

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

58

• In a circuit switch, path of a sample is determined at time of connection establishment

• No need for a sample header--position in frame used

• In a packet switch, packets carry a destination field or label– Need to look up destination port on-the-fly

• Datagram switches– lookup based on entire destination address (longest-

prefix match)

• Cell or Label-switches– lookup based on VCI or Labels

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

59

• Can have both internal and output blocking• Internal

– no path to output

• Output– trunk unavailable

• Unlike a circuit switch, cannot predict if packets will block (why?)

• If packet is blocked, must either buffer or drop

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

60

• Over-provisioning– internal links much faster than inputs

• Buffers– at input or output

• Backpressure– if switch fabric doesn’t have buffers, prevent

packet from entering until path is available

• Parallel switch fabrics– increases effective switching capacity

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

61

• What happens if packets at two inputs both want to go to same output?

• Can defer one at an input buffer

• Or, buffer cross-points: complex arbiter

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

62

• Goal: towards building “self-routing” fabrics• Can build complicated fabrics from a simple

element

• Routing rule: if 0, send packet to upper output, else to lower output– If both packets to same output, buffer or drop

0

1

data 10

data 00

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

63

• Simplest self-routing recursive fabric, 2n output need n stages with 2n-1 components in each stage

• What if two packets both want to go to the same output→output blocking

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

64

• Can avoid blocking by choosing order in which packets appear at input ports

• If we can – present packets at inputs sorted by output– remove duplicates – remove gaps– precede banyan with a perfect shuffle stage– then no internal blocking

• For example: [X, 011, 010, X, 011, X, X, X]:• Sort => [010, 011, 011, X, X, X, X, X]• Remove dups => [010, 011, X, X, X, X, X, X]• Shuffle => [010, X, 011, X, X, X, X, X]• Need sort, trap and shuffle networks.

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

65

• Build sorters from merge networks

• Assume we can merge two sorted lists

• Sort pairwise, merge, recurse

Sort {5,7,2,3,6,2,4,5}

1/ sort 2 by 22/ merge adjacent lists

to get two 4-el lists3/ merge de two lists

with a merge network

23

47

2

5

4

6

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

66

• What about trapped duplicates?– recirculate to beginning

– or run output of trap to multiple banyans (dilation)

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

67

3

7

5

2

6

0

1

4

7

2

3

5

6

1

0

4

7

5

2

3

1

0

6

4

7

0

5

1

3

4

2

6

7

4

5

6

0

3

1

2

7

6

4

5

3

2

0

2

7

6

5

4

3

2

1

0

000001

010011

100101

110111

Batcher Sorter Self-Routing Network

• Fabric can be used as scheduler. •Batcher-Banyan network is blocking for multicast.

a dans le sens de la flèche si a > b,a dans le sens opposé si a est tout seul

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

68

Two basic queueing techniques

Input Queueing Output Queueing

Usually a non-blockingswitch fabric (e.g. crossbar)

Usually a fast bus

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

69

Output Queueing

Individual Output Queues Centralized Shared Memory

Memory b/w = (N+1).R

1

2

N

Memory b/w = 2N.R

1

2

N

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

70

The “ideal”

1

1

1

1

1

1

1

1

1

11

1

2

2

2

2

2

2

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

71

How fast can we make centralized shared memory?

SharedMemory

200 byte bus

5ns SRAM

1

2

N

• 5ns per memory operation• Two memory operations per packet• Therefore, up to 160Gb/s• In practice, closer to 80Gb/s

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

72

• Output and Input Queueing

• Output Queueing

• Input Queueing– Scheduling algorithms

– Combining input and output queues

– Multicast traffic

– Other non-blocking fabrics

• Multistage Switches

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

73

Input Queueing with Crossbar

configuration Data Out

Scheduler

Memory b/w = 2R

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

74

Head of Line Blocking

Del

ay

Load58.6% 100%

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

75

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

76

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

77

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

78

Virtual output queues

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

79

Virtual Output Queues

Del

ay

Load100%

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

80

Virtual Output Queues

Scheduler

Memory b/w = 2R

Can be quitecomplex!

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

85

Why is serving long/old queues better than serving maximum number of queues?

• When traffic is uniformly distributed, servicing themaximum number of queues leads to 100% throughput.• When traffic is non-uniform, some queues become longer than others.• A good algorithm keeps the queue lengths matched, and services a large number of queues.

VOQ #

Avg

Occ

upan

cy Uniform traffic

VOQ #

Avg

Occ

upan

cy

Non-uniform traffic

From a Nick McKeown's tutorial, 1999 and slides from Kalyanaraman (with figure from Keshav) Some slides modified by C. Pham

103

Shared Memory

InputQueued

Combined Input and

Output QueuedParallelPacket

Switches37526014

72356104

75231064

70513426

74560312

76453202

76543210

000001

010011

100101

110111

Batcher Sorter Self-Routing Network

Multistage

This document was created with Win2PDF available at http://www.daneprairie.com.The unregistered version of Win2PDF is for evaluation or non-commercial use only.