revisão: forwarding metamorphosis: fast programmable match-action processing in hardware for sdn

28
Lendo e apresentando: Forwarding Metamorphosis: Fast Programmable Match- Action Processing in Hardware for SDN Bruno Castelucci 2013.10.31 1

Upload: bruno-castelucci

Post on 22-Nov-2014

109 views

Category:

Engineering


3 download

DESCRIPTION

Lendo e apresentando o artigo: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. http://conferences.sigcomm.org/sigcomm/2013/program.php http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p99.pdf

TRANSCRIPT

Page 1: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Lendo e apresentando:

Forwarding Metamorphosis:

Fast Programmable Match-

Action Processing in Hardware

for SDN

Bruno Castelucci – 2013.10.31

1

Page 2: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

References:

Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN (Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, Mark Horowitz) SIGCOMM - 2013.08.13

Reading: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN (Ji Yang) 2013.09.23

2

Page 3: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Objetivos

• O Papper descreve a implementação de um Switch de 64 ports de 10Gbps usando o modelo RMT (Reconfigurable Match Tables).

• Desenhar uma arquitetura para o RMT.

• Mostrar alguns casos de uso.

3

Page 4: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Exemplo: Switch padrão

4

Dep

ars

er

In

Queues

Data

Out ACL Stage

L3 Stage

L2 Stage

Act

ion

: se

t L

2D

Stage 1

L2

Ta

ble

L2: 128k x 48 Exact match

Act

ion

: se

t L

2D

, d

ec T

TL

Stage 2

L3

Ta

ble

L3: 16k x 32 Longest prefix match

Act

ion

: p

erm

it/d

eny

Stage 3 A

CL

Ta

ble

ACL: 4k Ternary match

Pa

rser

X X X X X

Access Control List (ACL) Time to live (TTL)

Page 5: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Forwarding Plane is Challenged

• Current Hardware switches are rigid

▫ Match Action can process limited set fields

• OpenFlow/SDN protocol defines a limited repertoire of packet processing actions ▫ Updated frequently

• Hardware still has its vantages ▫ Speed: Faster than CPU ▫ Pipeline, parallelism

Page 6: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

What if you need flexibility?

• Flexibility to: ▫ Trade one memory size for another ▫ Add a new table ▫ Add a new header field ▫ Add a different action

• SDN accentuates the need for flexibility ▫ Gives programmatic control to control plane, expects

to be able to use flexibility Multiple stages of match-action Flexible actions Flexible header fields

6

Page 7: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

What about Alternatives?

Aren’t there other ways to get

flexibility?

• Software? 100x too slow, expensive

• NPUs? 10x too slow, expensive

• FPGAs? 10x too slow, expensive

7

Page 8: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

What’s Hard about a

Flexible Switch Chip?

• Big chip

• Wiring intensive

• Many crossbars

• Lots of TCAM

• Operate in high frequency – 1TBps

8

Page 9: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

The RMT Model

9

Page 10: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Hardware Architectures for SDN

• Single Match Table ▫ Easier to implement. ▫ Needs to store every combination of headers: too big.

• Multiple Match Tables ▫ Allows multiple smaller match tables to be matched by a subset of packet fields. ▫ Number of and size of are limited: hard to optimize and add new behaviors ▫ Very few actions implemented today on current chips.

• Reconfigurable Match Tables ▫ Work under pipeline ▫ New fields added easily ▫ Number, topology, widths, depths of match tables can be configured ▫ New actions defined easily ▫ Packets can be placed in specified queues (queuing discipline specified)

• RMT permite que o plano de encaminhamento seja alterado "on the run" sem modificar o hardware.

Como no Openflow com MMT, o programador pode especificar match tables, limitado somente ao tamanho do hardware, porém, o RMT permite modificar todos os header fields do pacote com mais flexibilidade.

Page 11: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

RMT Consists of:

11

•Reconfigurable parser •Support field definition

•Resizable match •VLIW: very long instruction word

•Configurable output queue

Page 12: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

How is it configured?

RMT Parse Graph and Table Graph

12

Compiler and control protocol not available yet.

Page 13: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

But the Parse Graph and Table

Graph don’t show you how to build

a switch

13

Page 14: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Performance vs Flexibility

• Multiprocessor: memory bottleneck • Change to pipeline • Fixed function chips specialize processors • Flexible switch needs general purpose CPUs

14

Memory

Memory

Memory CPU

CPU

CPU

L2 L3 ACL

Access Control List (ACL)

Page 15: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Memory

C P

U

C

PU

CPU

C P

U

C

PU

CPU

C

PU

CPU

CP

U

How We Did It • Memory to CPU bottleneck

• Replicate CPUs

• More stages for finer granularity

• Higher CPU cost ok

15

C P

U

C

PU

CPU

Page 16: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Antes, definições de memória

• RAM –

▫ binário (0, 1), busca em loop por endereços, ou hash table, busca lenta.

• CAM – Content Addressable Memory –

▫ binário, associative memory, o conteudo é buscado em todos os campos ao mesmo tempo, busca rapida.

• TCAM – Ternary CAM –

▫ ternária (0, 1, X), wildcard match, busca rápida. Prefix matching.

16

Page 17: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

RMT Consists of:

17

Page 18: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Parser

18

•Recebe o pacote de produz o header vector de 4kb.

•O parse é feito baseado num gráfico provido pelo usuário convertido em entradas numa TCAM de 256 x 40b entradas. •Cada interação tem um tamanho de campo de header específico e gera um campo no vetor de saida, o parser volta ao inicio para tratar o próximo campo. •O processo de parse é feito sequencialmente até que todo o header seja trabalhado, e todo o vector header esteja pronto.

Page 19: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

32x Match

Stages

• Cada estágio fisico tem memórias SRAM e TCAM e Action CPUs associadas independentes. ▫ Cada physical match stage tem

subunidades de SRAM (8x 80b) 640b e de TCAM (16x 40b) 640b.

• Cada subunidade pode trabalhar: ▫ independentemente, ▫ em grupos para maiores campos

maiores, ▫ ou em conjuntos para tabelas mais

extensas.

• Cada match stage tem adicionais 106 RAM Blocks de 1k x 112b e 16 blocos de TCAM de 2k x 40b. ▫ A fração configurada para memórias de

match, action e estatistica é configurável. ▫ Todas as leituras são realizadas em

paralelo.

• Em cada entrada de match table na RAM está associado um ponteiro para a action memory e size, instruction memory e um next table address.

19

Page 20: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

TCAM

640b

640b

Physical Stage n

Physical Stage 2

Logical Table 1

Ethertype

Logical Table 6 L2D

8 UDP

2 VLA

N

3 IPV4

5 IPV6

4 L2S

7 TCP SRAM HASH

Physical Stage 1

RMT Logical to Physical Table Mapping

20

ACL

UDP TCP

L2S

L2D

IPV4

ETH

VLAN

IPV6

9 ACL

Table Graph

Act

ion

Ma

tch

Ta

ble

Act

ion

Ma

tch

Ta

ble

Act

ion

Ma

tch

Ta

ble

Access Control List (ACL)

Page 21: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Instruction

ALU

Match result

Action Processing Model

21

Hea

der

In

F

ield

F

ield

Data

Hea

de

r O

ut

Page 22: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

ALU

VLIW Instructions Match result

Modeled as Multiple VLIW CPUs per Stage

ALU ALU ALU ALU ALU ALU ALU ALU

22

Page 23: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

VLIW Instructions

23

Page 24: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Reducing Latency

• Dependency Analysis

Page 25: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Hardware Implementation • 64 x 10Gb ports

▫ 960M packets/second

▫ 1GHz pipeline

• Packet header Parser

▫ 4Kb vector

▫ 256*40b TCAM,

• Physical Stages N=32

▫ 106 blocks SRAM with 1K*112bits: for 80bits width Hash

▫ 16 blocks TCAM with 2K*40bits

▫ Total:370Mb SRAM, 40Mb TCAM, support combine in depth or width

• Action units

– 200+ for each stage

– Each for one field

– 7k+ units total

• Queues

– 2K queues per port

– 4 levels of hierarchy

– Allowing various combinations of deficit round robin, hierarchical fair queuing, token buckets, priorities.

Page 26: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Cost of Configurability:

Comparison with Conventional Switch

• Many functions identical: I/O, data buffer, queueing…

• Make extra functions optional: statistics

• Memory dominates area ▫ Compare memory area/bit and bit count

• RMT must use memory bits efficiently to compete on cost

• Techniques for flexibility

26

Page 27: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Chip Comparison with Fixed Function Switches

Section Area % of chip Extra Cost

IO, buffer, queue, CPU, etc 37% 0.0%

Match memory & logic 54.3% 8.0%

VLIW action engine 7.4% 5.5%

Parser + deparser 1.3% 0.7%

Total extra area cost 14.2%

27

Section Power % of chip

Extra Cost

I/O 26.0% 0.0%

Memory leakage 43.7% 4.0%

Logic leakage 7.3% 2.5%

RAM active 2.7% 0.4%

TCAM active 3.5% 0.0%

Logic active 16.8% 5.5%

Total extra power cost 12.4%

Area

Power

Page 28: Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Conclusion

• How do we design a flexible chip? ▫ The RMT switch model ▫ Bring processing close to the memories: pipeline of many stages

▫ Bring the processing to the wires: 224 action CPUs per stage

• How much does it cost? ▫ 15% more than existing switches

28