1 or project group ii: packet buffer proposal da chuang, isaac keslassy, sundar iyer, greg watson,...

1

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

OR Project Group II:Packet Buffer Proposal

Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz

E-mail: [email protected] Router Project: http://klamath.stanford.edu/or/

2

Outline

Load-Balancing Background

Mis-sequencing Problem

Datapath Architecture First stage - Segmentation Second stage – Main Buffering Third stage - Reassembly

3

Arbitration

160Gb/s

160Gb/s

SwitchFabric

• Line termination

• IP packet processing

• Packet buffering

• Line termination

• IP packet processing

• Packet buffering

160Gb/s

160Gb/s

Electronic

Linecard #1ElectronicLinecard #625

Request

Grant

(100Tb/s = 625 * 160Gb/s)

100Tb/s router

4

Load-Balanced Switch

External Outputs

Internal Inputs

1

N

ExternalInputs

Load-balancing cyclic shift

Switching cyclic shift

1

N

1

N

11

2

2

5

160 Gbps Linecard

Fixed-sizePackets

Reassembly

SegmentationLookup/

ProcessingR

1

N

2

VOQs

IntermediateInput Block

Load-balancing

Switching

Input Block

Output Block

R

R

R

R

R

6

Outline


Mis-sequencing Problem


7

Problem: Unbounded Mis-sequencing

External Outputs

Internal Inputs

1

N

ExternalInputs

Spanning Set of Permutations

Spanning Set of Permutations

1

N

1

N

11

2

2

8

Preventing Mis-sequencing

Uniform Frame Spreading: Group cells by frames of N cells each (frame building) Spread each frame across all middle linecards Each middle stage receives the same type of packets => has the same queue occupancy state

1 11

N

Middle stage

N N

1N1

N N

1N 1

9

Outline


Missequencing Problem


10

Three stages on a linecard

Segmentation/Frame Building

1st stage

1

2

N

Main Buffering

2nd stage

1

2

N

R/N

RR R R

3rd stage

1

2

N

RR

Reassembly

11

Technology Assumptions in 2005

•DRAM Technology•Access Time ~ 40 ns•Size ~ 1 Gbits•Memory Bandwidth ~ 16 Gbps (16 data pins)

•On-chip SRAM Technology•Access Time ~ 2.5 ns•Size ~ 64 Mbits

•Serial Link Technology•Bandwidth ~ 10 Gb/s•>100 serial links per chip

12

First Stage

Segmentation

1

2

N

R

variable-sizepackets 128-byte

cells

16-bytes 1

2

N1

2

N

1

2

N

108-127

16-31

0-15

R/8

R/8

R/8

16-bytes

R/8

R/8

R/8

Frame Building

108-127

16-31

0-15

13

Segmentation Chip (1st stage)

Segmentation

1

2

N

R

variable-sizepackets 128-byte

cells

R/8

R/8

R/8

Incoming: 16x10 Gb/s Outgoing: 8x2x10 Gb/s On-chip Memory: N x 1500 bytes = 7.2 Mbits 3.2ns SRAM

16-bytes

108-127

16-31

0-15

14

Frame Building Chip (1st stage)

Incoming: 2x10 Gb/s Outgoing: 2x10 Gb/s On-chip Memory: N^2 x 16 bytes = 48 Mbits 3.2ns SRAM

16-bytes1

2

N

0-15

R/8

16-bytes

0-15

R/8

Frame Building

15



1st stage

1

2

N

Main Buffering

2nd stage

1

2

N

R/N

RR R R

3rd stage

1

2

N

RR

Reassembly

16

Packet Buffering Problem Packet buffers for a 160Gb/s router

linecard

BufferMemory

Write Rate, R

One 128B packetevery 6.4ns

Read Rate, R

One 128B packetevery 6.4ns

40Gbits

Buffer Manager

17

Memory Technology

Use SRAM?+ Fast enough random access time, but- Too low density to store 40Gbits of data.

Use DRAM? + High density means we can store data, but- Can’t meet random access time.

21

ArrivingPackets

R

Arbiter orScheduler

Requests

DepartingPackets

R

12

1

Q

21234

345

123456

Small head SRAM cache for FIFO heads

SRAM

Hybrid Memory HierarchyLarge DRAM memory holds the body of FIFOs

57 6810 9

79 81011

1214 1315

5052 515354

8688 878991 90

8284 838586

9294 9395 68 7911 10

1

Q

2

Writingb bytes

Readingb bytes

cache for FIFO tails

5556

9697

8788

57585960

899091

1

Q

2

Small tail SRAM

DRAM

22

SRAM/DRAM results

How much SRAM buffering, given: DRAM Trc = 40ns Write and read a 128-byte cell every 6.4ns Let Q = 625, b = 2*40ns/6.4ns = 12.5

Two Options [Iyer] Zero Latency

Qb[2+lnQ] = 61k cells = 66 Mbits Some Latency

Q(b-1) = 7.5k cells = 7.5 Mbits

23

Outline




24

Problem Statement

QueueManager

40 Gb DRAM

160 Gb/s 160 Gb/s

160 Gb/s 160 Gb/sOne 128B cellevery 6.4ns

One 128B cellevery 6.4ns

Write Rate, R Read Rate, R

25

Second Stage

R/8

R/8

R/8

Main Buffering

1

2

N

R/N

1

2

N

R/N

1

2

N

R/N

R/8

R/8

R/8

16-bytes

108-127

16-31

0-15

16-bytes

108-127

16-31

0-15

26

Queue Manager Chip (2nd stage)

Incoming: 2x10 Gb/s Outgoing: 2x10 Gb/s 35 pins/DRAM x 5 DRAMs = 175 pins SRAM/DRAM Memory: Q(b-1) = 2.8 Mbits 3.2ns SRAM SRAM linked list = 1 Mbit 3.2ns SRAM

16-bytes

0-15

R/8

16-bytes

0-15

R/8

Main Buffering

1

2

N

R/N

5 x 1Gb DRAM

R/4 R/4

27

Outline




28



1st stage

1

2

N

Main Buffering

2nd stage

1

2

N

R/N

RR R R

3rd stage

1

2

N

RR

Reassembly

29

Third stage

Reassembly

1

2

N

R

variable-sizepackets

R/8

R/8

R/8

Incoming: 8x2x10 Gb/s Outgoing: 16x10 Gb/s On-chip Memory: N x 1500 bytes = 7.2 Mbits 3.2ns

SRAM

16-bytes

108-127

16-31

0-15

30

1st stage 1 segmentation chip 8 frame building chips

2nd stage 8 queue manager chips 40 1 Gb DRAMs

3rd stage 1 reassembly chip

Total chip count 18 ASIC chips 40 1 Gb DRAMs

Linecard Datapath Requirements

1 or project group ii: packet buffer proposal da chuang, isaac keslassy, sundar iyer, greg watson,...

Documents

stage segmentation

stage segmentation

n main buffering

n middle stage nn

chip slide

stage incoming

stage main buffering

frames of n cells