1 or project group ii: packet buffer proposal da chuang, isaac keslassy, sundar iyer, greg watson,...
Post on 20-Dec-2015
216 views
TRANSCRIPT
1
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
OR Project Group II:Packet Buffer Proposal
Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
E-mail: [email protected] Router Project: http://klamath.stanford.edu/or/
2
Outline
Load-Balancing Background
Mis-sequencing Problem
Datapath Architecture First stage - Segmentation Second stage – Main Buffering Third stage - Reassembly
3
Arbitration
160Gb/s
160Gb/s
SwitchFabric
• Line termination
• IP packet processing
• Packet buffering
• Line termination
• IP packet processing
• Packet buffering
160Gb/s
160Gb/s
Electronic
Linecard #1ElectronicLinecard #625
Request
Grant
(100Tb/s = 625 * 160Gb/s)
100Tb/s router
4
Load-Balanced Switch
External Outputs
Internal Inputs
1
N
ExternalInputs
Load-balancing cyclic shift
Switching cyclic shift
1
N
1
N
11
2
2
5
160 Gbps Linecard
Fixed-sizePackets
Reassembly
SegmentationLookup/
ProcessingR
1
N
2
VOQs
IntermediateInput Block
Load-balancing
Switching
Input Block
Output Block
R
R
R
R
R
6
Outline
Load-Balancing Background
Mis-sequencing Problem
Datapath Architecture First stage - Segmentation Second stage – Main Buffering Third stage - Reassembly
7
Problem: Unbounded Mis-sequencing
External Outputs
Internal Inputs
1
N
ExternalInputs
Spanning Set of Permutations
Spanning Set of Permutations
1
N
1
N
11
2
2
8
Preventing Mis-sequencing
Uniform Frame Spreading: Group cells by frames of N cells each (frame building) Spread each frame across all middle linecards Each middle stage receives the same type of packets => has the same queue occupancy state
1 11
N
Middle stage
N N
1N1
N N
1N 1
9
Outline
Load-Balancing Background
Missequencing Problem
Datapath Architecture First stage - Segmentation Second stage – Main Buffering Third stage - Reassembly
10
Three stages on a linecard
Segmentation/Frame Building
1st stage
1
2
N
Main Buffering
2nd stage
1
2
N
R/N
RR R R
3rd stage
1
2
N
RR
Reassembly
11
Technology Assumptions in 2005
•DRAM Technology•Access Time ~ 40 ns•Size ~ 1 Gbits•Memory Bandwidth ~ 16 Gbps (16 data pins)
•On-chip SRAM Technology•Access Time ~ 2.5 ns•Size ~ 64 Mbits
•Serial Link Technology•Bandwidth ~ 10 Gb/s•>100 serial links per chip
12
First Stage
Segmentation
1
2
N
R
variable-sizepackets 128-byte
cells
16-bytes 1
2
N1
2
N
1
2
N
108-127
16-31
0-15
R/8
R/8
R/8
16-bytes
R/8
R/8
R/8
Frame Building
108-127
16-31
0-15
13
Segmentation Chip (1st stage)
Segmentation
1
2
N
R
variable-sizepackets 128-byte
cells
R/8
R/8
R/8
Incoming: 16x10 Gb/s Outgoing: 8x2x10 Gb/s On-chip Memory: N x 1500 bytes = 7.2 Mbits 3.2ns SRAM
16-bytes
108-127
16-31
0-15
14
Frame Building Chip (1st stage)
Incoming: 2x10 Gb/s Outgoing: 2x10 Gb/s On-chip Memory: N^2 x 16 bytes = 48 Mbits 3.2ns SRAM
16-bytes1
2
N
0-15
R/8
16-bytes
0-15
R/8
Frame Building
15
Three stages on a linecard
Segmentation/Frame Building
1st stage
1
2
N
Main Buffering
2nd stage
1
2
N
R/N
RR R R
3rd stage
1
2
N
RR
Reassembly
16
Packet Buffering Problem Packet buffers for a 160Gb/s router
linecard
BufferMemory
Write Rate, R
One 128B packetevery 6.4ns
Read Rate, R
One 128B packetevery 6.4ns
40Gbits
Buffer Manager
17
Memory Technology
Use SRAM?+ Fast enough random access time, but- Too low density to store 40Gbits of data.
Use DRAM? + High density means we can store data, but- Can’t meet random access time.
21
ArrivingPackets
R
Arbiter orScheduler
Requests
DepartingPackets
R
12
1
Q
21234
345
123456
Small head SRAM cache for FIFO heads
SRAM
Hybrid Memory HierarchyLarge DRAM memory holds the body of FIFOs
57 6810 9
79 81011
1214 1315
5052 515354
8688 878991 90
8284 838586
9294 9395 68 7911 10
1
Q
2
Writingb bytes
Readingb bytes
cache for FIFO tails
5556
9697
8788
57585960
899091
1
Q
2
Small tail SRAM
DRAM
22
SRAM/DRAM results
How much SRAM buffering, given: DRAM Trc = 40ns Write and read a 128-byte cell every 6.4ns Let Q = 625, b = 2*40ns/6.4ns = 12.5
Two Options [Iyer] Zero Latency
Qb[2+lnQ] = 61k cells = 66 Mbits Some Latency
Q(b-1) = 7.5k cells = 7.5 Mbits
23
Outline
Load-Balancing Background
Missequencing Problem
Datapath Architecture First stage - Segmentation Second stage – Main Buffering Third stage - Reassembly
24
Problem Statement
QueueManager
40 Gb DRAM
160 Gb/s 160 Gb/s
160 Gb/s 160 Gb/sOne 128B cellevery 6.4ns
One 128B cellevery 6.4ns
Write Rate, R Read Rate, R
25
Second Stage
R/8
R/8
R/8
Main Buffering
1
2
N
R/N
1
2
N
R/N
1
2
N
R/N
R/8
R/8
R/8
16-bytes
108-127
16-31
0-15
16-bytes
108-127
16-31
0-15
26
Queue Manager Chip (2nd stage)
Incoming: 2x10 Gb/s Outgoing: 2x10 Gb/s 35 pins/DRAM x 5 DRAMs = 175 pins SRAM/DRAM Memory: Q(b-1) = 2.8 Mbits 3.2ns SRAM SRAM linked list = 1 Mbit 3.2ns SRAM
16-bytes
0-15
R/8
16-bytes
0-15
R/8
Main Buffering
1
2
N
R/N
5 x 1Gb DRAM
R/4 R/4
27
Outline
Load-Balancing Background
Missequencing Problem
Datapath Architecture First stage - Segmentation Second stage – Main Buffering Third stage - Reassembly
28
Three stages on a linecard
Segmentation/Frame Building
1st stage
1
2
N
Main Buffering
2nd stage
1
2
N
R/N
RR R R
3rd stage
1
2
N
RR
Reassembly
29
Third stage
Reassembly
1
2
N
R
variable-sizepackets
R/8
R/8
R/8
Incoming: 8x2x10 Gb/s Outgoing: 16x10 Gb/s On-chip Memory: N x 1500 bytes = 7.2 Mbits 3.2ns
SRAM
16-bytes
108-127
16-31
0-15
30
1st stage 1 segmentation chip 8 frame building chips
2nd stage 8 queue manager chips 40 1 Gb DRAMs
3rd stage 1 reassembly chip
Total chip count 18 ASIC chips 40 1 Gb DRAMs
Linecard Datapath Requirements