eecs 262a advanced topics in computer systems lecture 18 software routers/ routebricks october 29...
DESCRIPTION
EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012. John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California, Berkeley Slides Courtesy: Sylvia Ratnasamy - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/1.jpg)
EECS 262a Advanced Topics in Computer Systems
Lecture 18
Software Routers/RouteBricksOctober 29th, 2012
John Kubiatowicz and Anthony D. JosephElectrical Engineering and Computer Sciences
University of California, BerkeleySlides Courtesy: Sylvia Ratnasamy
http://www.eecs.berkeley.edu/~kubitron/cs262
![Page 2: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/2.jpg)
10/29/2012 2cs262a-S12 Lecture-18
Today’s Paper• RouteBricks: Exploiting Parallelism To Scale Software Routers
Mihai Dobrescu and Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia Ratnasamy. Appears in Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), October 2009
• Thoughts?
• Paper divided into two pieces:– Single-Server Router– Cluster-Based Routing
![Page 3: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/3.jpg)
10/29/2012 3cs262a-S12 Lecture-18
Networks and routers
AT&T MIT
NYU
UCB
HP
![Page 4: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/4.jpg)
10/29/2012 4cs262a-S12 Lecture-18
Routers forward packets
Router 1
to MIT
to HP
UCB
to NYU
Router 2
Router 3
Router 4
Router 5
Destination Address
Next Hop Router
UCB 4
HP 5
MIT 2
NYU 3
Route Tableheaderpayload
111010010 MIT
![Page 5: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/5.jpg)
10/29/2012 5cs262a-S12 Lecture-18
Router definitions
1
2
3
45…
N-1
N
• N = number of external router `ports’• R = line rate of a port• Router capacity = N x R
R bits per second (bps)
![Page 6: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/6.jpg)
10/29/2012 6cs262a-S12 Lecture-18
Networks and routers
AT&T MIT
NYU
UCB
HP
core
core
edge (ISP)
edge (enterprise)
home, small business
![Page 7: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/7.jpg)
10/29/2012 7cs262a-S12 Lecture-18
Examples of routers (core)
72 racks, 1MW
Cisco CRS-1• R=10/40 Gbps• NR = 46 Tbps
Juniper T640• R= 2.5/10 Gbps• NR = 320 Gbps
![Page 8: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/8.jpg)
10/29/2012 8cs262a-S12 Lecture-18
Examples of routers (edge)
Cisco ASR 1006• R=1/10 Gbps• NR = 40 Gbps
Juniper M120• R= 2.5/10 Gbps• NR = 120 Gbps
![Page 9: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/9.jpg)
10/29/2012 9cs262a-S12 Lecture-18
Examples of routers (small business)
Cisco 3945E• R = 10/100/1000 Mbps• NR < 10 Gbps
![Page 10: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/10.jpg)
10/29/2012 10cs262a-S12 Lecture-18
Building routers
• edge, core– ASICs– network processors– commodity servers RouteBricks
• home, small business– ASICs– network, embedded processors– commodity PCs, servers
![Page 11: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/11.jpg)
10/29/2012 11cs262a-S12 Lecture-18
Why programmable routers• New ISP services
– intrusion detection, application acceleration• Simpler network monitoring
– measure link latency, track down traffic• New protocols
– IP traceback, Trajectory Sampling, …
Enable flexible, extensible networks
![Page 12: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/12.jpg)
10/29/2012 12cs262a-S12 Lecture-18
• deployed edge/core routers – port speed (R): 1/10/40 Gbps– capacity (NxR): 40Gbps to 40Tbps
• PC-based software routers– capacity (NxR), 2007: 1-2 Gbps [Click]– capacity (NxR), 2009: 4 Gbps [Vyatta]
• subsequent challenges: power, form-factor, …
Challenge: performance
![Page 13: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/13.jpg)
10/29/2012 13cs262a-S12 Lecture-18
A single-server router
mem mem
corescores
serverI/O hub
Network Interface Cards (NICs) ports
N router links
memory controllers(integrated)
sockets withcores
point-to-point links (e.g., QPI)
![Page 14: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/14.jpg)
10/29/2012 14cs262a-S12 Lecture-18
Packet processing in a server
mem
cores
I/O hub
mem
cores Per packet,
1. core polls input port2. NIC writes packet to
memory3. core reads packet4. core processes packet
(address lookup, checksum, etc.)
5. core writes packet to port
![Page 15: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/15.jpg)
10/29/2012 15cs262a-S12 Lecture-18
Packet processing in a server
mem
cores
I/O hub
mem
cores
Today, 144Gbps I/O
Teaser: 10Gbps?
Today, 200Gbps memory
8x 2.8GHz
Assuming 10Gbps with all 64B packets19.5 million packets per second one packet every 0.05 µsecs~1000 cycles to process a packet
Suggests efficient use of CPU cycles is key!
![Page 16: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/16.jpg)
10/29/2012 16cs262a-S12 Lecture-18
memmem`chipset’
corescores
Lesson#1: multi-core alone isn’t enough
mem mem
corescores
Current (2009)
I/O hub
`Older’ (2008)
Memory controller in
`chipset’
Shared front-side bus
bottleneck
Hardware need: avoid shared-bus servers
![Page 17: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/17.jpg)
10/29/2012 17cs262a-S12 Lecture-18
Lesson#2: on cores and ports
inputports
cores outputports
How do we assign cores to input and output ports?
poll transmit
![Page 18: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/18.jpg)
10/29/2012 18cs262a-S12 Lecture-18
Problem: locking
Lesson#2: on cores and ports
Hence, rule: one core per port
![Page 19: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/19.jpg)
10/29/2012 19cs262a-S12 Lecture-18
Problem: cache misses, inter-core communication
poll
looku
p+tx
pollpoll
poll
looku
p+tx
looku
p+tx
looku
p+tx
pipelined
poll+looku
p+tx
poll+looku
p+tx
poll+looku
p+tx
poll+looku
p+tx
poll+looku
p+tx
poll+looku
p+tx
poll+looku
p+tx
poll+looku
p+tx
parallel
L3 cache L3 cache L3 cache L3 cache
Lesson#2: on cores and ports
Hence, rule: one core per packet
packet transferred between cores packet stays at one corepacket (may be) transferred
across cachespacket always in one cache
![Page 20: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/20.jpg)
10/29/2012 20cs262a-S12 Lecture-18
• two rules:– one core per port– one core per packet
• problem: often, can’t simultaneously satisfy both
• solution: use multi-Q NICs
Lesson#2: on cores and ports
Example: when #cores > #ports
one core per portone core per packet
![Page 21: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/21.jpg)
10/29/2012 21cs262a-S12 Lecture-18
Multi-Q NICs
• feature on modern NICs (for virtualization)–port associated with multiple queues on NIC–NIC demuxes (muxes) incoming (outgoing) traffic–demux based on hashing packet fields
(e.g., source+destination address)
Multi-Q NIC: incoming traffic Multi-Q NIC: outgoing traffic
![Page 22: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/22.jpg)
10/29/2012 22cs262a-S12 Lecture-18
Multi-Q NICs• feature on modern NICs (for virtualization)• repurposed for routing
– rule: one core per port– rule: one core per packet
• if #queues per port == #cores, can always enforce both rules
queue
![Page 23: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/23.jpg)
10/29/2012 23cs262a-S12 Lecture-18
Lesson#2: on cores and portsrecap:• use multi-Q NICs
–with modified NIC driver for lock-free polling of queues
• with–one core per queue (avoid locking)–one core per packet (avoid cache misses, inter-
core communication)
![Page 24: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/24.jpg)
10/29/2012 24cs262a-S12 Lecture-18
Lesson#3: book-keeping
mem
cores ports
I/O hub
mem
cores
1. core polls input port2. NIC writes packet to memory3. core reads packet4. core processes packet 5. core writes packet to out port
and packet descriptors
• solution: batch packet operations–NIC transfers packets in batches of `k’
problem: excessive per packet book-keeping overhead
![Page 25: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/25.jpg)
10/29/2012 25cs262a-S12 Lecture-18
Recap: routing on a server
Design lessons:1. parallel hardware
» at cores and memory and NICs 2. careful queue-to-core allocation
»one core per queue, per packet3. reduced book-keeping per packet
»modified NIC driver w/ batching
(see paper for “non needs” – careful memory placement, etc.)
![Page 26: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/26.jpg)
10/29/2012 26cs262a-S12 Lecture-18
Single-Server Measurements:Experimental setup
• test server: Intel Nehalem (X5560)– dual socket, 8x 2.80GHz cores– 2x NICs; 2x 10Gbps ports/NIC
mem mem
corescores
I/O hub
additional servers generate/sink test traffic
10Gbpsmax 40Gbps
![Page 27: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/27.jpg)
10/29/2012 27cs262a-S12 Lecture-18
Experimental setup
• test server: Intel Nehalem (X5560)
• software: kernel-mode Click [TOCS’00]– with modified NIC driver
(batching, multi-Q)
mem mem
corescores
I/O hub
additional servers generate/sink test traffic
Click runtime
modified NIC driver
packet processing
10Gbps
![Page 28: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/28.jpg)
10/29/2012 28cs262a-S12 Lecture-18
Experimental setup
• test server: Intel Nehalem (X5560)
• software: kernel-mode Click [TOCS’00]– with modified NIC driver
• packet processing– static forwarding (no header processing) – IP routing
» trie-based longest-prefix address lookup» ~300,000 table entries [RouteViews]» checksum calculation, header updates, etc.
mem mem
corescores
I/O hub
additional servers generate/sink test traffic
Click runtime
modified NIC driver
packet processing
10Gbps
![Page 29: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/29.jpg)
10/29/2012 29cs262a-S12 Lecture-18
Experimental setup
• test server: Intel Nehalem (X5560)
• software: kernel-mode Click [TOCS’00]– with modified NIC driver
• packet processing– static forwarding (no header processing)– IP routing
• input traffic– all min-size (64B) packets
(maximizes packet rate given port speed R)– realistic mix of packet sizes [Abilene]
mem mem
corescores
I/O hub
additional servers generate/sink test traffic
Click runtime
modified NIC driver
packet processing
10Gbps
![Page 30: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/30.jpg)
10/29/2012 30cs262a-S12 Lecture-18
Factor analysis: design lessons
Test scenario: static forwarding of min-sized packets
Nehalem w/ multi-Q + `batching’
driver
older
shared-busserver
1.2
currentNehalem
server
Nehalem + `batching’
NIC driver
2.85.9
pkt
s/se
c (M
) 19
![Page 31: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/31.jpg)
10/29/2012 31cs262a-S12 Lecture-18
Single-server performance
IP routingstatic forwarding
36.5
6.35
36.5
9.7Gbps
min-size packets
realistic pkt sizes
Bottleneck: traffic generation Bottleneck?
40Gbps
![Page 32: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/32.jpg)
10/29/2012 32cs262a-S12 Lecture-18
Bottleneck analysis (64B pkts)
Test scenario: IP routing of min-sized packets
Per-packet load due to
routing
Maximum component capacity –
nominal (empirical)
Max. packet rate as per component
capacity -- nominal (empirical)
memory 725 bytes/pkt 51 (33) Gbytes/sec 70 (46) Mpkts/sec
I/O 191 bytes/pkt 16 (11) Gbytes/sec 84 (58) Mpkts/sec
Inter-socket
link
231 bytes/pkt 25 (18) Gbytes/sec 108 ( 78) Mpkts/sec
CPUs 1693 cycles/pkt 22.4 Gcycles/sec 13 Mpkts/sec
Recall: max IP routing = 6.35Gbps 12.4 M pkts/sec
CPUs are the bottleneck
![Page 33: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/33.jpg)
10/29/2012 33cs262a-S12 Lecture-18
Recap: single-server performance
R NRcurrent servers
(realistic packet sizes) 1/10 Gbps 36.5 Gbps
current servers(min-sized packets) 1
6.35(CPUs
bottleneck)
![Page 34: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/34.jpg)
10/29/2012 34cs262a-S12 Lecture-18
With upcoming servers? (2010)4x cores, 2x memory, 2x I/O
Recap: single-server performance
![Page 35: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/35.jpg)
10/29/2012 35cs262a-S12 Lecture-18
Recap: single-server performance
R NRcurrent servers
(realistic packet sizes) 1/10 Gbps 36.5 Gbps
current servers(min-sized packets) 1
6.35(CPUs
bottleneck)upcoming servers –
estimated(realistic packet sizes)
1/10/40 146
upcoming servers –estimated
(min-sized packets)1/10 25.4
![Page 36: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/36.jpg)
10/29/2012 36cs262a-S12 Lecture-18
Project Feedback from Meetings• Update your project descriptions and plan
– Turn your description/plan into a living document in Google Docs – Share Google Docs link with us – Update plan/progress throughout the semester
• Later this week: register your project and proposal on class Website (through project link)
• Questions to address:– What is your evaluation methodology? – What will you compare/evaluate against? Strawman?– What are your evaluation metrics?– What is your typical workload? Trace-based, analytical, …– Create a concrete staged project execution plan:
» Set reasonable initial goals with incremental milestones – always have something to show/results for project
![Page 37: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/37.jpg)
10/29/2012 37cs262a-S12 Lecture-18
Practical Architecture: Goal
• scale software routers to multiple 10Gbps ports
• example: 320Gbps (32x 10Gbps ports)–higher-end of edge routers; lower-end core
routers
![Page 38: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/38.jpg)
10/29/2012 38cs262a-S12 Lecture-18
A cluster-based router today
10Gbps
interconnect?
![Page 39: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/39.jpg)
10/29/2012 39cs262a-S12 Lecture-18
Interconnecting servers
Challenges– any input can send up to R bps to any output
![Page 40: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/40.jpg)
10/29/2012 40cs262a-S12 Lecture-18
A naïve solution
10Gbps
problem: commodity servers cannot accommodate NxR traffic
N2 internal links
of capacity R
RR
R
R
R
![Page 41: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/41.jpg)
10/29/2012 41cs262a-S12 Lecture-18
Interconnecting servers
Challenges– any input can send up to R bps to any output
» but need a low-capacity interconnect (~NR)» i.e., fewer (<N), lower-capacity (<R) links per server
– must cope with overload
![Page 42: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/42.jpg)
10/29/2012 42cs262a-S12 Lecture-18
Overload
need to drop 20Gbps; (fairly across input ports)
10Gbps
10Gbps
10Gbps
10Gbps
drop at output server? problem: output might
receive up to NxR traffic
drop at input servers? problem: requires global state
![Page 43: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/43.jpg)
10/29/2012 43cs262a-S12 Lecture-18
Interconnecting servers
Challenges– any input can send up to R bps to any output
» but need a lower-capacity interconnect» i.e., fewer (<N), lower-capacity (<R) links per server
– must cope with overload» need distributed dropping without global scheduling » processing at servers should scale as R, not NxR
![Page 44: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/44.jpg)
10/29/2012 44cs262a-S12 Lecture-18
Interconnecting servers
Challenges– any input can send up to R bps to any output– must cope with overload
With constraints (due to commodity servers and NICs)– internal link rates ≤ R– per-node processing: cxR (small c)– limited per-node fanout
Solution: Use Valiant Load Balancing (VLB)
![Page 45: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/45.jpg)
10/29/2012 45cs262a-S12 Lecture-18
Valiant Load Balancing (VLB)
• Valiant et al. [STOC’81], communication in multi-processors
• applied to data centers [Greenberg’09], all-optical routers [Kesslassy’03], traffic engineering [Zhang-Shen’04], etc.
• idea: random load-balancing across a low-capacity interconnect
![Page 46: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/46.jpg)
10/29/2012 46cs262a-S12 Lecture-18
VLB: operation
R/N
R/N
R/N
R/N
R/N
Packets forwarded in two phases
phase 1 phase 2
Packets arriving at external port are uniformly load balanced• N2 internal links of capacity R/N
• each server receives up to R bps Each server sends up to R/N (of traffic received in phase-1) to output server;
drops excess fairly
Output server transmits received traffic on external port
R
• N2 internal links of capacity R/N• each server receives up to R bps
R/N
R/N
R/N
R/N
R/N
R
![Page 47: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/47.jpg)
10/29/2012 47cs262a-S12 Lecture-18
VLB: operation
phase 1+2
• N2 internal links of capacity 2R/N• each server receives up to 2R bps • plus R bps from external port • hence, each server processes up to 3R• or up to 2R, when traffic is uniform [directVLB, Liu’05]
RR
![Page 48: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/48.jpg)
10/29/2012 48cs262a-S12 Lecture-18
VLB: fanout? (1)
Multiple external ports per server (if server constraints permit)
fewer but faster links
fewer but faster servers
![Page 49: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/49.jpg)
10/29/2012 49cs262a-S12 Lecture-18
VLB: fanout? (2)
Use extra servers to form a constant-degree multi-stage interconnect (e.g., butterfly)
![Page 50: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/50.jpg)
10/29/2012 50cs262a-S12 Lecture-18
Authors solution:
• assign maximum external ports per server• servers interconnected with commodity NIC
links• servers interconnected in a full mesh if
possible• else, introduce extra servers in a k-degree
butterfly• servers run flowlet-based VLB
![Page 51: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/51.jpg)
10/29/2012 51cs262a-S12 Lecture-18
Outline
• introduction• routing on a single server
– design – evaluation
• routing on a cluster – design– evaluation
• next steps• conclusion
![Page 52: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/52.jpg)
10/29/2012 52cs262a-S12 Lecture-18
Scalability
• question: how well does clustering scale forrealistic server fanout and processing capacity?
• metric: number of servers required to achievea target router speed
![Page 53: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/53.jpg)
10/29/2012 53cs262a-S12 Lecture-18
Scalability
Assumptions• 7 NICs per server• each NIC has 6 x 10Gbps ports or 8 x 1Gbps
ports• current servers
– one external 10Gbps port per server (i.e., requires that a server process 20-30Gbps)
• upcoming servers– two external 10Gbps port per server
(i.e., requires that a server process 40-60Gbps)
![Page 54: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/54.jpg)
10/29/2012 56cs262a-S12 Lecture-18
Scalability (computed)
160Gbps
320Gbps
640Gbps
1.28Tbps 2.56Tbps
current servers 16 32 128 256 512
upcomingservers 8 16 32 128 256
Example: can build a 320Gbps router using 32 `current’ servers
Transition from mesh to butterfly
![Page 55: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/55.jpg)
10/29/2012 57cs262a-S12 Lecture-18
Implementation: the RB8/4
Specs.• 8x 10Gbps external
ports• form-factor: 4U• power: 1.2KW• cost: ~$10k
2 x 10Gbps external ports
(Intel Niantic NIC)
Key results (realistic traffic)
• 72 Gbps routing• reordering: 0-0.15%• validated VLB bounds
4 x Nehalem servers
![Page 56: EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012](https://reader036.vdocuments.us/reader036/viewer/2022062521/568168b5550346895ddf8e6d/html5/thumbnails/56.jpg)
10/29/2012 58cs262a-S12 Lecture-18
Is this a good paper?• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good
system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”
challenge?• How would you review this paper today?