looking beyond 400g - ieee-sa€¦ · rakesh chopra. cisco fellow. february 8, 2020. a system...

23
Rakesh Chopra Cisco Fellow February 8, 2020 A System Vendor Perspective Beyond 400 Gb/s Ethernet Study Group Looking Beyond 400G … Many thanks to Cisco Engineers and Insightful Customers … www.linkedin.com/in/rakesh-chopra/ @Rakesh_Chopra1 [email protected]

Upload: others

Post on 28-Apr-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Rakesh ChopraCisco FellowFebruary 8, 2020

A System Vendor PerspectiveBeyond 400 Gb/s Ethernet Study Group

Looking Beyond 400G

… Many thanks to Cisco Engineers and Insightful Customers …

www.linkedin.com/in/rakesh-chopra/

@Rakesh_Chopra1

[email protected]

Page 2: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

System Architectures

RetimerRetimer Port

Port

LCSwitch

FESwitch

Retimer Port

Port

StandaloneSwitch

Mux Port

PortMux

Fixe

d*C

entra

lized

Dis

tribu

ted

LR VSR

* - Can be interconnected to create a disaggregated chassis

2

StandaloneSwitch

Optional Redundancy

Page 3: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

640G 1.28T 3.2T 6.4T12.8T

25.6T

51.2T

x64

102.4T?

12 Years7 Switch Generations (80x)4 SerDes Speeds (10x)4 Switch Radix Increases (8x)

10Gx128

28Gx256x128

10G 28Gx256

56G 56Gx256

QSFPDD56QSFP-DD56

2010 2012 2014 2016 2018 2020 2022 202?

SerDes Speed# SerDes

Optics

Switch Silicon BW

Relentless Advancement – Switch Silicon BandwidthRepresents a combination of multiple chip families and architectures to provide historical context and future projections

QSFP+QSFP+ QSFP28QSFP28112G

x512

QSFPDD800

112Gx512

QSFP-DD800?

3

Page 4: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

640G 1.28T 3.2T 6.4T 12.8T 25.6T

51.2T

2010 2012 2014 2016 2018 2020 2022

Relentless Advancement – 80x BW over 12 YearsRepresents a combination of multiple chip families and architectures to provide historical context and future projectionsFixed Box Power BreakdownRetimer Power and other system components not included

25xASIC SerDes

Power

8xASIC Core

Power

11xSystem Fan

Power

51.2T

26xOptics Power

Wat

ts

Increase vs. 2010

22xTotal Power

4

Page 5: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

The Multiplication Effect of a Watt

Point of Load(POL)

Conn PSU Conn PDUAutomatic Transfer

Switch (ATS)

Distribution Panel

Facility

1W

1.15W

1.3W 1.26W

2.73W

PUE=2.1

PUE=1.187%

77%

FansConn

Heat

Heat

Heat

Heat

Heat

Heat

Heat

Heat

Heat

System

Switch

Airflow

Switch

Cooling Systems

Page 6: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

“Power is Everything”*John Aaron- Apollo 13 Flight Controller

Apollo 13 – Universal Pictures

* - Thanks to Kraig Owen for the reference

Power is THE Problem to Solve

Limits what we can build

Limits what can be deployed

Limits what our planet can sustain

6

Adopt a power first design and deployment methodology

Page 7: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Co-packaged Optics Is InevitablePower savings drives requirement

Architectural Approach to Power Optimization

Must minimize SerDes power

SerDes power increases with distance

7

Retimers maybe needed adding additional power

No Retimers needed

Page 8: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Cop

per

Opt

ics

Opt

ics

Future innovations will only be possible with

silicon and opticalintegration

SiSilicon

OpOptics

Data Rate

Long Haul (1000km+)

Metro (80km)

Within Building (2km)

Within Big Floor (500m)

With Small Floor (100m)

Within Rack (2m-3m)

Within System (1m)

From Package (<1m)

TodayHigher data rates and distance drive the move from copper to optics

Co-packaged Optics Is Inevitableand viable in the 51.2T generation

8 51.2T

Page 9: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Wider Radix

More LayersGraph concept leveraged from R. Nagarajan, Ilya Lyubomirsky, “Next-Gen Data Center Interconnects: The Race to 800G”

Adjusted to hold servers per rack constant

Doubling Radix adds 2x-16x more servers

Adding a layer adds 2x-256x more servers

High

er R

even

ue P

er F

acilit

yBuilding Your Data CenterImpact of Switch Radix

9

Scale Out

Scale Up

Page 10: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Doubling Radix adds 2x-16x more servers depending on the layers in the network

Impact of Switch RadixCase against increasing Switch Radix

Increasing radix adds cabling complexity, cost and weight

Increasing radix decreases link (mac) speed for the same switch bandwidth As the flow speed approaches the link speed link utilization decreases

Complicated Cabling

#

50G

#

Poor link utilizationDoes the arrival of nx400G AI endpoints require 800GE/1.6TE network for high performance?

Depends on the applications!?

12.8T

x32 400GE

x64 200GE

x128 100GE

x256 50GEECMP Hash{source_ip, source_port, dest_ip, dest_port, protocol}

Page 11: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

LAG vs. ECMPThe Basic Topology

#

Lag Bundle

Create a “fat pipe” between two boxes

Service Provider

#

ECMP

Create multipe “equal” pipes between many boxes

Data Center

Both use the same hash function and expose link utilization issues

Note : Service Providers use ECMP as well but not in an equivalent fundamental way

Page 12: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

LAG vs. ECMPThe advantages of higher speed MACs aren’t as clear as they used to be

Decreasing Revenue

No downside to replacing a LAG bundle with a higher speed Ethernet MAC

Downside for higher speed MAC with ECMP

For same speed silicon:• As you increase MAC speed• Decrease your radix• Decreases your switches per DC• Lower revenue potential

Decrease Radix

#

Increase MAC speed

Create a “fat pipe” between two boxeswith no hash inefficiencies

Service Provider

# #

Data Center

#

Increase MAC speed

Shrink the number of boxes we can connect to

Page 13: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Adding a layer adds 2x-256x more servers depending on the switch radix

Impact of Switch RadixCase against adding Network Layers

Increasing layers adds network cost and powermore switches and optics per server

Assuming no extra components needed to scale out (reverse gearboxes, etc….)Ignoring ECMP hash efficiency impact for “goodput” of the network

x16

x3212.8T12.8T

12.8T 12.8T

768 Servers48 Switches

x16

x32

2-Tier 32x400G Network

ServersSwitch16

x128

x25612.8T12.8T

12.8T 12.8T

6,144 Servers384 Switches

x128

x256

2-Tier 256x50G Network

ServersSwitch16

x16

x3212.8T12.8T

12.8T 12.8T

x16

x32

x16

x3212.8T12.8T

12.8T 12.8T

x16

x32x16

x51212.8T12.8T

x16

x512

3-Tier 32x400G Network

12,288 Servers1,280 Switches

ServersSwitch9.6

Wider Radix

More

Layers

Scale Out

Scale Up

Page 14: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Wider Radix

More LayersGraph concept leveraged from R. Nagarajan, Ilya Lyubomirsky, “Next-Gen Data Center Interconnects: The Race to 800G”

Adjusted to hold servers per rack constant

Doubling Radix adds 2x-16x more servers

Adding a layer adds 2x-256x more servers

High

er R

even

ue P

er F

acilit

yBuilding Your Data CenterImpact of Switch Radix

14

Power Efficiency

Link Efficiency

Scale Out

Scale Up

There is no free lunch, every engineering choice has trade-offs

Balancing act between radix, MAC speed, and layers in the network…

Page 15: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Graph concept leveraged from R. Nagarajan, Ilya Lyubomirsky, “Next-Gen Data Center Interconnects: The Race to 800G”Adjusted to hold servers per rack constant

High

er R

even

ue P

er F

acilit

yBuilding Your Data CenterScale-Out vs. Scale-Up– A Balancing Act

15

Switch BW SerDes Radix x32

12.8T 56G 400GE

25.6T 112G 800GE

51.2T 112G 1.6TE

102.4T? 212G? 3.2TE x16

x16

x8

x8

• x32 and x128 radix are prominent today• Ethernet rates are lagging for x32 radix• Will x32 networks migrate to x64?

Page 16: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Graph concept leveraged from R. Nagarajan, Ilya Lyubomirsky, “Next-Gen Data Center Interconnects: The Race to 800G”Adjusted to hold servers per rack constant

High

er R

even

ue P

er F

acilit

yBuilding Your Data CenterScale-Out vs. Scale-Up– A Balancing Act

16

Switch BW SerDes Radix x32 Radix x64

12.8T 56G 400GE 200GE

25.6T 112G 800GE 400GE

51.2T 112G 1.6TE 800GE

102.4T? 212G? 3.2TE 1.6TE

Improved Power EfficiencyImproved Link Utilization

Wider Radix - Scale Out More Layers – Scale Up

x16

x8

x4

x16

x8

x8

x4

x8

• x32 and x128 radix are prominent today• Ethernet rates are lagging for x32 radix• Will x32 networks migrate to x64?

• Potential need for 800GE with 8x112G Lanes• 51.2T• 64 x QSFP-DD800 (carrying 1x800GE) – 2RU

• Potential need for 1.6TE with 8x224G Lanes• 102.4T• 64 x QSFP-DD1600 (Carrying 1x1.6TE) – 2RU

Radi

x 64

Page 17: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Graph concept leveraged from R. Nagarajan, Ilya Lyubomirsky, “Next-Gen Data Center Interconnects: The Race to 800G”Adjusted to hold servers per rack constant

High

er R

even

ue P

er F

acilit

yBuilding Your Data CenterScale-Out vs. Scale-Up– A Balancing Act

Switch BW SerDes Radix x32 Radix x64 Radix x128

12.8T 56G 400GE 200GE 100GE

25.6T 112G 800GE 400GE 200GE

51.2T 112G 1.6TE 800GE 400GE

102.4T? 212G? 3.2TE 1.6TE 800GE

Improved Power EfficiencyImproved Link Utilization

Wider Radix - Scale Out More Layers – Scale Up

• x32 and x128 radix are prominent today• Ethernet rates are lagging for x32 radix• Will x32 networks migrate to x64?

• Potential need for 800GE with 8x112G Lanes• 51.2T• 64 x QSFP-DD800 (carrying 1x800GE) – 2RU

• Potential need for 1.6TE with 8x224G Lanes• 102.4T• 64 x QSFP-DD1600 (Carrying 1x1.6TE) – 2RU

• Clear need for 800GE with 4x224G Lanes• 102.4T with 128-Radix• 128 x QSFP-800 (carrying 1x800GE) – 4RU

or

• 64 x QSFP-DD1600 (carrying 2x800GE)-2RU

x16

x8

x4

x16

x8

x8

x4

x8 x4

x4

x2

x2

Radi

x 64

Radi

x 12

8

17

Page 18: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

RetimerRetimer Optics

Optics

LCSwitch

FESwitch

Retimer Optics

Optics

StandaloneSwitch

Mux Optics

OpticsMux

Fixe

dC

entra

lized

Dis

tribu

ted

8x212G VSR - No Re-timers

X

X

VSR - Optimize for Optics112G last major passive copper generation Active Copper

212G LR Required

AEC

2x800GE1x1.6TE

2x800GE1x1.6TE

2x800GE1x1.6TE

2x800GE1x1.6TE

2x800GE1x1.6TE

2x800GE1x1.6TE

212G Generation Traditional System ArchitecturesViable with Traditional System Designs

18

Optional Redundancy

StandaloneSwitch

212G MR-LR Required

Page 19: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

212G Generation CPO System ArchitecturesPower Optimized ; Introduced first on Client-Side Optics

RetimerLC

SwitchFE

Switch

StandaloneSwitch

Mux

Mux

Fixe

dC

entra

lized

Dis

tribu

ted

212G LR Required

212G MR-LR Required

Optics

Optics

Optics

Optics

Optics

Optics

212G/2x112G

19

Optional Redundancy

StandaloneSwitch

Page 20: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

LCSwitch

FESwitch

StandaloneSwitch

MuxStandalone

Switch

Mux

Fixe

dC

entra

lized

Dis

tribu

ted

Optics

Optics

Optics

Optics

Optics

Optics

Optics

Optics

Optics

Optics

RetimerXOptics

Optics

Optics

Optics

Future CPO ArchitecturesEventually Optics replace high speed data interconnect

20

Optional Redundancy

Page 21: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Moral Imperative

Business Imperative

Technology Imperative

Perfect Catalyst to Innovate

Call to ActionPower Driven Architecture

3 Main System ArchitecturesFixed, Centralized, Modular

BW Doubling every 2 YearsNot Slowing Down, Power Too High

Co-package Optics are Coming51.2T Generation

21

Page 22: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Define 212G Electrical • XSR, VSR as first priority to optimize power efficiency

• Define VSR standard to ensure retimer-less designs• Define MR, LR as second priority• Focus on 212G* instead of 224G to optimize for Ethernet rates

Define 800GE MAC• Over 212G to enable 102.4T with radix 128 (128x800GE)• Over 112G to enable 51.2T with radix 64 (64x800GE)

Define 1.6TE MAC• Only if there is a cost effective PMD solution• Over 212G to enable 102.4T with radix 64 (64x1.6TE)

Next Steps

22

1

2

3

Incr

easi

ng P

riorit

yLooking for study group to define a cost and power effective solution to these problems

* - Final rate depends on future work

Page 23: Looking Beyond 400G - IEEE-SA€¦ · Rakesh Chopra. Cisco Fellow. February 8, 2020. A System Vendor Perspective. Beyond 400 Gb/s Ethernet Study Group. Looking Beyond 400G … Many

Thank You!