energy efficient multi-gb/s i/o: circuit and system design ......energy efficient multi-gb/s i/o:...

94
Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel Circuit Research Labs

Upload: others

Post on 31-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques

April 22, 2011

WMED-2011

Bryan Casper, Intel Circuit Research Labs

Page 2: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

2 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets

• Putting it all together

Page 3: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

3 Bryan Casper – Low Power I/O

High-End Server

Length<20‖

Assumptions CPU TDP = 135W

I/O eff = 10pJ/bit 1-side

0

25

50

75

100

0

250

500

750

1000

2010 2015*2020*

Fracti

on

of

CP

U

Po

wer (

%)

BW

Need

* (

GB

/s)

Memory

35%

Peripheral

25%

Coherent

40%

Future* BW Breakdown

2010 estimates based on Intel® Xeon® Processor X7560 *2015-2020 BW need estimates are solely the opinion of the author and do not necessarily represent the position of Intel Corp.

Page 4: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

4 Bryan Casper – Low Power I/O

Trends in I/O Power vs. Year*

1

10

100

Power Eff. (pJ/bit or mW/Gb/s)

2002 2004 2006 2008

-20%/year

Power efficiency improving

• Driven by circuit, channel and process improvements

• …but not keeping pace with aggregate BW needs

– e.g. 1TB/s x 10pJ/bit = 80W!

*Adapted from Poulton, et. al., JSSC Dec. 2007

Page 5: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

5 Bryan Casper – Low Power I/O

Impact of 1TB/s CPU*

*Assuming: 1TB @ 10pJ/bit, 5 year lifetime, 10¢/kWh, 50% conversion loss, fossil fuel generated electricity

For the environmentally minded: 8000kg of CO2

$800 Electricity

~½ CPU Power Budget

Page 6: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

6 Bryan Casper – Low Power I/O

I/O Energy Efficiency Definition

• mW/Gb/s = pJ/bit

• Total physical layer energy required to move data

– Includes amortized global power as well

• Usually 2-sided metric (TX + RX)

Tx Rx

PLL DLL

Bias

CRC

Framing

Transport

PHY layer clock distribution

EQ

BGR Included in PHY energy efficiency calculation

Synchronizers

CDR

Training logic

SerDes

Page 7: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

7 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets

• Putting it all together

Page 8: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

8 Bryan Casper – Low Power I/O

Past technology trends scaled efficiency proportional to feature size

Hatamkhani, et. al. DAC 2006

Page 9: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

9 Bryan Casper – Low Power I/O

Process vs. Logic Scaling Scenarios*

0

0.25

0.5

0.75

1

Past Present Future

Scalin

g F

acto

r P

er

Gen

erati

on

Logic Energy

Supply Voltage

Capacitance

*ITRS-like trends assuming bulk planar CMOS. Conceptual scenarios with large error bar. Detail

Page 10: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

10 Bryan Casper – Low Power I/O

Example Research I/O Energy Scaling

Aggregate I/O scaling factor

per generation

~0.9x

Variation compensation overhead could cause factor

to be >0.9x

Process scaling estimates vs. circuit type • Logic 0.75x 37 • Noise limited 0.95x 37.3 • Sense amp 0.85x 19.2 • Swing limited 1x 6.5

Sense amp

Swing limited

Logic

Noise limited

Research I/O* Power Breakdown

*Unpublished Research I/O technology based on 32nm

Page 11: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

11 Bryan Casper – Low Power I/O

Trends in I/O Power vs. Year*

1

10

100

Energ

y E

ffic

iency

(pJ/

bit)

2002 2004 2006 2008

-20%/year

Trend may change if depend solely on process

Architectural enhancements much more effective than process scaling

Page 12: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

12 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets

• Putting it all together

Page 13: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

13 Bryan Casper – Low Power I/O

Optimal Energy-Performance Design Space

1x

10x

100x

1000x

Adapted from Mark Horowitz IEDM 2005

Performance

Pow

er

10x

Page 14: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

14 Bryan Casper – Low Power I/O

Adder Design Space

1x

10x

100x

1000x

Stat.Carr.Chain Stat.Carr.Sel

Stat. KS

Dom. BK

Dom. KS

Dom. LF

Stat. LF

Stat. BK

Stat. 84421

Adapted from Mark Horowitz IEDM 2005

Performance

Pow

er

10x

Page 15: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

15 Bryan Casper – Low Power I/O

Adder Design Space

1x

10x

100x

1000x

Stat.Carr.Chain Stat.Carr.Sel

Stat. KS

Dom. BK

Dom. KS

Dom. LF

Stat. LF

Stat. BK

Stat. 84421

Adapted from Mark Horowitz IEDM 2005

Performance

Pow

er

10x

I/O tradeoff is even more nonlinear due to channel rolloff

Page 16: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

16 Bryan Casper – Low Power I/O

I/O Design Space Tradeoffs

Steep tradeoff caused by:

1. Channel BW limit

2. Process BW limit

3. Circuit architecture complexity

Performance

Pow

er

Key to low power links is operating on this portion

of design space

Page 17: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

17 Bryan Casper – Low Power I/O

Power’s Deadly Combination

Stingy System Architect

• Not willing to limit legacy channel length or topologies

• Doesn’t want to erode profit margins by adopting higher cost interconnect

• Perceives alternate topologies as unproven & risky

• Annoyed that Moore’s law doesn’t apply to channels

Macho Link Designer

• Knows Shannon’s Capacity

• Takes on challenge to apply advanced communication techniques to high-speed links

– e.g. DSL, Ethernet

• Thinks Moore’s law will eventually resolve power & complexity issues

Page 18: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

18 Bryan Casper – Low Power I/O

10

100

0 10 20 30 40 1

Channel Loss @ Symbol rate (dB)

Energ

y E

ffic

iency

(pJ/

bit)

Energy Efficiency Correlation to Loss

Page 19: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

19 Bryan Casper – Low Power I/O

Legacy Backplane Channel

15GHz -60dB

-30dB

0dB

0GHz 5GHz 10GHz

Lo

ss

Page 20: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

20 Bryan Casper – Low Power I/O

Backplane Data Rates

12Gb/s

6Gb/s

18Gb/s

TX FIR taps

DFE taps 1 2 4 4

128

4

1

4

2

4

8

4

16

4

32

4

64

4

4

Data

Rate

Increasing Equalizer Complexity (nonlinear scale)

Page 21: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

21 Bryan Casper – Low Power I/O

En

erg

y E

ffic

ien

cy

(pJ/b

it)

30

15

15 10 5

Max Rate (Gb/s)

0

Channel Power Wall

Overextending channel leads to nonlinear EQ

power vs. performance tradeoff

Increasing equalization

complexity

5pJ/bit baseline, ½pJ/bit/DFE tap, ¼pJ/bit/TX tap

Page 22: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

22 Bryan Casper – Low Power I/O

I/O Challenges: Power

-40dB

0dB

0GHz 5GHz 15GHz

-20dB

10GHz

Insertion Loss

10

100

1 10 100

Eq

uali

zer P

ow

er

(p

J/

bit

or

mW

/G

b/

s)

Performance (Gb/s) Power assumptions in slide backup

Legacy BP

• Legacy backplane w/ 2 connectors & sockets, ½m FR4

Page 23: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

23 Bryan Casper – Low Power I/O

I/O Challenges: Power

• Legacy backplane w/ 2 connectors & sockets, ½m FR4

• ½m cabled topology with top-pkg connectors

-40dB

0dB

0GHz 5GHz 15GHz

-20dB

10GHz

Insertion Loss

10

100

1 10 100

Eq

uali

zer P

ow

er

(p

J/

bit

or

mW

/G

b/

s)

Performance (Gb/s)

Cable Legacy BP

Power assumptions in slide backup

Cable

CPU Socket

LGA Connector

Page 24: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

24 Bryan Casper – Low Power I/O

Loss/rate/power estimates

-20dB

0dB

0GHz X GHz

-10dB

|S21|

Assuming:

• Xtalk not primary limiter

• TX+RX jitter ~1/2UI

• RX noise 1mVrms

• TX swing ~1Vdiffp-p

• Cabled link w/ connectors

− Channel ―well behaved‖

Equalization Complexity Est. data rate Normalized power (rough guess)

None ~0.8*X Gb/s 1

Low power ~2.0*X Gb/s ~1

Moderate (3 tap LE, 4 tap DFE) ~2.4*X Gb/s ~2

Complex (>50 tap LE+DFE) ~3.6*X Gb/s ~10-100

Complex EQ+PAM+FEC/coding ~4.4*X Gb/s ~100-1000

Shannon’s capacity ~8-10*X Gb/s n/a

Page 25: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

25 Bryan Casper – Low Power I/O

10

100

0 10 20 30 40 1

Power Optimized Links • Simple equalization • Low TX swing • Sensitive RX sampler • Low-power clocking

Energ

y E

ffic

iency

(pJ/

bit)

Channel Loss @ Symbol rate (dB)

Common Traits of Low Power Links

Page 26: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

26 Bryan Casper – Low Power I/O

How to scale rate or distance and maintain energy efficiency

• Path to scaling performance: Refined channels

– e.g. Top-package connector based cabled links

½ Meter Channel Examples (based on Intel Labs Measurements)

• PCIe (2 connector 20dB @4GHz

• LCP Flex* 20dB @15GHz

• Twinax 36 AWG* 20dB @30GHz

*No connector, pkg or pad cap

Cable

CPU Socket

LGA Connector

Page 27: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

27 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets

• Putting it all together

Page 28: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

28 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Low-power Link Circuits Top Ten

• Not a comprehensive list

– More like a sampling of known power reduction methods

• Few low power links incorporate all of these techniques

– Most incorporate at least some

• Not intended to be a detailed overview of each method

Page 29: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

29 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #1: Modest data rates

Steep tradeoff caused by:

1. Channel BW limit

2. Process BW limit

3. Circuit architecture complexity

Performance

Pow

er

Key to low power links is operating on this portion

of design space

Page 30: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

30 Bryan Casper – Low Power I/O

Clock Buffer Power/Performance Example

0

4

8

12

16

0 5 10

No

rm

alized

En

erg

y

Eff

icie

ncy (

en

erg

y/

bit

)

Operating Frequency (GHz)

Stay off process BW cliff

2

4

8 Fanout =16

Fanout set to meet per stage BW requirement

Page 31: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

31 Bryan Casper – Low Power I/O

Performance Impact on Circuit Architecture: Loop-unrolled DFE

Conventional DFE

• Speedpath limits frequency

Loop-unrolled DFE

• Redundancy to alleviate speedpath

• Increases power and complexity

– Proportional to C1*2N+C2

• C1=comparator + mux

• C2=baseline power

• N=number taps unrolled + + _

c1 +

+

× c4

× c3

× c2

+

× c4

× c3

× c2

× c1

+ + _

Page 32: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

32 Bryan Casper – Low Power I/O

Performance Impact on Circuit Architecture: Multi-phase Clocking

• Interleaving of receiver alleviates need for high-frequency latches and clocks

• Requires greater clock complexity and calibration

– Multiphase clock generators

– Sophisticated phase training

DCC Multi-phase clk

generator

Phase Calibration

RX

RX

RX

RX

DCC

RX

RX

Half rate clock Quarter rate clock

Page 33: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

33 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #2: Forwarded clocking

Forwarded clock power benefits

• No need for high clock recovery BW

• Edge/test samplers optional

– Clock recovery/test can be time multiplexed with data samplers

• Fewer, simpler phase rotators

– Greater tolerance for INL & jitter

– 1 rotator can cover data, edge & test in a time-multiplexed fashion

Data TX

Clock Generator

Data

Dynamic phase

rotator

Edge

Dynamic phase

rotator

Data

Fwd CK

TX

Clock Recovery

TX

Clock Generator

Data

Static phase

rotator

Clock Generator

Embedded clocking

Forwarded clocking

Clock Recovery

Eye Diagram Generator

Test

Dynamic phase

rotator

Clock Generator

Page 34: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

34 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #3: Global Circuit Sharing

• Parallel link implementations have ample opportunity to share common functionality

Data TX Clock

Recovery

Clock Generator

Data

Clock Generator

Data TX Data BGR & Bias

Adaptation Logic

Test Logic

Potential to be shared across parallel link

Page 35: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

35 Bryan Casper – Low Power I/O

Package Flex/Cable

dieB dieA

Socket

PCB

C4 10 bit I/O circuitry

Cal. FSM

Bias Scan

tx_lane[0]

tx_lane[1]

tx_lane[2]

tx_lane[3]

tx_lane[9]

tx_lane[8]

tx_lane[7]

tx_lane[6]

tx_lane[5]

tx_fclk

tx_lane[4]

Scan

Test

Common deskew

Term

Co-optimization of Channel and Circuits Enables Widespread Power Amortization

• Matched interconnect enables clock recovery sharing

– Common deskew across 10 bits

• Test, bias, etc. are shared as well

F. O'Mahony, et. al., " A 47×10Gb/s 1.4mW/(Gb/s) Parallel Interface in 45nm CMOS," ISSCC 2010

Page 36: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

36 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #4: Low power clock distribution

• Reduce distribution distance*

– Compact parallel link floorplan

• Repeaterless distribution**

**B. Casper, F. O'Mahony, "Clocking Analysis, Implementation and Measurement Techniques for High-Speed Data Links—A Tutorial," TCAS1, Jan. 2009

*F. O'Mahony, et. al., " A 47×10Gb/s 1.4mW/(Gb/s) Parallel Interface in 45nm CMOS," ISSCC 2010

Clk Gen

Active I/O circuitry

1302µm

2864µm

Page 37: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

37 Bryan Casper – Low Power I/O

Forwarded Clock Repeater-less Distribution

cmbias

Forwarded Clock TX

Off-chip interconnect

On-chip T-lines

B. Casper, et. al., ―A 20Gb/s forwarded clock transceiver in 90nm CMOS,‖ ISSCC 2006

Repeater-less distribution + forwarded clock combination

has potential to eliminate buffers and save power

Page 38: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

38 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #5: Resonantly tuned clocking

• Resonant clocking suppresses jitter outside the fundamental clock frequency

• Lower power for a given jitter budget

• Limits clock frequency operating points

• Frequently used for resonators

– LC-VCO

• Also used for distribution

Page 39: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

39 Bryan Casper – Low Power I/O

Resonant Clocking Example

J. Poulton, et al., ―A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS,‖ JSSC, Dec., 2007.

Enabled 3x-5x lower clocking power than conventional distribution

Page 40: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

40 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #6,7: Co-designed TX & RX

• TX output stage & RX input dissipate a large portion of link power

• Co-optimize to minimize power and meet BER requirements

Page 41: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

41 Bryan Casper – Low Power I/O

Swing vs. RX Sensitivity

0

1

2

3

0 0.2 0.4

Req

uir

ed

RX

In

pu

t N

ois

e

(m

Vrm

s)

Output Swing

Assumptions:

• RX noise variance proportional to RX power

– 5mW 1mVrms

• Normally distributed ISI sigma=0.001*swing

• 1E-12 BER target

• Voltage-mode TX w/ linear reg.

• Channel loss = -20dB

TX RX

Page 42: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

42 Bryan Casper – Low Power I/O

Simplistic Example: Swing vs. Efficiency

• Optimal energy at ~160mVpp

– Requires ~1mVrms input referred RX noise

0.1

1

10

0.01 0.1 1

En

erg

y E

ffic

ien

cy

(p

J/

bit

)

Output Swing

Total

TX

RX

Optimal Energy Point

Page 43: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

43 Bryan Casper – Low Power I/O

30Gb/s

15Gb/s

45Gb/s

TX FIR taps

DFE taps 1 2 3 4 5 6 4

128

4

1

4

2

4

8

4

16

4

32

4

64

4

4

Low Swing Tradeoffs: 19” Cabled Link Maximum Rates

• Lowest power equalization points hardly suffer due to low swings.

19” Flex ±500mV swing

19” Flex @ ±100mV swing

Ma

x D

ata

Rate

(B

ER

=10

-12)

Page 44: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

44 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #6: Low-Swing TX

Page 45: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

45 Bryan Casper – Low Power I/O

Low-Power TX Drivers: CM vs. VM

Vd,1 = (I/2)R

Vd,0 = -(I/2)R

Vd,pp = IR

I = (Vd,pp/ R)

I

R=Zo

R=Zo

Zo

Current Mode (CM) Single-ended Term

Source: Ganesh Balamurugan

Page 46: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

46 Bryan Casper – Low Power I/O

Low-Power TX Drivers: CM vs. VM

Vd,1 = (I/2)R

Vd,0 = -(I/2)R Vd,pp = IR

I = (Vd,pp/ R)

I

R=Zo

R=Zo

Zo

Vs

Vs

R=Zo

R=Zo

Zo

Vd,1 = (Vs / 2)

Vd,0 = -(Vs / 2)

Vd,pp = Vs

I = (Vd,pp/ 2R)

I = (Vs / 2R)

Current Mode (CM) Single-ended Term

Voltage Mode (VM) Single-ended Term

2X power reduction

Source: Ganesh Balamurugan

Page 47: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

47 Bryan Casper – Low Power I/O

Low-Power TX Drivers: CM vs. VM

Vd,1 = (I/2)R

Vd,0 = -(I/2)R Vd,pp = IR

I = (Vd,pp/ R)

I

R=Zo

R=Zo

Zo

Vs

Vs

R=Zo

Zo

Vd,1 = (Vs / 2)

Vd,0 = -(Vs / 2)

Vd,pp = Vs

I = (Vd,pp/ 4R)

I = (Vs / 4R)

Current Mode (CM) Single-ended Term

Voltage Mode (VM) Differential Term

2R=2Zo

Source: Ganesh Balamurugan

4X power reduction

Page 48: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

48 Bryan Casper – Low Power I/O

Low-Swing TX Drivers: CM vs. VM

VM (Palmer, JSSC 12/2007) CM (O’Mahony, JSSC 12/2010)

Vswing 210mVpp-diff 150mVpp-diff

Proc. / Vcc 90nm / 1.0Vcc 45nm / 0.8Vcc

Eq. No Yes (2-tap)

Datarate 6.25Gb/s 10Gb/s

Bias cap 36pF <1pF

TX drv power 1.10mW 2.12mW

TX bias power 0.76mW 0.34mW

Total TX drv. power 1.86mW 2.46mW

VM power savings reduces for low-swing TX

Page 49: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

49 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #7: Sensitive RX

Page 50: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

50 Bryan Casper – Low Power I/O

Low-Power RX samplers

Good receiver sensitivity allows low TX swing

• Residual input-referred offset: <2mV

• Input-referred noise: 1mV-rms

• Hysteresis + metastability: <2mV

Latch Latch din dout VOA

IDAC offset[5:0]

RSFF

O’Mahony, JSSC Dec. 2010

Page 51: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

51 Bryan Casper – Low Power I/O

Sensitive RX samplers

Latch Latch din dout VOA

IDAC offset[5:0]

RSFF

10:3 clk

IDAC

offset[5:0]

iref

3:10 in

out

in

out

Page 52: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

52 Bryan Casper – Low Power I/O

Sensitive RX samplers

Latch Latch din dout VOA

IDAC offset[5:0]

RSFF

in

out

in

out

in in

out out

clk

Page 53: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

53 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #8: Simple Equalization

• Linear equalizers - big bang for the buck

– If channel is ―well behaved‖ and ISI dominated

• DFE is complex, especially if speedpaths

– 1-tap DFE only cancels 1 postcursor point

DFE

Unequalized pulse response

LE

Page 54: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

54 Bryan Casper – Low Power I/O

Examples: Low Power Linear Equalizers

• Continuous time linear equalizers

– Passive using HP filters or inductive peaking

– Source degeneration

• Pre-emphasis

– Limit magnitude & sign of taps

– Current summing in analog domain

Data α 1-α

Clock

Page 55: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

55 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #9: Calibration and tuning

• Process scaling may reduce power

– By scaling both C and V scaling

– Increases variation due to smaller device area

• Increased logic resources enables sophisticated calibration logic to compensate variation

K. Kuhn, IEDM 2007

LeffWeff

c

2

1σV 2

T

Page 56: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

56 Bryan Casper – Low Power I/O

Extrapolated Process Variation

130nm

y2002

65nm

y2006

32nm

y2010

16nm

y2014

8nm

y2020

4nm

y2026

C2_hi 5.9E-09 5.6E-09 5.4E-09 5.1E-09 4.9E-09 4.6E-09

C2_lo 5.9E-09 4.9E-09 4.1E-09 3.4E-09 2.8E-09 2.4E-09

std(Vt_hi) 1.4E-02 2.6E-02 5.1E-02 9.7E-02 1.8E-01 3.5E-01

std(Vt_lo) 1.4E-02 2.3E-02 3.9E-02 6.5E-02 1.1E-01 1.8E-01

0.01

0.1

1

1E-09

1E-08

Vt

(min

siz

e)

C2

σ(Vt) = C2 ÷ √Agate

Area & Energy scaling limited by variation

Page 57: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

57 Bryan Casper – Low Power I/O

Example: Phase Rotator

fwdclk

MUX

Mixer

Most calibration and adaptation used today is fairly basic

• e.g. duty cycle correction

DCC

Adapted/Calibrated

Page 58: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

58 Bryan Casper – Low Power I/O

Example: Programmable Phase Rotator

• Power can scale as process variation increases

• Alternative is to not scale device area and hence no power scaling

fwdclk

MUX

Slew rate control Mixer

Independent delay tuning

DAC redundancy

Buffer is tunable for delay and

process skew

Digital Phase Meas.

Biasing Monitor (ADC)

Monitoring/BIST

Adapted/Calibrated

DCC

F. O’Mahony, et. al., A Programmable Phase Rotator based on Time-Modulated Injection-Locking

Page 59: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

59 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Top Ten #10: System Modeling

• Key to low power is balanced implementation

– Achieved through comprehensive understanding of power/performance tradeoffs

• Focus design effort and power on highest impact components

• System-level optimization most impactful

– Most will not have this opportunity due to standardization specs.

– Sub-system optimization still useful

Page 60: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

60 Bryan Casper – Low Power I/O

Methodology example: Sensitivity Calculation to Optimize Power

1. Calculate 1st order power sensitivity of each design option

2. Calculate 1st order margin sensitivity of each design option

3. Form mathematical relationship between power and performance

4. Minimize power for performance target using optimization algorithm

5. Repeat steps 1-4 to further refine design point

Page 61: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

61 Bryan Casper – Low Power I/O

Methodology example: Sensitivity Calculation to Optimize Power

Parameter change Baseline change

Eye width change

estimate vs. baseline

(units = 1ps or 0.01UI)

Power delta estimate vs. baseline (mW)

Baseline eye width 18 100

TX ref. ck. jitter (pp) 50ps60ps −4 +0

TX PLL 1-UI jitter , rms (Gaussian jitter, accumulated)

0.5ps0.75ps −12 −3

TX equalizer 2 taps 3 taps −2 +3

TX swing 100mV 200mV +3 +4

TX buffer sinusoidal jitter @ 200MHz

±15ps±18ps −10 −1

TX buffer duty cycle error

1% 2% −1 −0.1

RX PLL 1-UI jitter , rms (Gaussian jitter, accumulated)

0.5ps0.75ps +0 −3

RX PLL bandwidth 4MHz6MHz −7 +0

CDR loop latency 2UI4UI −2 −1

RX input noise 1mVrms2mVrms −2 +2

PI phase accuracy 0.015UI0.03UI −1 −3

Data TX RX

TX buffer

TX ref. ck. generator

TX PLL DLL

PI

CDR

RX ref. ck. generator

RX PLL

Knowledge of system performance and power sensitivities enables global power optimization

B. Casper, F. O'Mahony, "Clocking Analysis, Implementation and Measurement Techniques for High-Speed Data Links—A Tutorial," TCAS1, Jan. 2009

Page 62: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

62 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets

• Putting it all together

Page 63: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

63 Bryan Casper – Low Power I/O

Server Utilization

Barroso & Holzle, IEEE Computer, Dec 2007

Page 64: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

64 Bryan Casper – Low Power I/O

Energy-Disproportionate Link

Barroso & Holzle, IEEE Computer, Dec 2007

Page 65: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

65 Bryan Casper – Low Power I/O

1.0

0.8

0.6

0.4

0.2

0.0

Normalized Bandwidth Demand

Time

Wasted Energy

Conventional (fixed Bandwidth)

Energy-proportional I/O

65

Page 66: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

66 Bryan Casper – Low Power I/O

Power efficiency improves with adaptive supply/biasing

2.7 3.6

5.0

11.2

5Gb/s 0.68V

10Gb/s 0.85V

15Gb/s 1.05V

20Gb/s 1.2V

Energy Eff. (pJ/bit)

0

2

4

6

8

10

12

TX Driver

TX Ser/Pre

TX Clk

RXFE

RX Clk

Refs: B. Casper, ISSCC ’06 & G. Balamurugan, JSSC 4/08

66

Power Management: Scalable supplies

Page 67: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

67 Bryan Casper – Low Power I/O

65nm Low Power Link Operating Points

1

2

3

5 10 15

No

rm

alized

Scalin

g F

acto

r

Clock Buffer Frequency (Gb/s)

Current Bias

Supply Voltage

Output Swing

Balamurugan JSSC, April 2008

Page 68: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

68 Bryan Casper – Low Power I/O

Benefit of Eliminating Excess BW: Non-linear Efficiency/Performance

10 15 5 1

10

Data Rate (Gb/s)

I/O

En

erg

y E

ffic

ien

cy

(pJ

/bit

)

3.6

5.0

2.7 8‖ FR4

6.5

5.0

3.6

18‖ FR4 Backplane

Balamurugan JSSC, April 2008

Page 69: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

69 Bryan Casper – Low Power I/O

Fast Wake-Up Clocking

Page 70: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

70 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets (?)

• Putting it all together

Page 71: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

71 Bryan Casper – Low Power I/O

Low Power Link “Silver Bullets”?

Optical

Modulation (PAM)

3D Stacking

Page 72: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

72 Bryan Casper – Low Power I/O

Silver Bullet?: Optical

• Claim:

– Optical power is inherently better because no equalization needed

• Reality:

– Most optical power claims only include optical components

• Disregards electrical driver, clocking, recovery, synch., serdes, etc.

– Optical link = Electrical link + optical components in the middle

– Optical/electrical power crossover is likely 1m-5m, depending on rate

Distance

Pow

er

Electrical

Optical

Stay on this portion of the curve and equalization power likely to be small

subset of overall link power (e.g. <5% [O’Mahony,ISSCC2010])

Page 73: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

73 Bryan Casper – Low Power I/O

Electrical

Optical

Apples-Apples Electrical/Optical Channel Comparison

Mother Board

CPU Package Socket

Fiber Optical module

CPU Package Socket

Mother Board

CPU Package Socket

Cable

Connector

CPU Package Socket

Page 74: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

74 Bryan Casper – Low Power I/O

Complete Optical CPU Link

CPU w/ electrical TX

CPU w/ electrical RX

Modulator/VCSEL

Jumpers

Connectors

Modulator Driver

Transimpedance Amp /Limiting Amp (TIA/LA)

Photodetector (PD)

Package/module

Optical cable

Page 75: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

75 Bryan Casper – Low Power I/O

Example Optical Loss Profile (consumer electronics)

*2018+ assumptions

• Optical channel loss is frequency independent

– But the aggregate loss is 100x-1000x!

• VCSEL or MZI based links require large swings (500mV-1V)

• Worst-case received signal can be as low as ~10uA

– Requires extremely sensitive receiver (costs power)

• Full optical link >2x power of electrical at ≤3m distance

1E-5

1E-3

1E-1

Op

tical p

ow

er (

W)

Page 76: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

76 Bryan Casper – Low Power I/O

Silver Bullet?: PAM

• Claim:

– PAM uses less BW resulting in less equalization and lower power

• Reality:

– No inherent performance/power advantage over binary

• Using practical channels and 1E-12 BER

– Equalization and clock recovery more difficult

– PAM receiver more complex

• 4 PAM requires 1.5 samples/bit + decoding

• Binary requires 1 sample/bit

– PAM may have advantages when

• Symbol rate limited due to circuits

• Channel has excess BW

Page 77: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

77 Bryan Casper – Low Power I/O

Max data rate with 1e-12 BER (LE & DFE 4-tap)

0

2

4

6

8

10

12

14

16

18

20

1e-41e-61e-81e-12

Data

Rate

(G

b/

s)

Raw BER

RS(64,48,8) Coding overhead estimated at 100pJ/bit in 65nm

2PAM w/o ECC

4PAM w/o ECC

Page 78: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

78 Bryan Casper – Low Power I/O

Max data rate with small block coding to achieve 1e-12 BER (LE & DFE 4-tap)

0

2

4

6

8

10

12

14

16

18

20

1e-41e-61e-81e-12

Data

Rate

(G

b/

s)

Raw BER

RS(64,48,8) Coding overhead estimated at 100pJ/bit in 65nm

2PAM w/o ECC

4PAM w/o ECC

2PAM w/ ECC

4PAM w/ ECC

Page 79: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

79 Bryan Casper – Low Power I/O

45nm PAM Measurements (Within-package channel)

• Modulation benefits links that have channel BW much greater than circuit BW

• For this example, 2PAM power expected to be higher than 4PAM (at same data rate)

– Due to circuit limitations

Channel Signaling Mode Efficiency Data rate TX swing

MCP

2-PAM 2.3pJ/bit 12.5G 120mV

3-PAM 2.6pJ/bit 18.75G 260mV

4-PAM 2.6pJ/bit 25G 360mV

[Balamurugan, et. al.,ISSCC 2010]

Page 80: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

80 Bryan Casper – Low Power I/O

Silver Bullet?: 3D Stacking

• Claim:

– Stacking minimizes need for high-speed I/O

• Reality:

– Potential to reduce I/O power (within stack) by 10x-100x

• Reduce C & V (CV2)

– Components within stack must be tightly integrated (architecture, process, mechanicals)

– Thermal and power delivery limits applicability

• Primarily applicable for low power stacks

Micron Hybrid Memory Cube

Page 81: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

81 Bryan Casper – Low Power I/O

Example: Stacking DRAM on CPU

• Multiple DRAM stacks on CPU constrain power due to thermals

– DRAM temp limit <100C

– Assumes standard CPU cooling solution

0

20

40

60

80

100

0 4 8CP

U P

ow

er L

imit

(W

)

# DRAM Stacks on CPU

• Primarily applicable for low power CPUs • Micro-channel cooling could change tradeoffs

Page 82: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

82 Bryan Casper – Low Power I/O

Low Power Link “Silver Bullets”?

Optical

Modulation (PAM)

3D Stacking

Each technology may have power advantages for a limited set of applications. However, not a general

solution to solving the Link Power Problem.

Page 83: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

83 Bryan Casper – Low Power I/O

Agenda

• Introduction

• Impact of process scaling

• Active power optimization

– System

– Circuit

• Power management

• Low power silver bullets

• Putting it all together

Page 84: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

84 Bryan Casper – Low Power I/O

Package

HDI/Flex/cable bridge

500µm LGA connector

dieB dieA

Example: 47x10Gb/s Interface

Socket PCB

Page 85: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

85 Bryan Casper – Low Power I/O

Solutions: Interconnect

0GHz 10GHz 30GHz 20GHz

|S21|

(loss)

-40dB

0dB

-20dB

1m twinax w/o connector

40GHz

32AWG stranded conductor

copper foil shield

FEP dielectric

635um jacket

0.5m legacy backplane

85

Page 86: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

86 Bryan Casper – Low Power I/O

Solutions: Connectors

Top-pkg connector (4 signals/mm2)

Page 87: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

87 Bryan Casper – Low Power I/O

1. Modest data rates

2. Forwarded clocking

3. Global circuit sharing

4. Low power clock distribution

5. Resonantly tuned clocking

6. Low swing TX

7. Sensitive RX

8. Simple equalization

9. Calibration and tuning

10. System modeling

Solution: Circuits

• Utilized most suggested low power optimizations

Page 88: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

88 Bryan Casper – Low Power I/O

0.5m flex interconnect 3m twinax cable

45nm CMOS prototype demonstrates industry leading I/O power efficiency with non-traditional interconnects

Low Power Prototype Results

Page 89: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

89 Bryan Casper – Low Power I/O

Low Power Prototype Results

0.00

0.50

1.00

1.50

2.00

2.50

3.00

0 100 200 300

Ene

rgy

Effi

cie

ncy

(p

J/b

it)

Channel Length (cm)

HDI

LCP flex

32AWG -twinax

Link data rate = 10Gb/s

Aggressive power management: •Idle mode is 93% less power than active state •Wake-up from idle <5ns

Page 90: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

90 Bryan Casper – Low Power I/O

8

16

32

64

Legacy FR4interconnect

(multipleconnectors, vias

and sockets)

Short traditionaltopologies withevolutionary

optimizations.>1/4m use cable.

Short flex ormicro-twinax

cables with top-pkgconnector

Active electricalcables w/ 1/2mmicro-twinax

Data

rate

per p

air

(G

b/

s)

90

Demonstrated

Known solution. demonstration targeting 2012.

Solution being researched. demonstration anticipated

by 2015.

Research Roadmap

Target: 1-4pJ/bit (vs. current product=20-40pJ/bit)

Page 91: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

91 Bryan Casper – Low Power I/O

Link Active Power Optimization Key Points

• 1TB/s socket BW needed by 2020

– Power optimize or I/O will require majority of power budget

• Don’t depend solely on process scaling to lower power

– Architecture and circuit will drive energy scaling

• Stay away from bleeding edge

– Channel, process and architecture

• Balanced link design is key to low power

• Optical & stacking promising but limited

• Electrical innovation in circuits and channel fruitful

Acknowledgement: Frank O’Mahony, James Jaussi, Ganesh Balamurugan, Mozhgan Mansuri, Sudip Shekhar

Page 92: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

92 Bryan Casper – Low Power I/O

Related Publications 1

F. O’Mahony, et al., ―A 47×10Gb/s 1.4mW/(Gb/s) parallel interface in 45nm CMOS,‖ IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2010, pp. 156–157.

F. O’Mahony, et al., ―The future of electrical I/O for microprocessors,‖ International Symposium on VLSI DAT, Apr. 2009, pp. 31-34.

G. Balamurugan, et al., ―A scalable 5–15Gbps, 14–75mW low-power I/O transceiver in 65 nm CMOS,‖ IEEE J. Solid-State Circuits, vol. 43, pp. 1010-1019, Apr. 2008

H. Braunisch, et al., ―High-speed flex circuit chip-to-chip interconnects,‖ IEEE Trans. On Advanced Packaging, vol. 31, no. 1, 2008, pp. 82-90.

B. Casper, et al., ―Future microprocessor interfaces: analysis, design and optimization,‖ IEEE Custom Integrated Circuits Conference, Sept. 2007, pp. 479 – 486.

Page 93: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

93 Bryan Casper – Low Power I/O

Related Publications 2

• J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally, and M. Horowitz, ―A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS,‖ IEEE J. Solid-State Circuits, vol. 42, pp. 2745-2757, Dec. 2007.

• H. Hatamkhani, F. Lambrecht, V. Stojanovic, and C.-K. K. Yang, ―Power-centric design of high-speed I/Os,‖ in Proc. Design Automation Conf., 2006, pp. 867-872.

• K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang, ―A 27-mW 3.6-Gb/s I/O transceiver‖, IEEE J. Solid-State Circuits, vol. 39, pp. 602-612 , Dec. 2004.

• G. Balamurugan, J. Kennedy, G. Banerjee, J. E. Jaussi, M. Mansuri, F. O’Mahony, B. Casper, and R. Mooney, ―A scalable 5–15 Gbps, 14–75 mW low-power I/O transceiver in 65 nm CMOS,‖ IEEE J. Solid-State Circuits, vol. 43, pp. 1010-1019, Apr. 2008.

• S. Joshi, J. T.-S. Liao, Y. Fan, S. Hyvonen, M. Nagarajan, J. Rizk, H.-J. Lee, and I. Young, ―A 12-Gb/s transceiver in 32-nm bulk CMOS,‖ in 2009 Symp. VLSI Circuits Dig. Tech. Papers, pp. 52-53.

• B. Leibowitz, R. Palmer, J. Poulton, Y. Frans, S. Li, J. Wilson, M. Bucher, A. M. Fuller, J. Eyles, M. Aleksić, T. Greer, and N. M. Nguyen, ―A 4.3 GB/s mobile memory interface with power-efficient bandwidth scaling,‖ IEEE J. Solid-State Circuits, vol. 45, pp. 889-898, Apr. 2010.

• F. O’Mahony, M. Mansuri, B. Casper, J. E. Jaussi, and R. Mooney, ―A low-jitter PLL and repeaterless clock distribution network for a 20Gb/s link‖, in 2006 Symp. VLSI Circuits Dig. Tech. Papers, pp. 36-37.

• B. Casper, J. E. Jaussi, F. O'Mahony, M. Mansuri, K. Canagasaby, J. Kennedy, E. Yeung, and R. Mooney, ―A 20Gb/s forwarded clock transceiver in 90nm CMOS,‖ in 2006 IEEE ISSCC Dig. Tech. Papers, pp. 90-91.

• J. Montanaro, R. T. Witek, K. Anne, A. J. Black,, E. M. Cooper, D. W. Dobberpuhl, P. H. Donahue, J. Eno, G. W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stephany, and S. C. Thierauf, ―A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor,‖ IEEE J. Solid-State Circuits, vol. 31, pp. 1703–1714, Nov. 1996.

Page 94: Energy Efficient Multi-Gb/s I/O: Circuit and System Design ......Energy Efficient Multi-Gb/s I/O: Circuit and System Design Techniques April 22, 2011 WMED-2011 Bryan Casper, Intel

94 Bryan Casper – Low Power I/O

Related Publications 3

• Intel® Core™ i5-670 Processor http://ark.intel.com/Product.aspx?id=43556 • Intel® Xeon® Processor X5670 http://ark.intel.com/Product.aspx?id=47920 • Intel® Xeon® Processor X7560 http://ark.intel.com/Product.aspx?id=46499 • O'Mahony, F.; Kennedy, J.; Jaussi, J.E.; Balamurugan, G.; Mansuri, M.; Roberts, C.; Shekhar, S.;

Mooney, R.; Casper, B.; , "A 47×10Gb/s 1.4mW/(Gb/s) parallel interface in 45nm CMOS," Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International , vol., no., pp.156-157, 7-11 Feb. 2010

• Horowitz, M.; Alon, E.; Patil, D.; Naffziger, S.; Rajesh Kumar; Bernstein, K.; , "Scaling, power, and the future of CMOS," Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International , vol., no., pp.7 pp.-15, 5-5 Dec. 2005

• Casper, B.; Balamurugan, G.; Jaussi, J.E.; Kennedy, J.; Mansuri, M.; , "Future Microprocessor Interfaces: Analysis, Design and Optimization," Custom Integrated Circuits Conference, 2007. CICC '07. IEEE , vol., no., pp.479-486, 16-19 Sept. 2007

• B. Casper, F. O'Mahony, "Clocking Analysis, Implementation and Measurement Techniques for High-Speed Data Links—A Tutorial," TCAS1, Jan. 2009

• Casper, B.; Jaussi, J.; O'Mahony, F.; Mansuri, M.; Canagasaby, K.; Kennedy, J.; Yeung, E.; Mooney, R.; , "A 20Gb/s Forwarded Clock Transceiver in 90nm CMOS B.," Solid-State Circuits Conference, 2006. ISSCC 2006. Digest of Technical Papers. IEEE International , vol., no., pp. 263- 272, Feb. 6-9, 2006

• Kuhn, K.J.; , "Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS," Electron Devices Meeting, 2007. IEDM 2007. IEEE International , vol., no., pp.471-474, 10-12 Dec. 2007

• F. O'Mahony, B. Casper, M. Mansuri, M. Hossain, ―A Programmable Phase Rotator Based on Time-Modulated Injection-Locking,‖ VLSI symposium 2010