many voltage domains enabling an energy efficient graphics...

37
Many voltage domains enabling an energy efficient graphics processor in 14nm Rin kle Jain, Pascal Meinerzhagen, Vivek De October 19 2018 Radio Circuits Integration Lab, Intel Corporation Hillsboro, OR Power Supply on Chip Workshop 2018

Upload: others

Post on 12-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Many voltage domains enabling an energyefficient graphics processor in 14nm

Rinkle Jain, Pascal Meinerzhagen, Vivek De

October 19 2018

Radio Circuits Integration Lab, Intel Corporation

Hillsboro, OR

Power Supply on Chip Workshop 2018

Page 2: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Outline

State of the Art

Fine grained DVFS: Costs and benefits

Co-design with load

Summary and Conclusion

2/ 24

Page 3: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

The SoC Side of the Story - the FIVR

[Burton et.al APEC ’14] [Kurd et.al ISSCC ’14]

Faster state transitions by 25%, higher performance per watt

Overall idle power slashed by 20x, battery life improvement by > 50%

BOM savings helped all segments

PSoC 2018 Rinkle Jain 3 / 24

Page 4: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

The SoC Side of the Story - the FIVR

Segment Key Challenge What helped the mostServers Bump Imax Iin reductionLaptops Board Area, Iccmax High bandwidth/frequency

Desktops Performance High bandwidth

PSoC 2018 Rinkle Jain 3 / 24

Page 5: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

The Platform Perspective

PSoC 2018 Rinkle Jain 4 / 24

Page 6: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

The Platform Perspective

SoCs inherently require several voltage rails

Input rail consolidation simplifies power delivery significantlyPSoC 2018 Rinkle Jain 4 / 24

Page 7: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

Loadline improvements

Translates to battery life, area and/or performance benefits

PSoC 2018 Rinkle Jain 5 / 24

Page 8: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

MIM-based Switched Capacitor VR

[R.Jain et al. JSSC ’14]

[R.Jain et al. JSSC ’15]

Regulation, few ns response, low area overhead, 880mA/mm2

PSoC 2018 Rinkle Jain 6 / 24

Page 9: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Fully integrated VR Technologies

Proposed Control Law for Adaptive Widths

fsw < FthW = bW ′

a implies higher efficiency at W’ (W’=W/n)

fsw is a good indicator of low voltage and light load conditions

PSoC 2018 Rinkle Jain 7 / 24

Page 10: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Current density & domain area-power trends

0 0.5 1 1.5 2 2.5 3 3.50

1

2

3

4

5

Load area (mm2)

Max

load

cur

rent

(A)

4A/

mm

2

2A/m

m2

1A/mm2

0.6A/mm2

SCVR 22nm

PSoC 2018 Rinkle Jain 8 / 24

Page 11: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Current density & domain area-power trends

0 5 10 150

5

10

15

20

25

30

Load area (mm2)

Max

load

cur

rent

(A)

HSW

4A/m

m2

1A/mm2

2A/mm2

0.6A/mm2

1A/mm2

2A/mm2

0.6A/mm2

FIVR 22nm

PSoC 2018 Rinkle Jain 8 / 24

Page 12: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Current density & domain area-power trends

0 5 10 150

5

10

15

20

25

30

Load area (mm2)

Max

load

cur

rent

(A)

HSW

4A/m

m2

1A/mm2

2A/mm2

0.6A/mm2

HSW

4A/m

m2

1A/mm2

2A/mm2

FIVR 22nm

0.6A/mm2

SCVR 22nm

PSoC 2018 Rinkle Jain 8 / 24

Page 13: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Current density & domain area-power trends

22nm

22nm

o

o

PSoC 2018 Rinkle Jain 8 / 24

Page 14: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Buck converter with thin-film magnetics

H. Krishnamurthy et al. JSSC 2017

PSoC 2018 Rinkle Jain 9 / 24

Page 15: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Current density & domain area-power trends

22nm

22nm

o

o

o

PSoC 2018 Rinkle Jain 10 / 24

Page 16: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

14nm Graphics ProcessorI/O

I/O

GPU

Paging Cache

SS2SS1

SS0

EUs

PLL

EUs

8.0 x 8.0 mm2

Sampler

L2$Sam

plerL2$

Sampler

L2$

GPU

Interface

FFs, Cmd Stream

er, L3 Data $

Paging $ Ctrl

Paging $

TLB

Pwr Ctrl

Config

2 MB

Clk Sync

PLLPLL

CLKREF

System Agent

Gen9LP GPU

Bus Interface

& Ctrl

PCIe

FPGA

14nm Prototype

IVR D

ebug & I/O

SS0SS1

SS2÷ JTAG Ctrl JTAG

CLKGPUCKLSAJTAG Config & Debug

2 MB

PCI Bridge

Host PC

GPU data

Pwr

Cfg

New EUs

New EUs

EU EU EU

EU EU EU

New execution unit (EU) has a dedicated integrated voltage regulator

PSoC 2018 Rinkle Jain 11 / 24

Page 17: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Enabling DVFS

Switched Capacitor Voltage RegulatorSwitched Capacitor VR (SCVR)

APR friendly design, 6 distributed tiles with area overhead of 1.1% (680umx520um)

PSoC 2018 Rinkle Jain 12 / 24

Page 18: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Basic Power Gate (PG)

RF

RF

RF

RF

PG

VIN

VOUT

PG

Modified EU: Digital LDO (DLDO)

RF

RF

RF

RF

PGPG Driver

Ctrl Controller

DLDO PG

Driver

PPGa

PPGd

SPG

VIN

VOUT

PG

VUDDAC

Digital PGDriver

PPG

{ {Area overhead: 0.5%

Digital LDO demonstrated in S.Kim et al. JSSC ’15

PSoC 2018 Rinkle Jain 13 / 24

Page 19: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Distributed Implementation

Adaptive Width Simulations

Fast and stable adaptation

PSoC 2018 Rinkle Jain 14 / 24

Page 20: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Distributed Implementation

Adaptive Width Simulations

Fast and stable adaptation

PSoC 2018 Rinkle Jain 14 / 24

Page 21: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Distributed Implementation

Adaptive Width Simulations

Fast and stable adaptation

PSoC 2018 Rinkle Jain 14 / 24

Page 22: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

EU Other ...

VGPU

GPU

IVR

VIN

EU Other ...

VEU

GPU

IVRVOther

IVR

VIN

LOWLOW

HIGHEUOther

DEMAND

FGPU

VGPU

PLL PLL

LOW HIGH LOW

VLOW

VHIGH

Slow Fast

Shared voltage railAll blocks at high V/F PLL re-lock time

LOWLOW

HIGHEUOther

DEMAND

FEUFOther

LOW HIGH LOW

Slow Fast

VOther

VEU

VLOW

Dedicated railsFast EU clock No PLL re-lock

Fine grained DVFS

PSoC 2018 Rinkle Jain 15 / 24

Page 23: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

EUCLK2X/CLK1X

OtherCLK1X

BYPASS/IVR IVR

VIN

GPU

...

SYNC

10

CLK2X

CLKEU

EU Turbo

÷2

Level Shifter

...

CLK1X

÷2

EU Turbo

EUs at 2X clock (CLK2X) as needed

Sync logic at 1X/2X clock boundaries

IVR for fast turbo/un-turbo transitions

PSoC 2018 Rinkle Jain 16 / 24

Page 24: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

LOWLOW

STALLEUOther

DEMAND

FEUFOther

LOW

VLOW

Slow Gated

VGPU

LOWLOW

STALLEUOther

DEMANDLOW

Slow Gated

VRET

FEUFOther

VLOW

VOther

VEU

VLOW

EU Other

CLAMP IVR

VIN

GPU

...

VEU VOther

Retentive Sleep

EU Other

VIN

GPU

...

VGPU IVR

PSoC 2018 Rinkle Jain 17 / 24

Page 25: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Independent V/F Domains at SoC Level

SCVR

GPULow

Demand

CPUHigh

Demand

VIN

SCVR BYPASS

Energy Savings at Iso-Performance: Max 77%, avg 62%

05

1015202530354045

0 1 2 3 4 5 6 7 8G

PU E

nerg

y/O

pera

tion

[µJ]

Performance: Ops/Sec [Normalized]

SCVR 2:1

25℃

Max 58%

Avg 33%

PSoC 2018 Rinkle Jain 18 / 24

Page 26: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

EU Turbo Performance Gain

EUCLK1X/CLK2X

OtherCLK1X

DLDO/BYPASS DLDO

VIN

GPU

SYNC

EU Turbo

EUCLK1X

OtherCLK1X

VIN

GPU

Baseline: Shared V/F

Ideal IVR

5

10

15

20

25

30

0 1 2 3 4 5 6 7 8

Ener

gy/O

pera

tion

[µJ]

Performance: Ops/Sec [Normalized]

100% EU Utilization, 12 EUs, 70℃

Energy Savings at Iso-Performance: Max 32%, avg 29%

PSoC 2018 Rinkle Jain 19 / 24

Page 27: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Retentive Sleep: Leakage Savings

02468

1012141618

0 0.25 0.5 0.75 1

I SS(6

EU

s) [m

A]

VOUT [V]

25℃ VRET≈0.5V

VIN=0.8V

VIN=1.15V

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

EU2 EU3 EU4 EU5 EU6

EU V

MIN

Redu

ctio

n [%

]

25MHz 100MHz

PSoC 2018 Rinkle Jain 20 / 24

Page 28: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

ConclusionsComplete 14nm GPU features energy efficiency techniques

Hybrid IVR (DLDO + SCVR) for independent V/F domains

• Within GPU: EU turbo provides up to 68% performance gain, 50% EU area reduction, or 32% energy savings

• SoC Level: SCVR enables up to 77% GPU energy savings

Additional DLDO usages

• Retentive sleep: Reduces EU leakage by 63%

• VMIN per block: Up to 2.4% EU VMIN reduction

PSoC 2018 Rinkle Jain 21 / 24

Page 29: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 30: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 31: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 32: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 33: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 34: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 35: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Summary and Conclusion

Summary

Power delivery: VRs inside SoC is fully integrated(>100MHz, >2:1)

Buck for mainstream SoC, SC for relatively small rails

Reduced area => less L, C; worse interconnects with every node; efficiencyreduces

Need: Better passives utilization, current density, die stacking

Power management: Inductor less approach for down-scalability

Distributed, APR-friendly designs desirable for SoC integration and wideadoption

Need down-scalable high current density/die stacking for point-of-loadVRs!

Need battery/source-to-SoC VR integration(3-5 MHz bandwidth,discrete passives)

PSoC 2018 Rinkle Jain 22 / 24

Page 36: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Thank you for your attention!

PSoC 2018 Rinkle Jain 23 / 24

Page 37: Many voltage domains enabling an energy efficient graphics ...pwrsocevents.com/wp-content/uploads/2018... · 1 SS 0 EUs PLL EUs 8.0 x 8.0 mm 2 Sampler L 2 $ Sampler L 2 $ Sampler

Acknowledgement

Vaibhav Vaidya

Tri Hyunh

Chung-Ching Peng

George Matthew

Carlos tokunaga

Don Gardener

Kaladhar Radhakrishnan

Paul Fischer

Kevin O’brien

Muhammad Khellah

Jim Tschanz

Ravi Mahajan

Jonathan Douglas

Takao Oshita

This research was, in part, funded by the U.S. Government (DARPA). The views and conclusions

contained in this document are those of the authors and should not be interpreted as representing

the official policies, either expressed or implied, of the U.S. Government.

PSoC 2018 Rinkle Jain 24 / 24