2 explaining the gap between asic and custom power: a custom perspective andrew chang cadence design...

54

Upload: angelo-crowther

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory
Page 2: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

2

Explaining The Gap Between ASIC and Custom Power: A Custom Perspective

Andrew Chang Cadence Design Systems*

William J. DallyComputer Systems Laboratory

Stanford University

* Work done while Author was at Stanford

Page 3: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

3

Design Tradeoffs: Power vs. Performance

1. Move to More Energy Efficient

Operating Point

More Energy Efficient w/ Custom

Power

2

1 3

Performance

Page 4: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

4

Design Tradeoffs: Power vs. Performance

1. Move to More Energy Efficient

Operating Point

More Energy Efficient w/ Custom

2. Trade Performance for

Power

Larger Range w/ Custom

Power

2

1 3

Performance

Page 5: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

5

Design Tradeoffs: Power vs. Performance

1. Move to More Energy Efficient

Operating Point

More Energy Efficient w/ Custom

2. Trade Performance for

Power

Larger Range w/ Custom

3. Move to Different

Power vs. Performance Curve

More Architectural Choice with

Custom

Power

2

1 3

Performance

Page 6: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

6

Dynamic Power Dissipation

Pdyn = CVdd2 f = Ecircuit f

Reduce Vdd

Static, dynamic, voltage islands, power gating

Reduce and/or f Clock gating, block enables, bus encoding, glitch identification

and elimination

Reduce Ecircuit

Engineer interconnects, increase circuit efficiency, subthreshold circuit techniques

Page 7: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

7

Static Power Dissipation

Pstatic = Vdd (Isub + Iox )

Isub = K1 W e -Vt/ nV

(1- e –Vgs

/V)

Iox = K2 W (Vgs/tox)2 e – tox

/ Vgs

With K1, K2, n, and experimentally determined

Reduce Vdd Static, dynamic, voltage islands, power gating

Increase effective Vt Substituting high-threshold devices, transistor stacking, static and active

body bias

Reduce effective W Reduce number and size of devices in design

Page 8: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

8

Which Design Is More Efficient?

0.7um CMOS 173MHz chip w/ 460K T’s

0.18um CMOS 10kHz chip w/ 640K T’s

Page 9: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

9

Which Design Is More Efficient?

0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1V

0.18um CMOS 10kHz chip w/ 640K T’s

Vdd (max) = 1.8V, Vdd (min) = 0.18V

Page 10: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

10

Which Design Is More Efficient?

0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1VPower = 845mW

0.18um CMOS 10kHz chip w/ 640K T’s

Vdd (max) = 1.8V, Vdd (min) = 0.18VPower = 1.6mW

Page 11: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

11

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 12: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

12

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 13: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

13

Defining Ebit

Ebit = Cbit * Vdd2

Cbit = 4 * 2 fF/um * Wmin

Energy needed to write a 1-bit SRAM cell Approximates minimum useful capacitanceThe ratio of Ebit to the energy for a range of circuits

remains largely constant with technology scaling

Page 14: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

14

Technology Scaling for Ebit

is a normalized unit of distance equal to the M1 pitch

Technology

0.5m

0.18m

58 18

5.7 18

m2

Page 15: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

15

Technology Scaling for Nand2

is a normalized unit of distance equal to the M1 pitch

4 = 2.24m

8 = 4.48m

NAND2AB YN

A

BYN

Page 16: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

16

Applying Ebit

Energy 180nm 130nm 90nm 65nm

Ebit (fJ) 3.3 1.4 0.5 0.36

Relative 180nm 130nm 90nm 65nm

Ebit 1 1 1 1

1b FO4 ~10 ~10 ~10 ~10

1b SP-SRAM 0.3-7 0.3-7 0.3-7 0.3-7

1b RF 4-20+ 4-20+ 4-20+ 4-20+

1b DFF 20-30+ 15-30+ 10-30+ 10-30+

1b Nand2 11-30 (typ 19) 5-30 (typ 14) 5-30 (typ 14) 5-30 (typ 14)

Move 1b 1000 ~100 ~100 ~100 ~100

Move 1b 1.5mm 268 367 467 714

Page 17: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

17

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 18: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

18

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 19: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

19

Design Style: Custom

NVIDIA GeForceFX Intel Pentium-4

Design Style: ASIC400MHz – 125M Transistors 2600MHz – 55M Transistors

Effect of Architecture

Page 20: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

20

Design Style: Custom

NVIDIA GeForceFX Intel Pentium-4

Design Style: ASIC400MHz – 125M Transistors~20 Watts

2600MHz – 55M Transistors~60 Watts

Effect of Architecture

Page 21: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

21

Effect of Architecture ASIC Architecture: 6x Efficiency

Design Style: Custom

NVIDIA GeForceFX Intel Pentium-4

Design Style: ASIC400MHz – 125M Transistors~20 Watts: 10GFlops & 13 GBs

2600MHz – 55M Transistors~60 Watts: 5GFlops & 5 Gbs

Page 22: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

22

Custom Circuits: 9x (7x) Efficiency

Design Style: Custom

NVIDIA GeForceFX Intel Pentium-4

Design Style: Custom400MHz – 125M Transistors~3 Watts: 10GFlops & 13 GBs Vdd = 0.65V

2600MHz – 55M Transistors~60 Watts: 5GFlops & 5 Gbs Vdd = 1.3V

Page 23: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

23

Combined Architecture and Circuits40x+ Improvement but 1.5 Years vs. 3+ Years

Design Style: Custom

NVIDIA GeForceFX Intel Pentium-4

Design Style: Custom400MHz – 125M Transistors~3 Watts: 10GFlops & 13 GBs Vdd = 0.65V

2600MHz – 55M Transistors~60 Watts: 5GFlops & 5 Gbs Vdd = 1.3V

Page 24: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

24

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 25: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

25

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 26: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

26

ASIC vs. Custom

ASIC Methods Provide only coarse-grain control 100K+ gates,

but require much less effort and historically scale with complexity

Custom Methods Offer fine-grain control individual transistors &

gates, but require large effort and scale poorly with complexity

Exploits Design StructureExploits Circuit Techniques

Page 27: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

27

Custom Methods EmphasizeFine-Grain Manual Control + Custom Library

Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual

Specific

ASIC Simple Manual/Automated Automated Automated Automated Automated

Generic Automated w/ Hints

Page 28: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

28

Custom Methods EmphasizeFine-Grain Manual Control + Custom Library

Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual

Specific

ASIC Simple Manual/Automated Automated Automated Automated Automated

Generic Automated w/ Hints

Operation and Performance Characterized

for the Specific Case

Page 29: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

29

ASIC Methods SubstituteCoarse-Grain Control

Automation + Generic Library

Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual

Specific

ASIC Simple Manual/Automated Automated Automated Automated Automated

Generic Automated w/ Hints

Page 30: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

30

ASIC Methods SubstituteCoarse-Grain Control

Automation + Generic Library

Design Gate Library Floorplanning/ Coarse Detailed Coarse Detailed Style Partitioning Placement Placement Routing RoutingCustom Complex Manual Manual Manual Manual Manual

Specific

ASIC Simple Manual/Automated Automated Automated Automated Automated

Generic Automated w/ Hints

Operation and Performance Characterized

for the Typical/Generic Case

Page 31: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

31

ASIC Focus on 100K+ GatesLost Opportunities to Exploit Structure

Designs reuse similar basic building blocks Building blocks: 1-10K-gates not 100K+ gate

64-bit adder 1K-gates64x64 rf 2K-gates 64x64 multiplier 20K-gates

Opportunities to exploit these structures lost when design is viewed in large chunks

Page 32: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

32

Different Architectures Similar Building Blocks

LC LC LC

LC

LCLC

LC

LC LC

EX RF SRAM XCVRS

LC

Bus

Bank 1 Bank 0

CLST 0CLST 1CLST 2

CLST 0CLST 1CLST 2

NIF/ROUTER

MEMORY SWITCH

CLUSTER SWITCH

EMI

LTLB

1998 “MAP” 64b Microprocessor - 5M T’s(MIT/Stanford)

EX RF SRAM XCVRS Bus

LC

LCLCLC

LC

2002 “Imagine” 32b Stream Processor - 22M T’s(Stanford)

Cluster1

Cluster0

Cluster3

Cluster2

Cluster5

Cluster4

Cluster7

Cluster6

Microcontroller

Page 33: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

33

Significant Structure ExistsWithin 100K-gates

LC LC LC

LC

LCLC

LC

LC LC

LC

LC

LCLCLC

LC

EX RF SRAM XCVRS Bus

EX RF SRAM XCVRS Bus

Bank 1 Bank 0

CLST 0CLST 1CLST 2

CLST 0CLST 1CLST 2

NIF/ROUTER

MEMORY SWITCH

CLUSTER SWITCH

EMI

LTLB

1998 “MAP” 64b Microprocessor - 5M T’s(MIT/Stanford)

2002 “Imagine” 32b Stream Processor - 22M T’s(Stanford)

Cluster1

Cluster0

Cluster3

Cluster2

Cluster5

Cluster4

Cluster7

Cluster6

Microcontroller

Page 34: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

34

Energy of 100K-gate Equivalent

ASIC (N2) = 1400K Ebits (typ)

Custom Logic = 424K Ebits*

SRAM (small) = 1085K Ebits

SRAM (med) = 155K Ebits

SRAM (large) = 50K Ebits

*Based on data extracted from Intel McKinley

Page 35: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

35

Exploiting Circuit Techniques

Custom circuits more efficient Reduced parasitics 1.7x circuit techniques and flops 1.4x libraries 1.4x due to engineering interconnects

Subthreshold Circuits Low Performance but ultra-low powerRequires Architecture, Gates, Memories, CAD

Tools

Page 36: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

36

Relating Power to PerformanceCV/I, Idsat, tFO4

Idsat = K3 Leff -0.5 tox-0.8 (Vgs - Vt)1.25

tFO4 = K4 [Ceff Vdd /Idsat] (K4 ~ 13.5)

Page 37: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

37

Relating Power to Performance Relating Vdd and Vt to tFO4

Idsat = K3 Leff -0.5 tox-0.8 (Vgs - Vt)1.25

tFO4 = K4 [Ceff Vdd /Idsat] (K4 ~ 13.5)

Page 38: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

38

Relating Power to PerformanceCorrelation to Reported Foundry Data

Technology NodeCV/I est

(ps)CV/I reported

(ps)tFO4 est

(ps)

Foundry A 180-nm 3.94 3.70 53

Foundry A 130-nm 2.55 2.17 34

Foundry A 90-nm 1.85 2.04 25

Foundry A 65-nm 1.45 1.00 20

Idsat = K3 Leff -0.5 tox-0.8 (Vgs - Vt)1.25

tFO4 = K4 [Ceff Vdd /Idsat] (K4 ~ 13.5)

Page 39: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

39

Achievable Power Improvement (Assuming 50/50 split of Logic and Memory)

Technique TypeCustom vs.

ASIC Energy Type

Circuit Styles and Flops

Dynamic

1.7 0.815 Logic

Libraries + Vdd

Scaling1.4 0.855 Logic

SRAM Circuits 2 0.95 SRAM

Interconnect + Vdd

Scaling1.4 0.855 Inter-connect

Page 40: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

40

Achievable Power Improvement(Assuming 50/50 Split of Logic and Memory)

Technique TypeCustom vs.

ASIC Energy Type

Bit Encoding

Dynamic

1 0.84 Inter-connect

Clock Gating 1 0.84 Chip

Frequency Scaling 1 0.5 Chip

Subthreshold Circuits

N/A 0.062 Chip

Page 41: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

41

Achievable Power Improvement(Assuming 50/50 Split of Logic and Memory)

Technique TypeCustom vs.

ASIC Energy Type

Vdd Scaling

Static

1 0.79 Chip

MT-CMOS 1 0.5 Chip

Stacking and input state vector

1.4 0.7 Chip(typically

only one of these three is

applied)

Body Bias 2 0.5

Supply Gating 10 0.1

Page 42: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

42

Achievable Power ImprovementAssuming 50/50 Split of Logic and Memory

Type Tech ASIC

(Custom)Tech

ASIC (Custom)

Net Dynamic

130-nm

45% (32%)

90-nm

28%(20%)

Net Static 8% (4%) 20%(10%)

Total53%

(36%)48%(30%)

130nm uP assumes 80% Dynamic and 20% Static 90nm uP assumes 50% Dynamic and 50% Static

Page 43: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

43

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 44: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

44

Talk Outline

Normalized Metric: Ebit

Effect of Architecture ASIC vs. Custom

Building BlocksAchievable Energy Efficiency

16b 1024 FFT Example Answer to “Which Design is More Efficient”

Page 45: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

45

16b 1024 point FFT

Generally, k N log N operations (complex multiplies) with pre-computation

Radix-2, Radix-4 etc… implementations

Decimation in time and/or decimation in Frequency

Page 46: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

46

Range of Implementations

MIT FFT (2005) 0.18um CMOS, 628K T’s, 10KHz: Architecture and subtheshold circuits, 180mV

operation Spiffee (1999)

0.7um CMOS, 460K T’s, 173MHz: Cached FFT Architecture and algorithm, 1.1V operation

SA-1100 (1999) 0.35um CMOS, 2.6M T’s, 74MHz: Commercial embedded processor, Custom

Circuits, 1.5V operation Imagine (2003)

0.15um CMOS, 22M T’s , 232MHz: Streaming Media Processor, tiled standard cells, 1.2V operation

Stratix IS25F627C8 (2005) 0.13um CMOS, 3.9K logic elements, 123K memory bits, 24 DSP blocks, 272MHz: Commercial FPGA Co-processor,

Intel P4 (2003) 0.13um CMOS, 3GHz, SSE: Commerical General Purpose Processor, Custom

Circuits, 1.5V operation TI ‘C6416 (2003)

0.13um CMOS, 720MHz: Commercial Digital Signal Processor

Page 47: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

47

Ebit Energy 16b 1024 point FFT

Design Fab Vdd MHz mW Cycles

MIT FFT 180 1.8 0.01 1.6 95

Spiffee 700 3.3 173 845 5190

SA-1100 350 2 74 39 31500

Imagine 150 1.5 232 4000 3708

Stratix 130 1.3 275 884 1291

Intel P4 130 1.2 3000 51200 71680

TI 'C6416 130 1.2 720 1200 6526

Page 48: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

48

Ebit Energy 16b 1024 point FFT

DesignEDP

(rel norm)

Ebit

(fJ) Efft (nJ)Normalized to

Ebit (1e6)EnergyRatio

MIT FFT 143 3.3 154 47 1

Spiffee 1 91 25350 277 6

SA-1100 283 4.2 16601 3953 85

Imagine 148 2.2 63931 29726 637

Stratix 24 1.4 4149 2964 64

Intel P4 12548 1.4 1E+06 873813 18591

TI 'C6416 27 1.4 10877 7769 166

Page 49: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

49

Which Design Is More Efficient?

0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1VPower = 845mW

0.18um CMOS 10kHz chip w/ 640K T’s

Vdd (max) = 1.8V, Vdd (min) = 0.18VPower = 1.6mW

Page 50: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

50

Which Design Is More Efficient?Depends on the Metric!

0.7um CMOS 173MHz chip w/ 460K T’sVdd (typ) = 3.3V, Vdd (min) = 1.1VPower = 845mWEDP 143x better

0.18um CMOS 10kHz chip w/ 640K T’s

Vdd (max) = 1.8V, Vdd (min) = 0.18VPower = 1.6mWAbsolute energy 6x better

Page 51: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

51

Summary

Normalized metric – Ebit - enables meaningful comparisons across designs and technologies

Custom designers can exploit a wide range of optimizations: enabling architecture with circuits and circuits with Architecture

Custom designs can readily achieve a 3x advantage in energy with the potential for over 10x

Selective application of custom techniques and automated support for performance characterization at specific instead of generic operating points can enable ASIC designers to begin to bridge this Power Gap.

Page 52: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

52

Back-Up Slides

Page 53: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

53

ASIC Rely on General Optimization TechniquesFocus - Improve the Average Case

Partitioning: Hyper-graph - min-cut, ratio cut Solutions: move-based, geometric & combinatorial forms, clustering

Hypergraph

H(V,E) E = { e1, e2….} nets

Circuite1

e3

e4

e5

e6

e7

e8V1 V3

V4

V5

V2

e2

e2

V3

V4

e6

e7

e4

e5

e8

e3Vertex & Edge weights

used to encode costs

V1

V2

V5e1

Page 54: 2 Explaining The Gap Between ASIC and Custom Power: A Custom Perspective Andrew Chang Cadence Design Systems* William J. Dally Computer Systems Laboratory

54

Designs with Structure Do Not Exhibit Average Characteristics

64b Multiplier (half-array)

Clear Disparity in Resource Usage

Routing

Density