computer architecture: the road ahead clocks are not primarily the result of technology scaling, ......

45
Computer Architecture & Arithmetic Group 1 Stanford University Architecture & Technology: some thoughts on the road ahead M. J. Flynn University of Texas Austin March 17, 2003

Upload: vuongcong

Post on 11-Apr-2018

222 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 1 Stanford University

Architecture & Technology: some thoughts on the road ahead

M. J. Flynn

University of TexasAustin

March 17, 2003

Page 2: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 2 Stanford University

Outline• The road ahead • System, not processor, emphasis• Time: Performance and its limits• Power• Area• Beyond Time x Power x Area: computational integrity, design time and wireless interconnections• The shape of the “new” system

Page 3: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 3 Stanford University

The present and the future of the technology

Projections from the Semiconductor Industry Association

Page 4: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 4 Stanford University

SIA Roadmap - 1Total Transistors per Chip

010002000300040005000

130nm 90nm 65nm 32nm

2002 2004 2007 2013

Technology / Year of First Shipment

No.

of T

rans

isto

rs

(in m

illio

n)

Page 5: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 5 Stanford University

SIA Roadmap - 2On-Chip Clocks, global and local clock rates

0

5000

10000

15000

20000

25000

130nm 90nm 65nm 32nm

2003 2004 2007 2013

Technology / Year of First Shipment

Freq

uenc

y (in

MHz

)

Page 6: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 6 Stanford University

Semiconductor Industry Roadmap

Year 2001 2004 2007 2013Technology generation (nm) 130 90 65 32Wafer size (mm) 300 300 300 450Defect density (per m2) 1356 1356 1356 1116µP die size (mm2) 310 310 310 310Chip Frequency (MHz) 1767 4000 6700 19350MTx per Chip (Microprocessor) 276 552 1204 4424MaxPwr(W) High Performance 130 160 190 251

Semiconductor Technology Roadmap (2001)

Page 7: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 7 Stanford University

Limits on Scaling(at 10-30 nm)

Any effect such as:• Thermal noise effects at low voltages (VDD)• Dielectric material breakdown• Quantum effects

… may limit scaling Beyond scalable planar technology lie 3D technologies.

Page 8: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 8 Stanford University

What this means to Computer Architecture

Tradeoffs in time (performance), power and area (cost). (TPA product)

Page 9: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 9 Stanford University

Message: System not Processor Architecture

• Power not clock rate• System not processor integration• Integration of what?

– Wireless and optical interconnections– Voice and sensor I/O– Crypto, Video, Audio, etc.– Multi user displays, keyboards, sensors, etc.

Page 10: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 10 Stanford University

Performance: Time, Area and Power Tradeoffs

Page 11: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 11 Stanford University

Performance lesson: wires not gates determine delay.

Del

ay (p

s)

0

100

200

300

400

0.7 0.5 0.35 0.25 0.18 0.13

Process Generation (um)

Gate Delay

Interconnect delay forstandard functional unit

Page 12: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 12 Stanford University

Wires and coupling capacitance

Page 13: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 13 Stanford University

High Speed Clocking

Fast clocks are not primarily the result of technology scaling, but rather of circuit/logic techniques:

– Dynamic logic, Wave pipelining type

• Modern microprocessors are increasing clock speed more rapidly than anyone predicted…local clocks (SIA) or hyperclocking.

• But fast clocks do not necessarily increase system performance

Page 14: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 14 Stanford University

Page 15: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 15 Stanford University

CMOS

1

10

100

1000

10000

67 73 83 93 '03

year

freq

uenc

y

CMOS

Page 16: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 16 Stanford University

Bipolar and CMOS

1

10

100

1000

10000

67 73 83 93 '03

year

freq

uenc

y

BipolarCMOSLimit

Bipolar power limit

Page 17: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 17 Stanford University

SIA global clock rates

100

1000

10000

100000

'98 '01 '04 '07 '13

year

freq

uenc

y

global'01SIA '97

Page 18: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 18 Stanford University

SIA local & global clock rates

100

1000

10000

100000

'98 '01 '04 '07 '13

year

freq

uenc

y

globallocal

hyper frequency

Page 19: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 19 Stanford University

Lower clock overhead enables more pipe segments & higher freq.

• Let the total instruction execution without pipelining and associated clock overhead be T

• Let S be the number of segments and then S - k is number of cycles lost due to a pipeline break (S >k)

• Let b = prob of break/instr., C = clock o’head incl skewT

T/S C CT+CS

S-kcycles

b

Page 20: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 20 Stanford University

Performance, issue rate and frequencyperformance is largely determined by m and bf(s-k)

1+ mbf(s-k)1

T/s+C+mo1Perf= m *

d Perf d Perf= 0 = 0d s d m Optimum S (and max freq.)

is largely determined by C

Sopt= f(k) T Similarly optimum mis largely determined by

obC

Page 21: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 21 Stanford University

Architecturally, what’s happened to T, b and C?

• T has increased, probably 25-50%, due to added processor complexity

• b has decreased significantly, probably from .25 to closer to .05, due to better compilers, prediction and cache management.

• C has at least halved due to better clocking.• So Sopt has increased by more than 2x

decreasing the number of F04s/cycle by ½

Page 22: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 22 Stanford University

Optimum Pipelining and ILP

Optimum pipe segment size is determined by clock overhead, C

Optimum issue rate, m, is determined by issue overhead, o

Page 23: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 23 Stanford University

Performance vs ILP

Page 24: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 24 Stanford University

Performance and the memory “wall”

• BUT regardless of clock rate, performance is limited by the predictability & supply of instructions and data operands from memory and cache.

• The access time to memory depends on wire delay which remains constant with scaling.

• Even with significant hardware support (area) unpredicted events (missed branches, etc) force the processor to access memory. At some point limiting the performance.

Page 25: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 25 Stanford University

Power: and the curse of frequency

While Vdd and C decrease, frequency increases at an acceleratingrate thereby increasing power density

As Vdd decreases so does Vth;this increases I leakage and static power

Net: while increasing frequency may not increaseperformance by much it certainly does increase power

Page 26: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 26 Stanford University

Power: the new frontier

• Cooled high power: >100w/ die• High power: 30-100w/ die … plug in supply• Low power: 0.1-2w / die.. rechargeable

battery• Very low power: 1-100mw /die ..AA size

batteries• Extremely low power: 1-100 microwatt/die

and below (nano watts) .. button batteries

Page 27: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 27 Stanford University

Battery Energy & use

1uw5 years (always on)

40mAhbutton

1-10 mw½ year (10-20% duty)

4000 mAh2xAA

400mw-4w

50 hours (10-20% duty)

10,000 mAh

recharageable

powertimeenergy capacity

type

Page 28: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 28 Stanford University

Power is important!freq2

freq1

=P2

P1

3By Vdd and device scaling

• By scaling alone a 4x slower implementation may need only 1/64 as much power.

• Gating power to functional units and other techniques should enable 100MHz processors to operate at O(10-3) watts.

• Goal: O(10-6) watts.

Page 29: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 29 Stanford University

Extremely Low PowerSystem: suppose low power was the performance

metric, rather than cycle time.Very Low power (µW) can be achieved:

– Perhaps sub thresh hold circuits– Clock rate may be limited to 10 MHz (?)– Achieve 1 to 5 year battery life– Lots of area, BUT how to get performance

• Very efficient architecture and segmented powering.• Very efficient software & operating system• Very efficient arithmetic & signal processing• Very effective parallel processing

Page 30: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 30 Stanford University

Area: scaling silicon

Wafer size + die size = costFeature size + die size = performance

Page 31: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 31 Stanford University

The basics of wafer and die size

• Wafers, 30 cm or so in diameter produce (πd2/4)/A dice. Die area, A, is 0.5- 2cm2

about 700 1cm2 dice per wafer.

• Yield (% of good die) is e -ρ A where ρ is the defect density, @ ρA=1; Yield is 37%ρA =.2; Yield is 80%

Page 32: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 32 Stanford University

The basics of wafer fab

• A 30 cm, state of the art (ρ = 0.2-.5) wafer fab facility might cost $3B and require $5B pa to be profitable…that’s at least 5M wafer starts and almost 5B 1cm die /per year. At O($1000) per wafer that’s $1/die with, say a die (f=0.1u) has 100-500 M tx /cm2.

• So how to use them?

Page 33: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 33 Stanford University

Area and Cost

Is efficient use of die area important? Is processor cost important?NO − server processors (cost dominatedby memory, power/cooling, etc.)YES – “client”, everything in the middle.NO – small, embedded die which are

package limited. (10-100Mtx/die)

Page 34: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 34 Stanford University

Area

Indeed, looking ahead how to use 1-5 billion transistors/cm2 die?

– With performance limited by power, memory & program behavior servers use multiple processors per die, limited by memory.

– For client systems either “system on a chip” or hybrid system are attractive.

– Low cost embedded systems need imaginative packaging and enhanced function.

Page 35: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 35 Stanford University

Hybrid System

• Core, very low power units comprise limited storage, limited processing, cryptoand wireless: “watch” type package

• Multi user displays, keyboards, voice entry and network access can be implemented with higher power.

• Wireless, secure interconnects complete the system, but with major power management implications.

Page 36: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 36 Stanford University

Beyond T x A x P …other dimensions

• Computational integrity and RAS• Design time and FPL• Configure ability / system connections and

wireless technology.

Page 37: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 37 Stanford University

Computational Integrity

Design for– Reliability– Testability– Serviceability– Process recoverability– Fail-safe computation

Page 38: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 38 Stanford University

CAD productivity limitations favor FPL

1

Logi

c Tr

ansi

stor

s pe

r Chi

p(K

)

Prod

uctiv

ityTr

ans.

/Sta

ff - M

onth

10

100

1,000

10,000

100,000

1,000,000

10,000,000

10,000

100,000

1,000,000

Logic Transistors/ChipTransistor/Staff Month

58%/Yr. compoundComplexity growth rate

21%/Yr. compoundProductivity growth rate

Source: S. Malik, orig. SEMATECH

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2003

2001

2005

2007

2009

xx xx x

x

x

2.5µ

.10µ

.35µ

100,000,000

10,000,000

1,000

100

10

Page 39: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 39 Stanford University

Wireless technology

• Essential to realizing efficient client and package limited systems

• Can be power consumptive, requires very efficient channel management.– Focused, adaptive radiation patterns– Digital management of RF– Probably better suited to MHz rather than GHz

RF… in conflict with antenna goals– Many issues: security, multi signal formats, etc.

Page 40: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 40 Stanford University

The “New” Server SystemServer Processor

– Cost insensitive – High power, fast cycle time– Multiple (order of 10’s to 100) processors on

single die, together with portion of memory space– MIMD extensions via network– Base processor using ILP with SIMD extensions– Very high integrity

Page 41: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 41 Stanford University

Client System• Invisible core... integrated into

– Appliances– Communications devices– Notebooks – “Watches”, wearable cores

• Always secure and available• Very low power• Integrated wireless/networked

Page 42: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 42 Stanford University

The “New” Client SystemClient

– “System on a chip,” integrated, hybrid or embedded

– Many frequency- power design points from 100 µw with cycle time 2 ns to 1 µw with cycle of 200 ns

– Small die low cost– Several base processors

• Core and signal processors– High integrity– Signal processors VLIW/SIMD

Page 43: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 43 Stanford University

System Design: The CAD Challenge

• IP’s – not IC’s. Amalgamate existing designs from multiple sources– Validate, test, scale, implement

• Buses and Wires– Placement and coupling, wire length,

delay modules… some role for optics?

• Effective floorplanning

Page 44: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 44 Stanford University

System Design: Other Challenges

• Really good power management• Low power wireless interconnections• Integrity – Security• Scene/ pattern recognition• Software efficiency: Op Systems, compilers• Software productivity: rules based, analysis,

heuristic, etc.

Page 45: Computer Architecture: the road ahead clocks are not primarily the result of technology scaling, ... the new frontier • Cooled high ... Computer Architecture & Arithmetic Group 35

Computer Architecture & Arithmetic Group 45 Stanford University

Summary

• Processor design with deep sub-micron (f < 90nm) technology offers major advantages: 10x speed or 10-6 power and 100x circuit density. Indeed, SIA projections have consistently underestimated the future.

• But the real challenge is in system not processor design: new system products and concepts, ip management and design tools.