computing without processors thesis proposal

60
Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Nec Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems

Upload: breanna-hampson

Post on 02-Jan-2016

28 views

Category:

Documents


3 download

DESCRIPTION

Computing Without Processors Thesis Proposal. Mihai Budiu July 30, 2001. Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems. This presentation uses TeXPoint by George Necula. Four Types of Research. Solve nonexistent problems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computing Without Processors Thesis Proposal

Computing Without ProcessorsThesis Proposal

Mihai Budiu July 30, 2001

This presentation uses TeXPoint by George Necula

Thesis Committee:Seth Goldstein, chair

Todd Mowry Peter Lee

Babak Falsafi, ECENevin Heintze, Agere Systems

Page 2: Computing Without Processors Thesis Proposal

2

Four Types of Research

• Solve nonexistent problems

• Solve past problems

• Solve current problems

• Solve future problems

Page 3: Computing Without Processors Thesis Proposal

3

The Law

(source: Intel)

Page 4: Computing Without Processors Thesis Proposal

4

The Crossover Phenomenon

time

technology

Page 5: Computing Without Processors Thesis Proposal

5

Example Crossover

time

DRAM

CPU

1980

caches

access speed (ns)

no caches

200

Page 6: Computing Without Processors Thesis Proposal

Trouble Aheadfor

Microarchitecture

Page 7: Computing Without Processors Thesis Proposal

7

Signal Propagation

time

now

mmdie size

distancein 1 clock

20

Page 8: Computing Without Processors Thesis Proposal

8

Reliability & Yield

time

defects/chip

tolerable

new process

occurring

now

Page 9: Computing Without Processors Thesis Proposal

9

Energy

timenow

100W

CPU consumption

thermal dissipation

power

Page 10: Computing Without Processors Thesis Proposal

10

Instruction-Level Parallelism (ILP)

time

fetch

commit

instructions

now

Page 11: Computing Without Processors Thesis Proposal

11

Premises of this Research

• We will have lots of gates– Moore’s law continues– Nanotechnology

• Contemporary architectures do not scale

Page 12: Computing Without Processors Thesis Proposal

12

Outline

• Motivation

• ASH: Application-Specific Hardware

• The spatial model of computation

• CASH: Compiling for ASH

• Evolutionary path

• Conclusions

• Future work

Page 13: Computing Without Processors Thesis Proposal

13

ASH Application-Specific Hardware

Reconfigurablehardware

HLL program

Compiler

Circuit

Page 14: Computing Without Processors Thesis Proposal

14

ASH: A Scalable Architecture-- Thesis Statement --

Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture.

We can provide scalable compilers for translating high-level languages into hardware.

Page 15: Computing Without Processors Thesis Proposal

15

Exampleint f(void){ int i=0, j = 0;

for (; i < 10; i++) j += i;

return j;}

Page 16: Computing Without Processors Thesis Proposal

16

Outline

• Motivation

• ASH: Application-Specific Hardware

• The spatial model of computation

• CASH: Compiling for ASH

• Evolutionary path

• Conclusions

• Future work

Page 17: Computing Without Processors Thesis Proposal

17

• Build reconfigurable hardware using nanotechnology

Huge structures

ASH and Nanotechnology

• Low Power: 1010 gates use less than 2 W• Low cost: nanocents/gate• High density: 105x over CMOS

Nano-RAM cell

In yellow: a CMOS RAM cell.

Page 18: Computing Without Processors Thesis Proposal

18

A graph of the whole program execution:

A Limit Study of Performance

Memory word

Basic block

Memory write

Memory read

Control-flow transfer

Page 19: Computing Without Processors Thesis Proposal

19

Typical Program Graph (g721_e)

Control flow transfer

100% memory cluster

Memory reads

100% code cluster

memcpy

Page 20: Computing Without Processors Thesis Proposal

20

Program Graph After Inlining memcpy

memcpy

Page 21: Computing Without Processors Thesis Proposal

21

Application Slowdown

-1

0

1

2

3

4

5

6

7

8

9

10

11

tim

es s

low

er t

han

nat

ive

1 clock/square 5 clocks/square

Page 22: Computing Without Processors Thesis Proposal

22

How Time Is Spent

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

099.g

o

129.c

ompr

ess

130.l

i

132.i

jpeg

adpc

m_d

adpc

m_e

epic_

e

g721

_Q_d

g721

_Q_e

gsm

_d

gsm

_e

jpeg_d

jpeg_e

mpe

g2_d

per

cen

t

idle

executioncontrol flow

register traffic

No caches: reads expensive

No speculation

Page 23: Computing Without Processors Thesis Proposal

23

Lesson

The spatial model of computation has different properties.

Page 24: Computing Without Processors Thesis Proposal

24

Outline

• Motivation

• ASH: Application-Specific Hardware

• The spatial model of computation

• CASH: Compiling for ASH

• Evolutionary path

• Future work

Page 25: Computing Without Processors Thesis Proposal

25

CASH: Compiling for ASH

Memory partitioning

Interconnection net

Program to circuits

Page 26: Computing Without Processors Thesis Proposal

26

Compilation

1. Program

int reverse(int x){ int k,r=0; for (k=0; k<32; k++) r |= x&1; x = x >> 1; r = r << 1; }}

Unknown latency ops.

Computations& local storage2. Split-phase Abstract

Machines

3. Configurations placed

independently4. Placement on chip

Reliability

Page 27: Computing Without Processors Thesis Proposal

27

Split-phase Abstract Machines

SAM 1

SAM 2SAM 3

CFG

Power

Page 28: Computing Without Processors Thesis Proposal

28

Hyperblock => SAM

• Single-entry, multiple exit

• May contain loops

Page 29: Computing Without Processors Thesis Proposal

29

SAM => FSM

Start Loop

Exit

Exit

RemoteMemory

Localmemory

Page 30: Computing Without Processors Thesis Proposal

30

Implementing SAMs- interesting details -

Page 31: Computing Without Processors Thesis Proposal

31

The SAM FSM

Computation

Predicates (control)

Combinational logic

start exit

Reg

iste

r

args results

Page 32: Computing Without Processors Thesis Proposal

32

Computation = Dataflow

• Variables => wires + tokens• No token store; no token matching • Local communication only

Signals

x = a & 7;...

y = x >> 2;

Programs

&

a 7

>>

2

x

Circuits

Page 33: Computing Without Processors Thesis Proposal

33

Tokens & Synchronization

• Tokens signal operation completion• Possible implementations:

data

validack

Local

data

valid

reset

Global

data valid

Static

Page 34: Computing Without Processors Thesis Proposal

34

Speculation

if (x > 0) y = -x;

elsey = b*x;

*

x

b 0

y

!

slow

Computation Predicates

- >- >

and Eager Muxes

Static-Single Assignment implemented in hardware

ILP

Page 35: Computing Without Processors Thesis Proposal

35

Predicates

*q = 2;

• Guard side-effects– Memory access– Procedure calls

• Control looping

• Decide exit branch

• Select variable definition x=... x=...

...=x

Page 36: Computing Without Processors Thesis Proposal

36

Computing Predicates

• Correct for irreducible graphs• Correct even when speculatively computed • Can be eagerly computed

s t

b

Page 37: Computing Without Processors Thesis Proposal

37

Loops + Dataflow

for (i=0; i < 10; i++)a[i] += i;

+

load

+

store

&a[0]

+

1i

a[0]

0

a[1]

a[2]

a[3]

= Pipelining

Page 38: Computing Without Processors Thesis Proposal

38

Outline

• Motivation

• ASH: Application-Specific Hardware

• The spatial model of computation

• CASH: Compiling for ASH

• Evolutionary path

• Conclusions

• Future work

Page 39: Computing Without Processors Thesis Proposal

39

Evolutionary Path

Microprocessors ASH

The problem with ASH: Resources

Page 40: Computing Without Processors Thesis Proposal

40

Virtualization

Page 41: Computing Without Processors Thesis Proposal

41

CPU+ASH

core computation

support computation+ OS+ VM

CPU ASH

Memory

Page 42: Computing Without Processors Thesis Proposal

42

Outline

• Motivation

• ASH: Application-Specific Hardware

• The spatial model of computation

• CASH: Compiling for ASH

• Evolutionary path

• Conclusions

• Future work

Page 43: Computing Without Processors Thesis Proposal

43

ASH Benefits

Problem Solution

Reliability Configuration around defects

Power Only “useful” gates switching

Signals Localized computation

ILP Statically extracted

Page 44: Computing Without Processors Thesis Proposal

44

Scalable Performance

performance

CPU

ASH

time

now

Page 45: Computing Without Processors Thesis Proposal

45

Summary

• Contemporary CPU architecture faces lots of problems

• Application-Specific Hardware (ASH) provides a scalable technology

• Compiling HLL into hardware dataflow machines is an effective solution

Page 46: Computing Without Processors Thesis Proposal

46

Timeline

12/0206/01

CASH core

09/01 12/01 04/02 06/02 09/02

Writethesis

Hw/sw partitioning(ASH + CPU)

Costmodels

ASH Simulation

Loop parallelization

Explore architectural/compiler trade-offs

now

Memory partitioning

Page 47: Computing Without Processors Thesis Proposal

47

Extras

• Related work

• Reconfigurable hardware

• Other cross-over phenomena

• A CPU + ASH study

• More about predicates

Page 48: Computing Without Processors Thesis Proposal

48

Related Work

• Hardware synthesis from HLL

• Reconfigurable hardware

• Predicated execution

• Dataflow machines

• Speculative execution

• Predicated SSA

back

Page 49: Computing Without Processors Thesis Proposal

49

Reconfigurable Hardware

Universal gates

and/or

storage elements

Interconnectionnetwork

Programmable Switches

back to presentation

Page 50: Computing Without Processors Thesis Proposal

50

Switch controlled by a 1-bit RAM cell

0001

Universal gate = RAM

a0a1a0

a1

dataa1 & a2

0data in

control

Main RH Ingredient: RAM Cell

back

Page 51: Computing Without Processors Thesis Proposal

51

Reconfigurable Computing

• Back to ENIAC-style computation

• Synthesize one machine to solve one problem

back back to “extras”

Page 52: Computing Without Processors Thesis Proposal

52

Efficiency

time

idle

used

hardware resources

now

Page 53: Computing Without Processors Thesis Proposal

53

Manufacturing Cost

time

3x109$

now

cost

affordable

cost

Page 54: Computing Without Processors Thesis Proposal

54

Complexity

time

transistors

manageable

available

109

108

1010

now

Page 55: Computing Without Processors Thesis Proposal

55

CAD Tools

time

manual interventions

now

feasible

necessary

back

Page 56: Computing Without Processors Thesis Proposal

56

ASH BenefitsProblem Solution

Reliability Configuration around defects

Power Only “useful” gates switching

Signals Localized computation

ILP Statically extracted

Complexity Hierarchy of abstractions

CAD Compiler + local place & route

Efficiency Circuit customized to application

Cost No masks, no physics, same substrate

Performance Scalableback

Page 57: Computing Without Processors Thesis Proposal

57

CPU+ASH Study

• Reconfigurable functional unit on processor pipeline

• Adapted SimpleScalar 3.0• ASH & CPU use the same memory

hierarchy (incl. L1)• ASH can access CPU registers• CPU pipeline interlocked with ASH• Results pending

back

Page 58: Computing Without Processors Thesis Proposal

58

Simplifying Predicates

• Shared implementations

• Control equivalence

a

b

c

Page 59: Computing Without Processors Thesis Proposal

59

Deep Speculation

if (p) if (q) x = a; else x = b;else x = c;

x

a b c

!pp&!qp&q

Page 60: Computing Without Processors Thesis Proposal

60

Predicates & Tokens

*q = 2 readysafe

q

~x

ready

safe

x

*q = 2

1

ready & safe

q

Predicated tokens Eliminate speculation

~x

safe & readyx

back

ready

Eliminate wires

P P_ready

P & P_ready