vlsi-soc 2001 ifip - lirmm stream-based arrays: converging design flows for both, reiner hartenstein...

28
VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001, Montpellier, France Reconfigurabl e and Hardwired ....

Upload: wilfred-gray

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

VLSI-SoC 2001 IFIP - LIRMM

Stream-based Arrays: Converging Design Flows for both,

Reiner Hartenstein

University ofKaiserslautern

December 2- 4, 2001, Montpellier, France

Reconfigurable

and Hardwired ....

Page 2: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de2

University of Kaiserslautern

Xputer Lab>> Stream-based

Computing

• Stream-based Computing

• Stream-based Compilation Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de

Page 3: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de3

University of Kaiserslautern

Xputer Lab

XPU family (IP cores):PACT Corp., Munich

commercial rDPAs: rDPA (coarse grain) becoming

important

XPU128**) bought

**

**

flexible array: MorphICs

CALISTO: Silicon Spice

CS2000 family:Chameleon Systems

MECA family: Malleable

FIPSOC: SIDSA

ACM: Quicksilver Tech

CHESS array: Elixent

MorphoSys: Morpho Tech

http

://pa

ctco

rp.c

om

Page 4: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de4

University of Kaiserslautern

Xputer Lab

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

SNN filter Example: KressArray Family

not usedbackbus connect

KressArrayXplorer:rout thru only

http://kressarray.de You may use iton your Netscape

Page 5: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de5

University of Kaiserslautern

Xputer Lab Rapidly toward the Break-through

• replaceConcurrent Processes by more efficient parallelism: stream-based DPAs1

**) reconfigurable

2 ) KressArray** [1995]

and others [later]

terms:DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA

Kress: a generalization of systolic array synthesis:

stream-based rDPAs2

____

*) hardwired

1 ) systolic array*

[1980]

[Broderson]

Bee Project

chip-on-a-day* [2000]

Generalization ofthe Systolic Array

super systolic synthesis

Page 6: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de6

University of Kaiserslautern

Xputer Lab compare Concurrent Computing

DPUinstructionsequencer

DPUinstructionsequencer

DPUinstructionsequencer

DPUinstructionsequencer

....

Bus(es) or switch box

CPUextremely inefficient

massive bottleneck phenomena at run time •control flow overhead•instruction fetch / interpretation overhead •address computation overhead - may be massive

Page 7: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de7

University of Kaiserslautern

Xputer Lab... with Stream-based Computing:

(r)DPA

for both,• reconfigurable, and• hardwired [Brodersen]

DPU DPUDPU

DPU DPUDPU

DPU DPUDPU

•transport-triggered execution

driven by data stream fr. / to memoryor, fr. / to peripheral interface

•no instruction sequencer inside !

avoids run time overhead and bottleneck

phenomena

rDPA: drastically reduced reconfigurability overhead

•„instruction fetch“: at compile time

Page 8: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de

University of Kaiserslautern

Xputer Lab

8

Soft rDPA ?

Memorysoft CPU

miscellanous

soft

soft

DPUDPU

arra

y

arra

ysoft

soft

DPUDPU

arra

y

arra

y

HLL Compiler

•50 mio system gates soon

•even large rDPAs as soft IPs become feasible

•by >2005: don’t care about area

efficiency ?

Page 9: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de9

University of Kaiserslautern

Xputer Lab>> Stream-based Compilation

Techniques

• Stream-based Computing

• Stream-based Compilation

Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de

Page 10: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de10

University of Kaiserslautern

Xputer Lab

norouting!

equations

linearprojection

or algebraicmapping

DPU architecturey

+*

x

a

placement

a12

a11 a21

a32

a31

a23 a33

a22

a13

Systolic Stream-based Computing System

linear pipelinesand uniformarrays only The Mathematician’s

Synthesis Method

Systolic Array [H. T. Kung, 1980]: a DPA (Data Path Array)

computingin space

placement

computingin time

systolicarrays etc.

and other transformationsmigration by re-timing

this dichotomy iscompletely ignoredby our CS curricula

y10

y20

y30

---

y1

y2

y3

---

x1

x2

x3

-

- -

datastreams

Page 11: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de11

University of Kaiserslautern

Xputer Lab

2

General Stream-based Computing Systemheterogenous DPA or rDPA

simulated

annealing

free form

pipe network

Mapper

expression treeDPU architectures

y

+*

x

a

simultaneousplacement& routing

3

+

++

+

***sh

*sh

sh sh

xf

xf

-

-

1

Schedulerdatastreams

4

2

Page 12: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de12

University of Kaiserslautern

Xputer Lab

•an example by Nageldinger’s KressArray Xplorer

Memory Communication Architecture …•hot research topic in embedded systems

•storage context transformations [Cathoor, Herz, Kougia, Soudris]

•Synthesizable Memory Communication Architecture

• startups provide memory IPs or generators

application not usedLegend:

sequencersmemory ports

Optimized ParallelMemory Controller

GAG generic sequencer methodology available

Herz

Page 13: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de13

University of Kaiserslautern

Xputer Lab>> Use in Co-Design

• Stream-based Computing

• Stream-based Compilation

Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de

Page 14: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de14

University of Kaiserslautern

Xputer Lab

datacounter(s)

programcou n ter:

state register

CompilerMemory

Datapath

hardwired

Sequencer

Computer Computer tightly coupledby compact

instruction code

“von Neumann”

“von Neumann”does not supportsoft data pathsdoes not supportsoft data paths

Datapath

reconfigurable

Xputer Xputer

SchedulerCompiler

Memory

(multiple)sequencer

DatapathArray

University of Kaiserslautern

Xputer Lab

loosely coupledby decision data bits only

Xputer:Xputer:The Soft Machine Paradigm

The Soft Machine Paradigm reconfigurablereconfigurable

Computer:the wrong Machine Paradigm“von Neumann”

also for hardwiredalso for hardwired[Broderson]

enabling technologypublished 10 years ago

now a hot topic area

full day courselast week at Tampere, Finland

Page 15: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de15

University of Kaiserslautern

Xputer Lab

partitioning compiler

high level programming language source

Co-Compilation

Analyzer/ Profiler

supportingdifferentplatforms

Resource Parameters

Xputer

“Soft” Machine Paradigm

Configware running on

inte

rfac

e

ReconfigurableAccelerators

X-Ccompiler

KressArray

DPSS

GNU Ccompiler

X-C

Partitioner

Hardware / Software Co-Design turnsto Configware / Software Co-DesignJürgen Becker’s Co-DE-X Co-Compiler[ASP-DAC’95]

Computer

Machine Paradigm

Software running on

Processor

Page 16: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de16

University of Kaiserslautern

Xputer LabLoop Transformation

Examples

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop

loop 1-2triggerendloop

loop 1-16bodyendloop

sequential processes:

loop 1-8triggerendloop

reconf.array:host:

resource parameter drivenCo-Compilation

loop 1-8bodybodyendloop

loop unrolling

Page 17: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de17

University of Kaiserslautern

Xputer Lab>> Now it’s up to You !

• Stream-based Computing

• Stream-based Compilation Techniques

• Use in Co-Design

• Now it’s up to You !

http://www.uni-kl.de

Page 18: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de18

University of Kaiserslautern

Xputer LabHowever, current CS Education ….

Hardware invisible:under the surface

… is based on the Submarine Model

Brain usage:procedural-only

Software Faculty Colleagues shy away from the Paradigm Shift:their Brain hurts? - can’t be: this Half has been amputated

Algorithm

Assembly Language

procedural high level Programming

Language

Hardware

Software

This model disables ...

Page 19: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de19

University of Kaiserslautern

Xputer Lab

Hardware,Configware

... this model disablesHardware and Software as Alternatives

Algorithm

Software

partitioning

Software onlySoftware & Hardw/Configw

procedural structural

Brain Usage:both Hemispheres

Hardw/Configw only

Page 20: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de20

University of Kaiserslautern

Xputer LabThe Dominance of the Submarine

Model ...

Hardware

... indicates, that our CS education system produces zillions of mentally disabled

Persons

(procedural) structurallydisabled

… completely disabled to cope with solutions other than software only

It‘s time to attack the software faculty dictatorship.Get

involved!

Page 21: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de21

University of Kaiserslautern

Xputer Lab>>> thank you

thank you for listeningIt’s up to You !

Page 22: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de22

University of Kaiserslautern

Xputer Lab>>> END

END

Page 23: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de23

University of Kaiserslautern

Xputer LabThe Impact of Reconfigurable

Logic• Reconfigurable platforms bring a new dimension to digital

system development and have a strong impact on SoC design.

• A rapidly growing large user base of HDL-savvy designers with FPGA experience.

• Flexibility promises spin-around times downto minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades

• A New Business Model (in-field debugging and upgrading ... )

• A Fundamental Paradigm Shift in Silicon Application

Revenue/ month

Time / months

Update 1

Product

Update 2

1 10 20

ASIC Product

reconfigurable Product with download

30

[T. Kean]

Page 24: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de24

University of Kaiserslautern

Xputer LabThe History of

Paradigm Shifts

“Mainstream Silicon Applicationis switching every 10 Years”

TTL µproc.,memory

custom

standard

1957

1967

1977

1987

1997

2007

Makimoto’s Wave

ASICs,accel’s

LSI,MSI

??

“The Programmable System-on-a-Chipis the next wave“

reconfigurablePublished

in 1989

Page 25: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de25

University of Kaiserslautern

Xputer LabHow’s next Wave ?

2007

custom

standard

1957

1967

1977

1987

1997

procedural programming

algorithm: variable

resources: fixed

Tredennick’sParadigm Shifts

hardwired

algorithm: fixed

resources: fixed

2007FPGAs

structural programming

algorithm: variable

resources: variable

no further wave !

Coarse grain

RAs

Hartenstein’s Curve

Page 26: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de26

University of Kaiserslautern

Xputer LabThe Impact of

Makimoto’s Paradigm Shifts

TTL µproc.,memory

custom

standard

ASICs,accel’s

LSI,MSI

reconfigurable

1957

1967

1977

1987

1997

2007

Proceduralpersonalization via RAM-based

Machine Paradigm

structuralpersonalization:

RAM-basedbefore run time

Dr. Makimoto: FPL 2000 keynote

Software Industry’sSecret of Success

Configware Success Storyby new Machine ParadigmConfigware Success Storyby new Machine Paradigm

Page 27: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de27

University of Kaiserslautern

Xputer LabThe History of

Paradigm Shifts

“Mainstream Silicon Applicationis switching every 10 Years”

custom

standard

1957

1967

1977

1987

1997

2007

Makimoto’s Wave

TTL µproc.,memory FPGAs

ASICs,accel’s

LSI,MSI

coarsegrain

Page 28: VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

© 2001, [email protected] http://www.fpl.uni-kl.de28

University of Kaiserslautern

Xputer Lab

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

Wired by Abutment