vlsi-soc 2001 ifip - lirmm stream-based arrays: converging design flows for both, reiner hartenstein...

VLSI-SoC 2001 IFIP - LIRMM

Stream-based Arrays: Converging Design Flows for both,

Reiner Hartenstein

University ofKaiserslautern

December 2- 4, 2001, Montpellier, France

Reconfigurable

and Hardwired ....

© 2001, [email protected] http://www.fpl.uni-kl.de2

University of Kaiserslautern

Xputer Lab>> Stream-based

Computing

• Stream-based Computing

• Stream-based Compilation Techniques

• Use in Co-Design

• Now it’s up to You !http://www.uni-kl.de



Xputer Lab

XPU family (IP cores):PACT Corp., Munich

commercial rDPAs: rDPA (coarse grain) becoming

important

XPU128**) bought

**

**

flexible array: MorphICs

CALISTO: Silicon Spice

CS2000 family:Chameleon Systems

MECA family: Malleable

FIPSOC: SIDSA

ACM: Quicksilver Tech

CHESS array: Elixent

MorphoSys: Morpho Tech

http

://pa

ctco

rp.c

om



Xputer Lab

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

SNN filter Example: KressArray Family

not usedbackbus connect

KressArrayXplorer:rout thru only

http://kressarray.de You may use iton your Netscape



Xputer Lab Rapidly toward the Break-through

• replaceConcurrent Processes by more efficient parallelism: stream-based DPAs1

**) reconfigurable

2 ) KressArray** [1995]

and others [later]

terms:DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA

Kress: a generalization of systolic array synthesis:

stream-based rDPAs2

____

*) hardwired

1 ) systolic array*

[1980]

[Broderson]

Bee Project

chip-on-a-day* [2000]

Generalization ofthe Systolic Array

super systolic synthesis



Xputer Lab compare Concurrent Computing

DPUinstructionsequencer




....

Bus(es) or switch box

CPUextremely inefficient

•

massive bottleneck phenomena at run time •control flow overhead•instruction fetch / interpretation overhead •address computation overhead - may be massive



Xputer Lab... with Stream-based Computing:

(r)DPA

for both,• reconfigurable, and• hardwired [Brodersen]

DPU DPUDPU

DPU DPUDPU

DPU DPUDPU

•transport-triggered execution

driven by data stream fr. / to memoryor, fr. / to peripheral interface

•no instruction sequencer inside !

avoids run time overhead and bottleneck

phenomena

rDPA: drastically reduced reconfigurability overhead

•„instruction fetch“: at compile time

© 2001, [email protected] http://www.fpl.uni-kl.de


Xputer Lab

8

Soft rDPA ?

Memorysoft CPU

miscellanous

soft

soft

DPUDPU

arra

y

arra

ysoft

soft

DPUDPU

arra

y

arra

y

HLL Compiler

•50 mio system gates soon

•even large rDPAs as soft IPs become feasible

•by >2005: don’t care about area

efficiency ?



Xputer Lab>> Stream-based Compilation

Techniques


• Stream-based Compilation

Techniques





Xputer Lab

norouting!

equations

linearprojection

or algebraicmapping

DPU architecturey

+*

x

a

placement

a12

a11 a21

a32

a31

a23 a33

a22

a13

Systolic Stream-based Computing System

linear pipelinesand uniformarrays only The Mathematician’s

Synthesis Method

Systolic Array [H. T. Kung, 1980]: a DPA (Data Path Array)

computingin space

placement

computingin time

systolicarrays etc.

and other transformationsmigration by re-timing

this dichotomy iscompletely ignoredby our CS curricula

y10

y20

y30

---

y1

y2

y3

---

x1

x2

x3

-

- -

datastreams



Xputer Lab

2

General Stream-based Computing Systemheterogenous DPA or rDPA

simulated

annealing

free form

pipe network

Mapper

expression treeDPU architectures

y

+*

x

a

simultaneousplacement& routing

3

+

++

+

***sh

*sh

sh sh

xf

xf

-

-

1

Schedulerdatastreams

4

2



Xputer Lab

•an example by Nageldinger’s KressArray Xplorer

Memory Communication Architecture …•hot research topic in embedded systems

•storage context transformations [Cathoor, Herz, Kougia, Soudris]

•Synthesizable Memory Communication Architecture

• startups provide memory IPs or generators

application not usedLegend:

sequencersmemory ports

Optimized ParallelMemory Controller

GAG generic sequencer methodology available

Herz



Xputer Lab>> Use in Co-Design


• Stream-based Compilation

Techniques





Xputer Lab

datacounter(s)

programcou n ter:

state register

CompilerMemory

Datapath

hardwired

Sequencer

Computer Computer tightly coupledby compact

instruction code

“von Neumann”

“von Neumann”does not supportsoft data pathsdoes not supportsoft data paths

Datapath

reconfigurable

Xputer Xputer

SchedulerCompiler

Memory

(multiple)sequencer

DatapathArray


Xputer Lab

loosely coupledby decision data bits only

Xputer:Xputer:The Soft Machine Paradigm

The Soft Machine Paradigm reconfigurablereconfigurable

Computer:the wrong Machine Paradigm“von Neumann”

also for hardwiredalso for hardwired[Broderson]

enabling technologypublished 10 years ago

now a hot topic area

full day courselast week at Tampere, Finland



Xputer Lab

partitioning compiler

high level programming language source

Co-Compilation

Analyzer/ Profiler

supportingdifferentplatforms

Resource Parameters

Xputer

“Soft” Machine Paradigm

Configware running on

inte

rfac

e

ReconfigurableAccelerators

X-Ccompiler

KressArray

DPSS

GNU Ccompiler

X-C

Partitioner

Hardware / Software Co-Design turnsto Configware / Software Co-DesignJürgen Becker’s Co-DE-X Co-Compiler[ASP-DAC’95]

Computer

Machine Paradigm

Software running on

Processor



Xputer LabLoop Transformation

Examples

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop


loop 1-16bodyendloop

sequential processes:


reconf.array:host:

resource parameter drivenCo-Compilation

loop 1-8bodybodyendloop

loop unrolling



Xputer Lab>> Now it’s up to You !


• Stream-based Compilation Techniques


• Now it’s up to You !

http://www.uni-kl.de



Xputer LabHowever, current CS Education ….

Hardware invisible:under the surface

… is based on the Submarine Model

Brain usage:procedural-only

Software Faculty Colleagues shy away from the Paradigm Shift:their Brain hurts? - can’t be: this Half has been amputated

Algorithm

Assembly Language

procedural high level Programming

Language

Hardware

Software

This model disables ...



Xputer Lab

Hardware,Configware

... this model disablesHardware and Software as Alternatives

Algorithm

Software

partitioning

Software onlySoftware & Hardw/Configw

procedural structural

Brain Usage:both Hemispheres

Hardw/Configw only



Xputer LabThe Dominance of the Submarine

Model ...

Hardware

... indicates, that our CS education system produces zillions of mentally disabled

Persons

(procedural) structurallydisabled

… completely disabled to cope with solutions other than software only

It‘s time to attack the software faculty dictatorship.Get

involved!



Xputer Lab>>> thank you

thank you for listeningIt’s up to You !



Xputer Lab>>> END

END



Xputer LabThe Impact of Reconfigurable

Logic• Reconfigurable platforms bring a new dimension to digital

system development and have a strong impact on SoC design.

• A rapidly growing large user base of HDL-savvy designers with FPGA experience.

• Flexibility promises spin-around times downto minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades

• A New Business Model (in-field debugging and upgrading ... )

• A Fundamental Paradigm Shift in Silicon Application

Revenue/ month

Time / months

Update 1

Product

Update 2

1 10 20

ASIC Product

reconfigurable Product with download

30

[T. Kean]



Xputer LabThe History of

Paradigm Shifts

“Mainstream Silicon Applicationis switching every 10 Years”

TTL µproc.,memory

custom

standard

1957

1967

1977

1987

1997

2007

Makimoto’s Wave

ASICs,accel’s

LSI,MSI

??

“The Programmable System-on-a-Chipis the next wave“

reconfigurablePublished

in 1989



Xputer LabHow’s next Wave ?

2007

custom

standard

1957

1967

1977

1987

1997

procedural programming

algorithm: variable

resources: fixed

Tredennick’sParadigm Shifts

hardwired

algorithm: fixed

resources: fixed

2007FPGAs

structural programming

algorithm: variable

resources: variable

no further wave !

Coarse grain

RAs

Hartenstein’s Curve



Xputer LabThe Impact of

Makimoto’s Paradigm Shifts

TTL µproc.,memory

custom

standard

ASICs,accel’s

LSI,MSI

reconfigurable

1957

1967

1977

1987

1997

2007

Proceduralpersonalization via RAM-based

Machine Paradigm

structuralpersonalization:

RAM-basedbefore run time

Dr. Makimoto: FPL 2000 keynote

Software Industry’sSecret of Success

Configware Success Storyby new Machine ParadigmConfigware Success Storyby new Machine Paradigm



Xputer LabThe History of

Paradigm Shifts

“Mainstream Silicon Applicationis switching every 10 Years”

custom

standard

1957

1967

1977

1987

1997

2007

Makimoto’s Wave

TTL µproc.,memory FPGAs

ASICs,accel’s

LSI,MSI

coarsegrain



Xputer Lab

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

Wired by Abutment

vlsi-soc 2001 ifip - lirmm stream-based arrays: converging design flows for both, reiner hartenstein...

Documents