ipdps 2004 software or configware? about the digital divide of parallel computing reiner hartenstein...

55
IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

IPDPS 2004

Software or Configware? About the

Digital Divide of Parallel Computing

Reiner Hartenstein

TU Kaiserslautern

Santa Fe, NM, April 26 - 30, 2004

Page 2: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de2

TU Kaiserslautern

Preface

The White House, Sept 2000:Bill Clinton condemns the Digital Divide in America:

World Economic Forum 2002: The Global Digital Divide,disparity between the "haves" and "have nots“

The Digital Divide of Parallel Computing:The Digital Divide of Parallel Computing:Access to Configware (CW) SolutionsAccess to Configware (CW) Solutions

access to the internet

Page 3: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de3

TU KaiserslauternThe „havenots“

Configware methodology to move data around more efficiently:

Configware engineering as a qualification for programming embedded systems*:

„havenots“ are found in the HPC community

The „havenots“ are our typical CS graduatesReconfigurable HPC is

torpedoed by deficits in education:

curricular revisions are overdue*) also HPC !

Page 4: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de4

TU KaiserslauternSoftware to Configware

Migration

this talk will illustrate the performance benfitwhich may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC),point of view, this talk hardly mentions FPGAs(But coarse grain may be always mapped onto FPGAs)

Software to Configware Migration is the most important source of speed-upHardware is just frozen Configware

Page 5: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de5

TU Kaiserslautern>> HPC <<

•HPC

•Embedded Computing

•The wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarkshttp://www.uni-kl.de

Page 6: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de6

TU Kaiserslautern

Earth Simulator

5120 Processors, 5000 pins eachES 20: TFLOPS

Crossbar weight: 220 t, 3000 km of cable,moving data around

inside the

Page 7: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de7

TU Kaiserslautern

data are moved around by software

(slower than CPU clock by 2 orders of magnitude)

i.e. by memory-cycle-hungry instruction streams which fully hit the memory wall

extremely unbalanced

stolen from Bob Colwell

CPU

Page 8: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de8

TU Kaiserslautern

path of least resistance*:avoiding a paradigm shift

Many researchers seem never to stop working on sophisticated solutions for marginal improvements ...... continously ignoring methodologies promising speed-ups by orders of magnitude ....

blinders to ignore the impact of morphware

... continue to bang their heads against the memory

wallinstead

of

*) [Michel Dubois]

Page 9: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de9

TU Kaiserslautern

the data-stream-based approach

has no von Neumann bottle-neck

has no von Neumann bottle-neck

… understand only this parallelism solution:

the instruction-stream-based approach

von Neuman

n bottle-necks

von Neuman

n bottle-necks

... c

annot

cope w

ith

this

one

Page 10: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de10

TU Kaiserslautern>> Embedded Computing

<<

•HPC

•Embedded Computing

•The wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarkshttp://www.uni-kl.de

Page 11: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de11

TU Kaiserslautern

History of Machine Models

1957

1967

1977

1987

1997

2007

mainframe age

mainframe.

compile

procedural mind set: instruction-stream-based

procedural mind set: instruction-stream-based

(coordinates by Makimtos wave)

computer age (PC age)

accel.µProc.

compile

users: RIKEN institute, ARI, Heidelberg, etc.

MD-GRAPE-2 PCI board [1997]4 chips for N-body simulation

converts a PC to 64 GFlops

scientific computing example:molecular dynamics, astrophysics ,

plasma physics, hydrodynamics:

Page 12: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de12

TU Kaiserslautern

History of Machine Models

1957

1967

1977

1987

1997

2007

mainframe age

mainframe.

compile

procedural mind set: instruction-stream-based

procedural mind set: instruction-stream-based

(coordinates by Makimtos wave)

computer age (PC age)

accel.µProc.

compile

structural mind set: data-stream-basedstructural mind set: data-stream-based

by hardware guysby hardware guysdesign

Page 13: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de13

TU Kaiserslautern

the hardware / Software Chasm:

typical programmers don‘t understand function evaluation without machine mechanisms (counters, state registers)

It‘s the gap between procedural (instruction-stream-based) and structural (datastream-based) mind set

acceleratorsacceleratorsµprocessorµprocessor

Page 14: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de14

TU Kaiserslautern

Growth Rate of Embedded Software

1

2

0 10 12 18 months

factor

*) Department of Trade and Industry, London

(1.4/year)

[Moore

’s law]

>10 times more programmers will write embedded applications

than computer software by 2010

Embe

dded

sof

twar

e [D

TI*]

(~2.

5/yr

)

already to-day, more than 98% of all microprocessors are

used within embedded systems

Page 15: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de15

TU Kaiserslauterntypical CS graduates: the

„havenots“

To-day, „typical“ CS graduates are unqualified for this labor market

… cannot cope with Hardware / Configware / Software partitioning issues

… cannot implement Configware

Page 16: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de16

TU Kaiserslautern

the current CS mind set is based on the Submarine

Model

Hardware invisible:under the surface

Hardware

Algorithm

Assembly Language

procedural high level Programming

Language

SoftwareThis model does not suport Hardware / Configware / Software partitioning

Page 17: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de17

TU Kaiserslautern

Hardware / Configware / Software Partitioning skills urgently needed

Algorithm

partitioning

HWHW

CWCW

SWSW

to cope with each of it:SW, CW, HW

.

SW/HWSW/HW

SW/CW/HWSW/CW/HWSW/CWSW/CW

CW/H

WCW

/HW

or: to cope with anycombination of co-design

.

Software to Configware Migration is the most important source of speed-up

Hardware is just frozen Configware

Page 18: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de18

TU Kaiserslautern

By the way ...

http://fpl.org

International Conference on Field-Programmable Logic

and Applications (FPL)

Aug. 20 – Sept 1, 2004, Antwerp, Belgium

288 submissions !

... the oldest and largest conference in the field:

accel.µProc.... going into every type of application

they all work on high performance

Page 19: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de19

TU Kaiserslautern

Dominance of the Submarine Model ...

Hardware

... indicates, that our CS education system produces

zillions of mentally disabled CS graduates

(procedural) structurallydisabled

… disabled to cope with solutions other than

instruction-stream-based

Page 20: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de20

TU Kaiserslautern

CS Education

proceduralhave not

You cannot *teach Hardware to a Programmer*) efficiently

But to a Hardware Guy you always can teach

Programming

structuralhave natural

Page 21: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de21

TU Kaiserslautern>> the wrong Roadmap <<

•HPC

•Embedded Computing

• the wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarkshttp://www.uni-kl.de

Page 22: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de22

TU Kaiserslautern

Completely wrong roadmap

beef up old architectural principles by new technology?

growth factor

µm

0.1

performance

area efficiency

„Pollack‘s Law“ (simplified)

[intel]

... the CPU is a methusela, the steam

engine of the silicon age

Page 23: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de23

TU Kaiserslautern

Completely wrong mind set

The key problem, the memory wall, cannot be solved by new CPU technology

We need a 2nd machine paradigm (a 2nd mind set ...)

The vN paradigm is not a communication paradigmIts monopoly creates a completely wrong mind set

We need an architectural communication paradigmBut we need both paradigms: a dichotomy

Page 24: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de24

TU Kaiserslautern3rd machine model became

mainstream

1957

1967

1977

1987

1997

2007

computer age (PC age)

accel.

design

µProc.

compile

(Makimtos wave)

mainframe age

mainframe

compile

instruction-stream-based DPArrµProc.

programma

bleprogramma

ble

most CS curricula &

HPC are still here

most CS curricula &

HPC are still here

morphware age

Page 25: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de25

TU Kaiserslautern>> Configware Engineering <<

•Supercomputing (HPC)

•Embedded Computing

•The wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarkshttp://www.uni-kl.de

Page 26: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de26

TU Kaiserslautern

de facto Duality of RAM-based platforms

traditional new

RAM-based platform CPU morphware (FPGA, rDPA ..)

„running“ on it: software configware

machine paradigm von Neumann etc.:instruction-stream-based

anti machine: data-stream-based

2nd paradigm

We now have 2 types of programmable platforms

soft hardwaremorphware[DARPA]

hardware viewed as frozen configware: Just earlier binding

Page 27: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de27

TU Kaiserslautern

[Gordon Bell]

... going into every type of application

[Gordon Bell]

.... the brain hurts

CW has become mainstream ...

Others experienced, that the brain hurts, when trying the paradigm shift

The HPC scene believed to be smart, when smiling about us CW guys

morphware: fastest growing sector of the IC market

Page 28: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de28

TU Kaiserslautern

DPA

morphware age

rr

From Software to Configware Industry

structuralpersonalization:

RAM-based

Repeat Success Story bya 2nd Machine Paradigm !

Growing Configware IndustryGrowing Configware Industry

1957

1967

1977

1987

1997

2007

computer age (PC age)

µProc.

compileProcedural

personalization via RAM-based .

Machine Paradigm

Software IndustrySoftware Industry

1)2)

Software Industry’sSecret of Success anti machine

Page 29: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de29

TU Kaiserslautern

benefit from RAM-based & 2nd paradigm

RAM-based platform needed for:• flexibility, programmability

• avoiding the need of specific silicon

mask cost: currently 2 mio $ - rapidly growing

1)

simple 2nd machine paradigm needed as a common model:• to avoid the need of circuit expertize

• needed to to educate zillions of programmers

2)

(vN relocatability is based on the von Neumann bottleneck) high price

By the way: relocatability is more difficult, but not hopeless

Page 30: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de30

TU Kaiserslautern

configware resources: variable

Nick Tredennick’s Paradigm Shifts explain the differences

2 programming sources needed

flowware algorithm: variable

Configware EngineeringConfigware Engineering

Software EngineeringSoftware Engineering

1 programming source needed

algorithm: variable

resources: fixedsoftware

CPU

Page 31: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de31

TU Kaiserslautern

Compilation: Software vs. Configware

source program

softwarecompiler

software code

Software Engineeri

ng

Software Engineeri

ng

configware code

mapper

configwarecompiler

scheduler

flowware code

source „program“

Configware

Engineering

Configware

Engineering

placement &

routing

data

Page 32: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de32

TU Kaiserslautern

DPA

xxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

input data streams

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|output data streams

„data

streams“ time

port #

time

time

port #time

port #

Flowware defines: ... which data item at which time at which port

Flowware programs data streams

Page 33: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de33

TU Kaiserslautern

Flowware: not new

1957

1967

1977

1987

1997

2007

computer age (PC age)

accel.

design

µProc.

compile

(Makimtos wave)

mainframe age

mainframe

compile

DPArrµProc.

morphware age

*) no confusion, please: no „dataflow

machine“ !!!

*) no confusion, please: no „dataflow

machine“ !!!data stream* ...Flowware:

around 1980

Page 34: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de34

TU Kaiserslauterndata streams*: not new

1980: data streams (Kung, Leiserson: systolic arrays)

1989: data-stream-based Xputer architecture

1990: rDPU (Rabaey)

1994: Flowware Language MoPL (Becker et al.)

1995: super systolic array (rDPA) + DPSS tool (Kress)

1996+: Stream-C language, SCCC (Los Alamos), SCORE, ASPRC, Bee (UC Berkeley), ...

1996+: streaming languages (Stanford et al.)

1996: configware / software partitioning compiler (Becker)

*) please, don‘t

confuse with „data

flow“

Page 35: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de35

TU Kaiserslautern>> Dual Machine Paradigms <<

•HPC

•Embedded Computing

•The wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarkshttp://www.uni-kl.de

Page 36: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de36

TU Kaiserslautern

Why a new machine

paradigm ???The anti machine as the 2nd paradigm is the key to curricular innovation

rDPArDPAµprocessorµprocessor

... a Troyan horse to introduce the structural domain to the procedural-only mind set of programmers

Programming by flowware instead of software is very easy to learn

Flowware education: no fully fledged hardware expert needed to program embedded systems

(... same language primitives)

Page 37: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de37

TU Kaiserslautern

von Neumann vs. anti machine

programcounter

DPUCPU

RAMmemory

von Neumann bottleneck

(r)DPA

withoutsequencerno

CP

U !

asMA: auto-sequencing Memory Array

........

asM

asM

asM

asM

asM

(r)DPA

....

....

data stream machine

(anti machine)

datacounter

memory

bank

asM

asM: auto-sequencing Memory

instruction stream machine(von Neumann etc.)

Page 38: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de38

TU Kaiserslautern

Behavior of the Counter

datacounter

memory

bank

asM

asM

asM

asM

asM

asM

........

programcounter

DPUCPU

programme

d

by flowwaredata

streams

programm

e

d by software

(r)DPA

programm

e

d by flowware

Page 39: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de39

TU Kaiserslautern

Counters: the same micro architecture ?

data stream machine (anti

machine)

datacounter

memory

bank

asM

programcounter

DPUCPU

instruction stream machine:(von Neumann etc.)

yes, is possible, but for data counters ...

*) for history of AGUs see Herz et al.: Proc. ICECS 2002, Dubrovnik, Croatia

... a much better AGU methodology is available*

AGU: address generator unit

Page 40: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de40

TU Kaiserslauterncommercial rDPA example:

PACT XPP - XPU128

XPP128 rDPA

• Evaluation Board available, and • XDS Development Tool with Simulator

buses not

shown

rDPU

CF

G

PAE

core

ALU CtrlALU

CF

GC

FG

PAE

core

CF

GC

FG

PAE

core

PAE

core

ALU CtrlALUALU CtrlALU

CF

GC

FG

CF

GC

FG

• Full 32 or 24 Bit Design working silicon • 2 Configuration Hierarchies

© PACT AG, Munich http://pactcorp.com

(r)DPA

Page 41: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de41

TU Kaiserslautern

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

mapping algorithms efficently onto rDPA

rout thru only

not usedbackbus connect

SNN filter on KressArray

[Ulrich Nageldinger]

by DPSS: based on simulated annealing

Page 42: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de42

TU Kaiserslautern

symbiosis of machine models

1957

1967

1977

1987

1997

2007

computer age (PC age)

accel.

design

µProc.

compile

(Makimtos wave)

mainframe age

mainframe

compile

morphware age

DPArrµProc.replace PC by PS

co-compiler

symbiosis

Page 43: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de43

TU Kaiserslautern

Software / Configware Co-Compilation

Analyzer/ Profiler

SW code

SWcompiler

paradigm“vN" machine

CW Code

CWcompiler

anti machineparadigm

Partitioner

Resource Parameters

supportingdifferentplatforms

Jürgen Becker’s CoDe-X, 1996

High level PL source

Page 44: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de44

TU Kaiserslautern>> Speed-up Examples <<

•HPC

•Embedded Computing

•The wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarkshttp://www.uni-kl.de

Page 45: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de45

TU Kaiserslautern

Better solutions by Configware

Memory cycles minimizede.g.: no instruction fetch at run time & other

effects

Memory access for data: caches do not help anyhowLoop xforms: no intra-stream data memory cyclesComplex address computation: no memory cycles

No cache misses!

instead of software

methodologies not new: high level synthesis (1980+)loop transformations (1970+)

many other areas

Page 46: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de46

TU Kaiserslautern

speed-up examples

platform application example speed-up factor method

PACT Xtreme4-by-4 array[2003]

16 tap FIR filter x16 MOPS/mWstraight forward

MoM anti machine with DPLA* [1983]

grid-based DRC** 1-metal 1-poly nMOS*** 256 reference patterns

> x1000(computation

time)

multipleaspects

*) DPLA: MPC fabr. via E.I.S. multi univ. project

key issue: algorithmic cleverness

**) Design Rule Check

CPU 2 FPGA [FPGA 2004]

migrate several simple application exampes

x7 – x46(compute

time)

hi levelsynthesis

***) for 10-metal 3-poly cMOS expected: >> x10,000

DSP 2 FPGA [Xilinx 20042]

from fastest DSP: 10 gMACs to 1 teraMAC

X 100(compute

time)not spec.

2) Wim Roelandts

Page 47: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de47

TU KaiserslauternSoftware to Configware Migration:

(RAW’99 at Orlando)

question by a highly respected industrial senior researcher:

Ulrich Nageldinger‘s talk about KressArray Xplorer:

„But you can‘t implement decisions!“

(symptom of the configware/software chasm)

Page 48: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de48

TU Kaiserslautern

hypothetical branching example to illustrate time-to-space

migration

*) if no intermed. storage in register file

C = 1simple conservative CPU example

memory cycles

nanoseconds

if C then read A

read instruction 1 100instruction decoding

read operand* 1 100operate & reg. transfers

if not C then read B

read instruction 1 100instruction decoding

add & store

read instruction 1 100instruction decoding

operate & reg. transfers

store result 1 100

total 5 500

S = R + (if C then A else B endif);

S

+

ABR C

clock200 MHz(5 nanosec)

=1

sect

ion

of a

maj

or p

ipe

netw

ork

on rD

PA

no m

emor

y cy

cles

:

no m

emor

y cy

cles

:

spee

d-up

fac

tor

= 1

00

spee

d-up

fac

tor

= 1

00

Page 49: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de49

TU Kaiserslautern

rDPA (coarse grain) vs. FPGA (fine grain)

commoditycommodity

roughly:area

efficiency(trans/chip,

o‘o‘ magnitude)

hardwired 4

FPGA 2

µProc 0

rDPA 4

roughly:performance

(MOPS/mW, o‘o‘

magnitude)

hardwired 3

FPGA 2

µProc 0

rDPA 3

DSP 1

Status: ~1998

Page 50: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de50

TU Kaiserslautern

Why the speed-up ...

... although FPGA is clock slower by x 3 or even more(most know-how from „high level synthesis“ discipline)

moving operator to the data stream (before run time)

support operations: no clock nor memory cycle

decisions without memory cycles nor clock cycles

most „data fetch“ without memory cycle

Page 51: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de51

TU Kaiserslautern>> Final Remarks <<

http://www.uni-kl.de

•HPC

•Embedded Computing

•The wrong Roadmap

•Configware Engineering

•Dual Machine Paradigms

•Speed-up Examples

•Final Remarks

Page 52: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de52

TU Kaiserslautern

First Indications of Change

10th RAW at IPDPS, Nice, France, April 2003: after a decade of non-overlap: first IPDPS people coming

HPC Asia 2004 - 7th Int‘l Conference on High Performance Computing, July 20-22, 2004 Omiya Sonic City, Tokyo Area, Japan: Workshop on Reconfigurable Systems f. HPC (RHPC) + keynote address*

HPCA-11, 11th International Symposium on High-Performance Computer Architecture, San Francisco, Febr. 12-16, 2005: topic area explicitely: Embedded and reconfigurable architectures

SBAC-PAD 2004 - 16th Symposium on Computer Architecture and High Performance Computing, Foz do Iguacu, PR, Brazil, October 27-29, 2004: topic area explicitely: Reconfigurable Systems

*) keynote speaker:

PARS & Speed-up, Basel, Switzerland, March 2003: keynote address*

IPDPS, Santa Fe, NM, USA, April 2004: keynote address*

PDP’04, La Coruna, Spain, Febr. 2004: keynote address*

Page 53: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de53

TU Kaiserslautern

HPC experts coming ...

ARI, Astrononisches Rechen-Institut, founded 1700 in Berlin, moved 1945 to Heidelberg by

August Kopff

Gottfried Kirch

Simulation of Star Clusters: x10 speed-upby supercomputer-to-morphware migration

(also molecular biology et al.)

Rainer Spurzem, University of Heidelberg

Reinhard Männer, University of MannheimHPC pioneer since 1976 (Physics Dept Heidelberg)

Configware by

Astrophysics by

example: N-body problem went configwarepaper already

at FPL 1999http://fpl.org

Page 54: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de54

TU KaiserslauternConclusions

We need an academic grass roots movement, for ....

RC has become mainstream in all kinds of applications

... by a merger with the embedded systems mind set

CS education deficits: a curricular revision is overdue

...free material & tools for undergraduate lab courses to program and emulate small SW/CW/HW examples

all know-how needed readily available:

get involved !

Page 55: IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April 26 - 30, 2004

© 2004, [email protected] http://hartenstein.de55

TU Kaiserslautern

END