© 2006, [email protected] reconfigurable computing reiner hartenstein computing meeting eu,...
Post on 19-Dec-2015
221 views
TRANSCRIPT
![Page 1: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/1.jpg)
© 2006, [email protected]
http://hartenstein.de
Reconfigurable Computing
Reiner Hartenstein
Computing MeetingEU, ESU, Brussells, May 18, 2006
![Page 2: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/2.jpg)
2© 2006,
http://hartenstein.de
The Pervasiveness of RC
162,000
127,000
158,000113,000
171,000194,000
# of hits by Google
1,620,000
915,000
398,000
272,000
647,000
1,490,000
# of hits by Google
“FPGA and ….”ECE-savvy scene (mainstream many years)
Math/SW-savvy scene(more recently: 2-3 years)
and many more areas
and many more areas
![Page 3: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/3.jpg)
3© 2006,
http://hartenstein.de
The dominance of Configware
Most compute power is coming from Configware
More MIPS migrated to Configware than running as Software
![Page 4: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/4.jpg)
4© 2006,
http://hartenstein.de
Reconfigurable Supercomputing (VHPC) going commercial
Cray XD1
silicon graphics RASC
… and other vendors
![Page 5: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/5.jpg)
5© 2006,
http://hartenstein.de
>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
![Page 6: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/6.jpg)
6© 2006,
http://hartenstein.de
The Reconfigurable Computing Paradox
area-inefficient, slow, power-hungry, expensive
tools and languages unacceptable by most users
poor FPGA technology:
RC education: extremely poor, if at all
even most hardware experts (86%**) hate their tools
**) DeHon ‘98
poor tools:
poor education:- ignored by CS
curriculaCS taught like for a 50 year old mainframe …
![Page 7: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/7.jpg)
7© 2006,
http://hartenstein.de
FPGA integration density
the effective integration density of plane FPGAs is behind Moore’s law by more than 4 orders of magnitude
However, brillia
nt
results everywherewhat paradox ?
![Page 8: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/8.jpg)
8© 2006,
http://hartenstein.de
X 2/yr
FPGA
speed-up factors published
1980 1990 2000 2010100
103
106
109
8080
Pentium 4
7%/yr
50%/yr
http://xputers.informatik.uni-kl.de/faq-pages/fqa.html
10 000
Los Alamos traffic simulation
Los Alamos traffic simulation
47
real-time face detectionreal-time face detection6000
video-rate stereo vision
video-rate stereo vision
900pattern
recognitionpattern
recognition730
SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching
Smith-Waterman pattern matching
288
BLASTBLAST52protein identificationprotein identification
40
molecular dynamics simulationmolecular dynamics simulation
88
Reed-Solomon Decoding
Reed-Solomon Decoding2400
Viterbi DecodingViterbi Decoding
400
FFTFFT
100
1000MA
CMA
C
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
20002000
2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]
39,4
Lee Routing (by TU-KL)
Lee Routing (by TU-KL)
160
Grid-based DRC („fair
comparizon“)
Grid-based DRC („fair
comparizon“)1500015000
DSP and wirelessDSP and wirelessImage processing,Pattern matching,
Multimedia
Image processing,Pattern matching,
Multimedia
BioinformaticsBioinformatics
GRAPEGRAPE20
AstrophysicsAstrophysics
DPLADPLA
MoM Xputer architecture
Microprocessor
rela
tive
perf
orm
anc
e
Memory
10 000
x1.25 / yr (Moore)
cryptocrypto
1000
pre-FPGA era
>1 OoM>1 OoM
>2 OoM>2 OoM
>3 OoM>3 OoM
<4 OoM<4 OoM
![Page 9: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/9.jpg)
9© 2006,
http://hartenstein.de
500MHz FlexibleSoft Logic Architecture
200KLogic Cells
500MHz Programmable DSP Execution Units
0.6-11.1GbpsSerial Transceivers
500MHz PowerPC™ Processors(680DMIPS)
withAuxiliary Processor Unit
1Gbps DifferentialI/O
500MHz multi-portDistributed 10 Mb SRAM
500MHz DCM DigitalClock Management
platform FPGAs: better area efficiency
[courtesy Xilinx Corp.]DSP platform FPGA
DeHon‘s 1st Law (1996) was for plane FPGAs
![Page 10: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/10.jpg)
10© 2006,
http://hartenstein.de
pre FPGA era: Why DPLA* was so goodpre FPGA era: Why DPLA* was so good
Large arrays of canonical boolean expressions -
close to Moore’s lawclassical PLA layout highly area-efficient:
*) fabricated 1984 by E.I.S. multi university project
2ASM: Auto-Sequencing MemoryASM
**) for a survey by IMEC & TU-KL see: [M. Herz et al.: ICECS 2003, Dubrovnik]
1
Mid’ 80ies: first only very tiny FPGAs available: 1 DPLA replaced 256 of them
a generalization of the DMA**
GAG Generic Address Generator** to avoid address computation overhead
reducing memory cycles which is the
key issue
Speed-up factor of 20 by
![Page 11: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/11.jpg)
11© 2006,
http://hartenstein.de
X 2/yr
FPGA
taxonomy of algorithms, better tools and better education
1980 1990 2000 2010100
103
106
109
8080
Pentium 4
7%/yr
50%/yr
10 000
Los Alamos traffic simulation
Los Alamos traffic simulation
47
real-time face detectionreal-time face detection6000
video-rate stereo vision
video-rate stereo vision
900pattern
recognitionpattern
recognition730
SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching
Smith-Waterman pattern matching
288
BLASTBLAST52protein identificationprotein identification
40
molecular dynamics simulationmolecular dynamics simulation
88
Reed-Solomon Decoding
Reed-Solomon Decoding2400
Viterbi DecodingViterbi Decoding
400
FFTFFT
100
1000MA
CMA
C
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
20002000
2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]
39,4
Lee Routing (by TU-KL)
Lee Routing (by TU-KL)
160
Grid-based DRC („fair
comparizon“)
Grid-based DRC („fair
comparizon“)1500015000
DSP and wirelessImage processing,Pattern matching,
Multimedia
Bioinformatics
GRAPEGRAPE20
Astrophysics
DPLADPLA
MoM Xputer architecture
Microprocessor
rela
tive
perf
orm
anc
e
Memory
10 000
x1.25 / yr (Moore)
cryptocrypto
1000
even
hig
her s
peed
-up
?
cons
olid
atio
n ?
![Page 12: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/12.jpg)
12© 2006,
http://hartenstein.de
New dimensions of low power: Application migration [from supercomputer] resulting not only in massive speed-upsElectricity bills reduced by an order of magnitude and even more you may get for free…. up to millions of $ dollars per year
(also a matter of national energy policy)GoogleAmsterdam
NY
„Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack“ [Herb Riley, R. Associates]
![Page 13: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/13.jpg)
13© 2006,
http://hartenstein.de
>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
![Page 14: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/14.jpg)
14© 2006,
http://hartenstein.de
The Supercomputing Paradox
Growing listed Teraflops
Increasing number of processors running in parallel
COTS processor decreasing cost
promising technology
![Page 15: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/15.jpg)
15© 2006,
http://hartenstein.de
HPC by classic supercomputing methodology
Extreme shortage of affordable capacity
Lack of scalability: progress only by innovation
More parallelism absorbs programmer productivity
Program ready: hardware obsolete The law of More
Not for high performance embedded computing
poor results
![Page 16: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/16.jpg)
16© 2006,
http://hartenstein.de
>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
![Page 17: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/17.jpg)
17© 2006,
http://hartenstein.de
Why traditional supercomputing / HPC failed
memory-cycle-hungryinstruction-stream-based:
the wrong way, how the data are moved around
because of the wrong multi-core interconnect architecture
extr
emel
y unbal
ance d
stolen from Bob Colwell
CPU
![Page 18: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/18.jpg)
18© 2006,
http://hartenstein.de
Earth SimulatorCrossbar weight: 220 t, 3000 km of thick cable,
moving data around
inside the
![Page 19: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/19.jpg)
19© 2006,
http://hartenstein.de
discarding the wrong road map
with a paradigm shift the same performance is feasible
on a single 19” rack
![Page 20: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/20.jpg)
20© 2006,
http://hartenstein.de
Bringing together data and processor
moving the grand piano
by SoftwareMoving data to the processor:
![Page 21: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/21.jpg)
21© 2006,
http://hartenstein.de
Key issues in very High Performance Computing (vHPC)
this needs a paradigm shift
reducing memory cycles is the key
issue
away from the dominance of instruction streams
![Page 22: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/22.jpg)
22© 2006,
http://hartenstein.de
Here is the common model
data-stream-based
instruction-stream-
based
software code
accelerator reconfigurable
accelerator hardwired
configware code
CPU
it’s not von Neumannit’s not von Neumann the vN monopoly in our
curricula is severely harmful
the vN monopoly in our
curricula is severely harmful
Von Neumann:the tail is wagging the dog
we need dual paradigm education
we need dual paradigm education
very high performance & electricity bill issues
very high performance & electricity bill issues
legacy issueslegacy issues
symbioticsymbiotic
![Page 23: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/23.jpg)
23© 2006,
http://hartenstein.de
The wrong basic mind set
we need a a dual paradigm approach
this is a severe eduational challenge
our IT expert labor force lacks the rite basic mind set
![Page 24: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/24.jpg)
24© 2006,
http://hartenstein.de
For high school and undergraduate education
we need a an archtype simple common model
this is a severe eduational challenge
instead of a wide variety of sophisticated architectures
![Page 25: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/25.jpg)
25© 2006,
http://hartenstein.de
>> Outline <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
![Page 26: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/26.jpg)
26© 2006,
http://hartenstein.de
integration density
the effective integration density of plane FPGAs behind Moore’s law by more than 4 orders of magnitude
the effective integration density of rDPAs* may come close to Moore’s law
*) reconfigurable DataPath Arrays (coarse-grained reconfigurability)
![Page 27: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/27.jpg)
27© 2006,
http://hartenstein.de
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
Coarse grain is about computing, not logic
rout thru only
not usedbackbus connect
SNN filter on KressArray (mainly a pipe network)
[Ulrich Nageldinger]
reconfigurable Data Path Unit, e. g. 32 bits wide
no CPUrDPUrDPU
![Page 28: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/28.jpg)
28© 2006,
http://hartenstein.de
SW 2coarse-grained CW migration example
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
S
+
![Page 29: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/29.jpg)
29© 2006,
http://hartenstein.de
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
Compare it to software solution on CPU
S = R + (if C then A else B endif);C = 1simple conservative CPU example
memory cycles
nanoseconds
if C then read A
read instruction 1 100instruction decoding
read operand* 1 100operate & reg. transfers
if not C then read B
read instruction 1 100instruction decoding
add & store
read instruction 1 100instruction decoding
operate & reg. transfers
store result 1 100
total 5 500
S
+
Clock200S
+
S = R + (if C then A else B endif);
![Page 30: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/30.jpg)
30© 2006,
http://hartenstein.de
hypothetical branching example to illustrate software-to-configware
migration
*) if no intermediate storage in register file
C = 1simple conservative CPU example
memory cycles
nanoseconds
if C then read A
read instruction 1 100instruction decoding
read operand* 1 100operate & reg. transfers
if not C then read B
read instruction 1 100instruction decoding
add & store
read instruction 1 100instruction decoding
operate & reg. transfers
store result 1 100
total 5 500
S = R + (if C then A else B endif);
S
+
ABR C
clock200 MHz(5 nanosec)
=1
no m
emor
y cy
cles
:
no m
emor
y cy
cles
:
spee
d-up
fac
tor
= 1
00
spee
d-up
fac
tor
= 1
00
![Page 31: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/31.jpg)
31© 2006,
http://hartenstein.de
moving the locality of operation into the route of the data stream by P&R
Why the speed-up? What‘s the difference?
instead of moving data by instruction streams
![Page 32: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/32.jpg)
32© 2006,
http://hartenstein.de
Bringing together data and processor
Move the stoolby
Configware
Place the location of execution into the data pipe
![Page 33: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/33.jpg)
33© 2006,
http://hartenstein.de
Data-stream-based
instead of instruction-triggered
execution should be transport-triggered
transport should be done within compiled pipelines,
not by move engines*
*) which are instruction-stream-based !
![Page 34: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/34.jpg)
34© 2006,
http://hartenstein.de
For high school and undergraduate education
we should send CTOs and professors back to school
this is a severe eduational challenge
![Page 35: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/35.jpg)
35© 2006,
http://hartenstein.de
The wrong model
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
rout thru only
not usedbackbus connect
SNN filter on KressArray (mainly a pipe network)
[Ulrich Nageldinger]
reconfigurable Data Path Unit, e. g. 32 bits wide
no CPUrDPUrDPU
upon this schematics …… question by a Japanese Corporate vVIP
![Page 36: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/36.jpg)
36© 2006,
http://hartenstein.de
The wrong mind set ....
not knowing this solution:symptom of the hardware / software chasm
and the configware / software chasm
„but you can‘t implement decisions!“
We need Reconfigurable Computing Education
S
+
ABR C
clock200 MHz(5 nanosec)
=1
(Question by a Japanese Corporate vVIP: [RAW’99])
![Page 37: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/37.jpg)
37© 2006,
http://hartenstein.de
>> Outline <<
• Reconfigurable Computing Paradox
• The Supercomputing Paradox
• We are using the wrong model
• Coarse-grained Reconfigurable Devices
• Super Pentium for Desktop Supercomputer
http://www.uni-kl.de
![Page 38: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/38.jpg)
38© 2006,
http://hartenstein.de
Universal HPC co-architecture for:some Goals
embedded vHPC (nomadic, automotive, ...)desktop vHPC (scientific computing ...)
Application co-development environment forHardware non-experts, ....Acceptability by software-type users, ...
Meet product lifetime >> embedded syst. life:FPGA emulation logistics from
development downto maintenance and repair stationsexamples: automotive, aerospace,
industrial, ..
![Page 39: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/39.jpg)
39© 2006,
http://hartenstein.de
Architecture: A potential Pentium successorDiscard most caches
have 64* cores, 0.5 - 1 GHz
with clever interconnect for:
▪ concurrent processes and
▪ and for multithreading,
▪ Kung-Kress pipe network
The Desk-top Supercomputer!
*) CPU mode / DPU mode capability
and, for
CPU
mod
eDP
U m
ode
![Page 40: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/40.jpg)
40© 2006,
http://hartenstein.de
“Super Pentium” configuration examplerDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU
CPUCPU
CPUCPU CPUCPU
CPUCPU
twin paradigm machine
CPUCPU CPUCPU
CPUCPU CPUCPU
![Page 41: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/41.jpg)
41© 2006,
http://hartenstein.de
e. g.: ~ 8 x 8 rDPA: all feasible under 500 MHz
GamesGames MusicMusicVideosVideos
SMeXPPSMeXPP
CameraCamera
Baseband-Baseband-ProcessorProcessor
Radio-Radio-InterfaceInterface
AudioAudio--InterfaceInterface
SD/MMC CardsSD/MMC Cards
LCD DISPLAY
rDPArDPA
• Variable resolutions and refresh rates• Variable scan mode characteristics• Noise Reduction and Artifact Removal• High performance requirements• Variable file encoding formats• Variable content security formats• Variable Displays• Luminance processing• Detail enhancement• Color processing• Sharpness Enhancement• Shadow Enhancement• Differentiation • Programmable de-interlacing heuristics• Frame rate detection and conversion• Motion detection & estimation & compensation• Different standards (MPEG2/4, H.264)• A single device handles all modes
World TV & game console & multi media center
http://pactcorp.com
![Page 42: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/42.jpg)
42© 2006,
http://hartenstein.de
feasible under 500 MHz
means low electricity cost and allows very high inegration density
![Page 44: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/44.jpg)
44© 2006,
http://hartenstein.de
Dual Paradigm Application Development Support
instruction-stream-
based
software code
accelerator reconfigurable
accelerator hardwired
configware codedata-stream-based
CPU
software/configwareco-compiler
high level languageplacement & routing
in the compiler
optimizes
interconnect
bandwidth by
preferring nearest
neighbor connect
![Page 45: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/45.jpg)
45© 2006,
http://hartenstein.de
Software / Configware Co-Compilation
Juergen Becker’s CoDe-
X, 1996
CPUCPU
SWcompiler
CWcompiler
C language source
Partitioner
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU
Placement &
Routing(Move the Locality of Operation
)Resource
Parameters
supportingdifferentplatforms
![Page 46: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/46.jpg)
46© 2006,
http://hartenstein.de
Software / Configware very high level Synthesis
instruction-stream-
based
software code
accelerator reconfigurable
accelerator hardwired
configware codedata-stream-based
CPU
term-rewriting-basedvhl synthesis system
Math formula ....[Arvind, or,Mauricio Ayala]
![Page 47: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/47.jpg)
47© 2006,
http://hartenstein.de
>> Conclusions <<
•Reconfigurable Computing Paradox
•The Supercomputing Paradox
•We are using the wrong model
•Coarse-grained Reconfigurable Devices
•Super Pentium for Desktop Supercomputer
•Conclusions http://www.uni-kl.de
![Page 48: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/48.jpg)
48© 2006,
http://hartenstein.de
flexibility (for accelerators)
Objectives
avoiding specific silicon
rapid prototyping, field-patching, emulation
cheap, compact vHPC
for every area which needs:
![Page 49: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/49.jpg)
49© 2006,
http://hartenstein.de
Reconfigurable Computing opens many spectacular new horizons:
Conclusion (1)
Cheap vHPC without needing specific silicon, no mask ....
Massive reduction of the electricity bill: locally and national
Cheap embedded vHPC Cheap desktop supercomputer (a new market)
Fast and cheap prototyping
Replacing expensive hardwired accelerators
Supporting fault tolerance, self-repair and self-organization
Flexibility for systems with unstable multiple standards by dynamic reconfigurability
Emulation logistics for very long term sparepart provision and part type count reduction (automotive, aerospace …)
![Page 50: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/50.jpg)
50© 2006,
http://hartenstein.de
Universal vHPC co-architecture demonstrator
Conclusion (2)Needed:
The compilation tool problem to be solvedLanguage selection problem to be solvedEducation backlog problems to be solved
Use this to develop a very good high school and undergraduate lab course
A motivator: preparing for the top 500 contest
For widely spreading its use successfully:
select killer applications for demo
![Page 54: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/54.jpg)
54© 2006,
http://hartenstein.de
Compilation: Software vs. Configware
source program
softwarecompiler
software code
Software Engineeri
ng
Software Engineeri
ng
configware code
mapper
configwarecompiler
scheduler
flowware code
source „program“
Configware
Engineering
Configware
Engineering
placement &
routing
data
C, FORTRANMATHLAB
![Page 55: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/55.jpg)
55© 2006,
http://hartenstein.de
configware resources: variable
Nick Tredennick’s Paradigm Shifts explain the differences
2 programming sources needed
flowware algorithm: variable
Configware EngineeringConfigware Engineering
Software EngineeringSoftware Engineering
1 programming source
needed
algorithm: variable
resources: fixedsoftware
CPU
![Page 56: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/56.jpg)
56© 2006,
http://hartenstein.de
Co-Compilation
softwarecompiler
software code
Software / Configware Co-Compiler
Software / Configware Co-Compiler
configware code
mapperconfigware
compiler
scheduler
flowware code
data
C, FORTRAN, MATHLAB
automatic SW / CW partitionersimulated annealing
simulated annealing
simulated annealing
simulated annealing
![Page 57: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/57.jpg)
57© 2006,
http://hartenstein.de
Co-Compiler for Hardwired Kress/Kung Machine[e. g. Brodersen]
softwarecompiler
software code
Software / Flowware
Co-Compiler
Software / Flowware
Co-Compiler
flowwarecompiler
scheduler
flowware code
data
source
automatic SW / CW partitioner
![Page 58: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/58.jpg)
58© 2006,
http://hartenstein.de
The first archetype machine model
mainframe
CPU
compile orassemble
proceduralpersonalization
Software IndustrySoftware Industry Software Industry’sSecret of Success
simple basic .Machine Paradigm
personalization:RAM-based
instruction-stream- based mind set
“von Neumann”
![Page 59: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/59.jpg)
59© 2006,
http://hartenstein.de
The 2nd archetype machine model
compilestructural
personalization
Configware IndustryConfigware Industry
Configware Industry’sSecret of Success
personalization:RAM-based
data-stream- based mind set
“Kress-Kung”
accelerator reconfigurable
simple basic .Machine Paradigm
![Page 60: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/60.jpg)
60© 2006,
http://hartenstein.de
Co-Compiler Enabling Technology
is available from academia
only a small team needed for commercial re-implementation
on the road map to the Personal Supercomputer
![Page 61: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/61.jpg)
61© 2006,
http://hartenstein.de
DPA
xxx
xxx
xxx
|
||
x x
x
x
x
x
x x
x
- -
-
input data stream
xx
x
x
x
x
xx
x
--
-
-
-
-
-
-
-
-
-
-
xxx
xxx
xxx
|
|
|
|
|
|
|
|
|
|
|
|
|
|output data streams
„data
streams“ time
port #
time
time
port #time
port #
define: ... which data item at which time at which port
Data streams
(pipe network)
H. T. Kung paradigm(systolic array)
implemented by distributed
memory
datacounter
GAG RAM
ASM
ASM
ASM
ASM
ASM
ASM
AS
M
AS
M
AS
M
AS
M
AS
M
AS
MASM: Auto-
Sequencing Memory
50 & more on-chip ASM are feasible
50 & more on-chip ASM are feasible
![Page 62: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/62.jpg)
62© 2006,
http://hartenstein.de
The Generalization of the Systolic Array
[R. Kress]:use optimization algorithmse. g.: simulated annealing
Achievement: also non-linear and non-uniform pipes, and even more wild pipe structures possible
reconfigurability makes sense
discard algebraic synthesis methods
remedy?
only for applications with regular data dependencies
Kress-Kung paradigmsuper systolic array
![Page 63: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/63.jpg)
63© 2006,
http://hartenstein.de
(Kress-Kung machine paradigm) drastically reducing memory
cycles
Data Counter instead of Program CounterGeneralization of the DMA
ASM: Auto-Sequencing Memory
datacounter
GAG RAM
ASM
GAG & enabling technology:multiple publications 1989 … -Survey paper: [M. Herz et al.*: IEEE ICECS 2003, Dubrovnik] *) IMEC, Leuven & TU-KL
Storge Scheme optimization methodology, etc.*
![Page 64: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/64.jpg)
64© 2006,
http://hartenstein.de
fine-grained RC: 1st DeHon‘s 1st Law Technology:
reconfigurability overhead>
routing congestion
wiring overhead
overhead:
>> 10 000
1980 1990 2000 2010100
103
106
109
FPGAlogical
FPGArouted
(Gordon Moore curve)
transistors / microchip
(microprocessor)
immense area inefficiency
[1996: Ph. D, MIT]1012
density:density:
FPGAphysical
![Page 65: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/65.jpg)
65© 2006,
http://hartenstein.de
coarse-grained RC: Hartenstein‘s amendment of DeHon‘s 1st Law
rDPA
FPGArouted
>> 10 000
(Gordon Moore curve)
rDPA physical rDPA logical
area efficiency very close to Moore‘s law
[1996: ISIS, Austin, TX]
e.g.
KressArray
family
1980 1990 2000 2010100
103
106
109
transistors / microchip
1012
![Page 66: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/66.jpg)
66© 2006,
http://hartenstein.de
More compute power by Configware than Software
Conclusion: most compute power from Configware
75% of all (micro)processors are embedded 4 : 1
avarage acceleration factor >2-> rMIPS* : MIPS > 2
*) rMIPS: MIPS replaced by FPGA compute power
25% embedded µProc. accelerated by FPGA(s)
1 : 4
(a very cautious estimation**)
**) Dataquest interaction pending
-> 1 : 1-> Every 2nd µProc accelerated by FPGA(s)
(difference probably an order of magnitude)
![Page 67: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/67.jpg)
67© 2006,
http://hartenstein.de
Conclusion (3)
Self-Repair and Self-Organization methodologyEmbedded r-emulation logistics methodology
Universal vHPC co-architecture demonstrator
select a killer application for demo
For widely spreading its use successfully:
![Page 68: © 2006, reiner@hartenstein.de Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006](https://reader038.vdocuments.us/reader038/viewer/2022102907/56649d295503460f949fe185/html5/thumbnails/68.jpg)
68© 2006,
http://hartenstein.de
Dual Paradigm Application Development Support
instruction-stream-
based
software code
accelerator reconfigurable
accelerator hardwired
configware codedata-stream-based
CPU
software/configwareco-compiler
high level languageMATLAB
adapter
other example