Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 1
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
Directions of Programming Research:
Seeking a Needle in the Haystack?
Reiner Hartenstein
IEEE fellow
1
German-Brazilian Year of Science - Technology and Innovation 2010/2011,
Brasil-Alemanha 2010/11: Ano da Ciência, Tecnologia e Inovação,
Deutsch-Brasilianisches Jahr der Wissenschaft, Technologie und Innovation 2010/2011
Karlsruhe Institute of Technology
member,
The 1st Brazilian-German
Workshop on Micro and Nano
Electronics (BGME’2010), Oct 6-
8, 2010, Porto Allegre, RS, Brazil
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Abstract (Preface)
2
The energy consumption of all computers worldwide will become unaffordable.
We need to reinvent computing.
An alternative programmable technology with massively potential for speed-up and to save energy and has been developed decades ago (RC).
Progress of HPC* is stalled by the parallelism wall and the power wall
However all this is handicapped by programmer productivity problems
© 2008, [email protected] http://hartenstein.de 2010, 2010, 3
The Twin Wall Crisis:
Power & Performance
Worldwide two drastically disruptive developments:
µP industry changed strategy over to „manycore“
(away from faster clock speed)
Energy consumption of computing becoming unaffordable
The Programming Wall
The Power Wall
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
4
© 2008, [email protected] http://hartenstein.de 2010, 2010,
No more cheap oil
5
Currently: >80 $
Tendency: growing
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Oil crises: weekend
driving ban (Germany)
6
1973 1979/1980
(depencence on near east oil countries)
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 2
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Beyond Peak Oil
7
J. S. Gabrielli de Azevedo: Petrobras
e o Novo Marco Regulatório;
São Paulo, December 1, 2009
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Cheap Oil Era reached its End
Rapidly growing energy prices (IEA: factor of 3) predicted.
50% reserves are under water. Off-shore Projects re-calculated.
IEA: “>six more Saudi Arabias for the demand predicted for 2030“
80% of crude oil is coming from decline fields.
Higher Standards of living: China, India, Brazil, Mexico, newly industr. countries.
growing electricity consumption of computers: 10 more Saudi Arabias!
IEA estimates: demand will double til the year 2030
China passes the U.S. in energy use [IEA]
8
© 2008, [email protected] http://hartenstein.de 2010,
Beyond Oil: Literature
9
US: ~3 $
… post petroleum …
… hundreds of books
© 2008, [email protected] http://hartenstein.de 2010,
Beyond oil:
Literature (2)
10
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
11
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010,
Computers everywhere
12
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 3
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 13
... Ecosystem: just one example
13
© 2008, [email protected] http://hartenstein.de 2010, 2010,
... Supercomputers ...
14
© 2008, [email protected] http://hartenstein.de 2010, 2010,
more ...
15 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Business Information Systems
without Computers
16
Lufthansa Reservation
anno 1960
http://wiki.answers.com/Q/Why_are_computers_important_in_the_world
© 2008, [email protected] http://hartenstein.de 2010, 17
Banking without Computers
© 2008, [email protected] http://hartenstein.de 2010, 2010,
COMMputation
18
communication and computing infrasructures everywhere
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 4
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Innovation-driven computing
[Andy Hopper]
• Simulation and modelling are important tools which will help predict global warming and its effects.
19
• Computing will play a key part in optimizing use of resources in the physical world.
• The amount of infrastructure making up the digital world is continuing to grow rapidly and starting to consume significant energy resources.
• To help generate momentum and achieve these goals, it is important that a coordinated set of challenging international projects are investigated.
• We are experiencing a shift to the digital world in our daily lives as witnessed by the wide scale adoption of the world wide web.
Green IT:
• Smart energy meters: housing, buildings, facilities
• Carpooling and public transport by info web sites
• Road traffic and transport logistics optimization
• Reduce travelling by telecommuting.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Some grand challenge
examples for CPS
20
[Ed. Lee]
• Blackout-free electricity generation and distribution,
• Extreme-yield agriculture,
• Safe, rapid evacuation in response to natural or man-made disasters,
• Perpetual life assistants for busy, senior/disabled people,
• Location-independent access to world-class medicine,
• Near-zero automotive traffic fatalities, minimal injuries, and significantly reduced traffic congestion and delays,
• Reduce testing and integration time and costs of complex CPS systems (e.g. avionics) by 1 to 2 orders of magnitude,
• Energy-aware buildings and cities,
• Physical critical infrastructure that calls for preventive maintenance,
• Self-correcting cyber-physical systems for “one-off” applications,
• Disaster Response: Large-Scale Emergency Evacuation,
• Assistive Devices.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The World Economic Forum’s
"Global Redesign Initiative”
Organizations like UN, UNESCO; GATT, G8, G20 are increasingly inept at fixing what ails the world:
21
• economic growth • climate protection • poverty eradication • conflict avoidance • human security • global vaccine protocol • global risk management • promotion of shared values • intelligent water management • smart energy production/distribution …
“Existing global institutions require extensive rewiring to confront contemporary challenges."
Wikinomics approach for agile world-wide mass collaboration without bureaucracy.
for citizen juries, polling, digital brainstorms, policy
wikis, town hall meetings …
New paradigm to involve world citizens by global IT networks
with graphic user interfaces
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Growth of the Internet
The trends are illustrated by :
expanding wireless internet,
growing number of users.
shipping electronic books,
more cloud computing?
and many other services.
22
Internet service providers need to assess how much more bandwidth will be required.
2007 a factor of 30 predicted by the year 2030, if current trends continue
Broadband connections NA, Mex, WE by end’ 2007: 155 millions - predicted for after 2011: 228 millions.
larger e-mails,
services integr’g video and software
increasing popularity of games,
massive use of video on demand,
high-definition video and pay-TV,
services by mobile phone companies.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Power Consumption of Computers
[Albert Zomaya 2008]
Power consumption by internet: x30 til 2030 if trends continue G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008
23
at Dallas
[Randy Katz: IEEE Spectrum, Febr. 2009]
Energy cost may overtake
IT equipment cost in the near future
„Google causes 2% of the worlds
electricity consumption“
(Google denied)
at Quincey
at Boardman
2009
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Electricity Bill: a Key Issue
„The possibility of computer equipment power consumption spiraling out of control could have serious consequences for the overall affordability of computing.”
Patent for water-based data centers
Cost of a G’ data center determined by monthly power bill
[L. A. Barrosso, Google]
24
Google going to sell electricity
• Already 2005, Google’s electricity bill higher than value of its equipment.
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 5
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010, 25
How Societies Chose
to Fail or Succeed
Collapse of our computing ecosystem ?
Unaffordability of von-Neumann-centric computing could jeopardize all facets of our global economy.
Manycore: failure could jeopardize both, IT industry & most sections of the economy depending on rapid improvement of IT. [Dave Patterson]
Several recent outages of cloud computing services.
Stuxnet worm: only propaganda trick?
© 2008, [email protected] http://hartenstein.de 2010, 2010,
without Cyber Infrastructure ?
26
?
copyrighted!
? ?
homo computensis
homo Neanderthalensis?
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
27
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Trouble with Manycore
The growing core counts are racing ahead of programming paradigms and programmer productivity
28
- a challenge to CS education
to major extent also in mass markets
going to FPGA: for programmers a paradigm shift
in supercomputing
Chipmakers busy designing microprocessors that most programmers can’t program [David Patterson, IEEE Spectrum, July 2010]
doing so without any clear notion of how such devices would in general be programmed
They hope, someone will be able to figure out how to do that
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Can we get it right this time?
The “parallel programming problem”: addressed for at least 25 years, in HPC.
29
Only a small number of specialized developers write parallel code.
Multicore becoming ubiquitous: some hope that “if you build it, they will come”
[T. Mattson, M. Wrinn: Parallel Programming:
Can we PLEASE get it right this time? DAC 2008, Anaheim, CA, June 8-13, 2008],
A massive worldwide effort is required, taking many years, creating masses of jobs
We need to reinvent programmer education
We need to reinvent computing
„The proud era
of von Neumann architecture passes into history.“
„Foundational change will disrupt traditional habits throughout the discipline ....“
Michael Wrinn, (keynote at SIGCSE2010): Suddenly, All Computing Is Parallel: Seizing Opportunity Amid the Clamor http://www.sigcse.org/sigcse2010/attendees/keynotes.php
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Multicore is not new
•ACRI •Alliant •American Supercomputer •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent
•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines
•Kendall Square Research •Key Computer Laboratories
Dead (Super)Computer Society [Gordon Bell, keynote, ISCA 2000]
•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
the single core sequential mind set was the winner
only 2 or 3 successes
most in 1985-1995
- mainly research
30
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 6
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Amid the Clamor ?
31
bring parallel
computing into
mainstream of
undergraduate
education
[Michael Wrinn]
current discussions: despairingly seeking a needle in a haystack.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
We need a new Textbook
32
having an impact like Mead & Conway
"The book that changed everything“; Electronic Design News, Feb. 11, 2009
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
33
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Tail wagging the Dog
34
„Central“: it controls
(almost) everything
However,
it needs
accelerators
accelerators CPU
CPU
„Central Processing
Unit“
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Twin paradigm systems
35
CPU
„Central Processing
Unit“
„Central“: it controls
(almost) everything
However,
it needs
accelerators
CPU
hardwired
accelerators
reconfigurable
accelerators second paradigm
accelerators
ASIC: 3%
FPGA: 97%
[Dataquest, 2009]
design start ratio
© 2008, [email protected] http://hartenstein.de 2010, 2010, 36
A Clean Terminology, please
program source result
Software instruction streams
Flowware data streams
Configware datapath structures configured
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 7
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010, 37
FFT
100
Reed-Solomon Decoding 2400
Viterbi Decoding 400
1000
MAC
DSP and wireless
molecular dynamics simulation
88
BLAST 52
protein identification
40
Smith-Waterman pattern matching
288
Bioinformatics
GRAPE
20 Astrophysics
SPIHT wavelet-based image compression 457
real-time face detection
6000
video-rate stereo vision
900
pattern recognition 730
Image processing, Pattern matching, Multimedia
3000 CT imaging crypto
1000
28500
DES breaking
1
1000
1,000,000
Spe
edup
-Fac
tor
Speed-up
factors
obtained
by Software
to Configware
migration
8723 DNA seq.
100
10
10,000
100,000
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Energy saving factors: ~10% of speedup
38
FFT
100
Reed-Solomon Decoding 2400
Viterbi Decoding 400
1000
MAC
DSP and wireless
molecular dynamics simulation
88
BLAST 52
protein identification
40
Smith-Waterman pattern matching 288
Bioinformatics
GRAPE
20 Astrophysics
crypto 1000
28500 DES breaking
1
1000
1,000,000
Spe
edup
-Fac
tor
Power save
factors
obtained
(FPGAs) SPIHT wavelet-based
image compression 457
real-time face detection
6000
video-rate stereo vision
900
pattern recognition 730
Image processing, Pattern matching, Multimedia
3000 CT imaging
8723 DNA seq.
100,000
10,000
100
10
© 2008, [email protected] http://hartenstein.de 2010, 2010,
x86 Clock Frequency
39
1995 2000 2005100
1
10GHz
GHz
MHz
Pentium IV
Pentium III
Celeron
Celeron to Pentium IV: x20
growing clock speed
growing power consumption
(migration papers: power save not reported before 2005)
Pentium I (1989) to Pentium IV: x60
1995 – 2005: speed-ps obsolete?
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Hitting 28nm, and beyond
Both de facto FPGA giants (Xilinx and Altera) are hitting 28nm at end of 2010.
40
FPGAs now capable of implementing entire SoCs.
‘ve turned into a complex heterogeneous mix of coarse-grain elements and classical fine grained LUTs.
2009: Intel ships 32nm,
2010: foundries to ship 28nm
Intel will ship 22 nm in 2011,
16 nm in 2013
Xilinx partner TSMC, the world’s largest standalone
Fab almost the de facto Fab for all FPGAs in the world.
Also Altera is well known for its long partnership
with TSMC since early 90s.
© 2008, [email protected] http://hartenstein.de 2010, 2010, 41
[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]
Application . Speed-up factor
Savings Power Cost Size
DNA and Protein sequencing 8723 779 22 253
DES breaking 28514 3439 96 1116
much less equipment
needed
massively saving energy
RC*: Demonstrating the intensive Impact
SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster
Tarek El-Ghazawi
*) RC = Reconfigurable Computing © 2008, [email protected] http://hartenstein.de 2010,
Drastically less Equipment needed
For instance: a hangar
full of racks replaced by a single rack
without air
conditioning
42
or ½ rack
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 8
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010, 43
END
SGI® RASC™ Module (Version1)
Xilinx Virtex II-6000 FPGA
16MB QDR SRAM
Rack-mountable
Dual NUMAlink™ 4 ports
Seamless direct attach to server's
shared memory fabric
Datasheet (PDF 145K)
SGI® RASC™ RC100 Blade
Dual Virtex 4 LX200 FPGAs
80MB QDR SRAM or 20GB DDR2
SDRAM
Blade or rack-mountable form
factor
Dual NUMAlink™ 4 ports
Seamless direct attach to server's
shared memory fabric
Datasheet (PDF 137K)
Hetero HPC
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Cray-XD1 Architecture
The Cray-XD1 allows the Opteron µP to access the FPGA internal registers, internal and external memory.
44
provides several transfer modes between µP and the FPGA (depending on its initiator).
The µP can read from / write to the FPGA local memory space (i.e. internal registers, internal BRAMS, and external memory).
The FPGA can read from / write to the µP local memory space.
However, the use of HLL can disable some of these features.
The most bandwidth-efficient transfer mode:
write-only mode (producer initiates the transfer):
burst (for large amount of data) or non-burst.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The silver bullet
Reconfigurable Computing is really the silver bullet for massively saving energy
45
We have to develop a good rescue strategy
scene „Green Computing“ Reinvent Computing
predicted energy saving
factor of about 3 orders of magnitude
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Bizarre FPGA Synthesis Market
Paradox of Pursuit
Synplify Saves Synthesis - Again
by Kevin Morris
46
Start analyzing the perplexing paradox of the FPGA synthesis market
and each link of the chain reveals a bizarre force vector
that eventually doubles back onto itself into an unlikely equilibrium
that miraculously has held stable for a full decade
despite disruptive forces of epic proportions.
Rube Goldberg couldn’t have designed
a more elegant confluence
of convoluted causal relationships.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
47
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010,
The History of Computing
(1)
48
The 1st electrical computer, ready
prototyped for mass production ?
Guess: which year, which company ?
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 9
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 49
The History of Computing
(2) Prototype 1884: Herman Hollerith
datastream-based
the first reconfigurable computer
DPU
The first Xilinx FPGA came 100 years later
size: like about 3 refrigerators
1989 used for US population census
© 2008, [email protected] http://hartenstein.de 2010, 50
Early LUT
60 years later: RAM available – e. g. ferrite core
non-volatile configuration memory
field-programmable:
manually
swapping plug boards
(motivation for von Neumann machine paradigm)
© 2008, [email protected] http://hartenstein.de 2010,
80 years later
51
much larger than 3 refrigerators
just for a few ballistic tables:
the „von Neumann“ paradigm
von Neumann syndrome
© 2008, [email protected] http://hartenstein.de 2010, 52
the tremendous inefficiency of
computers causes immense
electricity consumption
the tremendous inefficiency of
computers causes immense
electricity consumption
52
because
of The von Neumann
Syndrome
© 2008, [email protected] http://hartenstein.de 2010, 2010,
All but ALU is overhead: x20 efficiency
53
(data cashe)
x20
inefficiency:
just one
of several
overhead
layers
[R. Hameed et al.: Understanding Sources of Inefficiency in General-Purpose Chips; 37th ISCA, June 19-23, 2010, St. Malo, France]
© 2008, [email protected] http://hartenstein.de 2010,
massive overhead phenomena
proportionate to the number of processors
overproportionate to the number of processors
54
overhead von Neumann
machine
instruction fetch instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU + other overh. instruction stream
i / o to / from off-chip RAM instruction stream
Inter PU communication instruction stream
message passing overhead instruction stream
transactional memory overh. instruction stream
multithreading overhead etc. instruction stream
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 10
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010,
55
von Neumann overhead vs.
Reconfigurable Computing
overhead von Neumann
machine datastream machine
instruction fetch instruction stream none*
state address computation instruction stream none*
data address computation instruction stream none*
data meet PU + other overh. instruction stream none*
i / o to / from off-chip RAM instruction stream none*
Inter PU communication instruction stream none*
message passing overhead instruction stream none*
transactional memory overh. instruction stream none*
multithreading overhead etc. instruction stream none*
55 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Critique of the von Neumann Model
Brad Cox 1990:
Planning the Software
Industrial Revolution
Dijkstra 1968: The Goto considered harmful
R. Hartenstein, G. Koch 1975: The universal Bus considered harmful
Backus 1978: Can programming be liberated from the von Neumann style
Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style L. Savain 2006: Why Software is bad
Critique of von Neumann is not new:
Peter G. Neumann 1985-2003: 216x “Inside Risks“ 18 years inside back cover of Comm_ACM
Peter G. Neumann
56
overhead piles up to code sizes
of astronomic dimensions
“von Neumann
Syndrome”: C.V. Ramamoorthy; UC Berkeley
Nathan’s Law: Software is a gas.
It expands to fill all its containers ...
Nathan Myhrvold
Wirth‘s Law
“software is slowing faster
than hardware is accelerating“
© 2008, [email protected] http://hartenstein.de 2010,
The transition from machine level to higher level languages led to the biggest productivity gain ever made
It‘s alarming that today‘s megabytes of code are compiled from languages at low abstraction levels (C, C++,Java)
The wrong Direction: by Herd Instinct ?
[Fred Brooks]
57
Java is a religion – not a language [Yale Patt]
Bud Lawson‘s Dilbert
© 2008, [email protected] http://hartenstein.de 2010,
Burroughs B5000/5500: language-friendly stack machine
IBM 260/370 & intel x86 highly complex instruction set
MULTICS (GE, Honeywell): well manageable (impl. in PL/1)
UNIX: complexity problems, compatibility problems
Pascal killed by C, coming as an infection, along with UNIX
unnecessary complexity
inside
Widening the Semantic Gap
[Harold „Bud“ Lawson]
„portable assembler language“
© 2008, [email protected] http://hartenstein.de 2010, 2010, 59
Scientific Revolutions
1st Newtons Law (inertia): „people do not change direction“
scientific scenes follow the herd instinct
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Apropos Herd Instinct
60
Some Programming Languages
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 11
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
40 years Software Crisis
F. L. Bauer 1968, coined „Software Crisis“ - N. N. 1995: THE STANDISH GROUP REPORT Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005
Anthony Berglas 2008: Why it is Important that Software Projects Fail
Oct 1957 The Economist: Nov 19th 1955
In 1955, Parkinson could not have foreseen the impact of software.
The size of bureaucracy is independent of the amount of real work to be done.
61 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
62
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010, 63
term controlled by execution
triggered by paradigm
CPU program counter
(at ALU)
instruction fetch
instruction stream
DPU**
rDPU**
data counter(s) (at memory)
data arrival* data-stream-based
*) “transport-triggered” **) does not have a program counter
- no instruction fetch
single paradigm (from the
mainframe age) is obsolete
© 2008, [email protected] http://hartenstein.de 2010,
64
term controlled by execution
triggered by paradigm
CPU program counter
(at ALU)
instruction fetch
instruction stream
DPU**
rDPU**
data counter(s) (at memory)
data arrival* data-stream
+ New Machine Model for FPGAs
*) “transport-triggered” **) does not have a program counter
- no instruction fetch
twin paradigm
twin paradigm
© 2008, [email protected] http://hartenstein.de 2010,
CPU-centric flat world model
Aristotelian model
This Software-only world model
is obsolete CPU
not visible from SE
(CS: introduced in the 40ies)
65
CPU-centric sequential-only
mind set
CPU-centric sequential-only
mind set
1,000,000 100,000 10,000
1000 100 10
but no hardware know-how
but no hardware know-how
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Machine Model Dichotomy
(1) von Neumann versus data stream machine
66
PE
Program Engineering
*) do not confuse with „dataflow“!
Flowware
Engineering
FE
auto-sequencing Memory
asM
SE
Software
Engineering
CPU
1st
Step:
The Generalization of
Software Engineering
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 12
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Machine Model Dichotomy
(2) von Neumann versus data stream machine
67 PE
Program Engineering
Flowware
Engineering
FE
auto-sequencing Memory
asM
SE
Software
Engineering
CPU
CE
Configware Engineering structures
pipe network
model etc. DPU Data-Path- Unit
Data-Path- Array DPA
The Generalization of
Software Engineering —
2nd
Step:
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Machine Model Dichotomy
(3) von Neumann versus data stream machine
68 PE
Program Engineering
Flowware
Engineering
FE
auto-sequencing Memory
asM
SE
Software
Engineering
CPU
CE
Configware Engineering structures
pipe network
model etc. DPU Data-Path- Unit
Data-Path- Array DPA
time to time time to space
mapping issue
The Generalization of
Software Engineering —
2nd
Step:
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Programming Model: Flowware
Adder
Speaker
FMDemod
LPF1
Split
Gather
LPF2 LPF3
HPF1 HPF2 HPF3
Source:
MIT
StreamIT
• Pros for streaming – Streamlined, low-overhead
communication – (More) deterministic behaviour – Good match for many simple media
rich applications
[Pierre Paulin]
We‘ve to find out, which applications types and programming models Students should exercise for the flowware approach
• Cons – control-dominated applications – shunt yard problem
© 2008, [email protected] http://hartenstein.de 2010, 2010, 70
A Clean Terminology, please
program source result
Software instruction streams
Flowware data streams
Configware datapath structures configured
© 2008, [email protected] http://hartenstein.de 2010, 2010, 71
Programming Language Paradigms
(1)
language category Computer Languages Languages f. Anti Machine
both deterministic procedural sequencing: traceable, checkpointable
operation sequence driven by:
read next instruction, goto (instr. addr.),
jump (to instr. addr.), instr. loop, loop nesting
no parallel loops, escapes, instruction stream branching
read next data item, goto (data addr.),
jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching
state register program counter data counter(s)
address computation
massive memory cycle overhead overhead avoided
Instruction fetch memory cycle overhead overhead avoided
parallel memory bank access interleaving only no restrictions
language features control flow + data manipulation
data streams only (no data manipulation)
Flowware Languages Software Languages
imperative language twins
© 2008, [email protected] http://hartenstein.de 2010, 2010, 72
Programming Language Paradigms
(2)
Computer Languages Languages f. Anti Machine
procedural sequencing: traceable, checkpointable
read next instruction, goto (instr. addr.),
jump (to instr. addr.), instr. loop, loop nesting
no parallel loops, escapes, instruction stream branching
read next data item, goto (data addr.),
jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching
program counter data counter(s)
massive memory cycle overhead overhead avoided
memory cycle overhead overhead avoided
interleaving only no restrictions
control flow + data manipulation
data streams only (no data manipulation)
Flowware Languages Software Languages
imperative language twins
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 13
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Procedural Languages Twins
systolic Flowware Languages
read next data item
goto (data address)
jump to (data address)
data loop
data loop nesting
data loop escape
data stream branching
yes: internally parallel loops
73
imperative Software Languages
read next instruction
goto (instruction address)
jump to (instruction address)
instruction loop
instruction loop nesting
instruction loop escape
instruction stream branching
no: no internally parallel loops
But there is the Asymmetry
program counter data counter(s)
for data parallelism
super
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
74
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The FPGA Programming Crisis
75 © 2008, [email protected] http://hartenstein.de 2010, 2010,
We need a good Textbook
N. Conner et al.: FPGAs
for Dummies; Wiley, 2008
76
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Acceleration Mechanisms
•parallelism by multi bank memory architecture •auxiliary hardware for address calculation •address calculation before run time
•avoiding multiple accesses to the same data. •avoiding memory cycles for address computation •optimization by storage scheme transformations •optimization by memory architecture transformations
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The language and tool disaster
Software people do not speak VHDL
Hardware people do not speak MPI nor OpenMP
Bad quality of the application development tools
Poll at FCCM’98: 86% designers hate their tools
progress stalled by qualification problems in industry and academia
Not only in embedded systems: comprehensibility barrier between procedural and structural mind set
Software people urgently need locality awareness
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 14
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
New boundary constraints are the limiting factor
79 27 October 2008 Software 2008, Zurich
Legacy scientific applications: predominantly sequential
The entire software ecosystem will need to evolve (including curricula): O/S, libraries, software development environments, compilers and languages
additional levels of parallelism: chaining, pipelining, systolic, super-systolic, wavefront arrays
additional data structures and storage organization: the new distributed memory discipline
New boundary constraints
© 2008, [email protected] http://hartenstein.de 2010, 2010,
HLL programming models
80
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Taxonomy of Twin Paradigm
Programming Flows (HPRC)
81
E. El-Araby et al.: Comparative Analysis of High Level Programming for Reconfigurable Computers: Methodology And Empirical Study; Proc. SPL2007 Symp., Mar del Plata, Argentina, Febr. 2007
© 2008, [email protected] http://hartenstein.de 2010, 82
Dual paradigm mind set: an old hat
Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer
Software mind set:
instruction-stream-based:
flow chart ->
control instructions
(mapping from procedural to structural domain)
C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972
W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc. 1967:
1972:
FF
token bit
evoke
FF FF
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Multicore Programming Requirements
Efficient distribution of tasks
83
Layers of abstraction hide critical sources of and limits to efficient parallel execution
Being memory limited
Internode communications (data assembly & dispatch) reduces computational efficiency: speedup/nodes
Result: scaled up cost, power, cooling and reliability concerns
© 2008, [email protected] http://hartenstein.de 2010, 2010,
how Programmers think
“Parallel programming: informal approaches are not working” [Mattson]
84
“We must adopt a systematic approach by insight into how programmers think” [Mattson] We must adopt a systematic approach by changing how programmers think [R. .H]
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 15
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Newer Developments in
Semiconductor Technology
Limits by increasing power density
85
significant problems in performance, power consumption and reliability: great challenges for Reconfigurable Computing.
the golden
CMOS era
is gone
Technology scaling does not deliver significant performance speedup
transistors less reliable: additional sources of errors*
defective at manufacture time
degrade and fail over the expected lifetime
process variations
increasing number of soft errors
December 28, 2009
Fault-Tolerance Techniques needed
(EM, HCD, TDDB)
*) J. M. P. Cardoso, M. Hübner (editors): Reconfigurable Computing, 2011, Springer *) S. Borkar: Designing reliable systems from unreliable components; 2005. © 2008, [email protected] http://hartenstein.de 2010, 2010,
Fault Tolerance
86
CPU
hardwired
accelerators
reconfigurable
accelerators
Fault Tolerance Implementation
accelerators
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Neurocomputing
87
The Memristor
hp:
discovered
2008, prod.
announced
for 2013
direct synapse emulation replacing massively inefficient digital simulation
less transistors for logic circuits
resistor w. memory: doping moved by electric field
third paradigm:
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Mem(r)istor History
1963: Memistor Corp. founded by Prof. Bernie Widrow, Stanford U.
Foto: Storz 1975
1960: missing device postulated by Prof. Karl Steinbuch
2007: Stan Williams, hp, finds Memristor
[Picture: Leon Chua]
1971: Leon Chua, UCB, specifies Memristor
88
2013 ? agreement
Hewlett Packard / Hynix Semiconductor
© 2008, [email protected] http://hartenstein.de 2010,
normal Hype Curve
89
[Olivier Temam: The Rebirth of Neural Networks; 37th ISCA, June 19-23, 2010, Saint-Malo, France]
[Olivier Temam, 2010]
© 2008, [email protected] http://hartenstein.de 2010,
Neural Network Hype Curve
90
[Olivier Temam: The Rebirth of Neural Networks; 37th ISCA, June 19-23, 2010, Saint-Malo, France]
[Olivier Temam, 2010]
(Olivier Temam never mentions Karl Steinbuch)
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 16
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010,
no
stopped Funding for 15 Years
91
[Olivier Temam,
2010]
Marvin Minsky,
Seymour Papert:
Perceptrons; 1969.
(world-wide)
SVM: Support vector machines: set of supervised learning methods to analyze data and recognize patterns
© 2008, [email protected] http://hartenstein.de 2010,
no
Marvyn Minski‘s blind alarm
92
20 y
ears
earl
ier
!
W. Hilberg: Karl Steinbuch, ein zu Unrecht vergessener Pionier der Künstlichen Neuronalen Systeme; FREQUENZ 1995, 49#(1-2):28-35.
[Olivier Temam,
2010]
1962: Karl Steinbuch
(was ignored by Marvyn
Minski book)
Steinbuchs Lernmatrix
1960
© 2008, [email protected] http://hartenstein.de 2010, 93
[Olivier Temam: The Rebirth of Neural Networks; 37th ISCA, June 19-23, 2010, Saint-Malo, France]
[Olivier Temam, 2010]
[Olivier Temam,
2010]
[Olivier Temam, 2010]
What ANNs can do
© 2008, [email protected] http://hartenstein.de 2010,
Defects-Tolerant Accelerators ?
94
[Olivier Temam, 2010]
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Triple paradigm systems?
95
accelerators CPU
hardwired
accelerators
reconfigurable
accelerators
(self-reconfigurable) neurocomputing
Self-organizing Fault Tolerance
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Development with VHDL is expensive
96
Development with VHDL is Expensive!
FPGAs Achilles’ Heel is in their long development time
–Relatively low level HDLs (VHDL/Verilog) are still dominant
–A large part of FPGA solution development is spent on learning specific FPGA
board APIs and debugging in hardware (70% in our experiments!)
–Unlike software, FPGAs do not currently offer forward/backward compatibility,
not even within the same family!
–FPGAs have a relatively low technology maturity and small user base
compared to software”Courtesy of Dr Khaled Benkrid, University of Edinburgh
[1] Grant Martin, Gary Smith. “High-Level Synthesis: Past, Present, and Future,”
IEEE Design and Test of Computers, July/August 2009.
In 2009, Berkeley Design Technology Inc. (BDTI), an
independent benchmarking and analysis firm, launched
the BDTI High-Level Synthesis Tool Certification Program™
to evaluate high-level synthesis tools for FPGAs.
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 17
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Role of Reconfigurable Computing
• Reconfigurable Computing
• Using the power of FPGAs they hope to solve the multi-core crisis
• Or, in this case, confusing computing with processor cores
• For many years FPGAs were just prototyping vehicles for ASICs
• Now they are replacing many ASICS & ASSPs
• Watch for the same Trojan effect with FPGAs in HPC
• Reconfigurable computing is a key part of the solution for concurrent programming
97 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Architectural Impact [Patrick Lysaght]
• Architectural impact
• –Only very high volume architectures transition to leading processes
• –Programmability and concurrency are the new architectural imperatives
• –MPSoCsevolve into heterogeneous, multi-core architectures
• –Power dissipation is a dominant concern
• –Design productivity lags silicon progress
98
© 2008, [email protected] http://hartenstein.de 2010, 2010,
EPP
• Xilinx EPP Solution
• Processor-centric approach
• Software-centric approach
• ARM®processing engine
99 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Outline
100
• The coming Shortage of Energy
• Energy Consumption of Computing
• The Programmability Crisis
• Rescue by Reconfigurable Computing ?
• The Reconfigurability Paradox
• We need to Reinvent Computing
• Reinventing Programmer Education
• Conclusions
© 2008, [email protected] http://hartenstein.de 2010, 2010, 101
We urgently need to reinvent computing
Conclusions
We should begin as early as we can still afford retrofitting.
But this will require a major effort for many years.
This will create many, many new jobs.
We need „une' Levée en Masses“
© 2008, [email protected] http://hartenstein.de 2010, 2010, 102
Conclusions
The migration of the huge supply of legacy software creates masses of jobs for decades ….
To avoid future unaffordability of our cyber infrastructure we need a massive software to
configware migration
…. and saves much more energy than most proposals from the climate protection scene
… impossible without reinventing programmer education
RC is the silver bullet
We have to hurry up to activate the public and the media, currently fully ignoring this wordwide vital issue
We must hurry up to start the required time-consuming massive campaign as long as we still can afford it
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 18
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Programming Datastream
• Accelerate tasks by streaming
• MISD structured computation: streaming computations across a long array before storing results in memory.
103
• Can achieve 100x in improved use of memory.
© 2008, [email protected] http://hartenstein.de 2010, 2010, 104
Reinvent? (final remark)
avoid traditional tunnel views
to obtain new perspectives
rediscovery and revival of old ideas
rearrange and teach them properly
to reach promising new horizons
© 2008, [email protected] http://hartenstein.de 2010, 2010, 105
http://hartenstein.de
Obrigado!
http://hartenstein.de/reinvent-m.pdf
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Debunking the GPU Myth
R. Vaduc et al.: On the Limits of GPU Acceleration; USENIX Workshop HotPar’2010, June 14 - 15, 2010, Berkeley, CA, USA
R. Bordawekar, U. Bundhugula, R. Rao: Believe it or Not! Multicore CPUs can Match GPUs for FLOP-intensive Applications! IBM Research Report, April 23, 2010, Yorktown Heights, NY, USA
V. Natoli: Kudos for CUDA, HPCwire, July 06, 2010, http://www.hpcwire.com/features/Kudos-for-CUDA-97889444.html
106
code easier to maintain,
the maturity of its compilers,
elegance or simplicity [Natoli]
CUDA: a programming language
CPUs and GPUs much closer in
performance (2.5X) than the
reported orders of magnitude
V. W. Lee et al.: Debunking the 100X GPU vs. CPU myth; 37th ISCA, June 19-23, 2010, Saint-Malo, France
© 2008, [email protected] http://hartenstein.de 2010, 2010, 107
END
© 2008, [email protected] http://hartenstein.de 2010, 2010, 108
Locality awareness is
essential for flowware
How data are moved Software: by addresses, read from instruction
Flowware: by wire (configured before run time)
relation to configware calls locality awareness
here locality is less relevant
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 19
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
traditional qualification in the time domain
109
Education Revolution
+ lean qualification in the space domain
= lean hardware modeling qualification
at a higher level of abstraction
by twin paradigm co-education:
© 2008, [email protected] http://hartenstein.de 2010, 2010,
New Programmer Education
110
New mix of skills needed, currently not available
essential: awareness of locality,
focusing on memory mapping issues and transfer
modes to detect overhead and
bottlenecks
understanding streams through complex fabrics
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Two classes of solutions
111
Migration of a particular algorithm to RC
Understanding a complex modern hetero system
to detect overhead and bottlenecks
© 2008, [email protected] http://hartenstein.de 2010, 2010,
understanding architecture
112
NoC
memory
memory
memory
ASIC
ASIC
ASIC
ASIP
ASIP
ASIP
FPGA
FPGA
FPGA
µ
P
µ
P
µ
P
I/O
I/O
I/O
memory,
streams
off-chip
th
e m
em
ory
wa
ll
several transfer modes
reconfigurable accelerators
hardwired accelerators
many-core
3%
ASIC
97% FPG
A [Dataquest March 25, 2009]
© 2008, [email protected] http://hartenstein.de 2010, 2010,
New Book on NoC
Jih-Sheng Shen, Pao-Ann Hsiung (editors): Dynamic Reconfigurable Network-on-Chip Designs: Innovations for Computational Processing and Communication; Information Science Reference, Hershey, USA, April 1, 2010
113 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Visible architecture
The programming model:
hardware view presented to the programmer:
Which hardware architecture parts are visible
and under the programmer’s direct control.
114
RC programming model:
whether (and how) the programmer can control
- data transfers between FPGA and onboard memory,
- FPGA and microprocessor memory, as well as
- FPGA and microprocessor.
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 20
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010,
the program counter
is the problem
115
the program counter indicates the problem
using data counters is much more efficient
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Rewriting needed anyway
• Rewriting of software needed anyway: for the survival ot the µP industry (to cope with the transision to manycore)
116
• Extended scope of Software Rewriting: to save energy by orders of magnitude
• different from „classical“ green computing
stro
ng s
yne
rgy
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Green computing vs. Reinvent Computing
117
scene „classical“ Green Computing (GC) Reinvent Computing (RC)
predicted energy saving
factor of about 3 orders of magnitude
status already on track
reinvent programmer education needed for 2 reasons –> also for µP
industry survival
to do funding continued
years of massive world-wide action required
support by media needed
© 2008, [email protected] http://hartenstein.de 2010, 2010, 118
The Anti Machine
Generalization
of the DMA
Uses data counters
instead of a Program Counter
[M. Herz et al.: IEEE ICECS 2003, Dubrovnik]
Does not need
memory-cycle-hungry
instruction streams data
counter
GAG RAM
ASM: Auto-Sequencing
Memory ASM
data
stream
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Bio of Reiner Hartenstein
119
http://hartenstein.de/Hartenstein-bio.pdf
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Absence of the Need to Think
• Too much effort?
• “The parallel approach to computing does require that some original thinking be done about numerical analysis and data management in order to secure efficient use.
• In an environment which has represented the absence of the need to think as the highest virtue this is a decided disadvantage.” -Daniel Slotnick, 1967
120
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 21
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Need a new world model
reconfigurability is the silver bullet to obtain
massively better energy efficiency as well as
much better performance by HPRC
the upcoming heterogeneous methodology .
121
The impact is a fascinating challenge to reach
new horizons of research in computer science.
We need a new generation of talented innovative scientists and engineers
to start the beginning second history of computing.
This chapter discusses its new world model.
Because of the multicore parallelism dilemma, we anyway need to reinvent programmer education
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Need a new world model
reconfigurability is the silver bullet to obtain
massively better energy efficiency as well as
much better performance by HPRC
the upcoming heterogeneous methodology .
122
The impact is a fascinating challenge to reach
new horizons of research in computer science.
We need a new generation of talented innovative scientists and engineers
to start the beginning second history of computing.
This chapter discusses its new world model.
The need for a massive campaign for migration of software over to configware. Because of the multicore parallelism dilemma, we anyway need to reinvent programmer education
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Platform Collision
Industry faces 'platform collision'
Which platform technology will win in the long run? And will it be the ASIC, ASSP, FPGA, MCU or IP core? And which company will be left standing?
"It's not clear, and all may coexist“ [Brad Howe, VP IC, Altera]
123
Far Future is Cloudy!
Battles will get further interesting if/when the parallel programming crisis is over
NoC research: world-wide >60 projects
© 2008, [email protected] http://hartenstein.de 2010, 2010,
versatility and heterogeneity
The semiconductor industry in all its history has not seen anything that can
match a microprocessor or FPGA in terms of versatility and heterogeneity of potentials.
Not long ago in the beginning of the last decade the reconfigurable computing research community
fell in a serious crush on coarse-grain reconfigurable hardware and FPGAs.
Computation in time vs computation in space was a major focus.
124
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Von Neumann coarse-grained
Reconfigurable Architectures ?
Von Neumann once again came back as a ―hero― to the community telling us that he as a team in form of multi/many cores can compete with FPGAs and exploit features of non Von Neumann coarse-grained reconfigurable architectures. That opened a new portal of research and products for academics and industry including progress of Network on Chips (NoCs).
125 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Market failure reasons
The failure reasons are more commercial than technical.
The market dominance of well established players has kept the
competition stakes quite high for new entrants,
companies with low differentiations badly failed;
in comparison innovative startups with strong differentiations succeeded to
either find a niche from market
or got acquired by a bigger company which bought them to strengthen its products portfolio or existing technology.
126
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 22
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
RTL vs Software
programming battle
At this point we can also see the most challenging battle between FPGAs vs MPSoC like platforms at present.
It is RTL vs Software programming.
with programming of multicore and the efforts on the way to address them with open solutions like OpenMP (www.openmp.org) and several tools from Intel to help programmers exploit its multicore processors is discussed
127 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Processor inside FPGA vs
FPGA inside Processor: EPP
128
The concept totally changed for these new devices
This makes the device more like heterogeneous SoCs as discussed in last section (fig. 4). This allows the devices to have significant benefits for high-performance applications:
Automotive Driver Assistance,
Intelligent Video Surveillance,
Wireless Communications, and
Industrial etc
FPGAs became software-centric: EDUCATION !!!!
are software centric: not hardware centric
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Power consumption
power consumption is becoming a severe problem for future integrated circuits, and therefore power efficient solutions will be very important.
129
Consequently, the challenge for reconfigurable computing is to show
that customization and massive parallelism of reconfigurable hardware
can overcome its power
consumption overhead over ASICs
providing power-efficient solutions.
Reconfigurable computing: opportunity to provide such solutions for future systems.
© 2008, [email protected] http://hartenstein.de 2010,
####
130
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Language-of the Year Phenomenon
[R. Newton]
[courtesy Richard Newton]
131
KARL
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Some special Languages
132
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 23
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Some Programming Languages
133 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Some Parallel Languages
134
© 2008, [email protected] http://hartenstein.de 2010,
ANN 135 © 2008, [email protected] http://hartenstein.de 2010, 2010,
understanding architecture
136
NoC
memory
memory
memory
ASIC
ASIC
ASIC
ASIP
ASIP
ASIP
FPGA
FPGA
FPGA
µ
P
µ
P
µ
P
I/O
I/O
I/O
AN
N
AN
N
memory,
streams
off-chip
th
e m
em
ory
wa
ll
several transfer modes
reconfigurable accelerators
hardwired accelerators
self-reconfigurable accelerators
many-core
© 2008, [email protected] http://hartenstein.de 2010, 2010, 137
threshold logic:
Neuron Model
x1 + x2 + x3 ≥ 1
x1 + x2 + x3 ≥ 3
© 2008, [email protected] http://hartenstein.de 2010, 2010, 138
Memristor
technology detected at hp 2008
TiO2 semicondictor: hi resitance
conductive by doping
resistance manipulated by moving
the doping via electrical field
“predicted” by UCB 1971
resistor with
memory
Postulated: KIT 1960
Widrow’s Memistor 1963-65
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 24
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 139
FPNA
logic function
depends on
resistor
dimensioning
Field-Programmable Neuron
Array
Memristor LUT
© 2008, [email protected] http://hartenstein.de 2010, 2010, 140
Teachable Neuron
from Boolean algebra
generalization of the LUT
to Steinbuch algebra
from Reconfigurable Computing
to Reconfigurable Neuro Computing
© 2008, [email protected] http://hartenstein.de 2010,
sonstiges
141 © 2008, [email protected] http://hartenstein.de 2010, 2010,
FPGA to ASIC design start ratio
142
3%
ASIC
97% FPGA
[Dataquest March 25, 2009]
© 2008, [email protected] http://hartenstein.de 2010, 143
Speed-up by MoM-1 compared to 68020
PISA project
© 2008, [email protected] http://hartenstein.de 2010, 2010,
No more cheap oil
144
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 25
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Fat vs Slim processor cores
It might be possible that in few years a standard becomes for it which can benefit emerging technologies like MPPAs which use a smaller RISC machines compared to high end processors to exploit more parallelism (Thin vs Thick or Fat vs Slim processor cores).
Currently the major focus of industry is to get tools for the high-end Multicore market processors like Intel/AMD, ARM etc.
These companies are slowly and carefully increasing their cores keeping consideration of their legacy software and tools maturity.
145 © 2008, [email protected] http://hartenstein.de 2010, 2010,
New roadmaps of FPGA giants
7. New roadmaps of FPGA giants
we covered the fundamental strengths and weaknesses of FPGAs.
We saw how new technologies evolved and tried to
address specific market segments where they can provide a better solution compared to FPGAs by using the weakness of FPGAs as their strength.
However FPGAs also have dramatically changed with time and the FPGA vendors are well aware of the pros and cons of their technology.
The most recent announcements of FPGA giants of putting Hard processor blocks is a milestone step by FPGAs and response to the competitive technologies.
146
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Tilera
Figure 3 shows the Tile64 device of Tilera Corporation (www.tilera.com).
It is a nice example of massively parallel processor arrays.
The architecture style is again regular like FPGAs.
In case of Tilera each tile is a processor core which can run a full operating system, or multiple tiles together can run a multi-processing operating system like SMP Linux.
The processor cores are connected by their iMesh on-chip network.
Their programming tools suite MDE (Multicore Development Environment) provides ease of programming w. ANSI C/C++.
147 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Run time support of RC
Challenges to runtime support of a reconfigurable system:
148
Online monitoring;
Load balancing;
HW dependable SW;
Visualization;
Runtime resource management and scheduling;
very fast re-layout for dynamic reconfiguration;
Managing adaptive dynamic routing.
Challenging issues in ES: developing generic embedded platforms to improve productivity and reusability.
A reconfigurable system that with the above characteristics
is far from trivial to develop
ES domain applications requirement examples:
being energy efficient and/or
safety critical (even more challenging).
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Structured ASICs
Structured ASICs are the class of devices based mostly on FPGA-like architecture and have special configuration mechanism to program the device at mask level.
149
This greatly reduces the cost and provides enhanced performance, however once created it is not re-programmable.
eASIC is a prominent example in this regard.
Xilinx and Altera also propose similar solutions for mass production, Easy Path and Hard Copy respectively.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The new developments in
semiconductor technology
difference to ASICs
make Reconfigurable Computing a
widely used solution for future systems.
Reconfigurable Computing can achieve such a goal;
however, several improvements are required:
three orders of magnitude higher Area*Time*Power product than ASICs.
an order of magnitude more resources
an order of magnitude higher delay
an order of magnitude higher power consumption
150
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 26
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Universal nature due to
prototype capability
Universal nature due to prototype capability
The strongest strength of FPGAs is that they have universal capabilities due to their prototype ability & HDL programming model.
This is not true for a microprocessor.
This power helps FPGAs absorb complex functionalities in form of Hard Macro blocks.
It can be a processor, an IP or anything else.
Since the programming model is HDL it gives instant usability of the component without any burden of new standards or languages.
Highly mature in-house or 3rd party synthesis tools are available due to standard RTL flow.
151 © 2008, [email protected] http://hartenstein.de 2010, 2010,
fine grain vs coarse grain
To ensure high flexibility of interconnecting these LUTs requires huge amount of routing composed of
programmable switches and configuration for them which take significant area of the device.
This gave rise to new architectural concepts where the focus was to decrease the degree of fine grained flexibility of FPGAs to a coarser grained one and furthermore application specific which was inherent as when we change the level of flexibility the application domains narrow.
However the resulting solutions are orders of magnitude better in performance, power and cost when compared to general purpose FPGAs.
152
© 2008, [email protected] http://hartenstein.de 2010, 2010, 153 editor in chief
rebooting ?
Rebooting after each crash ?
… prevented rebooting the ACM/IEEE task force on curriculum recommendations
© 2008, [email protected] http://hartenstein.de 2010, 2010, 154
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
be
gin of the
multicore era
Multimedia in the Multicore Era
Multimedia Performance Needs
application performance needs up to:
Audio 800 MIPS Graphics 11 GOPS Video 160 GOPS Digital TV 900 GOPS
[Pierre Paulin, MPSoC’09]
needed
performa
nce
growing
faster
than
Moore‘s
law
[courtesy E. Sanchez]
MIPS
GSM GPRS EDGE UMTS
next
standard
© 2008, [email protected] http://hartenstein.de 2010,
FPGA tools
155 © 2008, [email protected] http://hartenstein.de 2010, 2010,
IP eco-system is RTL dominated
IP eco-system is RTL dominated
The RTL flow of FPGAs provides an added benefit to IP eco-system of the industry.
It’s easier to port IPs both for ASICs and FPGAs as both use RTL.
This also holds true for FPGAs of different vendors because they all use RTL flow so porting the design to another FPGA is not extremely complicated like it is in microprocessors where legacy code plays a high role in its success and market dominance.
Furthermore as RTL is inherently parallel, mapped application is automatically optimally parallelized by CAD tools utilizing the best of the target hardware resources
(this still is one of major difficulty for multicore/multicore-like solutions).
156
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 27
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
RTL Programming
RTL Programming Have become Programmable Platform-Language of Silicon-Highly mature Tools-Path to ASICs/ASSPs, across FPGAs-No programming crisis (rising issue is compile time not programming!)
-In theory can implement anything
-Relative ease to Absorb functionalities to Hard blocks and go Heterogeneous-IP ecosystem is RTL dominated-Attractive target for IP providers.
157 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Programming successful
With FPGAs or successful multicore-like solutions it is obvious that programming is always HDL or ANSI C/C++ and now ESL (Electronic System Level) at industrial level is bridging HDL and ANSI C/C++.
Multicore and Massively Parallel Processor Arrays (MPPAs)
158
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Menta Startup: early research state
Founded 2007 in Montpellier, Fr: focused on eFPGAs*.
159
The technology is highly scalable and target independent,
customers immediately benefit of having eFPGA in their system
and based on the need of target market constraints
can go for a custom solution for a specific node
Creating highly customized domain specific eFPGAs
for the market segment of ASICs and ASSPs
so the target market segment is different than that of FPGAs.
*) embedded FPGAs
1444 LUT, 120 nm; Press release: 4Qu 2010 Laurent Rougé, Menta founder and CEO provides embedded-FPGA (eFPGA) technology for SoC (System on Chip) eFPGA Programmer® tool suite.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Menta tool suite
The eFPGA Creator tools suite of the company allows creation of customized eFPGA Core in a user friendly GUI environment
160
close collaboration with LIRMM (University of Montpellier)
working on MRAM use (Magnetoresistive Random Access Memory)
for non volatile configuration and superior architectural benefits
for partial/dynamic reconfiguration,
multi-context compared to SRAM and FLASH with
ease of fabrication with standard CMOS process compared to FLASH.
built in analysis tools and close coupling with backend silicon tools helps to build, analyze and validate the architecture to fine tune it to target needs.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Cylindrical Model:
Accelerated System
161 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Operations within the
Cylindrical Model
• Cylinder contains the data flow graph of the kernal / application
• Diameter (and base) is determined by the I/O bandwidth from data memory. This is usually a function of the technology and type of operation.
• Height is determined by data flow size (number of operations) and effective base size.
• Parts of the data flow graph must be folded / expanded to maintain constant operational diameter
162
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 28
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Parallelism within the
Cylindrical Model
• The cylinder model supports all forms of parallelism; MIMD / SIMD but most naturally supports streaming (MISD).
• The model assumes a defined, static data flow graph which is then realized by streaming or pipelining operations.
• It also requires a synchronizing technology (e.g. FIFOs)
to assure data coordination at an operational node.
• As each node is activated each cycle; the entire data flow
graph is executed each cycle
163 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Design methods and tools
Tools have impact on designer productivity
164
Synthesis tools focus on:
automatically mapping high level descriptions into efficient hardware implementations
according to performance metrics, such as speed, size, and power consumption.
can support verification of functionality, timing and testability
covers all the main steps of synthesis and analysis, including:
capturing domain-specific knowledge,
profiling,
design space exploration,
multi-core partitioning,
(H/S)system partitioning,
data representation optimization,
static and dynamic reconfiguration,
optimal custom instruction set generation,
functional simulation,
programming mixed SW/RHsystems,
ensuring effective cross-boundary communication (another challenging issue).
© 2008, [email protected] http://hartenstein.de 2010,
miscelaneous
165 © 2008, [email protected] http://hartenstein.de 2010, 2010,
ICT market at an inflection point
166
Prosperity depends on network capacity, ..., efficient pricing, flexible platforms, & ...
Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.
Broadband is significant at the inflection point, prompting major market governance changes
& massive funding
needed
Cowhey‘s & Aronson‘s Law
The battle for the living room & mobile is more important than the PC market.
... Cheap Revolution: •affordable broadband •software
performance
• low power
© 2008, [email protected] http://hartenstein.de 2010, 2010, 167
"Imagination is intelligence with an erection" — Victor Hugo
how programmers think
We must imagine how programmers think
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Only a fraction of the chip used
in current general purpose architectures only
a small fraction of the chip is dedicated to carry useful computations,
the remaining resources in memory hierarchy and
modules indirectly for performance (branch predictor, pipeline control)
Exploiting RP to speedup computationally intensive tasks.
to deploy this larger scale, need to address several challenging issues.
- Supply voltage reduced more than 15% per technology generation,
in order to keep the power consumption low
- operating frequency increasing by 20-30% per annum.
168
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 29
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Too many HDLs
169 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Area cost not a limiting factor
rapid increase of on-chip devices (currently billions of transistors),
& large number of metal layers
170
Reconfigurable computing can fill, at least partially,
the above gap in the missing performance speedup.
due to power limitations, not all resources can be active at the same time;
such resources then used to offer reconfigurability and flexibility on a chip
targeting fault-tolerance, better performance, or certai lower power computations.
reconfigurable hardware area cost
is not anymore a limiting factor.
resources get “cheaper”
© 2008, [email protected] http://hartenstein.de 2010, 2010, 171 © 2008, [email protected] http://hartenstein.de 2010, 2010,
New trends in industry
172
© 2008, [email protected] http://hartenstein.de 2010, 2010,
New trends in industry
173 © 2008, [email protected] http://hartenstein.de 2010, 2010, 174
CLB CLB
CLB CLB
CLB CLB
Field-Programmable Gate Array
con
nec
t to
CL
B
form
ing
a w
ire
switch box
CLB
Configurable Logic Box
connect box
FPGA Xilinx 1984
fun
ctio
n s
elec
t
B
A
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 30
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010, 175
Data meet the Processing Unit (PU)
by Software
by
Configware
routing the data by memory-cycle-hungry instruction streams thru shared memory
placement of the execution locality ...
We have 2 choices
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Tail wagging the Dog
176
CPU „Central Processing Unit“
„Central“: it controls
(almost) everything However,
it needs
accelerators
accelerators CPU
© 2008, [email protected] http://hartenstein.de 2010, 2010, 177
von Neumann dominance
Even hardware design went von Neumann about 1969
instruction streams + microinstruction streams
Microprogramming: nested von Neumann machines
[G. Koch et al.: The universal Bus considered harmful; 1st EUROMICRO Symp., June 1975, Nice, France 1975]
nested von Neumann bottlenecks:
multiple multiplexing overhead:
© 2008, [email protected] http://hartenstein.de 2010, 178
Massive Overhead Phenomena
overhead von Neumann
machine RC
instruction fetch instruction stream ./.
state address computation instruction stream ./.
data address computation instruction stream ./.
data meet PU instruction stream ./.
i/o - to / from off-chip RAM instruction stream ./.
multi-threading overhead instruction stream ./.
… other overhead instruction stream ./.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Acceleration by FPGA
• Only one tenth the frequency.
• Magnitude of parallelism overcomes frequency limitations.
• Stream data across large cell array, minimizing memory bandwidth.
• Customized data structures e.g.17 bit floating point; always just enough precision.
• A software (re) configurable technology
• Need an in-depth application study to realize acceleration;
• acceleration requires more programming effort (acceleration is not automatic; ).
179 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Strength and Weaknesses
Section 2 will present the strongest potentials of FPGAs along with their weaknesses which created opportunities for other solutions.
In section 3 for completion of scenario we will discuss structured ASICs which are similar to FPGAs in architecture and offer an interesting tradeoff between FPGAs and ASICs for low to average volumes.
Section 4 will discuss theme of the emerging technologies which have tried to take the niche from FPGA market share. We will show how these technologies have been inspired by FPGAs and have used the Multicore concept to compete with FPGAs.
Section 5 will provide a brief overview of the new FPGA startup companies that have emerged in the past few years.
Section 6 will give a glimpse of the high-end heterogeneous MPSoC Platforms.
Section 7 will present what FPGA vendors have learned in all these years form their challenges and emerging competitions and how they are adapting to it for their future devices.
180
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 31
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
FPGA to ASIC Gap
• Measuring the Gap between FPGAs and ASICs
• 30-40X Area,
• 12-14X Power,
• 3-5X Speed
• Like µProcessor:
• high price of flexibility
181 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Productivity vs. Efficiency
182
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Evaluation Metrics
183
The different HLL paradigms/approaches:
- imperative programming (Impulse-C)
- functional programming (Mitrion-C)
- schematic/graphical programming (DSPLogic)
Tarek’s evaluation metrics:
- the explicitness of the programming model,
- ease-of-use
- efficiency of generating hardware
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Taxonomy ofTwin Paradigm
Programming Flows (HPRC)
184
E. El-Araby et al.: Comparative Analysis of High Level Programming for Reconfigurable Computers: Methodology And Empirical Study; Proc. SPL2007, Mar del Plata, Argentina, Febr. 2007
[courtesy Richard Newton]
„The nroff of EDA“ [R. N.]
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Growth of the Internet
The future of our world-wide total computing ecosystem is facing a mind-blowing and growing electricity consumption, together with a trend toward growing cost and shrinking availability of energy sources.
Carbon footprint of the internet higher than the world-wide air traffic.
Will The Internet Break?
185
Consumer broadband connections NA, Mex, WE by end’ 2007:
155 millions - predicted for after 2011: 228 millions.
Accelerated trend: new technologies, larger e-mails, an explosion in services integrating video and software will intensify by increasing popularity of games, massive use of video on demand, high-definition video and pay-TV to the living room, as well as by newer services by mobile phone companies. and multiple connected PCs, and devices using connection
© 2008, [email protected] http://hartenstein.de 2010, 2010,
World Economic Forum’s
"Global Redesign Initiative”
Organizations like UN, GATT, G8, G20 are becoming increasing inept at fixing what ails the world: Goals of
186
• economic growth, • climate protection, • poverty eradication, • conflict avoidance, • human security and • promotion of shared values.
Klaus Schwab: "Our existing global institutions require extensive rewiring to confront contemporary challenges in an effective, inclusive and sustainable way."
IT crossborder integration enabling virtual interaction created a world:
• much more complex and more bottom-up than top-down."
• economically, politically and environmentally more interdependent
• without a new set of int’l bureaucracies piled on existing ones.
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 32
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Important: Reinvent Computing
The growth of IT and internet for:
187
• broad engineering issues
• insuring sustainability issues of the world like
• smart energy production and distribution,
• dealing with ageing and young population
• intelligent water management,
• strengthening welfare,
• mitigating riscs.
help existing institutions by IT networking to enable existing institutions to:
• unleash public value,
• catalyzing initiatives,
• unleashing human capital in the world
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The Wikinomics Approach
a global system with graphic visualization to measure success, for
188
-- more agile structures enabled by global networks for new kinds of collaboration without bureaucracy
• complete redesign the global legal system • for a global vaccine protocol, • global intellectual property system • global risk management, etc.
launch a new paradigm to involve world citizens through mass collaboration by a new communication medium including toolslike
• digital brainstorms and town hall meetings: • decision-making initiatives like citizen juries and deliberative polling • execution tools like policy wikis and • social networks with government and evaluation programs.
mass collaboration of
citizens worldwide
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Further Progress stalled
Not only disruptive architectural developments in industry stall further progress of IT with respect to energy-inefficiency and performance improvements.
189
Because of the inevitable manycore architecture contemporary computer systems are in an all-dominant programmability crisis.
The progress of performance is massively stalled because of this „programming wall“ caused by lacking scalability of parallelism and an ubiquitous programmer productivity gap.
Unaffordable operation cost by excessive power consumption are a massive future survival problem for existing cyber infrastructures, which we must not surrender.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The growing core count
The growing core counts are racing ahead of programming paradigms and programmer productivity, not only in supercomputing: everywhere!
Almost all supercomputing applications had
originally been written for a single processor and now
more than 50% of the applications do not scale beyond eight cores, although the newest petascale machines employ up to 100,000 processor cores each.
What about future exascale giants expected to come up with up to a million cores?
190
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Crashing into the
Programming Wall
• The list (not even complete) demonstrates, that most much earlier supercomputing projects and start-ups failed by crashing into the parallel programming wall.
• Even to-day the vast majority of HPC or supercomputing applications was originally written for a single processor with direct access to main memory.
• But the first petascale supercomputers employ more than 100,000 processor cores each, and distributed memory.
• Three real-world applications have broken the petaflop barrier (1015 calculations/second). A slightly larger number have surpassed 100 teraflops (100 x 1012 calculations/second), mostly on IBM and Cray164.
• The scene hopes, that dozens of applications are inherently parallel enough to be laboriously decomposed, sliced and diced, for mapping onto such highly parallel computers.
• But a large applications is only modestly scalable. More than 50% of the codes do not scale beyond eight cores, only about 6% can exploit more than 128 PE, still a tiny fraction of 100,000 or more available cores.number of
191 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Amid the Clamor
192
Michael Wrinn, (keynote at SIGCSE2010): Suddenly, All Computing Is Parallel: Seizing Opportunity Amid the Clamor http://www.sigcse.org/sigcse2010/attendees/keynotes.php
„Foundational change will disrupt traditional habits throughout the discipline ....“
„The proud era
of von Neumann architecture passes into history.“
a senior course
architect in the
Intel Software
College
bring parallel
computing into
mainstream of
undergraduate
education
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 33
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Programming Research stalled
The programming wall forces us to reshape the fundamental nature of system design and programming methods
193
The scientific community with its current discussions looks like despairingly seeking a needle in a haystack.
The still unanswered question is, what will it really take to build affordable and successfully programmable high performance platforms.
Will we be successful in addressing scalability challenges and in finding new programming models to support finding novel environments and algorithms which improve performance, resilience and power efficiency, and can exploit extreme concurrency?
However, the evolutionary path is not addressing the key issues.
Extrapolating from petascale to future exascale machines yields a power consumption of about 120 MW or more: the power wall.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Max Planck: Replacement of false doctrines by new insights needs 50 years waiting for not only old professors but also their scholars to die off.
50 years Software Crisis
Software Engineering critics is not new:
F. L. Bauer 1968, coined the term „Software Crisis“
N. N. 1995: THE STANDISH GROUP REPORT
Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005
Anthony Berglas 2008: Why it is Important that Software Projects Fail
Software Crisis:
term by F. L. Bauer
[1968]
194
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Will we be successful ?
Will we be successful in addressing scalability
195
and in finding new programming models
to support finding novel environments and algorithms
which improve performance, resilience and power efficiency
and can exploit extreme concurrency?
© 2008, [email protected] http://hartenstein.de 2010, 2010, 196
A Rescue Campaign
is urgently needed
Software must be rewritten not only for Manycore
But also in general for energy-efficient computing
However, a qualified programmer population is not existing (we do not yet know, how to rewrite software)
We need to reinvent computing (and its education)
© 2008, [email protected] http://hartenstein.de 2010, 2010, 197
Delaying such actions will
cause a world-wide disaster
Must be done as long as we can afford a rescue campaign
Will be costly and take many, many years
Creates thousands of new jobs
To convince politicians we need presence in the media
© 2008, [email protected] http://hartenstein.de 2010, 2010,
All but ALU is overhead: x20 efficiency
198
[R. Hameed et al.: Understanding Sources of Inefficiency in General-Purpose Chips; 37th ISCA, June 19-23, 2010, St. Malo, France]
… quantifying the overheads of a 720p HD H.264
explores methods to eliminate overheads by transformations (data
cashe)
Just one of several overhead abstraction layers
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 34
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010,
Programmable SOC in the media
no of design
starts: + 13.4%
in 2006
[Dataquest]
#####
til 2010:
from 80,000
to 110,000
[Dataquest]
June 2005
© 2008, [email protected] http://hartenstein.de 2010, 200
alumnus
alumnus
(CV) at Karlsruhe: first graduate student and Ph. D. student of Karl Steinbuch
alumnus
Reiner Hartensteingiving a keynote address at ITIV 25th anniversary (in 1983)
dire
ctor
Utz Baitinger
Karl Steinbuch
founder
of ITIV
director Klaus Müller-Glaser
Jürgen Becker, vice president,
Univ. Karlsruhe
© 2008, [email protected] http://hartenstein.de 2010, 2010, 201
widening the
semantic gap
© 2008, [email protected] http://hartenstein.de 2010, 2010, 202
ISC2006 BoF Session Title and Abstract
Is Reconfigurable Computing the Next Generation Supercomputing?
Advances in reconfigurable computing, particularly FPGA (field-programmable gate array) technology, have reached a performance level where they rival and exceed the performance of general purpose processors for the right applications. FPGAs have gotten cheaper thanks to smaller geometries, multimillion gate counts and volume market leverage from ASIC preproduction and other conventional uses. The potential benefit from the widespread incorporation of FPGA technology into high-performance applications is high, provided present day barriers to their incorporation can be overcome. This session will focus on defining the anticipated market changes, anticipated roles of FPGA technology in high-performance computing (from accelerators to hybrid architectures), characterizing present day barriers to the incorporation of FPGA technology (such as identifying the right applications), and partnering efforts required (tools, benchmarks, standards, etc.)to speed the adoption of reconfigurable technology in high-performance supercomputing.
Keywords: Reconfigurable computing, FPGA Accelerators, Supercomputing
Date and Time
This BoF session is part of the conference program and will take place within a 45 minute-slot on
Wednesday 28. June 2006 from 18:00 - 19:30.
BoF Organizers
John Abott
Chief Analyst, The 451 Group, USA
Dr. Joshua Harr
CTO, Linux Networx, USA
As
CTO
for
Linux
Networ
x, Dr.
Joshu
a Harr
has the
respon
sibility
of
laying
the
technic
al
roadma
p for
the
compa
ny and
is
leading
the
team
develo
ping
cluster
manag
ement
tools.
Josh's
experie
nce
with
parallel
process
ing,
distrib
uted
comput
ing,
large
server
farms,
and
Linux
clusteri
ng
began
when
he
built
an
eight-
node
cluster
system
out of
used
compo
nents
while
in
college
. An
industr
y
expert,
Josh
has
been
called
upon
to
consult
with
busines
ses and
lecture
in
college
classro
oms.
He
earned
a Ph.D.
in
comput
ational
chemis
try and
a
bachel
or's
degree
in
molecu
lar
biolog
y from
BYU.
Dr. Eric Stahlberg
Organizing founder OpenFPGA, Ohio Supercomputer Center (OSC), USA
© 2008, [email protected] http://hartenstein.de 2010, 2010, 203
more offending statements to come
speaker
audience
© 2008, [email protected] http://hartenstein.de 2010, 2010, 204
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 35
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
How to achieve acceptance
[C
ourtesy R
ichard N
ew
ton]
Your name here: your proposals
how to hide the ugliness from the user [Herman
Schmit]
© 2008, [email protected] http://hartenstein.de 2010,
Lean hdw Qualification: not this way!
[Richard Newton]
We want a WYSIWYG design entry [Richard Newton]
206
Richard Newton: The Next EDA
Revolution (Japan, Sept 1996)
(nroff: from Unix in the 60ies)
© 2008, [email protected] http://hartenstein.de 2010,
The „von Neumann“ mainframe
introduced early
40ies
207
The contemporary basic
mind set of programmers
is still tape-oriented
Time domain:
instruction streams,
controlled by program counter
notorious
headache w.
parallelism
© 2008, [email protected] http://hartenstein.de 2010, 2010, 208
Languages turned into Religions
Teaching to students the tunnel view of language designers
falling in love with the subtleties of formalismes
instead of meeting the needs of the user
Java is a religion – not a language [Yale Patt]
© 2008, [email protected] http://hartenstein.de 2010, 209
The spirit of the Mainframe Age
For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps …
Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view
… finally tending to code sizes of astronomic dimensions
1951: Hardware Design going von Neumann (Microprogramming)
© 2008, [email protected] http://hartenstein.de 2010, 2010, 210 27 October 2008 Software 2008, Zurich
Few parallel abstractions: low level machine-specific models (shared memory,
message passing), assembly level constructs (thread, semaphore, lock), machine-specific performance models, parallel programs are low level and machine-specific
(hard to port, reuse investments, develop market, gain economiesof scale)
Can Multicore supplant Moore‘s dividend? Not without major innovation: Difficult programming Lack of parallel algorithms Few abstractions Sequential code
Difficult programming: synchronization, races, non-determinism, missing language and tool support
Sequential code
Multiple Programming models: long-standing consensus von Neumann, no consensus on parallelism model (data
parallelism, thread parallelism, message passing)
single application may use all of them, language and tools needed to support and
integrate models (education and training, when and how to use)
Lack of parallel algorithms
End of Moore’s Dividend?
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 36
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010,
What Language ?
Computer scientists haven’t been interested in programming clusters. If putting the cluster on a chip is what excites them, fine.
Gordon Bell: It will still have to run Fortran!
*) like CoDe-X
Based on classical programming language principles, a dual paradigm dichotomy approach (instruction-procedural interlaced with data-procedural) is a good candidate to support parallel programming.
© 2008, [email protected] http://hartenstein.de 2010,
Tools for Team Design
• At 28-nm, FPGAs deliver the equivalent of a 20- to 30-million gate application-specific integrated circuit (ASIC).
• At this size, FPGA design tools begin to break down. Design and verification in a reasonable amount of time becomes impossible.
• Tools for team-design are coming up, where we should discuss:
• Distributed and parallel development
• Design flows
• Tracking and reporting
• An important consideration is the management and integration of sub-projects into the top level, including
• source code version control
• Design / IP re-use
• Time budgeting
• In-context synthesis
212
© 2008, [email protected] http://hartenstein.de 2010,
43% growth
• In 2010 iSuppli sees 43 percent growth for the PLD market (including FPGAs), and 30% just for FPGAs.
213
• The market for core silicon (PLDs, ASIC, and standard products (ASSPs)) is predicted to grow 21.2 percent in 2010, where PLDs will grow fastest, by 43% up to $4.7 billion.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Dave Patterson‘s Law
214
memory,
streams
off-chip
I/O
I/O
I/O
the memory
wall
Patterson
© 2008, [email protected] http://hartenstein.de 2010, 215
trends: revival of graphic HDLs ?
Results from CVT project funded by the EU within the ESPRIT program
abutment
expressions
describe
compound
cells,
.........
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Only a fraction of the chip used
in current general purpose architectures only
a small fraction of the chip is dedicated to carry useful computations,
the remaining resources in memory hierarchy and
modules indirectly for performance (branch predictor, pipeline control)
Exploiting RP to speedup computationally intensive tasks.
to deploy this larger scale, need to address several challenging issues.
- Supply voltage reduced more than 15% per technology generation, in order to keep the power consumption low
- operating frequency increasing by 20-30% per annum.
216
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 37
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010, 217
A Rescue Campaign
is urgently needed
Software must be rewritten not only for Manycore
But also in general for energy-efficient computing
However, a qualified programmer population is not existing (we do not yet know, how to rewrite software)
We need to reinvent computing (and its education)
© 2008, [email protected] http://hartenstein.de 2010,
43% growth
• In 2010 iSuppli sees 43 percent growth for the PLD market (including FPGAs), and 30% just for FPGAs.
218
• The market for core silicon (PLDs, ASIC, and standard products (ASSPs)) is predicted to grow 21.2 percent in 2010, where PLDs will grow fastest, by 43% up to $4.7 billion.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
On Bottlenecks
R. Hartenstein, G. Koch: The Universal Bus considered harmful; Symposium on the Microarchitecture of Computing Systems; June 1975, Nice, France [North Holland/American Elsevier].
219 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Von Neumann Coarse-Grained
Reconfigurable Architectures ?
Von Neumann once again came back as a ―hero― to the community telling us that he as a team in form of multi/many cores can compete with FPGAs and exploit features of non Von Neumann coarse-grained reconfigurable architectures. That opened a new portal of research and products for academics and industry including progress of Network on Chips (NoCs).
220
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Architectural Impact [Patrick Lysaght]
• Architectural impact
• – Only very high volume architectures transition to leading processes
•– Programmability and concurrency are the new architectural imperatives
• – MPSoCs evolve into heterogeneous, multi-core architectures
• – Power dissipation is a dominant concern
• – Design productivity lags silicon progress
221 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Innovation-driven computing
[Andy Hopper]
• Simulation and modelling are important tools which will help predict global warming and its effects.
222
• Computing will play a key part in optimizing use of resources in the physical world.
• The amount of infrastructure making up the digital world is continuing to grow rapidly and starting to consume significant energy resources.
• To help generate momentum and achieve these goals, it is important that a coordinated set of challenging international projects are investigated.
• We are experiencing a shift to the digital world in our daily lives as witnessed by the wide scale adoption of the world wide web.
Green IT:
• Smart energy meters: housing, buildings, facilities
• Carpooling and public transport by info web sites
• Road traffic and transport logistics optimization
• Reduce travelling by telecommuting.
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 38
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Hitting 28nm, and beyond
Both de facto FPGA giants (Xilinx and Altera) are hitting 28nm at end of 2010.
223
FPGAs now capable of implementing entire SoCs.
‘ve turned into a complex heterogeneous mix of coarse-grain elements and classical fine grained LUTs.
2009: Intel ships 32nm,
2010: foundries to ship 28nm
Intel will ship 22 nm in 2011,
16 nm in 2013
Xilinx partner TSMC, the world’s largest standalone
Fab almost the de facto Fab for all FPGAs in the world.
Also Altera is well known for its long partnership
with TSMC since early 90s.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Cray-XD1 Architecture
The Cray-XD1 allows the Opteron µP to access the FPGA internal registers, internal and external memory.
224
provides several transfer modes between µP and the FPGA (depending on its initiator).
The µP can read from / write to the FPGA local memory space (i.e. internal registers, internal BRAMS, and external memory).
The FPGA can read from / write to the µP local memory space.
However, the use of HLL can disable some of these features.
The most bandwidth-efficient transfer mode:
write-only mode (producer initiates the transfer):
burst (for large amount of data) or non-burst.
© 2008, [email protected] http://hartenstein.de 2010, 2010,
The new developments in
semiconductor technology
difference to ASICs
make Reconfigurable Computing a
widely used solution for future systems.
Reconfigurable Computing can achieve such a goal;
however, several improvements are required:
three orders of magnitude higher Area*Time*Power product than ASICs.
an order of magnitude more resources
an order of magnitude higher delay
an order of magnitude higher power consumption
225 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Run time support of RC
Challenges to runtime support of a reconfigurable system:
226
Online monitoring;
Load balancing;
HW dependable SW;
Visualization;
Runtime resource management and scheduling;
very fast re-layout for dynamic reconfiguration;
Managing adaptive dynamic routing.
Challenging issues in ES: developing generic embedded platforms to improve productivity and reusability.
A reconfigurable system that with the above characteristics
is far from trivial to develop
ES domain applications requirement examples:
being energy efficient and/or
safety critical (even more challenging).
© 2008, [email protected] http://hartenstein.de 2010, 2010,
fine grain vs coarse grain
To ensure high flexibility of interconnecting these LUTs requires huge amount of routing composed of
programmable switches and configuration for them which take significant area of the device.
This gave rise to new architectural concepts where the focus was to decrease the degree of fine grained flexibility of FPGAs to a coarser grained one and furthermore application specific which was inherent as when we change the level of flexibility the application domains narrow.
However the resulting solutions are orders of magnitude better in performance, power and cost when compared to general purpose FPGAs.
227 © 2008, [email protected] http://hartenstein.de 2010, 2010,
Some special Languages
228
Reiner Haartenstein (keynote): Directions of Programming Research: Seeking a Needle in the Haystack? The 1st Brazilian-German Workshop on Micro and Nano Electronics (BGME’2010), Oct 6-8, 2010, Porto Allegre, RS, Brazil 39
Reiner Hartenstein, TU Kaiserslautern, Germany
http://hartenstein.de
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Some Parallel Languages
229
© 2008, [email protected] http://hartenstein.de 2010, 2010,
Area cost not a limiting factor
rapid increase of on-chip devices (currently billions of transistors),
& large number of metal layers
230
Reconfigurable computing can fill, at least partially,
the above gap in the missing performance speedup.
due to power limitations, not all resources can be active at the same time;
such resources then used to offer reconfigurability and flexibility on a chip
targeting fault-tolerance, better performance, or certai lower power computations.
reconfigurable hardware area cost
is not anymore a limiting factor.
resources get “cheaper”
© 2008, [email protected] http://hartenstein.de 2010, 2010, 231
Kommentar
avoid traditional tunnel views
to obtain new perspectives
rediscovery and revival of old ideas
rearrange and teach them properly
to reach promising new horizons