v. milutinovic, g. rakocevic, s. stojanovic, and z. sustran university of belgrade oskar mencer...
TRANSCRIPT
![Page 1: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/1.jpg)
V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. SustranUniversity of Belgrade
Oskar MencerImperial College, London
Oliver Pell Maxeler Technologies, London and Palo Alto
Michael FlynnStanford University, Palo Alto
1/72
![Page 2: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/2.jpg)
For Big Data algorithms and for the same hardware price as before, achieving:
a) speed-up, 20-200 b) monthly electricity bills, reduced
20 timesc) size, 20 times smaller
The major issues of engineering are: design cost and design complexity.
Remember, economy has its own rules: production count and market demand!
2/72
![Page 3: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/3.jpg)
1. BigData2. WORM3. Tolerance to latency4. Over 95% of run time in loops5. Reusability of data (e.g.,
x+x2+x3+x4+…)6. Skills
Use a tractor, not a Ferrari, to drive over a plowed field
3/72
![Page 4: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/4.jpg)
Absolutely all results achieved with:
a) All hardware produced in Europe, specifically UK
b) All software generated by programmers
of EU and WB
4/72
![Page 5: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/5.jpg)
ControlFlow (MultiFlow and ManyFlow): Top500 ranks using Linpack
(Japanese K, IBM Sequoya, Cray Titan, …)
DataFlow: Coarse Grain (HEP) vs. Fine Grain
(Maxeler)
5/72
![Page 6: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/6.jpg)
Compiling below the machine code level brings speedups;also a smaller power, size, and cost.
The price to pay:The machine is more difficult to program.
Consequently:Ideal for WORM applications :)
Examples using Maxeler:GeoPhysics (20-40), Banking (200-1000, with JP Morgan
20%),M&C (New York City), Datamining (Google), …
6/72
![Page 7: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/7.jpg)
7
![Page 8: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/8.jpg)
8/72
![Page 9: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/9.jpg)
9/72Why Java? Minimal Kolmogorov Complexity, etc…
![Page 10: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/10.jpg)
10
![Page 11: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/11.jpg)
11
![Page 12: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/12.jpg)
Assumptions: 1. Software includes enough parallelism to keep all cores busy 2. The only limiting factor is the number of cores.
tGPU = N * NOPS * CGPU*TclkGPU / NcoresGPU
tCPU = N * NOPS * CCPU*TclkCPU /NcoresCPU
tDF = NOPS * CDF * TclkDF + (N – 1) * TclkDF / NDF
12/72
![Page 13: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/13.jpg)
DualCore?
Which way are the horses going?
13/72
![Page 14: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/14.jpg)
Is it possibleto use 2000 chicken instead of two horses?
?==
14/72
What is better, real and anecdotic?
![Page 15: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/15.jpg)
2 x 1000 chickens (CUDA and rCUDA) 15/72
![Page 16: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/16.jpg)
How about 2 000 000 ants?
16/72
Dat
a
![Page 17: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/17.jpg)
Marmalade
Big Data Input Results
17/72
![Page 18: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/18.jpg)
Factor: 20 to 200
MultiCore/ManyCore
Dataflow
Machine Level Code
Gate Transfer Level
18/72
![Page 19: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/19.jpg)
Factor: 20
MultiCore/ManyCore
Dataflow
19/72
![Page 20: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/20.jpg)
Factor: 20
Data Processing
Process ControlData Processing
Process Control
MultiCore/ManyCore
DataFlow
20/72
![Page 21: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/21.jpg)
MultiCore:Explain what to do, to the driverCaches, instruction buffers, and predictors needed
ManyCore:Explain what to do, to many sub-driversReduced caches and instruction buffers needed
DataFlow:Make a field of processing gates: 1C+2nJava+3JavaNo caches, etc. (300 students/year: BGD, BCN, LjU,
ICL,…)
21/72
![Page 22: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/22.jpg)
MultiCore:Business as usual
ManyCore:More difficult
DataFlow:Much more difficultDebugging both, application and configuration
code
22/72
![Page 23: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/23.jpg)
MultiCore/ManyCore:Several minutes
DataFlow:Several hours for the real hardwareFortunately, only several minutes for the simulator,
and several seconds for reload (90% due to DRAM inertia)
The simulator supports both the large JPMorgan machineas well as the smallest “University Support” machine
Good news:Tabula@2GHz
23/72
![Page 24: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/24.jpg)
24/72
![Page 25: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/25.jpg)
MultiCore:Horse stable
ManyCore:Chicken house
DataFlow:Ant hole
25/72
![Page 26: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/26.jpg)
MultiCore:Haystack
ManyCore:Cornbits
DataFlow:Crumbs
26/72
![Page 27: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/27.jpg)
27/72
Small Data: Toy Benchmarks (e.g., Linpack)
![Page 28: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/28.jpg)
28/72
Medium Data (benchmarks favorising NVidia,compared to Intel,…)
![Page 29: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/29.jpg)
29/72
Big Data
![Page 30: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/30.jpg)
Maxeler Hardware
CPUs plus DFEsIntel Xeon CPU cores and up to
4 DFEs with 192GB of RAM
DFEs shared over Infiniband Up to 8 DFEs with 384GB of RAM and dynamic allocation
of DFEs to CPU servers
Low latency connectivityIntel Xeon CPUs and 1-2 DFEs with up to six 10Gbit Ethernet
connections
MaxWorkstationDesktop development system
MaxCloudOn-demand scalable accelerated compute resource, hosted in London
3030/72/72
![Page 31: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/31.jpg)
1. Coarse grained, stateful: Business– CPU requires DFE for minutes or hours
2. Fine grained, transactional with shared database: DM– CPU utilizes DFE for ms to s– Many short computations, accessing common database data
3. Fine grained, stateless transactional: Science (Phy, ...)– CPU requires DFE for ms to s– Many short computations
3131/72/72
Major Classes of Algorithms, from the Computational Perspective
![Page 32: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/32.jpg)
• Long runtime, but:• Memory requirements
change dramatically based on modelled frequency
• Number of DFEs allocated to a CPU process can be easily varied to increase available memory
• Streaming compression• Boundary data exchanged
over chassis MaxRing
3232/72/72
Coarse Grained: Modeling
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
1 4 8
Equi
vale
nt C
PU c
ores
Number of MAX2 cards
15Hz peak frequency
30Hz peak frequency
45Hz peak frequency
70Hz peak frequency
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80Peak Frequency (Hz)
Timesteps (thousand)
Domain points (billion)
Total computed points (trillion)
![Page 33: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/33.jpg)
• DFE DRAM contains the database to be searched• CPUs issue transactions find(x, db)• Complex search function
– Text search against documents– Shortest distance to coordinate (multi-dimensional)– Smith Waterman sequence alignment for genomes
• Any CPU runs on any DFE that has been loaded with the database– MaxelerOS may add or remove DFEs
from the processing group to balance system demands– New DFEs must be loaded with the search DB before use
3333/72/72
Fine Grained, Shared Data: Monitoring
![Page 34: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/34.jpg)
• Analyse > 1,000,000 scenarios• Many CPU processes run on many DFEs• ≈50x MPC-X vs. multi-core x86 node• Each transaction executes on any DFE
in the assigned group atomically
3434/72/72
Fine Grained, Stateless: The BSOP Control
CPU DFE Loop over instrumentsLoop over instruments
Random number generator and
sampling of underliers
Random number generator and
sampling of underliers
Price instruments using Black
Scholes
Price instruments using Black
Scholes
Tail analysis on CPU
Tail analysis on CPU
CPU DFE Loop over instrumentsLoop over instruments
Random number generator and
sampling of underliers
Random number generator and
sampling of underliers
Price instruments using Black
Scholes
Price instruments using Black
Scholes
Tail analysis on CPU
Tail analysis on CPU
CPU DFE Loop over instrumentsLoop over instruments
Random number generator and
sampling of underliers
Random number generator and
sampling of underliers
Price instruments using Black
Scholes
Price instruments using Black
Scholes
Tail analysis on CPU
Tail analysis on CPU
CPU DFE Loop over instrumentsLoop over instruments
Random number generator and
sampling of underliers
Random number generator and
sampling of underliers
Price instruments using Black
Scholes
Price instruments using Black
Scholes
Tail analysis on CPU
Tail analysis on CPU
DFE Loop over instrumentsLoop over instrumentsCPUMarket and instruments data
Random number generator and
sampling of underliers
Random number generator and
sampling of underliers
Price instruments using Black
Scholes
Price instruments using Black
ScholesInstrument values
Tail analysis on CPU
Tail analysis on CPU
![Page 35: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/35.jpg)
3535/72/72
Selected Examples:Business,Mathematics,GeoPhysics, etc.
![Page 36: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/36.jpg)
3636
![Page 37: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/37.jpg)
An MIS Example: Credit Derivatives
![Page 38: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/38.jpg)
Climber
Tether
Orbital station
HW
![Page 39: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/39.jpg)
3939
![Page 40: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/40.jpg)
Seismic Imaging
• Running on MaxNode servers- 8 parallel compute pipelines per chip- 150MHz => low power consumption!- 30x faster than microprocessors
An Implementation of the Acoustic Wave Equation on FPGAs T. Nemeth†, J. Stefani†, W. Liu†, R. Dimond‡, O. Pell‡, R.Ergas§
†Chevron, ‡Maxeler, §Formerly Chevron, SEG 20084040/72/72
![Page 41: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/41.jpg)
Performance of one MAX2 card vs. 1 CPU core
Land case (8 params), speedup of 230x
Marine case (6 params), speedup of 190x
The CRS Results
CPU Coherency MAX2 Coherency
4141/72/72
![Page 42: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/42.jpg)
4242
![Page 43: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/43.jpg)
![Page 44: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/44.jpg)
44444444/72/72
![Page 45: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/45.jpg)
• DM for Monitoring and Control in Seismic processing • Velocity independent / data driven method
to obtain a stack of traces, based on 8 parameters• Search for every sample of each output trace
Trace Stacking: Speed-up 217P. Marchetti et al, 2010
parameters( emergence angle & azimuth
Normal Wave front parametersKN,11; KN,12 ; KN22
NIP Wave front parameters( KNip,11; KNip,12 ; KNip22 )
hHKHhmHKHmmw TzyNIPzy
TTzyNzy
TT
0
0
2
00
2 22
v
t
vtthyp
4545/72/72
![Page 46: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/46.jpg)
4646
![Page 47: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/47.jpg)
This is about algorithmic changes, to maximize
the algorithm to architecture match:
data choreography, process modifications,pipeline utilization,
anddecision precision.
The winning paradigm of Big Data ExaScale?
4747/72/72
Conclusion: Nota Bene
![Page 48: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/48.jpg)
Revisiting the Top 500 SuperComputers benchmarksOur paper in Communications of the ACM
Revisiting all major Big Data DM algorithmsMassive static parallelism at low clock frequencies
Concurrency and communicationConcurrency between millions of tiny cores difficult,
“jitter” between cores will harm performance at synchronization points
Reliability and fault tolerance10-100x fewer nodes, failures much less often
Memory bandwidth and FLOP/byte ratioOptimize data choreography, data movement,
and the algorithmic computationNew architecture of n-Programming paradigms
48/72
![Page 49: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/49.jpg)
FP7: RoMoL@BCN
4949/72/72
The SAB goal: Out of box thinking!
![Page 50: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/50.jpg)
FP7: BalCon@SRB
5050/72/72
The SAB goal: Seed for new proposals!
The vision of Alkis Konstantellos
![Page 51: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/51.jpg)
51/7251/72
DAFNE: Leader MISANU
![Page 52: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/52.jpg)
52/7252/72
DAFNE = South (MaxCode) + North (BigData)
MISANU, IMP, KG, NS, BSC, UPV, U of Siena, U of Roma,IJS, FRI,IRB, QPLAN, Bogazici, U of Istanbul,U of Bucharest, U of Arad,U of Tuzla, Technion, Maxeler Israel, IPSI
52/72
UKSwedenNorway
DenmarkGermany
FranceAustria
SwissPoland
Hungary
![Page 53: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/53.jpg)
53/7253/72
The DAFNE Map
![Page 54: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/54.jpg)
54/72
The TriPeak @ DATAMAN
Siena+ BSC+ Imperial College + Maxeler+ Belgrade
46/72
![Page 55: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/55.jpg)
55/72
The TriPeak: EssenceMontBlanc = A ManyCore (NVidia) + a MultiCore (ARM)Maxeler = A FineGrain DataFlow (FPGA)
How about a happy marriage?MontBlanc (ompSS) and Maxeler (an accelerator)
In each happy marriage,it is known who does what :)
The Big Data DM algorithms:What part goes to MontBlanc and what to Maxeler?
55/72
![Page 56: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/56.jpg)
56/72
TriPeak: Core of the Symbiotic Success
An intelligent DM algorithmic scheduler,partially implemented for compile time,and partially for run time.
At compile time:Checking what part of code fits where(MontBlanc or Maxeler): LoC 1M vs 2K vs 20K
At run time:Rechecking the compile time decision,based on the current data values.
56/72
![Page 57: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/57.jpg)
57/7257/7257
![Page 58: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/58.jpg)
58/7258/72
Maxeler: Research (Google: good method)
Structure of a Typical Research Paper: Scenario #1[Comparison of Platforms for One Algorithm]Curve A: MultiCore of approximately the same PurchasePriceCurve B: ManyCore of approximately the same PurchasePriceCurve C: Maxeler after a direct algorithm migrationCurve D: Maxeler after algorithmic improvementsCurve E: Maxeler after data choreographyCurve F: Maxeler after precision modifications
Structure of a Typical Research Paper: Scenario #2[Ranking of Algorithms for One Application]CurveSet A: Comparison of Algorithms on a MultiCoreCurveSet B: Comparison of Algorithms on a ManyCoreCurveSet C: Comparison on Maxeler, after a direct algorithm migrationCurveSet D: Comparison on Maxeler, after algorithmic improvementsCurveSet E: Comparison on Maxeler, after data choreographyCurveSet F: Comparison on Maxeler, after precision modifications
58/72
![Page 59: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/59.jpg)
59/7259/72
Maxeler Research in Serbia: Special Issue of IPSI Transactions Journal
KG: Blood Flow, Tijana Djukic and Prof. Filipovic
NS: Combinatorial Math, Prof. Senk and Ivan Stanojevic
MISANU: The SAT Math, Zivojin Sustran and Prof. Ognjanovic
ETF: Meteorology, Radomir Radojicic and Marko Stankovic
ETF: Physics (Gross Pitaevskii 3D real), Sasa Stojanovic
ETF: Physics (Gross Pitaevskii 3D imaginary), Lena Parezanovic
59/72
![Page 60: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/60.jpg)
60/7260/72
Maxeler Research WorldWide:Special Issue of Advances in Computers @ SCI
Stanford, Texas,Imperial, Maxeler,ETF, MF, MISANU, IMP, KG, NS, BSC, UPV, U of Siena, U of Roma,IJS, FRI, …
60/72
![Page 61: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/61.jpg)
61/7261/72© H. Maurer61
![Page 62: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/62.jpg)
62/7262/72
Maxeler: Teaching (Google: prof vm)TEACHING, VLSI, PowerPoints, Maxeler:
Maxeler Veljko Explanations, August 2012Maxeler Veljko Anegdotic, Maxeler Oskar Talk, August 2012Maxeler Forbes ArticleFlyer by JP MorganFlyer by Maxeler HPCTutorial Slides by Sasha and Veljko: Practice (Current Update)Paper, unconditionally accepted for Advances in Computers by ElsevierPaper, unconditionally accepted for Communications of the ACMTutorial Slides by Oskar: Theory (7 parts)Slides by Jacob, New YorkSlides by Jacob, AlabamaSlides by Sasha: Practice (Current Update)Maxeler in MeteorologyMaxeler in MathematicsExamples generated in Belgrade and Worldwide
THE COURSE ALSO INCLUDES DARPA METHODOLOGY FOR MICROPROCESSOR DESIGN, with an example
62/72
![Page 63: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/63.jpg)
63/7263/72
Maxeler PreConference Tutorials (2013)
Google:
IEEE HiPeak, Berlin, Germany, January 2013
ACM iSAC, Coimbra, Portugal, March 2013
IEEE MECO, Budva, Montenegro, June 2013
ACM ISCA, Tel Aviv, Israel, June 2013
63/72
![Page 64: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/64.jpg)
64/7264/72
Maxeler InHouse Tutorials (2013)
64/72
![Page 65: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/65.jpg)
65/7265/72© H. Maurer65
![Page 66: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/66.jpg)
66/7266/72
Maxeler University Program Members
![Page 67: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/67.jpg)
67/7267/72
How to Become a Family Member?
Options to consider:
a. MAX-UP free of charge b. Purchasing a university-level machine (min about $10K) c. Purchasing a JPM-level machine
(slowly approaching $100M), or at least a Schlumberger-level machine
(slowly moving above $10M)
67/72
![Page 68: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/68.jpg)
68/7268/72
Good to Know!Maxeler employs close to 100 people, GBR and USA:
a. Maxeler cash burn per year = about $10M b. If a university-level machine is sold at the 100% profit margin, the company life of Maxeler is extended for about 2 hours. c. If a university-level machine is sold at the 1% profit margin, the company life of Maxeler is extended for 1 minute.
Our past or ongoing FP7 projects requiring Maxeler speeds:
a. ProSenseb. ARTreatc. HiPEAC
68/72
![Page 69: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/69.jpg)
69/7269/72
The Educational Mission
The reality:
a. University-level machines are sold at the ZERO profit margin! b. Only the Xilinx costs, handling, and shipping. c. Email support for student doing thesis is practically unlimited!
Important note:
a. Total number of accredited universities in the whole world? b. As per WeboMetrics, about 20000. c. Consequently, all universities of the world together bring only: 20000 minutes of extra life, or about two weeks of extra life.
Conclusion: This is a chance for those who jump in first :)
69/72
![Page 70: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/70.jpg)
70/7270/72
Our Work Impacting Maxeler
Milutinovic, V., Knezevic, P., Radunovic, B., Casselman, S., Schewel, J., Obelix Searches Internet Using Customer Data, IEEE COMPUTER, July 2000 (impact factor 2.205/2010).
Milutinovic, V., Cvetkovic, D., Mirkovic, J., Genetic Search Based on Multiple Mutation Approaches, IEEE COMPUTER, November 2000 (impact factor 2.205/2010).
Milutinovic, V., Ngom, A., Stojmenovic, I., STRIP --- A Strip Based Neural Network Growth Algorithm for Learning Multiple-Valued Functions, IEEE TRANSACTIONS ON NEURAL NETWORKS, March 2001, Vol.12, No.2, pp. 212-227.
Jovanov, E., Milutinovic, V., Hurson, A., Acceleration of Nonnumeric Operations Using Hardware Support for the Ordered Table Hashing Algorithms, IEEE TRANSACTIONS ON COMPUTERS, September 2002, Vol.51, No.9, pp. 1026-1040 (impact factor 1.822/2010).
70/72
![Page 71: V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies,](https://reader035.vdocuments.us/reader035/viewer/2022070408/56649e535503460f94b493ab/html5/thumbnails/71.jpg)
71/7271/72
Maxeler Impacting Our Work
Tafa, Z., Rakocevic, G., Mihailovic, Dj., Milutinovic, V., Effects of Interdisciplinary Education On Technology-driven Application Design IEEE Transactions on Education, August 2011, pp.462-470. (impact factor 1.328/2010).
Tomazic, S., Pavlovic, V., Milovanovic, J., Sodnik, J., Kos, A., Stancin, S., Milutinovic, V., Fast File Existence Checking in Archiving Systems ACM Transactions on Storage (TOS) TOS Homepage archive, Volume 7 Issue 1, June 2011, ACM New York, NY, USA.
Jovanovic, Z., Milutinovic, V., FPGA Accelerator for Floating-Point Matrix Multiplication, IEE Computers & Digital Techniques, 2012, 6, (4), pp. 249-256.
Flynn, M., Mencer, O., Milutinovic, V., Rakocevic, G., Stenstrom, P., Trobec, R., and Valero, M., Moving from Petaflops (on Simple Benchmarks) to Petadata per Unit of Time and Power (On Sophisticated Benchmarks) Communications of the ACM, May 2013 (impact factor 1.919/2010).
71/72