parallella: a love story - adapteva a love story heterogeneous.. parallel.. efficient.. open.....
TRANSCRIPT
![Page 1: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/1.jpg)
Parallella: A Love StoryHeterogeneous..
Parallel.. Efficient..
Open..Andreas OlofssonMIT, Jan 7,2013
1
![Page 2: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/2.jpg)
Adapteva Achieves 3 “World Firsts”
2
1. First processor company to reach 50 GFLOPS/W
2. First open source OpenCL™ SDK in the mobile market
3. First semiconductor company to successfully crowd‐source project
“OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.”
![Page 3: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/3.jpg)
Prologue
3
![Page 4: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/4.jpg)
Why we need heterogeneous and parallel platforms
4
0
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1990 1995 2000 2005 2010 2015 2020 2025 2030
System ProcessingNeedsLegacy ProcessingEfficiency
“The Efficiency Gap”Von NeumannSaturation
![Page 5: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/5.jpg)
ASIC, FPGA, DSP, CPU?
5
ASIC FPGA DSP CPU
Flexibility Poor Great Good Good
Efficiency Great Good Good Fair
DevelopmentCost/Risk High Medium Medium Low
Leverage Minimal Modest High Huge
![Page 6: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/6.jpg)
A Practical Radar System Example
6
ADC/DAC FPGA
1
DDR
uP
Storage
Display
EpiphanyFPGAs are great for front‐end
DSP and connectivity.
The missing piece: a math engine that is high performance, low-power and C-programmable.
Microprocessors are great for user interfacing, knowledge
extraction, and system management.
![Page 7: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/7.jpg)
Why SOC integration is so disruptive
7
62 cm3 0.00003 cm3>1M X Volume Reduction
2XCPU
A5X‐die~13mm
FPU~0.15mm
ARMA9
~2mm
A5X Chip~16mm
iPhone4s~58mm What if your
smartphone disappears?
![Page 8: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/8.jpg)
The Problem: SOCs are complex!
8
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
$1,000,000,000
Per Product SOC R&D CostsWhat if you could do a
28nm chip for $100k?
![Page 9: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/9.jpg)
Our Vision: True Heterogeneous Computing
9
SYSTEM‐ON‐CHIP
BIGCPU
FPGA
BIGCPU
BIGCPU
BIGCPU
1000 small Epiphany RISC CPUs/DSPs
GPU Analog
![Page 10: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/10.jpg)
Epiphany: Massive Task‐Parallelism
10
Coprocessor to ARM/Intel CPU 25mW per core C/C++ programmable
![Page 11: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/11.jpg)
Programming Models
11
MODEL#1TASK QUEUE MODEL
• Up to 2 GFLOPS/core• Supports standard C/C++• “Cloud on a chip”
MODEL #2DATA PARALLEL MODEL
• openCL programmable• Easy integration of C/C++• openMP/MPI roadmap
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
X86/ARM/FPGA Host
Task1
Task3Task4
Task2
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
X86/ARM/FPGA HostTask1
![Page 12: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/12.jpg)
Epiphany Silicon Devices
12
Features:• 16 RISC CPU cores• 512KB distributed memory• IEEE Floating Point• 32 distributed DMA engines• 4 off-chip serial links• 65nmSpecifications:• 1 GHz• 32 GFLOPS• 2 Watt Max Chip Power• 512 GB/sec memory bandwidth• 8GB/sec off chip BW
eLinkI/O
eLinkI/O
eLink I/O
eLink I/O
Features:• 64 RISC CPU cores• 2MB distributed memory• IEEE Floating Point• 128 distributed DMA engines• 4 off-chip serial links• 28nmSpecifications:• 800 MHz• 100 GFLOPS• 2 Watt Max Chip Power• 1.6 TB/sec memory bandwidth• 8GB/sec off chip BW
![Page 13: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/13.jpg)
Parallella
13
![Page 14: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/14.jpg)
Parallella Open Computing
14
Rj45
USB
GPIO
GPIO
ZYNQ(ARM)CPU
E64
1GB SDRAM
uSD
HDMI
USB
• Open (and ”free”):• Documentation• Board design files• Drivers• Software Tools
• Accessible (NO NDAs!)• $100 entry point• ~4000 devs signed up in 4 weeks
IO IO
![Page 15: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/15.jpg)
How cool is this?
15
100 GFLOPS100 KW$10M
(1992)Connection Machine 5
100 GFLOPS5 W (20k X)$200 (50k X)
(2012/2013)Parallella Board
Rj45
USB
GPIO
GPIO
ZYNQ(ARM)CPU
E64
1GB SDRAM
uSD
HDMI
USB
![Page 16: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/16.jpg)
eLink
Parallella Architecture
16
Dual CoreARM A9
AXI BUS
MIO
SHARED DRAM
“O/S” DRAM
USB OTG USB 2.0
UART Ethernet
SD‐CARD I2C
DAC/ADC IFHDMI
Controller
AXI‐MASTER AXI‐SLAVE
“Glue‐Logic”
DaughterCard
AXI‐MASTER
ZynqFPGA
Zynq“Hard”
Off‐Chip
EpiphanyEpiphany
MEM‐CTRL
“Sandbox”
![Page 17: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/17.jpg)
Parallella Coprocessor Approach
17
ARM runs Linux
Epiphany accelerates key
tasks
Programmable logic “makes
anything possible”
Program Flow
1. ARM boots Linux. First stage boot loader from Flash, everything else from SD card.2. “Main” application executes on ARM3. Application sends critical tasks send to Epiphany using OpenCL or simple threads4. ARM/Epiphany communication through shared DRAM buffer outside virtual memory of O/S.
![Page 18: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/18.jpg)
18
Zedboard Introduction
18
![Page 19: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus](https://reader034.vdocuments.us/reader034/viewer/2022051803/5afe7c987f8b9a8b4d8f02d4/html5/thumbnails/19.jpg)
19
The Future is… Open
Heterogeneous Massively Task-Parallel
Efficient
Grande Challenges Ahead…• Rebuild the computer ecosystem• Rewrite billions of lines of code• Retrain millions of programmers• Rewrite the education curriculum