pulp: an open hardware platform the story so far · 2018-11-09 · 1 february 2016 first release of...
TRANSCRIPT
2Integrated Systems Laboratory
1Department of Electrical, Electronicand Information Engineering
PULP: an Open Hardware PlatformThe story so far
NIPS summer school 2018 - Perugia 19.07.2018
Frank K. Gürkaynak
Davide Rossi
http://pulp-platform.org
§ 14:30 - 16:00 / Part I / Frank K. GürkaynakWhat is PULP and what are its building blocks§ Brief history of the project and its goals§ Open source HW, why and why not§ Components of a PULP based System§ RISC-V cores of PULP project§ PULP systems and implementations
§ 16:00 - 16:30 / Coffee Break§ 16:30 - 18:00 / Part II / Davide Rossi
Secrets of PULP based systems
§ Tomorrow / Hands on sessions with PULP systems
What will we cover today
§ Project started in 2013 by Luca Benini§ A collaboration between University of Bologna and ETH Zürich§ Key goal is
§ We were able to start with a clean slate, no need to remain compatible to legacy systems.
Parallel Ultra Low Power (PULP)
How to get the most BANGfor the ENERGY consumed in a computing system
Energy efficiency is the key driver in the PULP project
Maximum Energy Efficiency @ 0.5V + 0.5V FBBà 60GOPS/W
Wide operating range
Low leakage(< 2%)
1.8GOPS
~10mW @ 100 MHz, 0.75V
Mea
sure
men
t res
ults
from
PU
LPv1
a fo
ur c
ore
clus
ter i
n ST
FD
SOI
28nm
tech
nolo
gy ru
nnin
g a
para
llel m
atrix
mul
tiplic
atio
ns
§ We concentrate on programmable systems§ Can not have custom hardware, need to be flexible§ We need to make the system accessible to application developers
§ Scalable over a wide operating range§ Work just as well when processing 0.001 GOPS as 1000 GOPS
§ Don’t waste idle energy§ Eliminate sources where cores and systems are idly wasting energy
§ Make good use of options provided by the technology§ Body biasing, power gating, library, memory selection etc..
§ Take advantage of heterogeneous acceleration§ Allow an architecture where accelerators can be added efficienty
To reach our goal we work on the following
§ Luca Benini holds a dual appointment in ETH Zürich and Bologna§ Total team about 50-60 members (60% in Zürich, 40% in Bologna)
§ 1 Professor§ 3 Assistant Professors and 2 Senior Scientists§ 8 Post Doctoral researchers§ 30+ Ph.D. students§ 6 Technical staff and 2 embedded staff in industrial partners
§ Most of this team works on projects that are related to PULP§ Core group of 15 designers that concentrate on PULP development§ Many others contribute to application development, software support
§ In ETH Zürich, the Microelectronics Design Center supports the project§ Permanent staff of 4§ Maintains design flows for ASIC and FPGA
PULP is maintained by a large group
We have designed over 20 PULP based ASICs already
www.pulp-platform.org
28nm 28nm 28nm 65nm 65nm 65nm 65nm 65nm 65nm65nm
130nm 180nm28nm65nm180nm40nm 65nm130nm130nm
65nm
180nm
http://asic.ethz.ch
22nm65nm65nm130nm
We firmly believe in Open Source movement
http://pulp-platform.org§ First launched in February 2016 (github)
§ HDL code, testbenches, testcases, SW, scripts, debug support.§ We use solderpad (v0.51) license (Apache like license adapted for HW)
http://www.solderpad.org/licenses/
§ The way we design ICs has changed, big part is now infrastructure§ Processors, peripherals, memory subsystems are now considered infrastructure§ Very few (if any) groups design complete IC from scratch§ High quality building blocks (IP) needed
§ We need an easy and fast way to collaborate with people§ Currently complicated agreements have to be made between all partners§ In many cases, too difficult for academia and SMEs
§ Hardware is a critical for security, we need to ensure it is secure§ Being able to see what is really inside will improve security§ Having a way to design open HW, will not prevent people from keeping secrets.
Open Hardware is a necessity, not an ideological crusade
§ The differences complicate things§ Not a uniform way to discuss the issue, depends on the design flow§ This talk about ASIC flow (the most complex one).
§ Different actors, different revenue streams§ Not every actor in the current design flow, earns money the same way§ EDA companies are long complaining that they are out of the loop
Their income is not based on the amount of ICs produced.§ Not everybody will be happy if open hardware will be more common§ Important to understand the relationships
§ Interestingly most intermediate file formats are open, readable§ Verilog, Liberty, SPICE, EDIF, CIF, LEF, DEF, OA (open access)
Frank. K. Gürkaynak - Open Source Hardware 10
Hardware design flows for PCB/FPGA/ASIC are different
TechnologyProvider
IP Providers
EDA Companies
ASIC Design Flow and Main Actors
Integrated Systems Laboratory 11
HDLNetlist GDSII ChipLogic
Synth. P&R Foundry
PDK
OtherIP
Simu-lator
Sim.Models
High-Level Code
HLS
Std.Cells
§ Only beginning of the pipeline can be open HW§ Later stages contain business interest of various actors
FPGAVendor
FPGA Design Flow and Main Actors
Integrated Systems Laboratory 12
HDLNetlist BitfileLogic
Synth. P&R FPGASimu-lator
Sim.Models
High-Level Code
HLS
FPGAResour.
FPGA VENDOR
§ FPGA vendors want to sell the FPGAs§ Little commercial interested in intermediate files
§ Most vendors allow bitfiles to be published -> will sell more FPGAs
§ Similar to Apache/BSD, adapted specifically for Hardware§ Allows you to:
§ Use § Modify§ Make products and sell them without restrictions.
§ Note the difference to GPL§ Systems that include PULP do not have to be open source (Copyright not Copyleft)§ They can be released commercially§ LGPL may not work as you think for HW
We provide PULP with SOLDER Pad License
http://www.solderpad.org/licenses/
§ The following are ok:§ RTL code written in HDL, or a high-level language for HLS flow§ Testbenches in HDL and associated makefiles, golden models
§ How about support scripts for different tools?§ Synthesis scripts, tool startup files, configurations
§ And these are currently no go :§ Netlists mapped to standard cell libraries§ Placement information (DEF) § Actual Physical Layout (GDSII)
At the moment, open HW can (mostly/only) be HDL code
OPEN
NOT YET
SWHW
Where are we now?
Frank. K. Gürkaynak - Open Source Hardware
15
Application
Operating System
Instruction Set Arch
Microarchitecture
Components (RAM, ALU)
Gates
Transistors
Emacs
Linux
RISC-V
PULP
PULP Open-Source Releases and External Contributions
1 February 2016First release of PULPino, our single-core microcontroller
2
3
May 2016Toolchain and compiler for our RISC-V implementation (RI5CY), DSP extensions
August 2017PULPino updates, new cores Zero-riscyand Micro-riscy, FPU, toolchain updates
February 2018PULPissimo, ARIANE, PULP
1
2
4
September 2017Porting of ARM CMSIS to PULPinohttps://github.com/misaleh/CMSIS-DSP-PULPino
June 2017Porting of Verilator and BEEBS benchmarks to PULPinohttps://github.com/embecosm/ri5cy
December 2017STING: Open-Source Verification Environment for PULPinohttp://valtrix.in/programming/running-sting-on-pulpino
Releases Community Contributions
November 2017Numerous Bug fixes to RI5CY in PULPinohttps://github.com/pulp-platform/riscv
3
March 2018Open PULP
4
5
We try to leverage open source as much as possible
Compiler Infrastructure
Processor & Hardware IPs
Virtualization Layer
ProgrammingModel
Low-Power Silicon Technology
§ Many companies (we know of) are actively using PULP§ They value that it is silicon proven§ They like that it uses a permissive open source license
Silicon and Open Hardware fuel PULP success
§ GreenWaves Technologies§ Dolphin§ IQ Analog (14nm chips)§ Embecosm§ lowRISC§ Mentor Graphics§ Cadence Design Systems§ ST Microelectronics (IT,F)§ Micron§ Google§ IBM§ NXP
§ Stanford§ Cambridge§ UCLA§ CEA/LETI§ EPFL§ National Chia Tung
University§ Politecnico di Milano§ Politecnico di Torino§ Universita Roma I§ Instituto Superior Tecnico –
U. de Lisboa§ Fondazione Bruno Kessler
§ Zagreb FER§ Universita di Genova§ Istanbul Technical U. § RWTH Aachen§ Lund§ USI – Lugano§ Bar-Ilan§ TU-Kaiserslautern§ TU-Graz§ UC San Diego§ CSEM§ IBM Research
§ SIAE Microelectronica§ Advanced Circuit
Pursuit§ Shanghai Xidian
Technology§ SCS Zurich§ IMT technologies§ Microsemi§ Arduino§ RacyICs
Companies that are using/evaluating PULP Research Centers/Universities using PULP
Com
pani
es w
ith a
nnou
nced
pro
duct
s, b
usin
ess
Com
pani
es th
at u
se P
ULP
inte
rnal
ly o
r for
trai
ning
Com
pani
es e
xplo
ring
oppo
rtun
ities
Platforms
InterconnectPeripheralsRISC-V Cores
The PULP family explained
Accelerators
RISC-V Cores
We have developed several optimized RISC-V cores
RI5CY
32b
Microriscy32b
Zeroriscy32b
Ariane
64b
§ Our research was not developing processors…§ … but we needed good processors for systems we build for research§ Initially (2013) our options were
§ Build our own (support for SW and tools)§ Use a commercial processor (licensing, collaboration issues)§ Use what is openly available (OpenRISC,.. )
§ We started with OpenRISC§ First chips until mid-2016 were all using OpenRISC cores§ We spent time improving the microarchitecture
§ Moved to RISC-V later§ Larger community, more momentum§ Transition was relatively simple (new decoder)
How we started with open source processors
§ Started by UC-Berkeley in 2010§ Open Standard governed by RISC-V
foundation§ ETHZ is a founding member of the
foundation§ Necessary for the continuity§ Extensions are still being developed
§ Defines 32, 64 and 128 bit ISA§ No implementation, just the ISA§ Different RISC-V implementations (both
open and close source) are available
§ At IIS we specialize in efficient implementations of RISC-V cores
Spec separated into “extensions”
RISC-V Instruction Set Architecture
I Integer instructions
E Reduced number of registers
M Multiplication and Division
A Atomic instructions
F Single-Precision Floating-Point
D Double-Precision Floating-Point
C Compressed Instructions
X Non Standard Extensions
Our RISC-V cores
§ Zero-riscy§ RV32-ICM
§ Micro-riscy§ RV32-CE
§ Ariane§ RV64-ICM
§ RI5CY§ RV32-ICMX§ SIMD§ HW loops§ Bit
manipulation§ Fixed point
§ RI5CY+FPU§ RV32-ICMFX
Low Cost Core
Linux capable Core
Core with DSP enhancements
Floating-point capable Core
32 bit 64 bit
ARM Cortex-M0+ ARM Cortex-M4 ARM Cortex-A55ARM Cortex-M4F
§ 4-stage pipeline, optimized for energy efficiency§ 40 kGE, 30 logic levels, Coremark/MHZ 3.19§ Includes various extensions (X) to RISC-V for DSP applications
RI5CY – Our workhorse 32-bit core
RI5CY – ISA Extensions improve performance
for (i = 0; i < 100; i++) d[i] = a[i] + b[i];
mv x5, 0mv x4, 100Lstart:
lb x2, 0(x10)lb x3, 0(x11)addi x10,x10, 1addi x11,x11, 1add x2, x3, x2sb x2, 0(x12)addi x4, x4, -1addi x12,x12, 1
bne x4, x5, Lstart
Baseline
11 cycles/output
mv x5, 0mv x4, 100Lstart:
lb x2, 0(x10!)lb x3, 0(x11!)addi x4, x4, -1add x2, x3, x2sb x2, 0(x12!)
bne x4, x5, Lstart
8 cycles/output
Auto-incr load/store
lp.setupi 100, Lendlb x2, 0(x10!)lb x3, 0(x11!)add x2, x3, x2
Lend: sb x2, 0(x12!)
HW Loop
5 cycles/output
lp.setupi 25, Lendlw x2, 0(x10!)lw x3, 0(x11!)pv.add.b x2, x3, x2
Lend: sw x2, 0(x12!)
Packed-SIMD
1,25 cycles/output
§ RI5CY was built for energy efficiency for DSP applications§ Ideally all parts of the core are running all the time doing something useful§ This does not always mean it is low-power§ The core is rather large (> 40 kGE without FPU)
§ People asked us about a simple and small core§ Not all processor cores are used for DSP applications§ The DSP extensions are mostly idle for control applications§ Zero-Riscy was designed to as a simple and efficient core.
§ Some people wanted the smallest possible RISC-V core§ It is possible to further reduce area by using 16 registers instead of 32 (E)§ Also the multiplier can be removed saving a bit more§ Micro-Riscy is a parametrized variation of Zero-Riscy with minimal area
Why we designed newer 32-bit cores
§ Only 2-stage pipeline, simplified register file§ Zero-Riscy (RV32-ICM), 19kGE, 2.44 Coremark/MHz§ Micro-Riscy (RV32-EC), 12kGE, 0.91 Coremark/MHz§ Used as SoC level controller in newer PULP systems
Zero/Micro-riscy, small area core for control applications
Different 32-bit cores with different area requirements
RI5CY Zero-riscy Micro-riscy
Different cores for different types of workload
RI5CY fo
r DSP
Zero-ris
cygood
trade-o
ff
Micro-ris
cyfor
contro
l
Normalized Static Power Consumption Dynamic Power Consumption
§ For the first 4 years of the PULP project we used only 32bit cores§ Luca once famously said “We will never build a 64bit core”.§ Most IoT applications work well with 32bit cores.§ A typical 64bit core is much more than 2x the size of a 32bit core.
§ But times change:§ Using a 64bit Linux capable core allows you to share the same address space as
main stream processors.§ We are involved in several projects where we (are planning to) use this capability
§ There is a lot of interest in the security community for working on a contemporary open source 64bit core.
§ Open research questions on how to build systems with multiple cores.
Finally the step into 64-bit cores
ARIANE: Our Linux Capable 64-bit core
§ Tuned for high frequency, 6 stage pipeline, integrated cache§ In order issue, out-of-order write-back, in-order-commit§ Supports privilege spec 1.11, M, S and U modes§ Hardware Page Table Walker
§ Implemented in GF22nm (Poseidon), and UMC65 (Scarabaeus)§ In 22nm: 910 MHz worst case conditions
(SSG, 125/-40C, 0.72V)§ 8-way 32kByte Data cache and
4-way 32kByte Instruction Cache § Core area: 175 kGE
Main properties of Ariane
7%8% 3%
21%
44%
9%
8%AreaPC GenIFIDIssueExReg FileCSR
§ Full FPGA implementation§ Xilinx Vertex 7 – VC707§ Core: 50 – 100 MHz§ Core: 15 kLUTs§ 1 GB DDR3
§ Mapping to simpler boards underway
Ariane mapped to FPGA boots Linux
Accelerators
InterconnectPeripheralsRISC-V Cores
Only processing cores are not enough, we need more
RI5CY
32b
Microriscy32b
Zeroriscy32b
Ariane
64bAXI4 – InterconnectDMA GPIO
APB – Peripheral BusI2SUART
Logarithmic interconnectSPIJTAG
Neurostream(ML)
HWCrypt(crypto)
PULPO(1st order opt)
HWCE(convolution)
Platforms
Accelerators
InterconnectPeripheralsRISC-V Cores
The pulp-platforms put everything together
RI5CY
32b
Microriscy32b
Zeroriscy32b
Ariane
64bAXI4 – InterconnectDMA GPIO
APB – Peripheral BusI2SUART
Logarithmic interconnectSPIJTAG
Neurostream(ML)
HWCrypt(crypto)
PULPO(1st order opt)
HWCE(convolution)
R5
MI
O
interconnect
A
Single Core• PULPino• PULPissimo
§ Simple design§ Meant as a quick release
§ Separate Data and Instruction memory§ Makes it easy in HW§ Not meant as a Harvard arch.
§ Can be configured to work with all our 32bit cores§ RI5CY, Zero/Micro-Riscy
§ Peripherals copied from its larger brothers§ Any AXI and APB peripherals
could be used
PULPino our first single core platform
PULPinoDataMem
RISC-Vcore
InstMem
I$
APB
-inte
rcon
nect
GPIO
AXI
-in
terc
onne
ct
BusAdapt
SPI M
UART
I2C
UART
SPI S
BootROM
§ Shared memory§ Unified Data/Instruction Memory§ Uses the multi-core infrastructure
§ Support for Accelerators § Direct shared memory access§ Programmed through APB bus§ Number of TCDM access ports
determines max. throughput § uDMA for I/O subsystem
§ Can copy data directly from I/O to memory without involving the core
§ Used as a fabric controller in larger PULP systems
PULPissimo the improved single core platform
RI5CYIbuf/ I$
instr data
Event Unit
Tightly Coupled Data Memory Interconnect
MemBank
MemBank
MemBank
MemBank
MemBank
MemBank
uDMA
APB / Peripheral Interconnect
Clock / ResetGenerator
DebugUnit
FLLs
I/Ointfs
UARTSPII2SI2C
SDIOCPI
JTAG
HardwareAccelerator Ext
Coreplex
§ Still work in progress§ Current version is very
simple§ Useful for in-house testing§ A more advanced version
will likely be developed soon.
Kerbin the single core support structure for Ariane
Kerbin
APB
-inte
rcon
nect
GPIO
AXI
-in
terc
onne
ct
BusAdapt
SPI M
UART
I2C
UART
SPI S
BootROM
Aria
neI$
D$
Timer
AXI2Per
Debug
FLL
Platforms
Accelerators
InterconnectPeripheralsRISC-V Cores
Finally for HPC applications we have multi-cluster systems
RI5CY
32b
Microriscy32b
Zeroriscy32b
Ariane
64bAXI4 – InterconnectDMA GPIO
APB – Peripheral BusI2SUART
Logarithmic interconnectSPIJTAG
M
I
Ocluster
interconnect
A R5R5R5
M MMMin
terc
onne
ct
Neurostream(ML)
HWCrypt(crypto)
PULPO(1st order opt)
HWCE(convolution)
R5
MI
O
inte
rcon
nect
A
Single Core• PULPino• PULPissimo
Multi-core• Fulmine• Mr. Wolf
R5
§ Multiple RISC-V cores§ Individual cores can be started/stopped with little overhead§ DSP extensions in cores
§ Multi-banked scratchpad memory (TCDM)§ Not a cache, there is no L1 data cache in our systems
§ Logarithmic Interconnect allowing all cores to access all banks§ Cores will be stalled during contention, includes arbitration
§ DMA engine to copy data to and from TCDM§ Data in TCDM managed by software§ Multiple channels, allows pipelined operation
§ Hardware accelerators with direct access to TCDM§ No data copies necessary between cores and accelerators.
The main components of a PULP cluster
CLUSTER
PULP cluster contains multiple RISC-V cores
RISC-Vcore
RISC-Vcore
RISC-Vcore
RISC-Vcore
CLUSTER
Tightly Coupled Data Memory
All cores can access all memory banks in the cluster
interconnect
RISC-Vcore
Mem Mem MemMem
RISC-Vcore
RISC-Vcore
RISC-Vcore
Mem Mem MemMem
Mem
Mem
CLUSTER
Tightly Coupled Data Memory
Data is copied from a higher level through DMA
interconnect
RISC-Vcore
MemDMA Mem MemMem
RISC-Vcore
RISC-Vcore
RISC-Vcore
Mem Mem MemMem
Mem
Memin
terc
onne
ct
L2Mem
CLUSTER
Tightly Coupled Data Memory
There is a (shared) instruction cache that fetches from L2
interconnect
RISC-Vcore
MemDMA Mem MemMem
RISC-Vcore
RISC-Vcore
RISC-Vcore
Mem Mem MemMem
I$
Mem
Memin
terc
onne
ct
L2Mem
I$ I$ I$
CLUSTER
Tightly Coupled Data Memory
Hardware Accelerators can be added to the cluster
interconnect
RISC-Vcore
MemDMA Mem MemMem
RISC-Vcore
RISC-Vcore
RISC-Vcore
Mem Mem MemMem
I$
HWACCEL
Mem
Memin
terc
onne
ct
L2Mem
I$ I$ I$
CLUSTER
Tightly Coupled Data Memory
Event unit to manage resources (fast sleep/wakeup)
interconnect
RISC-Vcore
MemDMA Mem MemMem
RISC-Vcore
RISC-Vcore
RISC-Vcore
Mem Mem MemMem
I$
HWACCEL
Mem
Memin
terc
onne
ct
L2Mem
I$ I$ I$
EventUnit
PULPissimo CLUSTER
Tightly Coupled Data Memory
An additional microcontroller system (PULPissimo) for I/O
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
How do we work: Initiate a DMA transfer
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
Data copied from L2 into TCDM
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
Once data is transferred, event unit notifies cores/accel
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
Cores can work on the data transferred
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
Accelerators can work on the same data
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
Once our work is done, DMA copies data back
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
PULPissimo CLUSTER
Tightly Coupled Data Memory
During normal operation all of these occur concurrently
interconnect
RISC-V
core
MemDMA Mem MemMem
RISC-V
core
RISC-V
core
RISC-V
core
Mem Mem MemMem
I$
HW
ACCEL
Mem
Memin
terc
on
ne
ct
L2
Mem
Mem
Cont
I/O
RISC-V
core
I$ I$ I$
Ext.
Mem
Event
Unit
Platforms
Accelerators
InterconnectPeripheralsRISC-V Cores
Finally for HPC applications we have multi-cluster systems
RI5CY
32b
Microriscy32b
Zeroriscy32b
Ariane
64bAXI4 – InterconnectDMA GPIO
APB – Peripheral BusI2SUART
Logarithmic interconnectSPIJTAG
M
I
Ocluster
interconnect
A R5R5R5
M MMMin
terc
onne
ct
cluster
interconnect
R5 R5R5R5
M MMM
cluster
interconnect
R5 R5R5R5
M MMM
cluster
interconnect
A R5R5R5
M MMMM
I
O inte
rcon
nect
Neurostream(ML)
HWCrypt(crypto)
PULPO(1st order opt)
HWCE(convolution)
R5
MI
O
inte
rcon
nect
A
Single Core• PULPino• PULPissimo
Multi-core• Fulmine• Mr. Wolf
Multi-cluster• Hero
IOT HPC
R5R5
We will release HERO soon
§ Multi-cluster PULP system on FPGA connected to ARM cores§ Will be made available in 3Q2018 (soon).
HERO is modifiable and expandable
§ Up to 8 clusters, each with 8 cores§ All components are open source and written in System Verilog§ Standard interfaces (mostly AXI)§ New components can easily be added to the memory map
TLX-4 0 0
So
C B
usMa ilb ox
L2Me m
Clu s t e r 0L1 Me m
Clu s t e r 1L1 Me m
Clu s t e r L-1L1 Me m
RAB
X-Ba r In t e rcon n e ct
Clu
ster
Bu
s
DMA
L1 S
PMB
ank M
-1
RISC-VPE N-1
Sh a re d L1 I$
L1 S
PMB
ank 0
L1 S
PMB
ank 1
L1 S
PMB
ank 2
RISC-VPE 1
RISC-VPE 0Pe
rip
he
ral B
us
TRYXPe r2 AXI
AXI2 Pe r
Tim e r
Eve n t Un it
TRYX TRYX
Sh a re d APU
DEMUX DEMUX DEMUX
# of clusters� {1, 2, 4, 8}
# of PEs� {2, 4, 8}
FPU� { private, shared (APU), of f}
integer DSP unit� { private, shared (APU)}
L1 SPM size and # of banks
I$ design, size, # of banks
L2 SPM size
system-levelinterconnecttopology
RAB L1 TLB sizeand L2 TLB size,
associativity,and # of banks
§ All are 28 FDSOI technology, RVT, LVT and RVT flavor§ Uses OpenRISC cores§ Chips designed in collaboration with STM, EPFL, CEA/LETI§ PULPv3 has ABB control
Brief illustrated history of selected ASICs
PULP PULPv2 PULPv3
§ First multi-core systems that were designed to work on development boards. Each have several peripherals (SPI, I2C, GPIO)
§ Mia Wallace and Fulmine (UMC65) use OpenRISC cores§ Honey Bunny (GF28 SLP) uses RISC-V cores§ All chips also have our own FLL designs.
The first system chips, meant for designing boards
Mia Wallace
HoneyBunny
Fulmine
§ Designed in collaboration with the Analog group of Prof. Huang§ All chips with SMIC130 (because of analog IPs)§ First three with OpenRISC, VivoSoC3 with RISC-V
Combining PULP with analog front-end for Biomedical apps
VivoSoC VivoSoCv2.0
VivoSoCv2.001
VivoSoCv3
Mr. Wolf our latest system chip will solve problems
§ TSMC 40LP - 9mm2
§ 64 pin QFN package§ Cluster with
§ 8x RI5CY§ 2x shared FPUs § 64 kByte TCDM
§ SoC domain with§ micro-riscy core to control
power modes§ DC-DC converter (Dolphin)
to regulate power§ 512 kBytes L2
Mr. Wolf will (hopefully) solve problems
§ TSMC 40LP - 9mm2
§ 64 pin QFN package§ Cluster with
§ 8x RI5CY§ 2x shared FPUs § 64 kByte TCDM
§ SoC domain with§ micro-riscy core to control
power modes§ DC-DC converter (Dolphin)
to regulate power§ 512 kBytes L2
PULP in the IoT domain: Latest version Mr. Wolf
GAP8: commercial big brother of PULP from Greenwaves
L2
SPIM x 2
SoCTCDMBANK
#0
TCDMBANK
#1
TCDMBANK #M-1
SHARED I$
RISCV#0
RISCV#1
RISCV#7
BRIDGE
CLUSTER
TCDM INTERCONNECT
...
...CL
UST
ER B
US
PERI
PHER
AL
INTE
RCO
NN
ECT
DC FIFO
DC FIFOID R.
AXI2MEM
SOC
BUS
ROM
SoC Ctrl
JTAG2AXI
JTAG
FLL Ctr
UART
GPIO
Top
DU DU DU
PAD
MU
X
JTAGTAP
I2C x 2
I2S x 2
SOC
APB
TMC
uDM
A
SPI S
FLL x 2
DMA
CLUSTERPERIPHS
LVDS I/Q
AXI TO MEM+ MPU FILTER
FC SUBSYSTEM
L2 REFILLMASTER
UDMASLAVE
ROM REFILL MASTER
APB SLAVE
AXIMASTER
PWM x 4
10b // (CPI)
PP Im
PP Au
DC/DCRTC
PMU
BOR BOR TRC
Q-SPIM
AXI TO APB + MPU FILTER
Hyper-Bus
Serial I/Q
AQFN84 package
HWCE
§ Taped out Jan 2018§ Arrived back June 2018§ Quentin
§ PULPissimo implementation§ RI5CY+ FPU + HWCE§ 512 kByte RAM
§ Kerbin§ Ariane core + Caches§ Uses the memory infrastructure
of Quentin§ Hyperdrive
§ Binary CNN Accelerator
Poseidon: Our latest chip in GF 22FDX3 mm
QuentinPULPissimo
Hyperdrive
KerbinAriane
§ For the workshop:§ Davide will give details on how we do thing in PULP next§ Tomorrow there will be hands on sessions for GAP8 and Multicluster
PULP§ For us:
§ Plenty of chips in pipeline (Kosmodrom, Vega, Villalobos)§ Multi-core Ariane§ Atomic instructions, 64bit FPU, Vector Unit
§ For you:§ There is plenty to be done§ Let us know if you want to/can contribute
What is next
The image part with relationship ID rId9 was not found in the file.
PULP @ ETH ZürichQUESTIONS?
@pulp_platformhttp://pulp-platform.org