raven: a 28nm risc-v vector processor with integrated...
TRANSCRIPT
![Page 1: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/1.jpg)
1
Raven: A 28nm RISC-V Vector Processor with
Integrated Switched-Capacitor DC-DC
Converters and Adaptive Clocking
Yunsup Lee, Brian Zimmer, Andrew Waterman,
Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller,
Stevo Bailey, Milovan Blagojevic, Pi-Feng Chiu,
Henry Cook, Rimas Avizienis, Brian Richards,
Elad Alon, Borivoje Nikolic, Krste Asanovic
University of California, Berkeley
![Page 2: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/2.jpg)
Motivation
Energy efficiency constrains everything - SoCs are designed with an increasing number of
voltage domains for better power management
- Dynamic voltage and
frequency scaling (DVFS)
maximizes energy
efficiency while
meeting performance
constraints
2
• Off-chip conversion
Few voltage domains
Costly off-chip components
Slow mode transitions
✖ ✖ ✖
• On-chip conversion
Many domains
No off-chip components
Fast transitions
✔ ✔ ✔
![Page 3: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/3.jpg)
3
iPhone 6
Front Side Back Side
Credits: iFixit
Voltage
Regulators
Primary Power
Management IC
Secondary Power
Management IC
![Page 4: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/4.jpg)
Raven Project Goals
Build a microprocessor that is:
4
• Fine-grained DVFS
• High conversion efficiency Energy-efficient
• Entirely on-chip converter
• Low area overhead Low-cost
![Page 5: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/5.jpg)
Talk Outline
Motivation/Raven Project Goals
On-Chip Switched-Capacitor DC-DC Converters
Raven3 Chip Architecture
Raven3 Implementation
Raven3 Evaluation
RISC-V Chip Building at UC Berkeley
Summary
For more details on the Switched-Capacitor DC-DC Converters, please take a
look at HC23 tutorial “Fully Integrated Switched-Capacitor DC-DC Conversion”
by Elad Alon
5
![Page 6: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/6.jpg)
Switched-Capacitor (SC) DC-DC Converters
6
![Page 7: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/7.jpg)
SC DC-DC Converters Cont’d
7
Partition capacitor
and switches into
many “unit cells” for
better modularity
Keeps the custom
design modular
Makes it easier to
floorplan SC DC-DC
converter
![Page 8: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/8.jpg)
Traditional Approach: Interleaved Switching
8
Switch one unit cell at a time to smooth out
voltage ripple Pros
- Voltage ripple at the output is
suppressed
- Great for digital designs with
fixed frequency clocks
Cons - Each unit cell charge shares
with other unit cells
- Causes an efficiency loss
beyond typical switching
losses
![Page 9: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/9.jpg)
Raven’s Approach: Simultaneous Switching
Switch all unit cells simultaneously when Vout
reaches a lower bound Vref
Add an adaptive clock generator so that clock
frequency tracks the voltage ripple 9
Pros - Simplifies the design
- No charge sharing losses
- Better energy efficiency
Cons - Need to deal with big ripple
on voltage output
![Page 10: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/10.jpg)
Self-Adjusting Clock Generator
Replica tracks critical path with voltage ripple
Controller quantizes clock edge
10
Block Diagram (Tunable Replica Path)
DLL2Ghz input,
16 phases
ControllerSelect closest edge
Tunable
Replica
Path
Vout .........
...
...
......
Adaptive system clock
Vout
![Page 11: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/11.jpg)
Reconfigurable SC Converters for DVFS
11
Simple lower bound control
1V Mode 1.8V 1/2 Mode 1V 2/3 Mode 1V 1/2 Mode(~1V) (~0.9V) (~0.67V) (~0.5V)
Unit Cell Topologies
Lower-bound Controller
Operational Waveform
1V
1V
1V
1.8V
1V
1.8V
1V
1V
1V
1V
1V
1V
VoutVout
VoutVout
+FSM
2GHz comparator
Vref
Vout toggle
toggle
Vref
Vout
Φ1Φ1 Φ1 Φ1
Φ2Φ2Φ2
Φ2
Φ2
Φ2
Φ2 Φ1
Φ1
Φ1
Φ1
off
off
Φ2
Φ2 Φ1
Φ1
Φ1
Φ1
off
Φ2
off
off
Φ1
Φ1
Φ1
Φ1
off
off
Φ2
Φ2
Φ2
Φ2
off
off
off
off
Efficiency Improvements
Interleaving Rippling (this approach)
Videal VoutVlow
VhighVout
VidealVlow
Vhigh
clk
Adaptive clock negates losses by increasing the average clock frequency
caused by charge sharing with interleaved converters
clk
No losses∆ V loss
∆ V
Vlow
Vin
Vlow
VinVhigh
Vlow
VinVin
Vhigh
toggletoggle
Operational Waveform
z
bypass
serial
![Page 12: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/12.jpg)
Raven3 Chip Architecture
12 12
![Page 13: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/13.jpg)
RISC-V is a new, open, and completely free
general-purpose instruction set architecture (ISA)
developed at UC Berkeley starting in 2010
RISC-V is simple and a clean-slate design - The base (enough to boot Linux and run modern
software stack) has less than 50 instructions
RISC-V is modular and has been designed to be
flexible and extensible - Better integrate accelerators with host cores
RISC-V software stack - GNU tools (GCC/Binutils/glibc/newlib/GDB),
LLVM/Clang, Linux, Yocto (OpenJDK, Python, Scala)
Checkout http://riscv.org for more details 13
![Page 14: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/14.jpg)
Rocket Scalar Core
14
PC IF ID EX MEM
To Vector
Accelerator
64-bit 5-stage single-issue in-order pipeline
Design minimizes impact of long clock-to-output delays of
SRAMs
64-entry BTB, 256-entry BHT, 2-entry RAS
MMU supports page-based virtual memory
IEEE 754-2008-compliant FPU
Supports SP, DP Fused-Multiply-Add (FMA) with HW
support for subnormals
WB
![Page 15: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/15.jpg)
ARM Cortex-A5 vs. RISC-V Rocket
Category ARM Cortex-A5 RISC-V Rocket
ISA 32-bit ARM v7 64-bit RISC-V v2
Architecture Single-Issue In-Order Single-Issue In-Order 5-stage
Performance 1.57 DMIPS/MHz 1.72 DMIPS/MHz
Process TSMC 40GPLUS TSMC 40GPLUS
Area w/o Caches 0.27 mm2 0.14 mm2
Area with 16K Caches 0.53 mm2 0.39 mm2
Area Efficiency 2.96 DMIPS/MHz/mm2 4.41 DMIPS/MHz/mm2
Frequency >1GHz >1GHz
Dynamic Power <0.08 mW/MHz 0.034 mW/MHz
- PPA reporting conditions - 85% utilization, use Dhrystone for benchmark, frequency/power at TT
0.9V 25C, all regular Vt transistors
- 10% higher in DMIPS/MHz, 49% more area-efficient 15
![Page 16: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/16.jpg)
Hwacha Vector Accelerator
16
![Page 17: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/17.jpg)
17
1.3mm X 1.8mm Raven3 Chip
Fabricated in ST 28nm FDSOI
(Fully Depleted Silicon-on-Insulator)
Chip area: 2.34 mm2
Single-Core RISC-V Processor with a Vector Accelerator
Core area: 1.19mm2
Integrated Switched-Capacitor DC-DC Converter
Converter area: 0.19mm2
Area overhead: 16%
Adaptive Clock Generator
971 MHz
34 GFLOPS/W w/o Converter
26 GFLOPS/W w/ Converter
122 pins total
60 for signals
62 for power/ground
![Page 18: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/18.jpg)
Raven3 Test setup
Conf. 1
Conf. 2
Conf. 3
18
Daughterboard: Chip-on-board
Motherboard: Programmable
supplies
Host: Zedboard (ZYNQ FPGA,
network accessible)
![Page 19: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/19.jpg)
Raven3 Voltage Measurements
19
![Page 20: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/20.jpg)
Measuring Efficiency
20
Traditional efficiency measurement is not
appropriate for our design 1. Difficult to measure on-chip current and voltage
2. Cannot measure adaptive clock overhead
![Page 21: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/21.jpg)
Raven3 Achieves >80% System Efficiency
System efficiency = Ratio of energy required to
finish same workload in same time with the
converter versus without the converter
21
![Page 22: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/22.jpg)
Raven3 Achieves 34 Double-Precision GFLOPS/W
Double-precision matrix-multiplication running on
vector unit used as energy-efficiency metric - 34 GFLOPS/W without DC-DC converter
- 26 GFLOPS/W with DC-DC converter
- Note, GFLOPS/W is inverse of nJ/FLOP
22
![Page 23: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/23.jpg)
Five 28nm & Six 45nm RISC-V Chips Taped Out So Far
23
Raven-1 Raven-2
Raven-3
Raven-3.5
EOS14
EOS16
EOS18
EOS20
EOS22 EOS24
2011 2012 2013 2014 2015
May Apr Aug Feb Jul Sep Mar Nov Mar
Raven: ST 28nm FDSOI
SWERVE: TSMC 28nm
EOS: IBM 45nm SOI
SWERVE
Apr
![Page 24: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/24.jpg)
Agile Hardware Development Methodology
24
C++
FPGA
ASIC Flow
Tape-in
Tape-out
Big Chip
Tape-out
“Tape-in”: Designs that
could be taped out - LVS clean & DRC sane
- Pass RTL/gate-level
simulation and timing
Fully scripted ASIC flow - RTL change to chip <1d
- Get early feedback
- Automatic nightly
regressions
- Identify source of subtle
bugs
Check longer programs on
FPGA
Iterate quickly on RTL with
using the
C++ emulator (Checkout
chisel.eecs.berkeley.edu
for more details)
![Page 25: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/25.jpg)
Summary
28nm Raven3 processor features: - Fine-grained, wide-range DVFS (20ns, 0.45-1V)
- Entirely on-chip voltage conversion
- High system efficiency (>80%)
- Extreme energy efficiency (34/26 GFLOPS/W)
Key enablers - RISC-V, a simple, yet powerful ISA (free and open!)
- Agile development, build hardware like software
- Chisel, lets designers do more things with less effort
RISC-V core generators and software tools open-
sourced at http://riscv.org
Energy-efficient, low-cost on-chip DC/DC
converters are buildable!
25
![Page 26: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/26.jpg)
Acknowledgements
Funding: BWRC members, ASPIRE members,
DARPA PERFECT Award Number HR0011-12-2-
0016, Intel ARO, AMD, GRC, Marie Curie FP7,
NSF GRFP, NVIDIA Fellowship
Fabrication donation by STMicroelectronics
Tom Burd, James Dunn, Olivier Thomas, and
Andrei Vladimirescu
26
![Page 27: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/27.jpg)
27
SUPPLEMENTAL
SLIDES
![Page 28: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/28.jpg)
Test Chip Setup
28
![Page 29: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/29.jpg)
FPGA Setup
29
![Page 30: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/30.jpg)
“Rocket Chip” SoC Generator
30
![Page 31: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/31.jpg)
Who should use the Rocket Chip Generator?
31
1. Change Parameters 2. Develop New Accelerators 3. Develop Own RISC-V Core 4. Develop Own Device
![Page 32: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/32.jpg)
Chisel is a new HDL embedded in Scala - Rely on good software engineering practices such as
abstraction, reuse,
object oriented
programming,
functional programming
- Build hardware
like software
Chisel is not a
high-level
synthesis tool - It is a program
that writes RTL
Chisel Program
C++
code FPGA
Verilog ASIC
Verilog
Software
Simulator
C++ Compiler
Scala/JVM
FPGA
Emulation
FPGA Tools
GDS
Layout
ASIC Tools
32
![Page 33: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/33.jpg)
Functional Programming 101
(1, 2, 3) map { n => n+1 } Result: (2, 3, 4)
(1, 2, 3) zip (a, b, c) Result: ((1, a), (2, b), (3, c))
((1, a), (2, b), (3, c)) map { case (left, right) =>
left } Result: (1, 2, 3)
(1, 2, 3) foreach { n => print(n) } Result: “123”
for (x <- 1 to 3; y <- 1 to 3) yield (x, y) Result: ((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2),
(3,3))
33
![Page 34: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/34.jpg)
Functional Programming Example Used in AHB-Lite Crossbar
class AHBXbar(nMasters: Int, nSlaves: Int) extends Module { val io = new Bundle { val masters = Vec.fill(nMasters){new AHBMasterIO}.flip val slaves = Vec.fill(nSlaves){new AHBSlaveIO}.flip } val buses = List.fill(nMasters){ Module(new AHBBus(nSlaves)) } val muxes = List.fill(nSlaves){ Module(new AHBSlaveMux(nMasters)) } (buses.map(b => b.io.master) zip io.masters) foreach { case (b, m) => b <> m } (muxes.map(m => m.io.out) zip io.slaves ) foreach { case (x, s) => x <> s } for (m <- 0 until nMasters; s <- 0 until nSlaves) yield { buses(m).io.slaves(s) <> muxes(s).io.ins(m) } } 34
![Page 35: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/35.jpg)
TileLinkIO
35
- TileLinkIO consists of Acquire, Probe, Release, Grant, Finish
Client
Manager
Cache
Acqu
ire
Fin
ish
Gra
nt
Pro
be
Rele
ase
Client
Acqu
ire
Fin
ish
Gra
nt
Cache
Pro
be
Rele
ase
![Page 36: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/36.jpg)
UncachedTileLinkIO
36
Client
Manager
Cache
Acqu
ire
Fin
ish
Gra
nt
Pro
be
Rele
ase
Client
Acqu
ire
Fin
ish
Gra
nt
- UncachedTileLinkIO consists of Acquire, Grant, Finish - Convertors for TileLinkIO/UncachedTileLinkIO in uncore
library
![Page 37: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/37.jpg)
ROCCIO
Rocket sends
coprocessor instruction
via the Cmd interface
Accelerator responds
through Resp interface
Accelerator sends
memory requests to
L1D$ via CacheIO
busy bit for fences
IRQ, S, exception bit
used for virtualization
UncachedTileLinkIO for
instruction cache on
accelerator
PTWIO for page-table
walker ports on
accelerator
37
![Page 38: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/38.jpg)
Vector Bank Execution Diagram
R R
R R
R
W
R
R
W
R
R
W
R
R
38
After a 2-cycle initial startup latency, the banked RF is
effectively able to read out 2 operands/cycle.
![Page 39: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/39.jpg)
Our Physical Design Flow
39
RTL Code (Verilog)
Synthesis
Place-and-Route
Gate-level Netlist
Signed-Off Design
Formality
Formal Verification
PrimeTime/StarRC
Static Timing Analysis
VCS Post-PNR
Gate-level Simulation
Hierarchical SYN & PNR
UPF-based MV SYN & PNR
Chisel Source Code
Chisel
![Page 40: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/40.jpg)
What’s Different about RISC-V?
Simple - Far smaller than other commercial ISAs
Clean-slate design - Clear separation between user and privileged ISA - Avoids µarchitecture or technology-dependent
features
A modular ISA - Small standard base ISA - Multiple standard extensions
Designed for extensibility/specialization - Variable-length instruction encoding - Vast opcode space available for instruction-set
extensions
Stable - Base and standard extensions are frozen - Additions via optional extensions, not new versions
40
![Page 41: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/41.jpg)
SPECint2006 Code Size
RISC-V is smallest ISA for 32- and 64-bit addresses
- Average 34% smaller for RV32C, 42% smaller for RV64C
41
142%
100%
141% 131% 129%
169%
80%
100%
120%
140%
160%
180%
RV64C RV64 X86-64 ARMv8 MIPS64
64-bit Address
134%
100%
140%
126% 136%
101%
173%
126%
80%
100%
120%
140%
160%
180%
32-bit Address
![Page 42: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/42.jpg)
RISC-V Documentation, Software Ecosystem
Documentation - RISC-V User-Level ISA Specification V2
- RISC-V Compressed ISA Specification V1.7
- RISC-V Privileged-Level ISA Specification V1.7
Modern Software Stack - GCC/Binutils/glibc/newlib/GDB
- LLVM/Clang
- Linux
- Yocto (OpenJDK, Python, Scala, …)
RISC-V Software Implementation - ANGEL JS ISA Simulator
- Spike ISA Simulator
- QEMU
Checkout http://riscv.org for more details 42
![Page 43: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/43.jpg)
RISC-V Core Generators from UC Berkeley
Z-scale: Family of Tiny Cores - Similar class: ARM Cortex M0/M0+/M3/M4
- Integrates with AHB-Lite interconnect
- Open-Sourced
Rocket: Family of In-order Cores - Currently 64-bit single-issue only
- Plans to work on dual-issue, 32-bit options
- Similar class: ARM Cortex A5/A7/A53
- Will integrate with AXI4 interconnect
- Open-Sourced
BOOM: Family of Out-of-Order Cores - Supports 64-bit single-, dual-, quad-issue
- Similar class: ARM Cortex A9/A15/A57
- Will integrate with AXI4 interconnect
43
![Page 44: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/44.jpg)
CoreMark Scores
44
0
1.00
2.00
3.00
4.00
5.00
6.00
CoreMark/MHzC
ore
Mark
/MH
z
UC BerkeleyIndustry*Comparisons
2
Checkout Chris Celio’s “BOOM: Berkeley Out-of-Order
Machine” talk from 2nd RISC-V Workshop for more details
![Page 45: Raven: A 28nm RISC-V Vector Processor with Integrated ...hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday... · 1 Raven: A 28nm RISC-V Vector Processor with Integrated](https://reader034.vdocuments.us/reader034/viewer/2022042520/5a7981957f8b9a6c158c8a47/html5/thumbnails/45.jpg)
Glossary
DVFS: Dynamic Voltage and Frequency Scaling
SC: Switched Capacitor
DLL: Delay-Locked Loop
LDO: Low-Dropout Regulator
FDSOI: Fully Depleted Silicon-on-Insulator
FMA: Fused-Multiply-Add
BTB: Branch Target Buffer
BHT: Branch History Table
RAS: Return Address Stack
45