ucb ace lab - risc-v · 2016-09-20 · uc berkeley...

35
UC Berkeley The Berkeley OutofOrder Machine (BOOM!): Computer Architecture Research Using an IndustryCompeBBve, Synthesizable, Parameterized RISCV Processor Christopher Celio, Krste Asanovic, David PaLerson 2015 June [email protected] Tuesday, June 30, 15

Upload: others

Post on 10-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley

The  Berkeley  Out-­‐of-­‐Order  Machine  (BOOM!):  Computer  Architecture  Research  Using  an  

Industry-­‐CompeBBve,  Synthesizable,  Parameterized  RISC-­‐V  Processor

Christopher  Celio,  Krste  Asanovic,  David  PaLerson

2015  [email protected]

Tuesday, June 30, 15

Page 2: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley What  is  BOOM?

§ superscalar,  out-­‐of-­‐order  processor  wriLen  in  Berkeley’s  Chisel  RTL

§ It  is  synthesizable§ It  is  parameterizable§ We  hope  to  use  it  as  a  plaQorm  for  architecture  research

2

BOOM is a work-in-progress. Results shown in the talk are preliminary and subject to

change!

Tuesday, June 30, 15

Page 3: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Other  Berkeley  RISC-­‐V  Processors

3

§ Sodor  CollecBon- RV32I  -­‐  Bny,  educaBonal,  not-­‐synthesizable

§ Z-­‐scale  - RV32IM  -­‐  micro-­‐controller

§ Rocket- RV64G  -­‐  in-­‐order,  single-­‐issue  applicaBon  

core§ BOOM

- RV64G  -­‐  out-­‐of-­‐order,  superscalar  applicaBon  core

Tuesday, June 30, 15

Page 4: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Why  OoO?

§ Great  for  ...- tolera'ng  variable  latencies- finding  ILP  in  code  (instruc'on-­‐level  parallelism)- complex  method  for  fine-­‐grain  data  prefetching- plays  nicely  with  poor  compilers  and  lazily  wri<en  code

4

Performance!

Tuesday, June 30, 15

Page 5: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley OoO  widely  used  in  industry

§ Intel  Xeon/i-­‐series  (10-­‐100W)§ ARM  Cortex  mobile  chips  (1W)§ Intel  Atom§ Sun/Oracle  Niagara  UltraSPARC§ Play  Sta'on

5Tuesday, June 30, 15

Page 6: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Academic  OoO  Research

§ general  lack  of  effort  in  academia  to  build,  evaluate  OoO  designs

§ most  research  uses  so[ware  simulators- cannot  produce  area,  power  numbers- hard  to  trust,  verify  results- McPAT  is  calibrated  against  90nm  Niagara,  65nm  Niagara  2,  65nm  Xeon,  and  180nm  Alpha  21364

- very  slow§ Other  Academic  OoO  RTL  efforts...- Illinois  Verilog  Model,  Princeton  Sharing  Architecture,  NCSU  FabScalar  (Alpha,  PISA)

- other  ISAs  can  be  very  challenging  to  implement  fully- rely  on  SW  simulators  for  performance  numbers- hopefully  RISC-­‐V  can  make  everybody’s  lives  easier!

6Tuesday, June 30, 15

Page 7: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

Perf  (CoreMark/s)  vs.  Area  (um2)

UC Berkeley Design-­‐space  exploraKon

§ Very  preliminary§ Parameters- fetch  width- issue  width- ROB  size- IW  size- LSU  size- Regfile  size- #  of  branch  tags

§ 3x  range  in  area§ 2x  range  in  performance

7

data  collected  byOrianna  DeMasi

1wide

2wide4wide

pareto  curve

Tuesday, June 30, 15

Page 8: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Research  Methodology

§ Which  benchmarks?§ How  many  cycles  do  we  need  to  run?§ State  of  the  art

- “SimPoints”- run  4-­‐10  snapshots  per  SPEC2000/2006  benchmark- each  snapshot  runs  for  ~10M  instrucBons

§ What  other  people  do  (ISCA  2014  results)-~50M  instrucKons  /  workload-~200B  instrucKons  /  paper

§ What  we  can  do-map  design  to  an  FPGA- run  50  MHz  (~1T  cycles/6hrs)- run  full  reference  benchmark  (~2  Trillion  instrucBons  avg)- run  on  FPGA  cluster  (~1-­‐2  weeks  simulaBon  in  one  day,  or  ~30-­‐60T  instrucKons/day) 8

Tuesday, June 30, 15

Page 9: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley

Berkeley  Architecture  Research  Infrastructure

§ RISC-­‐V  ISA§ Chisel  HCL  (hardware  construcBon  language)§ Rocket-­‐chip  SoC  generator

9Tuesday, June 30, 15

Page 10: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley The  RISC-­‐V  ISA  is  easy  to  implement!

§ relaxed  memory  model§ accrued  FP  excepBon  flags§ no  integer  side-­‐effects  (e.g.,  condiBon  codes)§ no  cmov  or  predicaBon§ no  implicit  register  specifiers  - JAL  requires  explicit  rd

§ rs1,  rs2,  rs3,  rd  always  in  same  space-allows  decode,  rename  to  proceed  in  parallel

10Tuesday, June 30, 15

Page 11: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley The  RISC-­‐V  ISA

§ BOOM  supports  “M”  (mul/div/rem)- imul  can  be  either  pipelined  or  unpipelined

§ BOOM  supports  “A”  -AMOs+LR/SC

§ BOOM  supports  “FD”  - single,  double-­‐precision  floa'ng  point- IEEE  754-­‐2008  compliant  FPU- SP,  DP  FMA  with  hw  support  for  subnormals

§ RV64G

11Tuesday, June 30, 15

Page 12: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Rocket-­‐Chip  SoC  Generator

12

§ open-­‐source§ taped  out  10  Bmes  by  Berkeley

§ runs  at  1.6  GHz  in  IBM  45nm§ makes  for  a  great  library  of  processor  components!

Tuesday, June 30, 15

Page 13: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC BerkeleySupports  Privileged  ISA  (“S”),  Virtual  Memory

§ boots  Linux!§ just  released  Privileged  ISA  v1.7§ instant  to  update- Privileged  ISA  nearly  en'rely  isolated  to  Control/Status  Register  (CSR)  File,  TLBs

- updated  git  submodule  pointers- changed  “tohost”  to  “mtohost”  in  one  line

13Tuesday, June 30, 15

Page 14: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Chisel

§ Hardware  Construc'on  Language  embedded  in  Scala

§ not  a  high-­‐level  synthesis  language§ hardware  module  is  a  data  structure  in  Scala

§ Full  power  of  Scala  for  wri'ng  generators- object-­‐oriented  programming- factory  objects,  traits,  overloading

- funcBonal  programming- high-­‐order  funs,  anonymous  funcs,  currying

§ generated  C++  simulator  is  1:1  copy  of  Verilog  designs

14Tuesday, June 30, 15

Page 15: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Chisel  Hardware  ConstrucKon  Language

§ object-­‐oriented,  funcBonal  programming§ powerful  for  wriBng  hw  generators§ 12  days  (+1092  loc)  to  add  SP,DP  floaBng  point  § 9  days  (+900  loc)  to  go  from  no  VM  to  booBng  Linux

15Tuesday, June 30, 15

Page 16: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley BOOM

16

in-­‐orderfront-­‐half

out-­‐of-­‐orderback-­‐half

Fetch Decode &Rename

Issue Window

Unified PhysicalRegister

File

Functional Unit

Tuesday, June 30, 15

Page 17: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley BOOM

17

§ PRF  - explicit  renaming- holds  specula've  and  commi<ed  data- holds  both  x-­‐regs,  f-­‐regs

§ Unified  Issue  Window- holds  all  instruc'ons

§ split  ROB/issue  window  design

Fetch Decode &Rename

Issue Window Unified

PhysicalRegister

File (PRF)

FPU

ALU

Rename Map Tables & Freelist

ROB

Commit

Tuesday, June 30, 15

Page 18: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Parameterized  Superscalar

18

OR

val  exe_units  =  ArrayBuffer[ExecutionUnit]()exe_units  +=  Module(new  ALUExeUnit(is_branch_unit        =  true                                                                        ,  has_fpu                =  true                                                                        ,  has_mul                =  true                                                                        ))exe_units  +=  Module(new  ALUMemExeUnit(fp_mem_support  =  true                                                                        ,  has_div                =  true                                                                        ))Issue

SelectRegfileWriteback

dual-issue (5r,3w)

bypassing

ALU

div

LSUAgen D$

bypassing

ALU

FPU

bypassnetwork

RegfileRead

imul

exe_units  +=  Module(new  ALUExeUnit(is_branch_unit  =  true))exe_units  +=  Module(new  ALUExeUnit(has_fpu  =  true                                                                  ,  has_mul  =  true                                                                  ))exe_units  +=  Module(new  ALUExeUnit(has_div  =  true))exe_units  +=  Module(new  MemExeUnit())

Issue Select

RegfileWriteback

Quad-issue (9r,4w)

ALU

div

LSUAgen D$

ALU

imul

FPU

ALU

bypassing

bypassnetwork

RegfileRead

Tuesday, June 30, 15

Page 19: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Full  Branch  SpeculaKon  Support

§ next-­‐line  predictor  (NLP)- BTB,  BHT,  RAS- combinaBonal

§ backing  predictor  (BPD)- global  history  predictor- SRAM  (1  r/w  port)

19

Branch Prediction

I$

Fetch Buffer

Fetch1

 μDecPC1

Fetch2

NLP

ExeBrTarget

NPC

Front-end

BPD

PC2

Front-end

TakePC

BHT Target >>

Tuesday, June 30, 15

Page 20: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Load/Store  Unit

§ load/store  queue  with  store  ordering- loads  execute  fully  out-­‐of-­‐order  wrt  stores,  other  loads- store-­‐data  forwarded  to  loads  as  required

§ non-­‐blocking  data  cache

20Tuesday, June 30, 15

Page 21: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Synthesizable§ Runs  on  FPGA- (Zynq  zedboard  and  Zynq  zc706)

§ 2GHz  (30  FO4)  in  TSMC  45nm- speed  of  logic  (SRAM  is  slower)

212-wide BOOM layout.

Regfile

LLC Data

LLC Data (256k)ROB

D$ (32k)

Issue

Uncore

I$ (32k)Rename

Exe

Exe

I$

bpd

RenUncore

1.7mm2 @ 45nm

preliminary resultsTuesday, June 30, 15

Page 22: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Benefits  of  using  Chisel§ ~9,000  loc  in  BOOM  github  repo§ addiBonal  ~11,500  loc  instanBated  from  other  libraries- ~5,000  loc  from  Rocket  core  repository- func'onal  units,  caches,  PTWs,  etc.

- ~4,500  loc  from  uncore- coherence  hubs,  L2  caches,  networks,  host/target  interfaces

- ~2000  loc  from  hardfloat- floa'ng  point  hard  units

22Tuesday, June 30, 15

Page 23: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Feature  Summary

23

Feature BOOM

ISA RISC-V (RV64G)

Synthesizable √FPGA √

Parameterized √floating point √

AMOs+LR/SC √caches √

VM √Boots Linux √Multi-core √

lines of code 9k + 11k

Tuesday, June 30, 15

Page 24: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley That’s  BOOM!

24

Issue Select

RegfileWriteback

Quad-issue (9r,4w)

ALU

div

LSUAgen D$

ALU

imul

FPU

ALU

bypassing

bypassnetwork

RegfileRead

Tuesday, June 30, 15

Page 25: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

Category ARM Cortex-A9 RISC-V BOOM-2w

ISA 32-bit ARM v7 64-bit RISC-V v2 (RV64G)

Architecture 2 wide, 3+1 issue Out-of-Order 8-stage

2 wide, 3 issue Out-of-Order 6-stage

Performance 3.59 CoreMarks/MHz 3.91 CoreMarks/MHz

Process TSMC 40GPLUS TSMC 40GPLUS

Area with 32K caches

~2.5 mm2 ~1.00 mm2

Area efficiency 1.4 CoreMarks/MHz/mm2 3.9 CoreMarks/MHz/mm2

Frequency 1.4 GHz 1.5 GHz

UC Berkeley Comparison  against  ARM

25

Category ARM Cortex-A9 RISC-V BOOM-2w

ISA 32-bit ARM v7 64-bit RISC-V v2 (RV64G)

Architecture 2 wide, 3+1 issue Out-of-Order 8-stage

2 wide, 3 issue Out-of-Order 6-stage

Performance 3.59 CoreMarks/MHz 3.91 CoreMarks/MHz

Process TSMC 40GPLUS TSMC 40GPLUS

Area with 32K caches

~2.5 mm2 ~1.00 mm2

Area efficiency 1.4 CoreMarks/MHz/mm2 3.9 CoreMarks/MHz/mm2

Frequency 1.4 GHz 1.5 GHz

+9%!

2-wide BOOM layout.

Regfile

LLC Data

LLC Data (256k)ROB

D$ (32k)

Issue

Uncore

I$ (32k)Rename

Exe

Exe

I$

bpd

RenUncore

note:  not  to  scale

preliminary resultsTuesday, June 30, 15

Page 26: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

0

1.00

2.00

3.00

4.00

5.00

6.00CoreMark/MHz

Cor

eMar

k/M

Hz

UC Berkeley Industry  Comparisons

26

BOOM

-­‐4w

BOOM

-­‐2w

Rocket

in-­‐orderprocessors

out-­‐of-­‐orderprocessors

Ivy  Bri

dge

Cortex

-­‐A15

Cortex

-­‐A9

Cortex

-­‐A8

Cortex

-­‐A5MIPS74

k

preliminary resultsTuesday, June 30, 15

Page 27: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Industry  Comparisons

27

Processor Core Area CoreMark/MHz Freq (MHz) IPC

Intel Xeon E5 2668 (Ivy) ~12 mm2@22nm 5.60 3,300 1.96

ARM Cortex-A15 2.8 mm2@28nm 4.72 2,116 1.50

BOOM-4wide 1.1 mm2@45nm 4.70 1,000 1.50

BOOM-2wide 0.8 mm2@45nm 3.91 1,500 1.26

ARM Cortex-A9 2.5 mm2@40nm 3.59 1,400 1.27

MIPS 74K 2.5 mm2@65nm 2.50 1,600 -

Rocket (RV64G) 0.5 mm2@45nm 2.32 1,500 0.76

ARM Cortex-A5 0.5 mm2@40nm 2.13 - -

48x

preliminary resultsTuesday, June 30, 15

Page 28: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Ivy  Bridge  Tile  Comparison

282-wide BOOM layout.

Regfile

LLC Data

LLC Data (256k)ROB

D$ (32k)

Issue

Uncore

I$ (32k)Rename

Exe

Exe

I$

bpd

RenUncore

2-wide BOOM layout.

Regfile

LLC Data

LLC Data (256k)ROB

D$ (32k)

Issue

Uncore

I$ (32k)Rename

Exe

Exe

I$

bpd

RenUncore

Ivy  Bridge-­‐EP  Tile  (32kB/32kB  +  256kB  caches)

~12nm  @  22nm

BOOM-­‐2w  Chipscaled  to  0.4mm2  @  22nm

BOOM-2w Chip (32kb/32kB + 256kB caches) 1.7mm2 @ 45nm

preliminary resultsTuesday, June 30, 15

Page 29: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Synthesis  Results

290

100000

200000

300000

400000

500000

600000

700000

No FPU FPU BOOM-1w BOOM-2w BOOM-4w

Core Area (um^2)

Issue UnitRename Stage (maptables)RRd Stage (bypasses)Register FileROBLSUBr PredictorFreelistBusyTableFetchBufferFPUImulOther

Rocket

BOOM

preliminary resultsTuesday, June 30, 15

Page 30: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Synthesis  Results

30

Issue Unit Rename Stage (maptables)RRd Stage (bypasses) Register FileROB LSUBr Predictor FreelistBusyTable FetchBufferFPU ImulOther

0

100000

200000

300000

400000

500000

600000

700000

No FPU BOOM-1w BOOM-4w

Core Area (um^2)

0

200000

400000

600000

800000

1000000

1200000

Rck-I Rck-G BOOM-1wBOOM-2wBOOM-4w

Tile Area (um^2)

CoreI$ (16 KB)D$ (16 KB)

preliminary resultsTuesday, June 30, 15

Page 31: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Lessons§ RISC-­‐V  is  a  great  ISA

- it  gets  out  of  your  way- the  instrucBon  count  difference  is  greater  between  gcc  versions  than  between  ISAs

§ code-­‐reuse  is  great- leveraging  exisBng  Rocket-­‐chip  infrastructure

§ Way  too  much  of  my  Bme  is  wasted  on  corralling  benchmarks-we  should  share  our  efforts- hLps://github.com/ccelio/Speckle/- make  generaBng  portable  SPEC  CPU2006  easy

§ Debugging  is  hard- good  verificaBon  tests  are  more  valuable  than  good  RTL- use  asserts  EVERYWHERE- use  an  ISA  simulator  in  parallel  with  RTL  simulaBon

31Tuesday, June 30, 15

Page 32: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley “Speckle”  -­‐  a  wrapper  for  SPEC  CPU2006

§ SPEC  is  designed  to  be  run  naBvely- a  pain  for  cross-­‐compiling,  running  on  a  simulator  or  FPGA

§ If  you  have  a  copy  of  CPU2006...-modify  the  provided  cfg  file- Speckle  will  compile  and  generate  a  portable  directory  of  binaries,  input  files,  and  input  arguments,  and  a  run  script

§ hLps://github.com/ccelio/Speckle/

32Tuesday, June 30, 15

Page 33: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Conclusion§ BOOM  supports  full  RV64G  +  privileged  ISA  (VM  support)§ Able  to  boot  Linux  and  run  CoreMark,  SPECINT,  and  Dhrystone  benchmarks

§ BOOM  is  9,000  loc  and  3  person-­‐years  of  work

§ Future  Work- bring-­‐up  more  interes'ng  applica'ons- add  ROCC  interface- explore  new  µarch  designs- tape-­‐out  this  fall- open-­‐source  by  winter  workshop

33Tuesday, June 30, 15

Page 34: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley QuesKons?

34Tuesday, June 30, 15

Page 35: UCB ACE Lab - RISC-V · 2016-09-20 · UC Berkeley The$Berkeley$Outof-Order$Machine$(BOOM!Computer$Architecture$Research$Using$an$ Industry-CompeBBve,$Synthesizable,$Parameterized$

UC Berkeley Funding  Acknowledgements

35

§ Research  par*ally  funded  by  DARPA  Award  Number  HR0011-­‐12-­‐2-­‐0016,  the  Center  for  Future  Architecture  Research,  a  member  of  STARnet,  a  Semiconductor  Research  Corpora*on  program  sponsored  by  MARCO  and  DARPA,  and  ASPIRE  Lab  industrial  sponsors  and  affiliates  Intel,  Google,  Huawei,  Nokia,  NVIDIA,  Oracle,  and  Samsung.  

§ Approved  for  public  release;  distribu*on  is  unlimited.  The  content  of  this  presenta*on  does  not  necessarily  reflect  the  posi*on  or  the  policy  of  the  US  government  and  no  official  endorsement  should  be  inferred.

§ Any  opinions,  findings,  conclusions,  or  recommenda*ons  in  this  paper  are  solely  those  of  the  authors  and  does  not  necessarily  reflect  the  posi*on  or  the  policy  of  the  sponsors.  

Tuesday, June 30, 15