hasim: fpga-based micro-architecture simulatorhasim is a timing model – not rtl! • performance...
TRANSCRIPT
![Page 1: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/1.jpg)
HAsim: FPGA-Based Micro-Architecture Simulator
Michael Adler Michael Pellauer
Kermin E. Fleming* Angshuman Parashar
Joel Emer
*MIT
![Page 2: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/2.jpg)
HAsim Is a Timing Model – Not RTL!
• Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable • Full of communication channels
• Programmed like a software timing model • FPGA is just a highly parallel execution engine • FPGA cycle != Model cycle • FPGA simulation will be faster than software if: • Parallelism can overcome the ~40x clock difference • I/O bandwidth is sufficient
1
![Page 3: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/3.jpg)
Fast, Accurate or Now?
2
Accuracy
Development Time
Model Speed
![Page 4: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/4.jpg)
FPGA Picture is Different
3
Accuracy
Development Time
Model Speed
![Page 5: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/5.jpg)
Reducing Development Time: Managing Complexity
• Programming Language (Bluespec) • Timing model infrastructure
– Reusable functional model – Inter-module communication – Tracking simulated time
• Hybrid hardware / software models – GEM5 for:
• Checkpoints • Loading • Functional memory management • Emulating difficult instructions
4
Development Time
![Page 6: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/6.jpg)
STDIO on General Purpose Machines
FILE *f = fopen(path, “w”); const char *name = “Kenneth”; fprintf(f, “%s, what is the frequency?\n”, name);
5
![Page 7: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/7.jpg)
I/O In Hardware Description Languages (System Verilog)
Integer f = fopen(path, “w”); string name = “Kenneth”; fwrite(f, “%s, what is the frequency?\n”, name);
6
![Page 8: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/8.jpg)
Nothing Comes from Nothing
FPGAs have: • No standard physical device • No standard device model • No standard system interface • No standard API
7
![Page 9: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/9.jpg)
What Makes Hardware General Purpose?
• The software – Compilers and library APIs make code “universal” – Hardware standards (ACPI, PCIe) mostly make OS
development and compiler writing easier. Little impact on user programs.
– ISA matters if you want to avoid recompiling. ISA is part of the software API, along with standard libraries.
8
![Page 10: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/10.jpg)
LEAP Platform
RRR
Pla&orm Interface
STDIO Scratchpad Memory
Control
Timing Par<<on
Func<onal Par<<on
Remote Memory Channel
FPGA Physical Pla&orm
Exe Decode Fetch
RRR
Channel
So'ware Physical Pla&orm
Virtual Pla2orm
Control
SoBware Services
Streams Memory State Emulate
Virtual Pla2orm
FPGA So'ware
![Page 11: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/11.jpg)
Reducing Model Complexity: Shared Functional Model
10
ITranslate
Fetch
DTranslate
Memory
Local Commit
Global Commit
Decode
Execute
Functional Pipeline
Functional State
• Similar philosophy to GEM5 or Asim: – Single ISA functional model
implementation – Functional machine state is
completely managed – Timing models can be ISA-
independent
• Each functional pipeline stage behaves like a request/response FIFO
ISPASS 2008 Paper: Quick Performance Models Quickly: Timing-Directed Simulation on FPGAs
![Page 12: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/12.jpg)
Timing Model
11
ITranslate
Fetch
DTranslate
Memory
Local Commit
Global Commit
Decode
Execute
Functional Pipeline
Functional State
IP
Next IP
• Timing & functional models communicate state using tokens
• Minimal timing model: – Only state is IP – Drives a single
token at a time
Timing Pipeline
![Page 13: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/13.jpg)
Pipeline Parallelism
12
ITranslate
Fetch
DTranslate
Memory
Local Commit
Global Commit
Decode
Execute
Functional Pipeline
Functional State
IPs
Next IPs
• Model of a pipelined design naturally runs pipelined on an FPGA
• Detailed model of a pipelined design runs faster than a trivial, unpipelined model!
![Page 14: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/14.jpg)
13
Managing Time: A-Ports and Soft Connections
FPGA cycles != simulated cycles:
– We are building a timing model, NOT a prototype – 1:n cycle mapping would force us to slow the
timing clock to the longest operation, even if it is infrequent
– 1:n would force us either to fit an entire design on the FPGA or synchronize clock domains
![Page 15: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/15.jpg)
14
Option #1: Global Controller [rejected]
Central controller advances cycle when all modules are ready • Improvement: slowest possible cycle no longer dictates
throughput • However:
– Place & route becomes difficult – Long signal to global controller is on the critical path
FET DEC EXE MEM WB
Controller
curCC
![Page 16: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/16.jpg)
15
Option #2: A-Ports
• Extension of Asim ports • FIFO with user-specified latency and capacity • Manage model time by guaranteeing exactly one
message per cycle through every port
FET DEC EXE MEM WB 1 1
1 1 0
2
• Beginning of model cycle: read all input ports • End of model cycle: write all output ports
ISFPGA 2008 Paper: A-Ports: An Efficient Abstraction for Cycle-Accurate Performance Models on FPGAs
![Page 17: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/17.jpg)
Hybrid Modeling: Software Instruction Emulation
16
FPG
A Sof
twar
e
Time
Execute
Emulation Server
GEM5 Functional Instruction Simulator
Memory Server
Functional Cache
Execute
Emulation Server
Sync Registers Sync
Reg
iste
rs
RRR Layer
Emulate Instruction Em
ulat
ion
Don
e
……
Ack
![Page 18: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •](https://reader034.vdocuments.us/reader034/viewer/2022052000/60129359bdecac1776319f6d/html5/thumbnails/18.jpg)
HAsim / LEAP Open Source
Redmine site with source and papers:
http://asim.csail.mit.edu/
17