Computer Architecture Lab at
Combining Simulators and FPGAs “An Out-of-Body Experience”
Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi
{echung, bgold, jhoe, babak}@ece.cmu.edu
SIMFLEX/PROTOFLEX
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 2
The RAMP full-system challenge• RAMP vision for studying systems w/ FPGAs
– functional & cycle-accurate simulation
– scalability, speed, & flexibility on FPGAs
– full-system (run unmodified binaries & OS)
PCI Bus
Ethernetcontroller
Graphics card
I/O MMUcontroller
DiskDisk
DMAcontroller
IRQ controller
Terminal
MemorySCSI
controller
CPU CPU
‘Full-sys’ RAMP will incur large effortyet, not all behaviors frequently used (e.g., I/O)
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 3
• Simulators already provide full-system
why not simulate infrequent behaviors (e.g., I/O devices)?
Combining simulators & FPGAs
• Advantages
– avoid impl. infreq. behaviors lowers full-sys FPGA development
– low impact on scalability & perf. on FPGA
Memory Memory SCSI
disk
SCSI
disk
FPGA SimulatorCPU
Ethernet Ethernet
CPUCPU CPU
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 4
Outline
• Motivation
• Migration
• Implementation status
• Conclusion
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 5
Migration
• 3 ways to map target object to host
FPGA-only Simulation-only Migratable
• Migratable objects
– switch modes between FPGA & simulator hosts
– target behavior need not be 100% in FPGA mode
e.g., impl. 80% target behavior in FPGA, 100% in simulator
1 2 3
Target design FPGA Simulator
1 2
3
“Target objects”ex: func or timing cpu
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 6
Migration example
Target-to-host mappings:
• CPU = migratable
• Memory = FPGA-only
• Devices = SW-only Memory
Memory SCSI
disk
SCSI
FPGA
SimulatorCPU
tim
e
load
Example CPU instruction stream
CPU
addmultiply
I/O SCSI cmdaddsub
..
SCSI cmd
CPU state transfer
loadCPU
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 7
Advantages
• Lowers development effort
– avoid bring-up of infrequent behaviors
– migrate & validate ref. models from simulator
– tailor impl. to workload (avoid rarely used instrs, good for CISC x86)
• Fast & scalable
– perf-critical objects on FPGA (eg, CPU, memory)
– scalable for MPs add migratable CPUs
Memory SCSI
FPGA SimulatorCPU CPU CPU
Memory
CPU
SCSI
disk
CPU CPU
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 8
Subtleties
• Objects separated in simulator/FPGA interact
– examples: interrupts, DMA
– handle by forwarding messages between FPGA/simulator
– FPGA-only & SW-only mapped objects easy to locate
– migrated objects require tracking
Memory Memory SCSI
disk
SCSI
FPGA SimulatorCPUCPUCPU DMA
Forwarded DMA
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 9
Subtleties
• Objects separated in simulator/FPGA interact
– examples: interrupts, DMA
– handle by forwarding messages between FPGA/simulator
– FPGA-only & SW-only mapped objects easy to locate
– migrated objects require tracking
Memory Memory SCSI
disk
SCSI
FPGA SimulatorCPUCPUCPU
Interrupt
Option 1:Forwardedinterrupt
Option 2:Forced migration
Cross-host interactions rare low impact on FPGA perf.
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 10
Subtleties cont.
• Migration cost
– migrating object requires state copy
e.g., migratable CPU has registers & TLBs
– FPGA-to-simulator latency & sim. time limits # migrations/instr
• FPGA & simulator asynchrony
– simulated time “ticks” at different rates in FPGA & simulator
– must synchronize for deterministic replay & accurate device timing
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 11
Outline
• Motivation
• Migration
• Implementation in progress
• Conclusion
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 12
Implementation status
• Target system
– Sun Fire[tm] 3800 Server (up to 24-way)
– UltraSPARC III ISA
– Solaris 8
• Proof-of-concept software-to-software migration
– run 2 instances of Virtutech Simics
– migration designed & tested in 2 weeks
– can migrate on arbitrary behavior (e.g., ADD instruction)
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 13
BlueSPARC core (in progress)• In-order SPARCV9 core
– supports 144 out of 170 integer instr behaviors
– supports partial MMU w/ I- & D-TLBs
– goal: 99.999% of instrs & behaviors in target workloads• SPEC (mostly user-level), OLTP/DB2 (high TLB misses, 40% time in priv-mode)
– CPI ranges 5 to 7 cycles
– synth: 15k LUTs on Virtex-II Pro 30, 85MHz, 12MIPS (worst-case)
– developed in Bluespec HDL, 6000L in 6 weeks
• Core validation
– run RTL in lockstep w/ Simics’s UltraSPARC simulation model
– workload validation w/ SPEC, OLTP/DB2, OpenSPARC verif. suite
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 14
Migration on FPGA (in progress)
Xilinx XUP Virtex-II Pro 30 Virtutech Simics
Migration& messageinterface
• PowerPC functions
– core & memory initialization from Simics checkpoints
– facilitates migration for BlueSPARC
– connects simulated devices to memory (e.g., SCSI DMA)
ethernet
Simics UltraSPARC
Simulated target devices
BlueSPARCPowerPC
DDR memory
June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 15
Conclusion
• Contributions
– virtualizes infrequent behaviors using simulation
– simplifies full-system FPGA emulator, still fast/scalable
– incremental validation from reference system
• Future work
– support migration in RDL?
– adding cores + scaling across multiple FPGAs
• We are ready for BEE2
• Thanks! Questions? [email protected]
• PROTOFLEX/SIMFLEX (http://www.ece.cmu.edu/~simflex)