spree tutorial peter yiannacouras april 13, 2006
TRANSCRIPT
![Page 1: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/1.jpg)
SPREE Tutorial
Peter Yiannacouras
April 13, 2006
![Page 2: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/2.jpg)
Processors on FPGAs
You all used FPGAs (ECE241)Adders7-segment decodersEtc.
We are putting whole microprocessors on themWe call these soft processors
![Page 3: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/3.jpg)
Hard Versus Soft Processors
Soft Processor Written in HDL Programmed onto chip
Hard Processors Made of transistors Costs millions to make
Verilog
FasterSmallerLess Power
![Page 4: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/4.jpg)
Processors and FPGA Systems
We aim to improve soft processors by customizing them
FPGAs are a common platform for digital systems
MemoryInterface
UART
Custom Logic
Ethernet
Performs coordination and even computation Better processors => less hardware to design
Soft Processor
![Page 5: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/5.jpg)
Our Research Problem
Soft processors have worse Area Speed Power
But are Flexible
use tocounteract
HOW???
Customize the processor’s architectureie. Intel vs AMDie. Motorola 68360 vs 68010
HOW????
![Page 6: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/6.jpg)
Research Goals
1. Understand tradeoffs in soft processors Eg. A hardware multiplier is big but can
perform multiplies fast
2. Customize it to the application Eg. Bubble sort doesn’t use multiplies,
therefore remove hardware multiplier and save on area
We developed SPREE, software to help us do both
![Page 7: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/7.jpg)
SPREE
SPREE System (Soft Processor Rapid Exploration Environment)
Verilog
ISA Datapath ■ Input: Processor description
1. Verify ISA against datapath
2. Datapath Instantiation3. Control Generation
■ SPREE System
■ Output: Synthesizable Verilog
ProcessorDescription
![Page 8: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/8.jpg)
Input: Instruction Set Architecture (ISA) Description
SPREE
Verilog
■ ISA■ Datapath
FETCH
RFREAD
ADD
RFWRITE
RFREAD
MIPS ADD – add rd, rs, rt
■ Graph of Generic Operations (GENOPs)■ Edges indicate flow of data
ISA currently fixed (subset of MIPS I)
![Page 9: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/9.jpg)
Input: Datapath Description
SPREE
RTL
■ ISA■ Datapath
Mul
Ifetch Reg File
ALU WriteBack
Mul
Ifetch Reg File
ALUShifter
DataMem
SPREEComponentLibrary
Mul
IfetchRegfile
ALU WriteBack
Data Mem
■ Interconnection of hand-coded components■ Allows efficient synthesis
■ Described using C++
![Page 10: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/10.jpg)
Component Selection
Select by nameNames looked up in library
Stored in cpugen/rtl_lib
RTLComponent *ifetch=new RTLComponent("ifetch");
RTLComponent *reg_file=new RTLComponent("reg_file");
![Page 11: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/11.jpg)
Datapath Wiring Example
rdrsrt
offset
Ifetch
dst a_reg a_datab_reg b_datawritedata
Regfile
proc.addConnection(ifetch,"rs",reg_file,"a_reg");
proc.addConnection(ifetch,"rt",reg_file,"b_reg");
opA resultopB
ALU
![Page 12: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/12.jpg)
SPREE generator(spegen)
SPREE System + Backend (Soft Processor Rapid Exploration Environment)
VerilogProcessorDescription
1. Area2. Clock Frequency3. Power
4. Cycle Count
Quartus II CAD Software(specadflow)
ModelsimVerilog Simulator(spebenchmark)
Benchmarks
MintMIPS Simulator(simulator/run)
Comparetraces
![Page 13: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/13.jpg)
Walking through an Example (see README.txt)
Choose a pre-built processorcpugen/src/arch lists all the processors
Let’s choose pipe3_serialshift3-stage pipeline with serial shifter
![Page 14: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/14.jpg)
Using SPREE on a Processor
Generate, benchmark, synthesize
% spegen pipe3_serialshift
% spebenchmark pipe3_serialshift
% specadflow pipe3_serialshift
% specompare pipe3_serialshift
← Generates Verilog
← Runs benchmarks
← Synthesizes processor
← Display results
![Page 15: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/15.jpg)
spegen – Generating Processors
Input: Processor description Syntax: spegen <processor name> Output:
A folder named after the processor Hand-coded Verilog modules system.v
Generated hookup and control OUT.cpugen
stages per instruction Hazard window/branch penalty
test_bench.v test bench for Modelsim simulation
![Page 16: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/16.jpg)
Benchmarking
Run programs on the processorMeasure time taken till completionVerify functionality
Can do this without knowing anything about the benchmarks themselves
![Page 17: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/17.jpg)
spebenchmark – Benchmarking
Input: Processor implementation Syntax: spebenchmark <processor> Output: (ideally)
Cycle counts of all benchmarks
Traces: /tmp/modelsim_trace.txt
******* Benchmarking pipe3_serialshift ********Simulating bubble_sort ... Success! Cycle count=2994Simulating crc ... Success! Cycle count=112750Simulating des ... Success! Cycle count=5129Simulating fft ... Success! Cycle count=5077Simulating fir ... Success! Cycle count=1214...
![Page 18: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/18.jpg)
Benchmarking – under the hood
ModelsimVerilog Simulator(spebenchmark)
Compiler(gcc - MIPS)
MintMIPS Simulator(simulator/run)
Comparetraces
Verilog
BinaryExecutable
C sourcebenchmarks
TraceTrace Cycle Count
/tmp/modelsim_trace.txt/tmp/modelsim_store_trace.txt
applications/<benchmark name>/mint
spebenchmark
![Page 19: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/19.jpg)
specompiler - Setup compiler
Choose the path to your compiler (prebuilt) Default: /jayar/b/b0/yiannac/spe/compiler
GCC 3.3.3, software division
Another: /jayar/b/b0/yiannac/spe/compiler-softmul GCC 3.3.3, software division and software multiplication
specompiler will:1. Compile all benchmarks (and store binaries)
2. Simulate all benchmarks (and store traces)
% specompiler /jayar/b/b0/yiannac/spe/compiler-softmul
After this point, you can just run spebenchmark
![Page 20: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/20.jpg)
spebenchmark - failure
Shows discrepancy between MINT and Modelsim
******* Benchmarking pipe3_serialshift ********Simulating bubble_sort ... Error: Trace does not match, Cycle count=381Discrepancy found at 6800000 psModelsim: PC=04000064 | IR=24090001 | 05: 00000000 Mint: PC=040000b8 | IR=8c47004c | 07: 00000064
destinationregister
valuebeing written
Clues towhere the
error occurred
![Page 21: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/21.jpg)
spebenchmark - waveforms
Can see any signal within the processor% sim_gui bubble_sort pipe3_serialshift
![Page 22: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/22.jpg)
Modelsim
LEARN IT!!!
Quartus Simulator is vastly inferior, and even unusable for our purposes
![Page 23: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/23.jpg)
The Testbench (test_bench.v)
What is it?The stimulus and monitor for your circuit
SPREE automatically generates And hence it works right away
Handcoding your own processor meansYou have to interface with the test benchOnce you have the testbench you can use
spebenchmark
![Page 24: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/24.jpg)
Manual Interfacing with the Testbench
test_bench.v
regfile_weregfile_dst
regfile_data
datamem_wedatamem_addrdatamem_data
Your soft processor
Need only 6 wires To track writes to register file and data mem
![Page 25: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/25.jpg)
SPREE generator(spegen)
SPREE System + Backend (Soft Processor Rapid Exploration Environment)
VerilogProcessorDescription
1. Area2. Clock Frequency3. Power
4. Cycle Count
Quartus II CAD Software(specadflow)
ModelsimVerilog Simulator(spebenchmark)
Benchmarks
MintMIPS Simulator(simulator/run)
Comparetraces
![Page 26: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/26.jpg)
specadflow – Synthesis
Input: Processor implementationSyntax: specadflow <processor name>
Performs a “seed sweep”Average several runs since results are noisyRun several instances of quartusAcross several machines in parallel
![Page 27: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/27.jpg)
specadflow Output
Output:Synthesis results (hidden)Summary output
Started Tue 6:27PM, Waiting for processes: 10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54 10.0.0.51 Finished Tue 6:33PM108175.78120.99822 ... Waiting on eda writer
Area (LEs or ALUTs)Clock Frequency (MHz)Estimated Energy/cycle dissipated (nJ/cycle)
![Page 28: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/28.jpg)
Any Questions?
Technical support, ask me
![Page 29: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/29.jpg)
EXTRAS
![Page 30: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/30.jpg)
Setup/Install
Copy and unpack the SPREE tarball: /jayar/b/b0/yiannac/spree.tar.gz
Build all the SPREE software
Follow instructions in INSTALL.txt
If there’s any errors, email me
% cd spree% make
![Page 31: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/31.jpg)
SPREE Directory Structure
spree
applications cpugen modelsim quartussimulatorcompiler
Benchmarks C source
binutilsgcc
newlib
the cpugenerator
+processor
descriptions
Verilogsimulator
MIPSsimulator
synthesis
![Page 32: SPREE Tutorial Peter Yiannacouras April 13, 2006](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649e9a5503460f94b9cbfe/html5/thumbnails/32.jpg)
Setup cluster
Choose the cluster you’re using aenao – high performance, limited access eecg – any eecg-connected machine
Edit quartus/machines.txt Put a list of 11 or so good eecg machines
% specluster eecg % specluster aenaoOR