scoore: santa cruz out-of-order risc engine fpga design issues · 2021. 4. 27. · scoore: fpga...
TRANSCRIPT
Francisco J. Mesa-Martinez, Abhishek Sharma, Andrew W. Hill, Carlos A. Cabrera, Cyrus Bazeghi, Hari Kolakaleti, Joseph Nayfach, Keertika Singh, Kevin S. Halle, Matthew D. Fischler, Melisa Nuñez, Sangeetha Nair, Suraj Narender Kurapati,
Wael Ali Ashmawi, and Jose Renau
SCOORE: Santa Cruz Out-of-Order RISC Engine
FPGA Design Issues
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Problem
• Simulation speed• Most paper evaluations are limited to short execution times• This gets worse with multiprocessor and multicore simulations
• Disconnect between theory and implementation • System simulators are an abstraction• Novel architectural ideas may have hidden implementation
complexity
2
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Are FPGAs a Way Out ?
• FPGAs are getting faster, larger, cheaper...
• Can we use these new FPGAs to: • Speed up simulation time?• Bridge the divide between theory and implementation?
3
0
80
160
240
320
400
2000
2002
2004
2006
2008
*
Number of Simple In-Order Cores
per FPGA
(source: Altera Corp.)
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Not so Fast!
• Existing Out-of-Order designs do not map well to FPGAs• Most are designed with a focus on ASIC implementation
• Further, these designs require multiple FPGAs!!!
4
0
4
8
12
16
20
Architectural Simulation vs. PPC & AXP Out-of-Order Cores in FPGA
18.016.0
2.0
SimulationSpeed (MHz)
Architectural SimulatorIVM/AXP (Single FPGA)Puma/PPC (Single FPGA)
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Why Is Out-of-Order So Bad On FPGAs?
• FPGA and ASIC design techniques may be tangential to each other!
• Out-of-Order aggravates these issues• Strike 1: Complex structures • Strike 2: Multiported & fully associative structures• Strike 3: Wiring & sizing overhead
5
Aggressive ASIC perf. target
Aggressive FPGA perf. target
ASIC FPGA
FPGA ASIC
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
FPGAs Can Still Be Part Of The Solution
• Problem: Simulation speed• Solution:
• Keep the simulator for quick evaluation• Take advantage of an FPGA-aware flow to speed up simulation• Use an FPGA-friendly OoO core (SCOORE)
• Problem: Disconnect between theory and implementation • Solution: • Couple FPGA flow with ASIC targeting to increase accuracy
6
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Proposal: Hybrid ASIC/FPGA Flow
• Early ideas and behavior explored
• Pros:• Fast development
• Cons:• Slow execution!• Reduced accuracy
7
Simulation FPGAASIC System
Start: Refinement: End:
• Theory / Practice “Link”• Targets are set• Speed/Area/Power• Pros:• Realistic Feedback• Extensive instruction
count/space executed • Cons:• More “effort”• We avoid full custom
• Better Evaluation• New ideas that can
be implemented• Real hardware• Advantages• Performance/Power/
Area sans surprises• Robust system
verification
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
SCOORE: An FPGA-friendly Design
• Design parameters:• Design follows hybrid FPGA/ASIC design principles• Based on SPARC • 12-stage pipeline• 4-way superscalar • Clustered architecture• Out-of-Order (up to 192 in-flight instructions)• Memory speculation, load-hit speculation• Performance targets: • 125MHz FPGA • 900MHz ASIC @ 90nm
• We have used SCOORE as the development platform to define a set of guidelines for FPGA-friendly OoO designs
8
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Lesson I: Don’t Cascade SRAM Structures
• Problem:
• Cascading has small impact in ASICs, however FPGA implementations suffer greatly • Over 50% performance hit!!!
• Cascading between Branch Target Buffer (BTB) & Return Address Stack (RAS) predictor in IF• Critical path which limits speed of IF stage
• Solution:• Avoid cascading SRAM structures in a single cycle• FPGA implementation reduces size and clock cycle significantly• FPGA and ASIC both benefit from shortened critical path
9
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Lesson II: Avoid Heavily Ported Structures
• Problem:
• Heavily multiported structures are not “FPGA-friendly”• Limits on the number of Write ports in FPGA SRAM banks• The instruction window is a large heavily ported structure
• CAM structures are bad for both ASIC and FPGA• ASIC: Large CAM structures are larger and power hungry• FPGA: CAM’s are implemented using logic cells!!!
• Solution:
• Break monolithic structure into multiple banks• Ex: SEED (Scalable Efficient Enforcement of Dependencies)1
• Shows dramatic speedup in FPGA• Reduced area/power requirements in ASIC due to reduced port count• Allows the implementation of very large instruction windows that were
not practical before
10
1 “SEED: Scalable Efficient Enforcement of Dependencies,” F. Mesa-Martinez, M. Huang, and J. Renau, to Appear in PACT06
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
• FPGA Frequency:
• ASIC Frequency:
• ASIC Area:
• ASIC Dynamic Power:
SEED: Results
11
0.0
20.0
40.0
60.0
80.0
100.0
120.0
16 32 64128
256
Fre
qu
en
cy (
MH
z) BASE
SEED
Instruction WindowEntries
0
100
200
300
400
500
600
16 32 64128
256
Fre
qu
en
cy (
MH
z)
BASESEED
Instruction WindowEntries
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
16 32 64128
256
Are
a (
mm
^2)
BASESEED
Instruction WindowEntries
0
50
100
150
200
250
300
350
400
450
16 32 64128
256
Dynam
ic P
ow
er
(mW
)BASESEED
Instruction WindowEntries
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Lessons Learned: Design Guidelines
12
• Wiring and Sizing:• FPGA and ASICs require different resource sizing• FPGA: Fixed SRAM/MAC/etc structures• ASIC: More sensitive to SRAM sizing
• Wire delay is major limiter in ASIC and even bigger in FPGA
• SRAM Structures:• SRAM-like structures in processor should be mapped into the
built-in SRAM structures in an FPGA• Avoid multiple Write ports: Most FPGAs have a single W port• Avoid fully associative structures: FPGAs map such structure into logic• Avoid single-cycle cascade of SRAM structures
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Acknowledgements
•Puma: Jayakumaran Sivagnaname (University of Michigan)
• IVM: Nicholas Wang (University of Illinois)
13
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
Contact Information
Web: http://masc.soe.ucsc.edu/
Your’s truly: Francisco J. Mesa-Martineze-mail: [email protected]
Advisor: Jose Renau e-mail: [email protected]
14
SCOORE: FPGA Design Issues Francisco Mesa-Martinez
• Integration of SEED into an out-of-order pipeline:
SEED: Theory of Operation
15
Outstanding loads
PredictorL1 Load
IF ID WakeUp RF EXIssue
ScoreboardDispatch
n dependents
RenameDecode
BufferIssue
Ready instructions sent to issue queue directly
Issue
dependentsused to wake upDequeued tokens
Redispatch
Woken instructions themselves
queued in the token queueTokens of woken instructions
Dispatch
Read
sent to issue queue
Fetch Register
Scorebrd
Toke
n Q
ueue
depTable