scoore: santa cruz out-of-order risc engine fpga design issues · 2021. 4. 27. · scoore: fpga...

15
Francisco J. Mesa-Martinez, Abhishek Sharma, Andrew W. Hill, Carlos A. Cabrera, Cyrus Bazeghi, Hari Kolakaleti, Joseph Nayfach, Keertika Singh, Kevin S. Halle, Matthew D. Fischler, Melisa Nuñez, Sangeetha Nair, Suraj Narender Kurapati, Wael Ali Ashmawi, and Jose Renau SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues

Upload: others

Post on 11-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

Francisco J. Mesa-Martinez, Abhishek Sharma, Andrew W. Hill, Carlos A. Cabrera, Cyrus Bazeghi, Hari Kolakaleti, Joseph Nayfach, Keertika Singh, Kevin S. Halle, Matthew D. Fischler, Melisa Nuñez, Sangeetha Nair, Suraj Narender Kurapati,

Wael Ali Ashmawi, and Jose Renau

SCOORE: Santa Cruz Out-of-Order RISC Engine

FPGA Design Issues

Page 2: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Problem

• Simulation speed• Most paper evaluations are limited to short execution times• This gets worse with multiprocessor and multicore simulations

• Disconnect between theory and implementation • System simulators are an abstraction• Novel architectural ideas may have hidden implementation

complexity

2

Page 3: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Are FPGAs a Way Out ?

• FPGAs are getting faster, larger, cheaper...

• Can we use these new FPGAs to: • Speed up simulation time?• Bridge the divide between theory and implementation?

3

0

80

160

240

320

400

2000

2002

2004

2006

2008

*

Number of Simple In-Order Cores

per FPGA

(source: Altera Corp.)

Page 4: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Not so Fast!

• Existing Out-of-Order designs do not map well to FPGAs• Most are designed with a focus on ASIC implementation

• Further, these designs require multiple FPGAs!!!

4

0

4

8

12

16

20

Architectural Simulation vs. PPC & AXP Out-of-Order Cores in FPGA

18.016.0

2.0

SimulationSpeed (MHz)

Architectural SimulatorIVM/AXP (Single FPGA)Puma/PPC (Single FPGA)

Page 5: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Why Is Out-of-Order So Bad On FPGAs?

• FPGA and ASIC design techniques may be tangential to each other!

• Out-of-Order aggravates these issues• Strike 1: Complex structures • Strike 2: Multiported & fully associative structures• Strike 3: Wiring & sizing overhead

5

Aggressive ASIC perf. target

Aggressive FPGA perf. target

ASIC FPGA

FPGA ASIC

Page 6: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

FPGAs Can Still Be Part Of The Solution

• Problem: Simulation speed• Solution:

• Keep the simulator for quick evaluation• Take advantage of an FPGA-aware flow to speed up simulation• Use an FPGA-friendly OoO core (SCOORE)

• Problem: Disconnect between theory and implementation • Solution: • Couple FPGA flow with ASIC targeting to increase accuracy

6

Page 7: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Proposal: Hybrid ASIC/FPGA Flow

• Early ideas and behavior explored

• Pros:• Fast development

• Cons:• Slow execution!• Reduced accuracy

7

Simulation FPGAASIC System

Start: Refinement: End:

• Theory / Practice “Link”• Targets are set• Speed/Area/Power• Pros:• Realistic Feedback• Extensive instruction

count/space executed • Cons:• More “effort”• We avoid full custom

• Better Evaluation• New ideas that can

be implemented• Real hardware• Advantages• Performance/Power/

Area sans surprises• Robust system

verification

Page 8: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

SCOORE: An FPGA-friendly Design

• Design parameters:• Design follows hybrid FPGA/ASIC design principles• Based on SPARC • 12-stage pipeline• 4-way superscalar • Clustered architecture• Out-of-Order (up to 192 in-flight instructions)• Memory speculation, load-hit speculation• Performance targets: • 125MHz FPGA • 900MHz ASIC @ 90nm

• We have used SCOORE as the development platform to define a set of guidelines for FPGA-friendly OoO designs

8

Page 9: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Lesson I: Don’t Cascade SRAM Structures

• Problem:

• Cascading has small impact in ASICs, however FPGA implementations suffer greatly • Over 50% performance hit!!!

• Cascading between Branch Target Buffer (BTB) & Return Address Stack (RAS) predictor in IF• Critical path which limits speed of IF stage

• Solution:• Avoid cascading SRAM structures in a single cycle• FPGA implementation reduces size and clock cycle significantly• FPGA and ASIC both benefit from shortened critical path

9

Page 10: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Lesson II: Avoid Heavily Ported Structures

• Problem:

• Heavily multiported structures are not “FPGA-friendly”• Limits on the number of Write ports in FPGA SRAM banks• The instruction window is a large heavily ported structure

• CAM structures are bad for both ASIC and FPGA• ASIC: Large CAM structures are larger and power hungry• FPGA: CAM’s are implemented using logic cells!!!

• Solution:

• Break monolithic structure into multiple banks• Ex: SEED (Scalable Efficient Enforcement of Dependencies)1

• Shows dramatic speedup in FPGA• Reduced area/power requirements in ASIC due to reduced port count• Allows the implementation of very large instruction windows that were

not practical before

10

1 “SEED: Scalable Efficient Enforcement of Dependencies,” F. Mesa-Martinez, M. Huang, and J. Renau, to Appear in PACT06

Page 11: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

• FPGA Frequency:

• ASIC Frequency:

• ASIC Area:

• ASIC Dynamic Power:

SEED: Results

11

0.0

20.0

40.0

60.0

80.0

100.0

120.0

16 32 64128

256

Fre

qu

en

cy (

MH

z) BASE

SEED

Instruction WindowEntries

0

100

200

300

400

500

600

16 32 64128

256

Fre

qu

en

cy (

MH

z)

BASESEED

Instruction WindowEntries

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

16 32 64128

256

Are

a (

mm

^2)

BASESEED

Instruction WindowEntries

0

50

100

150

200

250

300

350

400

450

16 32 64128

256

Dynam

ic P

ow

er

(mW

)BASESEED

Instruction WindowEntries

Page 12: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Lessons Learned: Design Guidelines

12

• Wiring and Sizing:• FPGA and ASICs require different resource sizing• FPGA: Fixed SRAM/MAC/etc structures• ASIC: More sensitive to SRAM sizing

• Wire delay is major limiter in ASIC and even bigger in FPGA

• SRAM Structures:• SRAM-like structures in processor should be mapped into the

built-in SRAM structures in an FPGA• Avoid multiple Write ports: Most FPGAs have a single W port• Avoid fully associative structures: FPGAs map such structure into logic• Avoid single-cycle cascade of SRAM structures

Page 13: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Acknowledgements

•Puma: Jayakumaran Sivagnaname (University of Michigan)

• IVM: Nicholas Wang (University of Illinois)

13

Page 14: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

Contact Information

Web: http://masc.soe.ucsc.edu/

Your’s truly: Francisco J. Mesa-Martineze-mail: [email protected]

Advisor: Jose Renau e-mail: [email protected]

14

Page 15: SCOORE: Santa Cruz Out-of-Order RISC Engine FPGA Design Issues · 2021. 4. 27. · SCOORE: FPGA Design Issues Francisco Mesa-Martinez Not so Fast! •Existing Out-of-Order designs

SCOORE: FPGA Design Issues Francisco Mesa-Martinez

• Integration of SEED into an out-of-order pipeline:

SEED: Theory of Operation

15

Outstanding loads

PredictorL1 Load

IF ID WakeUp RF EXIssue

ScoreboardDispatch

n dependents

RenameDecode

BufferIssue

Ready instructions sent to issue queue directly

Issue

dependentsused to wake upDequeued tokens

Redispatch

Woken instructions themselves

queued in the token queueTokens of woken instructions

Dispatch

Read

sent to issue queue

Fetch Register

Scorebrd

Toke

n Q

ueue

depTable