center for embedded systems research (cesr) department of electrical & computer eng’g

25
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud, Ahmed S. AL- Zawawi, Aravindh Anantaraman, and Eric Rotenberg Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing

Upload: yan

Post on 08-Feb-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg. Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

NC STATE UNIVERSITY

Center for Embedded Systems Research (CESR)Department of Electrical & Computer Eng’g

North Carolina State University

Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg

Virtual Multiprocessor: An Analyzable, High-Performance

Microarchitecture for Real-Time Computing

Page 2: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

2El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Embedded Processor Trends

Inheriting desktop high-performance features Examples

• ARM11: 8-stage pipeline, caches, dynamic br. pred.• Ubicom IP3023: 8 hardware threads• PowerPC 750: 2-way superscalar, OOO execution

Page 3: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

3El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Real-Time Systems and Analyzability

Schedulability of task-set determined a priori• Requires worst-case execution times (WCET) statically analyzable microarchitecture

Dynamic microarchitecture features complicate real-time design

A trade-off between performance and analyzability

AB

Page 4: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

4El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Multiple Simple Processors

+ Analyzable+ Natural fit with real-time systems− Rigid resource partitioning− Higher cost/performance metric

proc 1 proc 2

A BC

Page 5: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

5El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Simultaneous Multithreading (SMT)

+ Flexible resource sharing+ Better cost/performance metric− Unanalyzable

SMT

A B C

Page 6: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

6El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Unanalyzability of SMT

− Violates single-task WCET assumption (tasks analyzed separately)

− Arbitrary periods arbitrary overlap of tasks− Dynamic interference

Cannot derive WCETs Cannot perform schedulability

Page 7: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

7El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Real-Time Virtual Multiprocessor Combine MP analyzability and SMT flexibility Key idea: Interference-free multithreading

• SMT performance• WCET of each task independent of task-set

RVMP substrate: two parts• Highly reconfigurable multithreaded superscalar

Space: multiple arbitrary interference-free partitionsTime: rapidly reconfigure partitions

• Static schedule orchestrates partitioning

Page 8: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

8El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Big Picture

Co-design processor and real-time scheduling for analyzable high-performance

Page 9: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

9El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

RVMP Architecture Superscalar “ways” are natural partitioning

granularity Different-sized virtual processors carved out

of single superscalar

Page 10: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

10El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Processor Architecture Starting point

• Alpha 21164: 4-way in-order superscalar• Ubicom IP3023: 8 hardware threads

(4 in RVMP 4 VPs)

Simplifications for analyzability• In-order issue within VPs• Software-managed scratchpads • Static branch prediction

Not limitations of RVMP!

Page 11: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

11El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Processor Architecture

FetchUnit

PC

InterleavedInstruction Scratchpad

DecodeSlotter and Scoreboard

(Issue Logic)

Int RF

ShadowBuffers

FP RF

Data Scratchpad

FU4: FPU

FU0: INT

FU1: INT/MUL/DIV

FU2: INT/AGEN

FU3: INT/AGEN

4 4

1

ShadowBuffers

RD

RDWR

HRT

Fetch Buffer

Page 12: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

12El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Fetch Buffer

InstructionScratchpad

FetchUnit

Decode Issue

Backend

FV

PV

Shadow Buffers

Shadow Buffers

FV

PV

HRT

Instruction Fetch

Page 13: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

13El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Real-Time Scheduling

Too complicated• Must schedule entire hyper-period• Overwhelming # of possible space/time schedules• High dedicated-storage cost for schedule

ABCD

Page 14: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

14El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

WCET2WCET1

Task A

WCET3WCET4

period

1234

# of

way

s

4 ways

3 ways

2 ways

1 way

…Round

Page 15: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

15El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Task Bperiod

1234

# of

way

s

4 ways

3 ways

2 ways

1 way

Page 16: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

16El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Task Cperiod

1234

# of

way

s

4 ways

3 ways

2 ways

1 way

Page 17: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

17El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Task Dperiod

1234

# of

way

s

4 ways

3 ways

2 ways

1 way

Page 18: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

18El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

FAILEDSUCCEEDEDCyclic schedule

Page 19: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

19El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Interaction: Scheduling and Architecture

HRT LTC

FU0FU1FU2FU3FU4

FU0FU1FU2FU3FU4

60

40

100 cycles

60 40

FV PV CVs

0

EOT

1

INVALIDINVALID

Page 20: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

20El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Experiments Tasks from C-lab and MiBench benchmarks 100 task-sets

• 4 tasks per task-set (also 8 tasks in paper)• grouped according to scalar utilization (U_scalar)

Two experiments• Worst-case schedulability analysis • Run-time experiments

Page 21: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

21El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Worst-Case Schedulability Tests

25 25 25 25

1614 15

5

16

97

2

7

1 1

25

911 10

20

25

9

1618

2325

18

24 24 2525

0%

25%

50%

75%

100%

Sca

lar

RV

MP

4x1

2x2

1x4

Sca

lar

RV

MP

4x1

2x2

1x4

Sca

lar

RV

MP

4x1

2x2

1x4

Sca

lar

RV

MP

4x1

2x2

1x4

0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4

Task-set bins

Succ

ess

rate

(%)

FailureSuccess

Page 22: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

22El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

RVMP Configurations

0

1

2

3

4

5

6

71-

1-1-

12-

21-

3/2-

21-

3/2-

1-1

4/1-

1-2

4/4/

2-2

4/4/

1-3

4/4/

4/4

1-1-

1-1

2-2

1-3/

2-2

1-3/

2-1-

14/

1-1-

24/

4/2-

24/

4/1-

34/

4/4/

4

1-1-

1-1

2-2

1-3/

2-2

1-3/

2-1-

14/

1-1-

24/

4/2-

24/

4/1-

34/

4/4/

4

1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4

Task-set bins

# of

task

-set

s

Page 23: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

23El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Run-Time Experiments

25 25 25 25 25

1715 15

5

18 17 16

97

3

16

7

1

96

810 10

20

7 8 9

1618

22

911

18

24 24 25

1619

14

25

10%

25%

50%

75%

100%

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-IC

NT

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-IC

NT

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-IC

NT

RV

MP

4x1

2x2

1x4

SM

T-E

DF

SM

T-IC

NT

0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4

Task-set bins

Succ

ess

rate

(%)

FailureSuccess

Page 24: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

24El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Unsafe Behavior of SMT

16

12

6

Only SMT-EDF successOnly SMT-ICNT successBoth successBoth failure

Failure40%

Success60%

1 < U_scalar <= 2

Page 25: Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g

25El-Haj-Mahmoud © 2005

NC STATE UNIVERSITY

CASES 2005

Summary Novel contributions

• Virtualize a single processor− Space: variable-size interference-free partitions− Time: rapid reconfiguration

• Simple real-time scheduling approach Analyzability of MP with flexibility of SMT Co-design processor and real-time scheduling

for analyzable high-performance