computational sprinting on a real system: preliminary results arun raghavan *, marios papaefthymiou...

21
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan * , Marios Papaefthymiou + , Kevin P. Pipe +# , Thomas F. Wenisch + , Milo M. K. Martin * University of Pennsylvania, Computer and Information Science * University of Michigan, Electrical Eng. and Computer Science + University of Michigan, Mechanical

Upload: violet-fields

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

Computational Sprinting on a Real System: Preliminary Results

Arun Raghavan*, Marios Papaefthymiou+, Kevin P. Pipe+#,

Thomas F. Wenisch+, Milo M. K. Martin*

University of Pennsylvania, Computer and Information

Science*

University of Michigan, Electrical Eng. and Computer

Science+

University of Michigan, Mechanical Engineering#

Page 2: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

2

Overview

Computational Sprinting

• What is Computational Sprinting?• Summary of feasibility study [HPCA’12]• Simulation results:

• 10x responsiveness improvements in short bursts• Same dynamic energy consumption

• Preliminary results with sprinting on prototype-proxy• Characterize real thermal/energy/performance behavior• Sprinting can improve energy efficiency due to race to halt

Page 3: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

3

Computational Sprinting and Dark Silicon

• Problem: Dark Silicon/Utilization Wall • Cannot use all transistors on chip all the time• Cooling constraints limit mobile systems

• One approach: Use few transistors for long durations• Specialized functional units [Accelerators, GreenDroid]• Targeted towards sustained compute, e.g. media playback

• Our approach: Use many transistors for short durations• Computational Sprinting by activating many “dark cores” • Unsustainable power for short, intense bursts of computation• Responsiveness for bursty/interactive applications

Computational Sprinting

Page 4: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

4 Computational Sprinting

0123456789

10

Sprinting v/s Sustainable Operation

0

20

40

60

80

100

120

po

wer

time

Tmaxte

mp

erat

ure

time

Page 5: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

Why Not Use All Cores?

Tmaxte

mp

erat

ure

po

wer

time

time

Page 6: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

6 Computational Sprinting

Why Not Use All Cores?

tem

per

atu

reTmax

po

wer

N cores

time

time

Effect of thermal capacitance

Page 7: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

7 Computational Sprinting

Parallel Computational Sprinting

Tmaxte

mp

erat

ure

po

wer

time

time

Page 8: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

8 Computational Sprinting

Why Not Use All Cores?

tem

per

atu

reTmax

po

wer

N cores

time

time

Page 9: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

9 Computational Sprinting

Parallel Computational Sprinting

tem

per

atu

reTmax

po

wer

N cores

1 core

time

time

Page 10: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

10 Computational Sprinting

Parallel Computational Sprinting

tem

per

atu

reTmax

po

wer

N cores

1 core

time

time

Page 11: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

11 Computational Sprinting

Parallel Computational Sprinting

po

wer

tem

per

atu

reTmax

N cores

1 core

time

time

State of the art:Turbo Boost 2.0

exceeds sustainable power

with DVFS (~25% for 25s)

Our goal: 10x, ~1s

Page 12: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

12

Sprinting Challenges and Opportunities

• Thermal challenges• How to extend sprint duration and intensity? Latent heat from phase change material close to the die

• Electrical challenges• How to supply peak currents? Ultracapacitor/battery hybrid• How to ensure power stability? Ramped activation (~100μs)

• Architectural challenges• How to control sprints? Thermal resource management• How do applications benefit from sprinting? 10x responsiveness improvements; small energy overhead

Computational Sprinting

DiePCM

Page 13: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

13

Moving Beyond Simulation: Can we make a real system sprint?

Page 14: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

14

Characterize real system energy/performance

How might sprinting behave on real system?

Page 15: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

15 Computational Sprinting

Multiple Cores: Energy and Performance

1 core 2 cores 4 cores0

0.4

0.8

1.2

featuredisparitysegment

norm

aliz

ed e

nerg

y

0

1

2

3

4

featuredisparitysegment

1 core 2 cores 4 coresnorm

aliz

ed s

peed

up

0

5

10

15

20

featuredisparitysegment

1 core 2 cores 4 cores

pow

er

Power increases with core count

But < 2x for 4 coresWhy? background power

Energy consumption improves due to early completionRace to halt

4x

2x

Page 16: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

Computational Sprinting1.6 2 2.4 2.8 3.20.8

0.9

1

1.1

1.2

featuredisparitysegment

frequency

DVFS: Energy and Performance

16

1.6 2 2.4 2.8 3.20

0.51

1.52

2.5

featuredisparitysegment

norm

aliz

ed s

peed

up

1.6 2 2.4 2.8 3.20

5

10

15

20

featuredisparitysegment

pow

er

Power increases by ~3x for 2x speedup

frequency

1.6 2 2.4 2.8 3.20

0.4

0.8

1.2

norm

aliz

ed e

nerg

y

frequency

Frequency Voltage

1.6 GHz 0.95V

3.2 GHz 1.25V

Min frequency is not always min energy

2x

3x

Page 17: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

Computational Sprinting

Energy/Performance/Power Tradeoffs

17

0 0.5 1 1.5012345678

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5012345678

norm

aliz

ed s

peed

up

normalized energy

norm

aliz

ed s

peed

up

power

4 cores, 1.6 GHz

4 cores, 3.2 GHz

1 core, 1.6 GHz

1 core, 3.2 GHz

1 core, 1.6 GHz

1 core, 3.2 GHz

4 cores, 1.6 GHz

4 cores, 3.2 GHz

What if this is the maximum sustainable power?

Sprinting enables higher responsiveness and greater

energy efficiency

Max speedup

Min energy

Min power

Page 18: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

18 Computational Sprinting

Emulating Sprinting with Limited Energy Budget

• Emulate effect of being TDP constrained • Not currently physically limiting cooling constraints• Estimate thermal capacity for sprinting based on energy

• Sprint operation:• Execute with all cores at maximum frequency• Monitor energy consumption via MSR• Terminate sprinting when energy capacity is exceeded

• Migrate threads to single core• Shutdown additional cores • Lower frequency to minimum

Page 19: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

Computational Sprinting

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

Effect of Thermal Capacity

190 10 20 30 40 50 60 70 80 90 100

0

0.5

1

1.5

norm

aliz

ed e

nerg

yno

rmal

ized

spe

edup

Thermal Capacity (J)

Thermal Capacity (J)

Speedup sensitive to amount of computation

within sprint

Longer sprints enable greater energy saving

Energy overhead from extremely short sprints

Caveat: Mobile system background power

characteristics likely differ

Page 20: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

20 Computational Sprinting

Conclusions and Work in Progress

• Sprinting utilizes dark silicon for energy improvement• Race to halt • Lower system idle power more energy saved

• Work in Progress: constrain cooling system• Approximate thermal characteristics of mobile systems• Correlate temperature with measured energy• Use presented results to guide sprint implementation

Page 21: Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K

21 Computational Sprinting