dynamic voltage and frequency scaling - usenix ·  · 2010-11-16etienne le sueur and gernot heiser...

137
Etienne Le Sueur and Gernot Heiser Dynamic Voltage and Frequency Scaling The Laws of Diminishing Returns HotPower’10, Vancouver, Canada, 2010 [email protected]

Upload: nguyenngoc

Post on 11-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Etienne Le Sueur and Gernot Heiser

Dynamic Voltage and Frequency ScalingThe Laws of Diminishing Returns

HotPower’10, Vancouver, Canada, [email protected]

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What is DVFS?

Dynamic Voltage and Frequency Scaling

2

P = CfV 2

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What is DVFS?

3

P = CfV 2

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What is DVFS?

dynamic power consumption

3

P = CfV 2

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What is DVFS?

dynamic power consumption

constant

3

P = CfV 2

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What is DVFS?

dynamic power consumption

frequencyconstant

3

P = CfV 2

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What is DVFS?

dynamic power consumption voltage squared

frequencyconstant

3

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

In the past...

“Under some conditions, we observe energy savings of 30% for a 4% performance loss.”

Snowdon et al. [2009]

4

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

In the past...

“... in most of the traces the potential for energy savings is good. The savings range from about 5% to about 75%, with most data points falling between 25% to 65% savings.”

Weiser et al. [1994]

5

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

In the past...

“Energy savings of 22% ... to complete the same task are possible without a substantial reduction in application performance...”

Weissel et al. [2002]

6

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

In the past...

“... Hence, on this system, by lowering the frequency to a point where the workloads can be adequately served without sacrificing latency, energy is saved.”

Miyoshi et al. [2002]

7

P = CfV 2 + Pstatic

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The whole story

8

P = CfV 2 + Pstatic

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The whole story

total power consumption

8

P = CfV 2 + Pstatic

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The whole story

total power consumption

dynamic power consumption

8

P = CfV 2 + Pstatic

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The whole story

total power consumption

dynamic power consumption

static power consumption

8

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power consumption

9

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power consumption

9

CPU leakage

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power consumption

9

Hard drives

CPU leakage

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power consumption

9

Hard drives

CPU leakage

Memory

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power consumption

9

Hard drives

CPU leakage

Memory

Power supply losses

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power consumption

9

Hard drives

CPU leakage

Memory

etc.Power supply losses

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

400c30: 48 8b 11 mov (%rcx),%rdx 400c33: 48 8b 42 48 mov 0x48(%rdx),%rax 400c37: 48 89 41 10 mov %rax,0x10(%rcx) 400c3b: 48 89 4a 48 mov %rcx,0x48(%rdx) 400c3f: 48 8b 51 08 mov 0x8(%rcx),%rdx 400c43: 48 8b 42 50 mov 0x50(%rdx),%rax 400c47: 48 89 41 18 mov %rax,0x18(%rcx) 400c4b: 48 89 4a 50 mov %rcx,0x50(%rdx) 400c4f: 48 83 c1 40 add $0x40,%rcx

10

[SPEC CPU2000, 181.mcf, gcc 4.2, high optimisation]

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

400c30: 48 8b 11 mov (%rcx),%rdx 400c33: 48 8b 42 48 mov 0x48(%rdx),%rax 400c37: 48 89 41 10 mov %rax,0x10(%rcx) 400c3b: 48 89 4a 48 mov %rcx,0x48(%rdx) 400c3f: 48 8b 51 08 mov 0x8(%rcx),%rdx 400c43: 48 8b 42 50 mov 0x50(%rdx),%rax 400c47: 48 89 41 18 mov %rax,0x18(%rcx) 400c4b: 48 89 4a 50 mov %rcx,0x50(%rdx) 400c4f: 48 83 c1 40 add $0x40,%rcx

10

[SPEC CPU2000, 181.mcf, gcc 4.2, high optimisation]

... such workloads can be memory-bound

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

11

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

11

0.8 1.5 2 2.70.6

0.7

0.8

0.9

1.0

1.1

Normalised CPU cycles for 164.gzip and 181.mcf

Nor

mal

ised

TS

C

Frequency (GHz)

164.gzip (cpu-bound) 181.mcf (memory-bound)

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

11

0.8 1.5 2 2.70.6

0.7

0.8

0.9

1.0

1.1

Normalised CPU cycles for 164.gzip and 181.mcf

Nor

mal

ised

TS

C

Frequency (GHz)

164.gzip (cpu-bound) 181.mcf (memory-bound)

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

11

0.8 1.5 2 2.70.6

0.7

0.8

0.9

1.0

1.1

Normalised CPU cycles for 164.gzip and 181.mcf

Nor

mal

ised

TS

C

Frequency (GHz)

164.gzip (cpu-bound) 181.mcf (memory-bound)

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

12

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

12

0.8 1.5 2.0 2.71.0

1.5

2.0

2.5

3.0

3.5

Runtime of 164.gzip (cpu-bound) vs. 181.mcf (memory-bound)

Nor

mal

ised

run

time

Frequency(GHz)

Runtime 181.mcf (memory-bound) Runtime 164.gzip (cpu-bound)

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

12

0.8 1.5 2.0 2.71.0

1.5

2.0

2.5

3.0

3.5

Runtime of 164.gzip (cpu-bound) vs. 181.mcf (memory-bound)

Nor

mal

ised

run

time

Frequency(GHz)

Runtime 181.mcf (memory-bound) Runtime 164.gzip (cpu-bound)

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How can DVFS save energy?

12

0.8 1.5 2.0 2.71.0

1.5

2.0

2.5

3.0

3.5

Runtime of 164.gzip (cpu-bound) vs. 181.mcf (memory-bound)

Nor

mal

ised

run

time

Frequency(GHz)

Runtime 181.mcf (memory-bound) Runtime 164.gzip (cpu-bound)

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Our analysis

3 generations of AMD Opteron CPUsin server-class systems

13

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Our analysis

14

Die codename Sledgehammer

Year 2003

Core count 1

Frequency range 0.8 - 2.0GHz

Voltage range 0.9 - 1.5

Process 130nm

TDP 89W

Die area 193mm2

Transistor count 106M

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Sledgehammer

15

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Sledgehammer

0.75

1.00

1.25

1.50

0.8 1 1.6 1.8 2

Energy and runtime of 181.mcf on Sledgehammer

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy Runtime

15

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Our analysis

16

Die codename Sledgehammer Santa Rosa

Year 2003 2006

Core count 1 2

Frequency range 0.8 - 2.0GHz 1.0 - 2.4GHz

Voltage range 0.9 - 1.5 0.9 - 1.35V

Process 130nm 90nm

TDP 89W 95W

Die area 193mm2 230mm2

Transistor count 106M 243M

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Santa Rosa

17

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Santa Rosa

1 1.2 1.4 1.6 1.8 2 2.2 2.40.50

0.75

1.00

1.25

1.50

1.75

Energy and runtime of 181.mcf on Santa Rosa

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instances

17

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Santa Rosa

1 1.2 1.4 1.6 1.8 2 2.2 2.40.50

0.75

1.00

1.25

1.50

1.75

Energy and runtime of 181.mcf on Santa Rosa

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instances

17

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Santa Rosa

1 1.2 1.4 1.6 1.8 2 2.2 2.40.50

0.75

1.00

1.25

1.50

1.75

Energy and runtime of 181.mcf on Santa Rosa

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instances

17

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Our analysis

18

Die codename Sledgehammer Santa Rosa Shanghai

Year 2003 2006 2009

Core count 1 2 4

Frequency range 0.8 - 2.0GHz 1.0 - 2.4GHz 0.8 - 2.7GHz

Voltage range 0.9 - 1.5 0.9 - 1.35V 1.0 - 1.35V

Process 130nm 90nm 45nm

TDP 89W 95W 75W

Die area 193mm2 230mm2 285mm2

Transistor count 106M 243M 463M

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Shanghai

19

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Shanghai

0.8 1.5 2 2.71.00

1.44

1.88

2.31

2.75

Energy and runtime of 181.mcf on Shanghai

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances

19

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Shanghai

0.8 1.5 2 2.71.00

1.44

1.88

2.31

2.75

Energy and runtime of 181.mcf on Shanghai

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances

19

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Shanghai

0.8 1.5 2 2.71.00

1.44

1.88

2.31

2.75

Energy and runtime of 181.mcf on Shanghai

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances

19

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Shanghai

0.8 1.5 2 2.71.00

1.44

1.88

2.31

2.75

Energy and runtime of 181.mcf on Shanghai

Nor

mal

ised

ene

rgy/

runt

ime

Frequency (GHz)

Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances

19

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Results

DVFS on Shanghai is ineffective for saving energy!

20

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Results

DVFS on Shanghai is ineffective for saving energy!

... but why?

20

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Static power

DVFS can only change dynamic power consumptionand static power is increasing!

21

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What are the trends?

22

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What are the trends?

• scaling of transistor technology;

22

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What are the trends?

• scaling of transistor technology;• increasing memory performance;

22

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What are the trends?

• scaling of transistor technology;• increasing memory performance;• improved idle/sleep modes; and

22

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

What are the trends?

• scaling of transistor technology;• increasing memory performance;• improved idle/sleep modes; and• multi-core processors.

22

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

In ~7 years:

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

• 130 nm to 45 nm

In ~7 years:

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

• 130 nm to 45 nm • 106 M to 463 M transistors

In ~7 years:

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2

In ~7 years:

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2

• 1 MiB to 8 MiB total SRAM cache

In ~7 years:

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2

• 1 MiB to 8 MiB total SRAM cache• Single to quad-core (more later...)

In ~7 years:

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Scaling of transistor technology

• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2

• 1 MiB to 8 MiB total SRAM cache• Single to quad-core (more later...)

In ~7 years:

Static power is increasing significantly, dynamic power is decreasing

23

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Increasing memory performance

In ~7 years:

24

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Increasing memory performance

• Much larger CPU SRAM caches (fewer cache-misses)

In ~7 years:

24

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Increasing memory performance

• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s

In ~7 years:

24

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Increasing memory performance

• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s • Larger DRAM prefetch distance: 2 to 3 cache-lines

In ~7 years:

24

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Increasing memory performance

• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s • Larger DRAM prefetch distance: 2 to 3 cache-lines• Dual to triple channel DDR memory-controllers

In ~7 years:

24

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Increasing memory performance

• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s • Larger DRAM prefetch distance: 2 to 3 cache-lines• Dual to triple channel DDR memory-controllers

In ~7 years:

Fewer pipeline stalls for memory references means code is less memory-bound

24

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Improved idle/sleep modes

In ~7 years:

25

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Improved idle/sleep modes

In ~7 years:

• From a single ʻhaltʼ mode (ACPI C1)

25

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Improved idle/sleep modes

In ~7 years:

• From a single ʻhaltʼ mode (ACPI C1)• To multiple stepped low-power sleep modes (C1 - 4)

25

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Improved idle/sleep modes

In ~7 years:

• From a single ʻhaltʼ mode (ACPI C1)• To multiple stepped low-power sleep modes (C1 - 4)

Push towards race-and-halt

25

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Multi-core processors

26

Today:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Multi-core processors

• Single off-chip voltage regulator module (VRM)

26

Today:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Multi-core processors

• Single off-chip voltage regulator module (VRM)• One or more on-chip PLL clock generator modules

26

Today:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Multi-core processors

• Single off-chip voltage regulator module (VRM)• One or more on-chip PLL clock generator modules

Per-core DVFS adds complexity to hardware and DVFS algorithms

26

Today:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How much energy can we really save?

27

Up to now, weʼve ignored the performance loss...

We can:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How much energy can we really save?

27

Up to now, weʼve ignored the performance loss...

• use a different benchmarking methodology;

We can:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How much energy can we really save?

27

Up to now, weʼve ignored the performance loss...

• use a different benchmarking methodology;• use a different metric;

We can:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

How much energy can we really save?

27

Up to now, weʼve ignored the performance loss...

• use a different benchmarking methodology;• use a different metric;• or both.

We can:

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

28

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;

28

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;• Add idle energy

28

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;• Add idle energy

28

0

0.25

0.50

0.75

1.00

Padded vs. non-padded benchmarks

Pow

er c

onsu

mp

tion

Time

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;• Add idle energy

28

0

0.25

0.50

0.75

1.00

Padded vs. non-padded benchmarks

Pow

er c

onsu

mp

tion

Static power

Time

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;• Add idle energy

28

0

0.25

0.50

0.75

1.00

Padded vs. non-padded benchmarks

Pow

er c

onsu

mp

tion

Low frequency

Static power

Time

tlow

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;• Add idle energy

28

0

0.25

0.50

0.75

1.00

Padded vs. non-padded benchmarks

Pow

er c

onsu

mp

tion High frequency

Low frequency

Static power

Time

thigh tlow

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

• Extend shorter executions;• Add idle energy

28

0

0.25

0.50

0.75

1.00

Padded vs. non-padded benchmarks

Pow

er c

onsu

mp

tion High frequency

Low frequency

Static power

Time

thigh tlow

Idle energy

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

29

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

29

0.8 1.5 2 2.70.4

0.6

0.8

1.0

Padded energy for 181.mcf on Shanghai

Nor

mal

ised

ene

rgy

Frequency (GHz)

Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

29

0.8 1.5 2 2.70.4

0.6

0.8

1.0

Padded energy for 181.mcf on Shanghai

Nor

mal

ised

ene

rgy

Frequency (GHz)

Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

29

0.8 1.5 2 2.70.4

0.6

0.8

1.0

Padded energy for 181.mcf on Shanghai

Nor

mal

ised

ene

rgy

Frequency (GHz)

Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

29

0.8 1.5 2 2.70.4

0.6

0.8

1.0

Padded energy for 181.mcf on Shanghai

Nor

mal

ised

ene

rgy

Frequency (GHz)

Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Padding methodology

30

What about performance?

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Energy-delay product

31

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Energy-delay product

31

0.8 1.5 2 2.70.5

1.0

1.5

2.0

2.5

Padded energy-delay product for 181.mcf on Shanghai

Nor

mal

ised

pad

ded

ED

P

Frequency (GHz)

EDP 1 instance EDP 2 instances EDP 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Energy-delay product

31

0.8 1.5 2 2.70.5

1.0

1.5

2.0

2.5

Padded energy-delay product for 181.mcf on Shanghai

Nor

mal

ised

pad

ded

ED

P

Frequency (GHz)

EDP 1 instance EDP 2 instances EDP 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Energy-delay product

31

0.8 1.5 2 2.70.5

1.0

1.5

2.0

2.5

Padded energy-delay product for 181.mcf on Shanghai

Nor

mal

ised

pad

ded

ED

P

Frequency (GHz)

EDP 1 instance EDP 2 instances EDP 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Energy-delay product

31

0.8 1.5 2 2.70.5

1.0

1.5

2.0

2.5

Padded energy-delay product for 181.mcf on Shanghai

Nor

mal

ised

pad

ded

ED

P

Frequency (GHz)

EDP 1 instance EDP 2 instances EDP 4 instances

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;• 22 nm and smaller are on the horizon;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;• memory performance will continue to improve; and

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;• memory performance will continue to improve; and• entry/exit costs to/from sleep modes will improve.

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

The future...

32

• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;• memory performance will continue to improve; and• entry/exit costs to/from sleep modes will improve.

What about DVFS?

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

A glimpse

33

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

A glimpse

33

1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75

1.00

1.25

1.50

1.75

2.00

2.25

Energy, runtime and EDP for Westmere (Core i5-540M)

Nor

mal

ised

ene

rgy,

run

time

and

ED

P

Frequency (GHz)

Runtime Energy Energy-delay product

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

A glimpse

33

1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75

1.00

1.25

1.50

1.75

2.00

2.25

Energy, runtime and EDP for Westmere (Core i5-540M)

Nor

mal

ised

ene

rgy,

run

time

and

ED

P

Frequency (GHz)

Runtime Energy Energy-delay product

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

A glimpse

33

1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75

1.00

1.25

1.50

1.75

2.00

2.25

Energy, runtime and EDP for Westmere (Core i5-540M)

Nor

mal

ised

ene

rgy,

run

time

and

ED

P

Frequency (GHz)

Runtime Energy Energy-delay product

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

A glimpse

33

1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75

1.00

1.25

1.50

1.75

2.00

2.25

Energy, runtime and EDP for Westmere (Core i5-540M)

Nor

mal

ised

ene

rgy,

run

time

and

ED

P

Frequency (GHz)

Runtime Energy Energy-delay product

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

A glimpse

33

1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75

1.00

1.25

1.50

1.75

2.00

2.25

Energy, runtime and EDP for Westmere (Core i5-540M)

Nor

mal

ised

ene

rgy,

run

time

and

ED

P

Frequency (GHz)

Runtime Energy Energy-delay product

Turbo Boost

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

• Transistor scaling is causing higher proportions of static leakage power;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

• Transistor scaling is causing higher proportions of static leakage power;

• smaller core voltages mean DVFS has less range to play with;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

• Transistor scaling is causing higher proportions of static leakage power;

• smaller core voltages mean DVFS has less range to play with;

• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

• Transistor scaling is causing higher proportions of static leakage power;

• smaller core voltages mean DVFS has less range to play with;

• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;

• sleep/idle modes are becoming much more efficient;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

• Transistor scaling is causing higher proportions of static leakage power;

• smaller core voltages mean DVFS has less range to play with;

• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;

• sleep/idle modes are becoming much more efficient;• DVFS implementations on multi-core processors are

more complex and the cost-benefit is small;

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Concluding remarks

34

• Transistor scaling is causing higher proportions of static leakage power;

• smaller core voltages mean DVFS has less range to play with;

• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;

• sleep/idle modes are becoming much more efficient;• DVFS implementations on multi-core processors are

more complex and the cost-benefit is small;• Optimal energy-efficiency is achieved by running fast.

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which

could increase memory-boundedness

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which

could increase memory-boundedness

• Periodic workloads

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which

could increase memory-boundedness

• Periodic workloads– MPEG video/audio playback, analyse race-to-halt

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which

could increase memory-boundedness

• Periodic workloads– MPEG video/audio playback, analyse race-to-halt– usefulness of sleep modes

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which

could increase memory-boundedness

• Periodic workloads– MPEG video/audio playback, analyse race-to-halt– usefulness of sleep modes

• Analyse and compare the trends for embedded devices

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Future direction

35

• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which

could increase memory-boundedness

• Periodic workloads– MPEG video/audio playback, analyse race-to-halt– usefulness of sleep modes

• Analyse and compare the trends for embedded devices– Mobile phones and netbooks

© NICTA 2010

Questions?

© NICTA 2010

Questions?

???

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Intel Atom N270

37

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Intel Atom N270

37

0.8 1.067 1.333 1.60.8

1.0

1.2

1.4

1.6

1.8

Energy, runtime and EDP for Atom N270

Nor

mal

ised

ene

rgy,

run

timea

nd E

DP

Frequency (GHz)

Runtime Energy EDP

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Intel Atom N270

37

0.8 1.067 1.333 1.60.8

1.0

1.2

1.4

1.6

1.8

Energy, runtime and EDP for Atom N270

Nor

mal

ised

ene

rgy,

run

timea

nd E

DP

Frequency (GHz)

Runtime Energy EDP

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Intel Atom N270

37

0.8 1.067 1.333 1.60.8

1.0

1.2

1.4

1.6

1.8

Energy, runtime and EDP for Atom N270

Nor

mal

ised

ene

rgy,

run

timea

nd E

DP

Frequency (GHz)

Runtime Energy EDP

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Intel Atom N270

37

0.8 1.067 1.333 1.60.8

1.0

1.2

1.4

1.6

1.8

Energy, runtime and EDP for Atom N270

Nor

mal

ised

ene

rgy,

run

timea

nd E

DP

Frequency (GHz)

Runtime Energy EDP

© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35

Intel Core 2 Duo (Ultra-low Voltage)

38

0.8

1.0

1.2

1.4

1.6

1.8

2.0

0.8 1.2 1.4 1.6

Energy, runtime and EDP for MacBook Air (Core2 Duo, ULV)

Nor

mal

ised

ene

rgy,

run

timea

nd E

DP

Frequency (GHz)

Runtime Energy EDP

From imagination to impact

From imagination to impact

From imagination to impact

From imagination to impact