reducing peak power with a table-driven adaptive processor core

61
Reducing Peak Power with a Table-Driven Adaptive Processor Core Vasileios Kontorinis (UCSD) Amirali Shayan (UCSD) Rakesh Kumar (UIUC) Dean Tullsen (UCSD)

Upload: nitesh

Post on 23-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Reducing Peak Power with a Table-Driven Adaptive Processor Core. Vasileios Kontorinis (UCSD) Amirali Shayan (UCSD) Rakesh Kumar (UIUC) Dean Tullsen (UCSD). The Power Problem. $. $. $. $. Power related issues : Wall power costs Processor design constraints - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Reducing Peak Power with a Table-Driven Adaptive Processor Core

Vasileios Kontorinis (UCSD)Amirali Shayan (UCSD)Rakesh Kumar (UIUC)Dean Tullsen (UCSD)

Page 2: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

The Power Problem

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 2

Power related issues: Wall power costs Processor design

constraints Power delivery

network Thermals Packaging Reliability

$

$$$

Page 3: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

The Power Problem

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 3

Power related issues: Wall power costs Processor design

constraints Power delivery

network Thermals Packaging Reliability

$

$$$

Average Power

Page 4: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

The Power Problem

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 4

Power related issues: Wall power costs Processor design

constraints Power delivery

network Thermals Packaging ReliabilityPeak

Power

Page 5: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Theoretical Peak vs Execution Peak

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 5

Time

Power

Page 6: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Theoretical Peak vs Execution Peak

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 6

Time

PowerAverage

Page 7: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Theoretical Peak vs Execution Peak

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 7

Time

PowerAverage

Execution Peak

Page 8: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Theoretical Peak vs Execution Peak

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 8

Time

PowerAverage

Execution Peak

Theoretical Peak

Page 9: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 9

Our Approach Motivation:

Most applications have few resource bottlenecks. Ample opportunity to disable core components

without hurting performance Goal:

Partially disable core components to limit Peak Power

Method: Each resource can be maximally configured Not all resources maximized at the same

time (centralized control mechanism).

Page 10: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp nas Average1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.80 Max configuration

Spee

dup

over

min

con

fig

lrbBZI1WhgkbAuCtDfZuWf

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 10

Motivating Experiment:

MIN MAXINT inst. Queue 16 32

FP Queue 16 32

INT regs 64 128

FP regs 64 128

INT alus 2 4

FP alus 1 3

LdSt units 1 2

ROB 128 256

Icache 4K 32K

Dcache 4K 32K

Min config

Max config

We reduce 10 core resources

Page 11: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp nas Average1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.80All_param_max1_param_max

Spee

dup

over

min

con

fig

NQhAksbd1jSmhfp1uhb92N KbB4I5TCpYydrZAwpoUZUP

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 11

Motivating Experiment: We reduce 10

core resources We selectively

maximize resources

1 out of 10 parameters max

Min config

10 params max

Page 12: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp nas Average1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.802_param_max1_param_max

Spee

dup

over

min

con

fig

eKdNSu0FHcC6DkjpnzK4f0

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 12

Motivating Experiment:

2 out of 10 parameters max

Min config

10 params max

We reduce 10 core resources

We selectively maximize resources

Page 13: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp nas Average1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.802_param_max1_param_max

Spee

dup

over

min

con

fig

idcm4BKIiMd0CxmxzHdlCF

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 13

Motivating Experiment:

3 out of 10 parameters max

Min config

10 params max

We reduce 10 core resources

We selectively maximize resources

Page 14: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 14

Motivating Experiment:

Media Olden Spec-int Spec-fp nas Average1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.802_param_max1_param_max

Spee

dup

over

min

con

fig

idcm4BKIiMd0CxmxzHdlCF

We reduce 10 core resources

We selectively maximize resources

We can aggressively reduce core components and give up little performance

Page 15: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 15

Outline

Introduction Architecture Results Conclusions

Page 16: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 16

Outline

Introduction Architecture Results Conclusions

Page 17: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 17

Baseline Architecture

Page 18: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 18

Baseline Architecture with Average Power Management

Page 19: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 19

Proposed Architecture with Peak Power Management

Page 20: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 20

Proposed Architecture with Peak Power Management

Holds possible coreconfigurations Does bookkeeping and

enforces configurations

Page 21: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Two Critical Issues

Which configurations to make available? (contents of Config ROM) How to transition among the available

configurations?(Adaptation manager policies)

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 21

Page 22: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Two Critical Issues

Which configurations to make available? (contents of Config ROM) How to transition among the available

configurations?(Adaptation manager policies)

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 22

Page 23: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Finding Appropriate Configurations

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 23

Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache

0 0 1 1 0 0 0 0 2 1

0 0 1 2 0 0 0 0 2 1

Page 24: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Finding Appropriate Configurations

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 24

Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache

0 0 1 1 0 0 0 0 2 1

0 0 1 2 0 0 0 0 2 1

Page 25: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Finding Appropriate Configurations

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 25

Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache

0 0 1 1 0 0 0 0 2 1

0 0 1 2 0 0 0 0 2 1

… … … … … … … … … …

Consider all possible configurations

69% 71%

Page 26: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 26

Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache

0 0 1 1 0 0 0 0 2 1

0 0 1 2 0 0 0 0 2 1

… … … … … … … … … …

Consider all possible configurations

Remove configs exceeding targeted peak power threshold

69% 71%

Finding Appropriate Configurations

Page 27: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Finding Appropriate Configurations

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 27

Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache

0 0 1 1 0 0 0 0 2 1

0 0 0 1 0 0 0 0 2 1

… … … … … … … … … …

Consider all possible configurations

Remove configs exceeding targeted peak power threshold

69% 68%

Page 28: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Finding Appropriate Configurations

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 28

Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache

0 0 1 1 0 0 0 0 2 1

0 0 0 1 0 0 0 0 2 1

… … … … … … … … … …

Consider all possible configurations

Remove configs exceeding targeted peak power threshold

Remove redundant configs

69% 68%

Page 29: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Contents of the Config ROM

Manageable number of configurations We find the best configuration faster

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 29

Relative power threshold

# of possible configurations

# of non-redundant configurations

70% 493 132

75% 1658 279

80% 3418 360

85% 4987 285

100% 6144 1

Page 30: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Implementation Overhead

Area: <1.25% increase(~0.5KB for Config ROM)

Peak Power: < 1.1% overhead Average Power: negligible

(infrequent epoch-based adaptation) Power-gating delays of up to 650

cycles. Verification Cost higher than non-

adaptive core, less than fully-adaptive core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 30

Page 31: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 31

Outline

Introduction Architecture Results

Dynamic Adaptation vs Static Tuning Realistic Adaptive Techniques Voltage Variation and Decoupling Capacitance

Benefits Conclusions

Page 32: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden SpecINT SpecFP NAS average0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

BEST_STATIC IDEAL_ADAPT MAX_CONF

Spe

edup

ove

r BE

ST_

STA

TIC

Dynamic Adaptation vs Static Tuning

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 32

Best Static Configuration:iqs:32. fqs:32 ialu:2 falu:1 ldst:1 ics:16KB dcs:16KB ipr:64 fpr:64

rob:256

70% of core peak

Page 33: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

parser g721d mg.big0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Media Olden SpecINT SpecFP NAS average0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

BEST_STATIC IDEAL_ADAPT MAX_CONF

Spe

edup

ove

r BE

ST_

STA

TIC

Dynamic Adaptation vs Static Tuning

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 33

INT ALUs needed

70% of core peak FP REGs

needed

Nothing needed

Best Static Configuration:iqs:32. fqs:32 ialu:2 falu:1 ldst:1 ics:16KB dcs:16KB ipr:64 fpr:64

rob:256

Page 34: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Two Critical Issues

Which configurations to make available? (contents of Config ROM) How to transition among the available

configurations?(Adaptation manager policies)

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 34

Page 35: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

When to adapt ? Which configuration to

choose ?

How to evaluate a configuration ?

Interval(INTV): every fixed interval of cycles

(2M cycles) RANDOM: randomly pick the next configuration

NONE: pick the chosen configuration, do not

evaluate

EventDriven(EVDRIV): capture phase changes by adapting when IPC or

cache misses/instr. change by more than 30% SAMPLE: sample

different configurations and pick the one with

highest instructions per cycle (ipc)

SCORE: evaluate configurations based on which provides

more of the bottleneck resource.

Choose the highest score .

AdaptiveInterval(INTVAD): mitigate

adaptation costs by extending interval when

cannot find better configurations, shrink it otherwise. (0.5M – 8M

cycles)

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 35

Page 36: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

When to adapt ? Which configuration to

choose ?

How to evaluate a configuration ?

Interval(INTV): every fixed interval of cycles

(2M cycles) RANDOM: randomly pick the next configuration

NONE: pick the chosen configuration, do not

evaluate

EventDriven(EVDRIV): capture phase changes by adapting when IPC or

cache misses/instr. change by more than 30% SAMPLE: sample

different configurations and pick the one with

highest instructions per cycle (ipc)

SCORE: evaluate configurations based on which provides

more of the bottleneck resource.

Choose the highest score.

AdaptiveInterval(INTVAD): mitigate

adaptation costs by extending interval when

cannot find better configurations, shrink it otherwise. (0.5M – 8M

cycles)

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 36

Page 37: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

When to adapt ? Which configuration to

choose ?

How to evaluate a configuration ?

Interval(INTV): every fixed interval of cycles

(2M cycles) RANDOM: randomly pick the next configuration

NONE: pick the chosen configuration, do not

evaluate

EventDriven(EVDRIV): capture phase changes by adapting when IPC or

cache misses/instr. change by more than 30% SAMPLE: sample

different configurations and pick the one with

highest instructions per cycle (ipc)

SCORE: evaluate configurations based on which provides

more of the bottleneck resource.

Choose the highest score.

AdaptiveInterval(INTVAD): mitigate

adaptation costs by extending interval when

cannot find better configurations, shrink it otherwise. (0.5M – 8M

cycles)

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 37

Page 38: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

When to adapt ? Which configuration to

choose ?

How to evaluate a configuration ?

Interval(INTV): every fixed interval of cycles

(2M cycles) RANDOM: randomly pick the next configuration

NONE: pick the chosen configuration, do not

evaluate

EventDriven(EVDRIV): capture phase changes by adapting when IPC or

cache misses/instr. change by more than 30% SAMPLE: sample

different configurations and pick the one with

highest instructions per cycle (ipc)

SCORE: evaluate configurations based on which provides

more of the bottleneck resource.

Choose the highest score.

AdaptiveInterval(INTVAD): mitigate

adaptation costs by extending interval when

cannot find better configurations, shrink it otherwise. (0.5M – 8M

cycles)

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 38

Page 39: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

When to adapt ? Which configuration to

choose ?

How to evaluate a configuration ?

Interval(INTV): every fixed interval of cycles

(2M cycles) RANDOM: randomly pick the next configuration

NONE: pick the chosen configuration, do not

evaluate

EventDriven(EVDRIV): capture phase changes by adapting when IPC or

cache misses/instr. change by more than 30% SAMPLE: sample

different configurations and pick the one with

highest instructions per cycle (ipc)

SCORE: evaluate configurations based on which provides

more of the bottleneck resource.

Choose the highest score.

AdaptiveInterval(INTVAD): mitigate

adaptation costs by extending interval when

cannot find better configurations, shrink it otherwise. (0.5M – 8M

cycles)

Realistic Adaptive Techniques

e.g. INTVAD_SCORE_SAMPLEMicro'09: Kontorinis, Shayan,

Kumar, Tullsen 39

Page 40: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp NAS average0.8

0.9

1

1.1

1.2

INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE

Spe

edup

ove

r BE

ST_

STA

TIC

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 40

Page 41: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp NAS average0.8

0.9

1

1.1

1.2

INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE

Spe

edup

ove

r BE

ST_

STA

TIC

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 41

Most configs in Config ROM perform poorly

Page 42: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp NAS average0.8

0.9

1

1.1

1.2

INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE

Spe

edup

ove

r BE

ST_

STA

TIC

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 42

SCORE marginally better than BEST_STATIC

Page 43: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Media Olden Spec-int Spec-fp NAS average0.8

0.9

1

1.1

1.2

INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE

Spe

edup

ove

r BE

ST_

STA

TIC

Realistic Adaptive Techniques

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 43

SAMPLING a big win!

Page 44: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

70% 75% 80%0.8

0.9

1.0

1.1

1.2

1.3 BEST_STATICINTVAD_SCORE_SAMPLEINTVAD_SCORE_SAMPLE_redIDEAL_ADAPTMAX_CONFIG

Spe

edup

ove

r BE

ST_

STA

TIC

Results Across Peak Power Budgets

vs Maximized Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 44

Reducing the configurations in Config ROM further improves performance

At 75% within 5% of maximized core

At 80% within 2.5% of maximized core

Peak power constraint

Page 45: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

So what have we gained?

Metrics Power efficiency

AP_ratio =

Decoupling Capacitance (% of total core area)

Voltage Variation (% of Vdd)

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 45

PowerPeak Power Average

Page 46: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

100% 85% 80% 75% 70%0

5

10

15

20

25

30 Average Peak

Peak Power Constraint

Pow

er (W

)

Power Efficiency

Both average and peak power decrease

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 46

Page 47: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

100% 85% 80% 75% 70%0

5

10

15

20

25

30 Average Peak

Peak Power Constraint

Pow

er (W

)

Power Efficiency

Both average and peak power decrease

AP_ratio improves as we constrain the peak power

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 47

AP_ratio: 56% 61% 63% 64% 67%

Page 48: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Voltage variation and Decoupling Capacitance benefits

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 48

Constant Voltage Variation

Constant Decoupling Cap.

Relative power threshold On-chip Decap (%of total Core Area)

Max. Voltage Variation (% VDD )

70% 9% 4.48%

75% 9.7% 4.80%

80% 10.5% 5.12%

85% 11.5% 5.44%

100% 15% 6.48%

Page 49: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Voltage variation and Decoupling Capacitance benefits

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 49

Constant Voltage Variation

Constant Decoupling Cap.

Relative power threshold On-chip Decap (%of total Core Area)

Max. Voltage Variation (% VDD )

70% 9% 4.48%

75% 9.7% 4.80%

80% 10.5% 5.12%

85% 11.5% 5.44%

100% 15% 6.48%

Page 50: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Voltage variation and Decoupling Capacitance benefits

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 50

Constant Voltage Variation

Constant Decoupling Cap.

Relative power threshold On-chip Decap (%of total Core Area)

Max. Voltage Variation (% VDD )

70% 9% 4.48%

75% 9.7% 4.80%

80% 10.5% 5.12%

85% 11.5% 5.44%

100% 15% 6.48%

Page 51: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Voltage variation and Decoupling Capacitance benefits

Reduced Peak Power Less required on-chip decap

Smaller Voltage VariationMicro'09: Kontorinis, Shayan,

Kumar, Tullsen 51

Constant Voltage Variation

Constant Decoupling Cap.

Relative power threshold On-chip Decap (%of total Core Area)

Max. Voltage Variation (% VDD )

70% 9% 4.48%

75% 9.7% 4.80%

80% 10.5% 5.12%

85% 11.5% 5.44%

100% 15% 6.48%

Page 52: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 52

Conclusions

Peak power is a first-class design constraint Impacts the efficiency and cost of power

delivery. Affects on-chip decoupling capacitance and

voltage variation Table-driven adaptation can be employed

to limit peak power while giving up little performance Reduces Peak power by 25% while giving up

less than 5% performance.

Page 53: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Reducing Peak Power with a Table-Driven Adaptive Processor Core

Vasileios Kontorinis (UCSD)Amirali Shayan (UCSD)Rakesh Kumar (UIUC)Dean Tullsen (UCSD)

Page 54: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Backup slides

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 54

Page 55: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Design Space

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 55

INT instruction queue 16,32 entries

FP instruction queue 16,32 entries

INT registers 64,128

FP registers 64,128

INT alus 2,4

FP alus 1,2,3

Load/Store units 1,2

Reorder Buffer 128,256 entries

Icache 1,2,4,8 ways of 4K each

Dcache 1,2,4,8 ways of 4K each

Page 56: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Multiple Config ROMs and their potential applications

Dynamic Thermal Management Hot Spot Avoidance Combat process variation Budget Peak Power across multiple

cores to maximize throughput

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 56

Page 57: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Benchmarks performing better with fewer resources

Explanation: Going further down the wrong path puts extra pressure in the memory subsystem . May negatively affect performance.

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 57

vpr-route crafty0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

BEST_STATIC IDEAL_ADAPT MAX_CONF

Spe

edup

ove

r BE

ST_

STA

TIC

Page 58: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Decoupling capacitance on die

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 58

Decoupling Capacitors

Core

Decoupling Ring

Page 59: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Delays

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 59

Page 60: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Adaptation transitionIq Fq ialu falu ldstu rob Iregs fregs Icache dcache

Config 1 32 16 2 1 1 256 128 0 32k 16k

Config 2 16 16 2 2 1 256 128 0 32k 16k

Micro'09: Kontorinis, Shayan, Kumar, Tullsen 60

Time(cyc)

AdaptationTriggered – Reg. Renaming Throttled

0 1000 2000 3000

Instructions in Iqueue

3216

Iqueue Powergating beginsIqueue

Powergating ends – Reg. Renaming restarts

Active falus

Time(cyc)

12

Falu power-up begins

Falu power-up ends

Page 61: Reducing Peak Power with a  Table-Driven Adaptive Processor Core

Reducing Peak Power with a Table-Driven Adaptive Processor Core

Vasileios Kontorinis (UCSD)Amirali Shayan (UCSD)Rakesh Kumar (UIUC)Dean Tullsen (UCSD)