leadout: composing low-overhead techniques...

Post on 18-Mar-2018

222 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Universityof Illinois

http://iacoma.cs.uiuc.edu/

LeadOut: Composing Low-Overhead Techniques for Single-Thread Performance

Brian Greskamp, Ulya Karpuzcu, Josep Torrellas

100

1000

10000

100000

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

Year

100

1K

10K

100KTo

p SP

ECin

t92

Rat

e

Ulya Karpuzcu LeadOut

Per-Thread Performance Trend

2

100

1000

10000

100000

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

Year

100

1K

10K

100KTo

p SP

ECin

t92

Rat

e

Ulya Karpuzcu LeadOut

Per-Thread Performance Trend

2

Historical

100

1000

10000

100000

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

Year

100

1K

10K

100KTo

p SP

ECin

t92

Rat

e

Ulya Karpuzcu LeadOut

Per-Thread Performance Trend

2

Historical

Contemporary

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

• Sequential applications need fast cores

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

• Sequential applications need fast cores

• Throughput applications demand more cores

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

• Sequential applications need fast cores

• Throughput applications demand more cores

• Amdahl’s Law: Most applications need some fast cores

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

• Sequential applications need fast cores

• Throughput applications demand more cores

• Amdahl’s Law: Most applications need some fast cores

• Faster cores without compromising core count?

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

• Sequential applications need fast cores

• Throughput applications demand more cores

• Amdahl’s Law: Most applications need some fast cores

• Faster cores without compromising core count?

• Configurable Timing Speculation

Ulya Karpuzcu LeadOut

Near-Future CMP Environment

3

• Sequential applications need fast cores

• Throughput applications demand more cores

• Amdahl’s Law: Most applications need some fast cores

• Faster cores without compromising core count?

• Configurable Timing Speculation

• V/f Boosting

Ulya Karpuzcu LeadOut

Timing Speculation (TS)

4

Ulya Karpuzcu LeadOut

Timing Speculation (TS)

4

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

Rat

ed

Rat

ed

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

TSTS

• Increase f at constant V

Rat

ed

Rat

ed

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

TSTS

• Increase f at constant V

Rat

ed

Rat

edTiming errors

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

TSTS

• Increase f at constant V• Support for error detection and correction

Rat

ed

Rat

edTiming errors

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

Gain

TSTS

• Increase f at constant V• Support for error detection and correction

Rat

ed

Rat

edTiming errors

Boost core frequency beyond nominal at constant supply voltage

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

Gain

TSTS

• Increase f at constant V• Support for error detection and correction

Rat

ed

Rat

edTiming errors

Boost core frequency beyond nominal at constant supply voltage

Assuming a high P/T headroom, PE limits performance gain

Ulya Karpuzcu LeadOut

Err

or

Rate

(P

E)

FreqP

erf

orm

an

ce

Freq

Timing Speculation (TS)

4

Gain

TSTS

• Increase f at constant V• Support for error detection and correction

Rat

ed

Rat

edTiming errors

Boost core frequency beyond nominal at constant supply voltage

Assuming a high P/T headroom, PE limits performance gain

Ulya Karpuzcu LeadOut 5

Voltage-Frequency Boosting

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)

FreqPe

rfor

man

ceP E

Freq

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)

FreqPe

rfor

man

ceP E

Freq

Rated

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)

FreqPe

rfor

man

ceP E

Freq

Rated V/f Boosting

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)

FreqPe

rfor

man

ceP E

Freq

Rated V/f Boosting

Gai

n

Ulya Karpuzcu LeadOut 5

Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)

FreqPe

rfor

man

ceP E

Freq

Rated V/f Boosting

Gai

n

Assuming a high P/T headroom, V limits performance gain

Ulya Karpuzcu LeadOut 6

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

• Available P/T headroom remains untapped

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

• Available P/T headroom remains untapped

• TS and V/f boosting are complementary

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

• Available P/T headroom remains untapped

• TS and V/f boosting are complementary

• Bounded by different constraints

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

• Available P/T headroom remains untapped

• TS and V/f boosting are complementary

• Bounded by different constraints

• LeadOut

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

• Available P/T headroom remains untapped

• TS and V/f boosting are complementary

• Bounded by different constraints

• LeadOut

• Synergistically combine TS and V/f Boosting

Contribution: LeadOut

Ulya Karpuzcu LeadOut 6

• Individual application of TS or V/f Boosting is suboptimal

• Unable to bring the multicore up to its P/T envelope

• Available P/T headroom remains untapped

• TS and V/f boosting are complementary

• Bounded by different constraints

• LeadOut

• Synergistically combine TS and V/f Boosting

• Speed-ups for single thread performance multiply

Contribution: LeadOut

Ulya Karpuzcu LeadOut

Example TS Architecture [PACT07]: Paceline

7

Many-Core CMP

Ulya Karpuzcu LeadOut

OS sees: One core

8

CheckerLeader

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut

Critical ThreadOS sees: One core

8

CheckerLeader

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut

Critical ThreadOS sees: One core

8

CheckerLeader

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 9

Leader Checker

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 9

Leader CheckerSpeculative Clock Rated ClockExample TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 9

Leader CheckerSpeculative Clock Rated Clock

=?

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 9

Leader CheckerSpeculative Clock Rated Clock

=?=?

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 9

Leader CheckerSpeculative Clock Rated Clock

=?=?

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 9

Leader CheckerSpeculative Clock Rated Clock

=?=?=?

Example TS Architecture [PACT07]: Paceline

Rated Clock Rated Clock

=?

Thread 0 Thread 1

OS sees: Two cores

Ulya Karpuzcu LeadOut 10

Example TS Architecture [PACT07]: Paceline

Ulya Karpuzcu LeadOut 11

Cor

e 0

Example V/f Boosting: Intel Turbo Boost

Ulya Karpuzcu LeadOut 11

Nominal

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3

Cor

e 0

Example V/f Boosting: Intel Turbo Boost

Ulya Karpuzcu LeadOut 11

Nominal

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3

Lightly Threaded Workload

Turbo

Cor

e 0

Example V/f Boosting: Intel Turbo Boost

Ulya Karpuzcu LeadOut 11

Nominal

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3

Lightly Threaded Workload

Turbo

Cor

e 0

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3

Example V/f Boosting: Intel Turbo Boost

Ulya Karpuzcu LeadOut 11

Nominal

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3

Lightly Threaded Workload

Turbo

Cor

e 0

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3Inactive cores

power-gated

Example V/f Boosting: Intel Turbo Boost

Ulya Karpuzcu LeadOut 11

Nominal

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3

Lightly Threaded Workload

Turbo

Cor

e 0

Freq

uenc

y

Cor

e 0

Cor

e 1

Cor

e 2

Cor

e 3Inactive cores

power-gated

Example V/f Boosting: Intel Turbo Boost

Ulya Karpuzcu LeadOut

Composing the Techniques

12

Ulya Karpuzcu LeadOut

Composing the Techniques

12

The techniques are orthogonal

Ulya Karpuzcu LeadOut

Composing the Techniques

• Suppose as much P/T headroom as necessary

12

The techniques are orthogonal

Ulya Karpuzcu LeadOut

Composing the Techniques

• Suppose as much P/T headroom as necessary

• Paceline: Performance gain constrained by PEMAX

12

The techniques are orthogonal

Ulya Karpuzcu LeadOut

Composing the Techniques

• Suppose as much P/T headroom as necessary

• Paceline: Performance gain constrained by PEMAX

• V/f Boosting: Performance gain constrained by VMAX

12

The techniques are orthogonal

Ulya Karpuzcu LeadOut 13

V/f Boosting + Paceline

Ulya Karpuzcu LeadOut 13

Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors

V/f Boosting + Paceline

FreqPe

rfor

man

ceP E

Freq

RatedV/f Boosting

Ulya Karpuzcu LeadOut 13

Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors

V/f Boosting + Paceline

FreqPe

rfor

man

ceP E

Freq

RatedV/f Boosting

Ulya Karpuzcu LeadOut 13

Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors

V/f Boosting + Paceline

V/f Boosting + Paceline

FreqPe

rfor

man

ceP E

Freq

RatedV/f Boosting

Ulya Karpuzcu LeadOut 13

Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors

V/f Boosting + Paceline

V/f Boosting + Paceline

Gai

n

Ulya Karpuzcu LeadOut 14

Composing the Techniques

Ulya Karpuzcu LeadOut 14

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 15

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 15

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 15

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 15

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 15

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 16

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 16

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 16

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 16

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 16

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 17

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 17

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 17

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 17

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 17

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Ulya Karpuzcu LeadOut 17

Multicore Loading

Condition

Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining

?

Multicore Loading

ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.

Gain from combining

?

Multicore Loading

Condition T/P V T/P PE T/P V/PE

Gain from combining

?

Very High ✓ ✓ ✓ Unlikely

High to Moderate ✓ ✓ ✓ Likely

Low ✓ ✓ ✓ Definitely

Composing the Techniques

Additional techniques can be applied

Ulya Karpuzcu LeadOut

LeadOut Operation

18

Ulya Karpuzcu LeadOut

LeadOut Operation

18

• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads

Ulya Karpuzcu LeadOut

LeadOut Operation

18

• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads

• Run throughput threads unoptimized

Ulya Karpuzcu LeadOut

LeadOut Operation

18

• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads

• Run throughput threads unoptimized• Optimization Problem

Ulya Karpuzcu LeadOut

LeadOut Operation

18

• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads

• Run throughput threads unoptimized• Optimization Problem

1. Which technique to use for a speed-critical thread?

Ulya Karpuzcu LeadOut

LeadOut Operation

18

• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads

• Run throughput threads unoptimized• Optimization Problem

1. Which technique to use for a speed-critical thread?2. How to optimally set V/f for chosen technique?

Ulya Karpuzcu LeadOut

LeadOut Evaluation

19

Ulya Karpuzcu LeadOut

LeadOut Evaluation

19

• Simulated 32nm CMP with 16 cores

Ulya Karpuzcu LeadOut

LeadOut Evaluation

19

• Simulated 32nm CMP with 16 cores • Detailed modeling of leakage, temperature, variation

Ulya Karpuzcu LeadOut

LeadOut Evaluation

19

• Simulated 32nm CMP with 16 cores • Detailed modeling of leakage, temperature, variation • Applications: SPECint2000 benchmarks

Ulya Karpuzcu LeadOut

LeadOut Evaluation

19

• Simulated 32nm CMP with 16 cores • Detailed modeling of leakage, temperature, variation • Applications: SPECint2000 benchmarks• 50 Monte Carlo die samples

Ulya Karpuzcu LeadOut 20

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

Ulya Karpuzcu LeadOut 21

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

Ulya Karpuzcu LeadOut 22

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

Ulya Karpuzcu LeadOut 23

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

Ulya Karpuzcu LeadOut 23

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

~20% speed-up at most at 2x power cost

Ulya Karpuzcu LeadOut 24

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

Ulya Karpuzcu LeadOut 24

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

20% speed-up at most at 2x power cost

38% speed-up at most at 3x power cost

Ulya Karpuzcu LeadOut 25

Power-Performance TradeoffPe

r Cor

e Po

wer

(W)

Performance

Increasing Load4

6

8

10

12

11

13

14

1 1.1 1.2 1.3 1.4

Unoptimized

Paceline

V/f-Boost

V/f-Boost + Paceline

The techniques are orthogonal

Ulya Karpuzcu LeadOut 26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

• Including V/f Boosting with activity migration

26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

• Including V/f Boosting with activity migration

• Sensitivity analysis

26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

• Including V/f Boosting with activity migration

• Sensitivity analysis

• Thermal design points, power grid designs, guardband

26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

• Including V/f Boosting with activity migration

• Sensitivity analysis

• Thermal design points, power grid designs, guardband

• A hierarchical controller design to dynamically set

26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

• Including V/f Boosting with activity migration

• Sensitivity analysis

• Thermal design points, power grid designs, guardband

• A hierarchical controller design to dynamically set

• Technique to apply

26

Also in the Paper

Ulya Karpuzcu LeadOut

• Detailed analysis for different load conditions

• Including V/f Boosting with activity migration

• Sensitivity analysis

• Thermal design points, power grid designs, guardband

• A hierarchical controller design to dynamically set

• Technique to apply

• Per core V/f assignment for the chosen technique

26

Also in the Paper

Ulya Karpuzcu LeadOut 27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

• V/f Boosting and Timing Speculation

27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

• V/f Boosting and Timing Speculation

• Individual application of any technique suboptimal

27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

• V/f Boosting and Timing Speculation

• Individual application of any technique suboptimal

• Techniques are complementary

27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

• V/f Boosting and Timing Speculation

• Individual application of any technique suboptimal

• Techniques are complementary

• LeadOut: A highly-configurable CMP

27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

• V/f Boosting and Timing Speculation

• Individual application of any technique suboptimal

• Techniques are complementary

• LeadOut: A highly-configurable CMP

• Combining V/f Boosting and TS synergistically

27

Conclusion

Ulya Karpuzcu LeadOut

• Two low overhead techniques for sequential acceleration

• V/f Boosting and Timing Speculation

• Individual application of any technique suboptimal

• Techniques are complementary

• LeadOut: A highly-configurable CMP

• Combining V/f Boosting and TS synergistically

• 34% thread speedup at 220% power increase

27

Conclusion

Universityof Illinois

http://iacoma.cs.uiuc.edu/

LeadOut: Composing Low-Overhead Techniques for Single-Thread Performance

Brian Greskamp, Ulya Karpuzcu, Josep Torrellas

top related