leadout: composing low-overhead techniques...
Post on 18-Mar-2018
222 Views
Preview:
TRANSCRIPT
Universityof Illinois
http://iacoma.cs.uiuc.edu/
LeadOut: Composing Low-Overhead Techniques for Single-Thread Performance
Brian Greskamp, Ulya Karpuzcu, Josep Torrellas
100
1000
10000
100000
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
Year
100
1K
10K
100KTo
p SP
ECin
t92
Rat
e
Ulya Karpuzcu LeadOut
Per-Thread Performance Trend
2
100
1000
10000
100000
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
Year
100
1K
10K
100KTo
p SP
ECin
t92
Rat
e
Ulya Karpuzcu LeadOut
Per-Thread Performance Trend
2
Historical
100
1000
10000
100000
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
Year
100
1K
10K
100KTo
p SP
ECin
t92
Rat
e
Ulya Karpuzcu LeadOut
Per-Thread Performance Trend
2
Historical
Contemporary
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
• Sequential applications need fast cores
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
• Sequential applications need fast cores
• Throughput applications demand more cores
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
• Sequential applications need fast cores
• Throughput applications demand more cores
• Amdahl’s Law: Most applications need some fast cores
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
• Sequential applications need fast cores
• Throughput applications demand more cores
• Amdahl’s Law: Most applications need some fast cores
• Faster cores without compromising core count?
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
• Sequential applications need fast cores
• Throughput applications demand more cores
• Amdahl’s Law: Most applications need some fast cores
• Faster cores without compromising core count?
• Configurable Timing Speculation
Ulya Karpuzcu LeadOut
Near-Future CMP Environment
3
• Sequential applications need fast cores
• Throughput applications demand more cores
• Amdahl’s Law: Most applications need some fast cores
• Faster cores without compromising core count?
• Configurable Timing Speculation
• V/f Boosting
Ulya Karpuzcu LeadOut
Timing Speculation (TS)
4
Ulya Karpuzcu LeadOut
Timing Speculation (TS)
4
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
Rat
ed
Rat
ed
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
TSTS
• Increase f at constant V
Rat
ed
Rat
ed
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
TSTS
• Increase f at constant V
Rat
ed
Rat
edTiming errors
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
TSTS
• Increase f at constant V• Support for error detection and correction
Rat
ed
Rat
edTiming errors
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
Gain
TSTS
• Increase f at constant V• Support for error detection and correction
Rat
ed
Rat
edTiming errors
Boost core frequency beyond nominal at constant supply voltage
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
Gain
TSTS
• Increase f at constant V• Support for error detection and correction
Rat
ed
Rat
edTiming errors
Boost core frequency beyond nominal at constant supply voltage
Assuming a high P/T headroom, PE limits performance gain
Ulya Karpuzcu LeadOut
Err
or
Rate
(P
E)
FreqP
erf
orm
an
ce
Freq
Timing Speculation (TS)
4
Gain
TSTS
• Increase f at constant V• Support for error detection and correction
Rat
ed
Rat
edTiming errors
Boost core frequency beyond nominal at constant supply voltage
Assuming a high P/T headroom, PE limits performance gain
Ulya Karpuzcu LeadOut 5
Voltage-Frequency Boosting
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)
FreqPe
rfor
man
ceP E
Freq
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)
FreqPe
rfor
man
ceP E
Freq
Rated
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)
FreqPe
rfor
man
ceP E
Freq
Rated V/f Boosting
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)
FreqPe
rfor
man
ceP E
Freq
Rated V/f Boosting
Gai
n
Ulya Karpuzcu LeadOut 5
Voltage-Frequency BoostingBoost core frequency beyond nominal by increasing V No timing errors (PE = 0)
FreqPe
rfor
man
ceP E
Freq
Rated V/f Boosting
Gai
n
Assuming a high P/T headroom, V limits performance gain
Ulya Karpuzcu LeadOut 6
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
• Available P/T headroom remains untapped
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
• Available P/T headroom remains untapped
• TS and V/f boosting are complementary
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
• Available P/T headroom remains untapped
• TS and V/f boosting are complementary
• Bounded by different constraints
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
• Available P/T headroom remains untapped
• TS and V/f boosting are complementary
• Bounded by different constraints
• LeadOut
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
• Available P/T headroom remains untapped
• TS and V/f boosting are complementary
• Bounded by different constraints
• LeadOut
• Synergistically combine TS and V/f Boosting
Contribution: LeadOut
Ulya Karpuzcu LeadOut 6
• Individual application of TS or V/f Boosting is suboptimal
• Unable to bring the multicore up to its P/T envelope
• Available P/T headroom remains untapped
• TS and V/f boosting are complementary
• Bounded by different constraints
• LeadOut
• Synergistically combine TS and V/f Boosting
• Speed-ups for single thread performance multiply
Contribution: LeadOut
Ulya Karpuzcu LeadOut
Example TS Architecture [PACT07]: Paceline
7
Many-Core CMP
Ulya Karpuzcu LeadOut
OS sees: One core
8
CheckerLeader
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut
Critical ThreadOS sees: One core
8
CheckerLeader
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut
Critical ThreadOS sees: One core
8
CheckerLeader
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 9
Leader Checker
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 9
Leader CheckerSpeculative Clock Rated ClockExample TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 9
Leader CheckerSpeculative Clock Rated Clock
=?
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 9
Leader CheckerSpeculative Clock Rated Clock
=?=?
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 9
Leader CheckerSpeculative Clock Rated Clock
=?=?
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 9
Leader CheckerSpeculative Clock Rated Clock
=?=?=?
Example TS Architecture [PACT07]: Paceline
Rated Clock Rated Clock
=?
Thread 0 Thread 1
OS sees: Two cores
Ulya Karpuzcu LeadOut 10
Example TS Architecture [PACT07]: Paceline
Ulya Karpuzcu LeadOut 11
Cor
e 0
Example V/f Boosting: Intel Turbo Boost
Ulya Karpuzcu LeadOut 11
Nominal
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3
Cor
e 0
Example V/f Boosting: Intel Turbo Boost
Ulya Karpuzcu LeadOut 11
Nominal
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3
Lightly Threaded Workload
Turbo
Cor
e 0
Example V/f Boosting: Intel Turbo Boost
Ulya Karpuzcu LeadOut 11
Nominal
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3
Lightly Threaded Workload
Turbo
Cor
e 0
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3
Example V/f Boosting: Intel Turbo Boost
Ulya Karpuzcu LeadOut 11
Nominal
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3
Lightly Threaded Workload
Turbo
Cor
e 0
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3Inactive cores
power-gated
Example V/f Boosting: Intel Turbo Boost
Ulya Karpuzcu LeadOut 11
Nominal
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3
Lightly Threaded Workload
Turbo
Cor
e 0
Freq
uenc
y
Cor
e 0
Cor
e 1
Cor
e 2
Cor
e 3Inactive cores
power-gated
Example V/f Boosting: Intel Turbo Boost
Ulya Karpuzcu LeadOut
Composing the Techniques
12
Ulya Karpuzcu LeadOut
Composing the Techniques
12
The techniques are orthogonal
Ulya Karpuzcu LeadOut
Composing the Techniques
• Suppose as much P/T headroom as necessary
12
The techniques are orthogonal
Ulya Karpuzcu LeadOut
Composing the Techniques
• Suppose as much P/T headroom as necessary
• Paceline: Performance gain constrained by PEMAX
12
The techniques are orthogonal
Ulya Karpuzcu LeadOut
Composing the Techniques
• Suppose as much P/T headroom as necessary
• Paceline: Performance gain constrained by PEMAX
• V/f Boosting: Performance gain constrained by VMAX
12
The techniques are orthogonal
Ulya Karpuzcu LeadOut 13
V/f Boosting + Paceline
Ulya Karpuzcu LeadOut 13
Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors
V/f Boosting + Paceline
FreqPe
rfor
man
ceP E
Freq
RatedV/f Boosting
Ulya Karpuzcu LeadOut 13
Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors
V/f Boosting + Paceline
FreqPe
rfor
man
ceP E
Freq
RatedV/f Boosting
Ulya Karpuzcu LeadOut 13
Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors
V/f Boosting + Paceline
V/f Boosting + Paceline
FreqPe
rfor
man
ceP E
Freq
RatedV/f Boosting
Ulya Karpuzcu LeadOut 13
Boost core frequency beyond nominal by increasing V; tolerating occasional timing errors
V/f Boosting + Paceline
V/f Boosting + Paceline
Gai
n
Ulya Karpuzcu LeadOut 14
Composing the Techniques
Ulya Karpuzcu LeadOut 14
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 15
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 15
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 15
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 15
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 15
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 16
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 16
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 16
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 16
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 16
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 17
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 17
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 17
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 17
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 17
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Ulya Karpuzcu LeadOut 17
Multicore Loading
Condition
Bounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding ConstraintsBounding Constraints Gain from combining
?
Multicore Loading
ConditionV/f BoostV/f Boost PacelinePaceline Compos.Compos.
Gain from combining
?
Multicore Loading
Condition T/P V T/P PE T/P V/PE
Gain from combining
?
Very High ✓ ✓ ✓ Unlikely
High to Moderate ✓ ✓ ✓ Likely
Low ✓ ✓ ✓ Definitely
Composing the Techniques
Additional techniques can be applied
Ulya Karpuzcu LeadOut
LeadOut Operation
18
Ulya Karpuzcu LeadOut
LeadOut Operation
18
• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads
Ulya Karpuzcu LeadOut
LeadOut Operation
18
• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads
• Run throughput threads unoptimized
Ulya Karpuzcu LeadOut
LeadOut Operation
18
• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads
• Run throughput threads unoptimized• Optimization Problem
Ulya Karpuzcu LeadOut
LeadOut Operation
18
• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads
• Run throughput threads unoptimized• Optimization Problem
1. Which technique to use for a speed-critical thread?
Ulya Karpuzcu LeadOut
LeadOut Operation
18
• At any given time, CMP executes a mix of speed-critical and throughput-oriented threads
• Run throughput threads unoptimized• Optimization Problem
1. Which technique to use for a speed-critical thread?2. How to optimally set V/f for chosen technique?
Ulya Karpuzcu LeadOut
LeadOut Evaluation
19
Ulya Karpuzcu LeadOut
LeadOut Evaluation
19
• Simulated 32nm CMP with 16 cores
Ulya Karpuzcu LeadOut
LeadOut Evaluation
19
• Simulated 32nm CMP with 16 cores • Detailed modeling of leakage, temperature, variation
Ulya Karpuzcu LeadOut
LeadOut Evaluation
19
• Simulated 32nm CMP with 16 cores • Detailed modeling of leakage, temperature, variation • Applications: SPECint2000 benchmarks
Ulya Karpuzcu LeadOut
LeadOut Evaluation
19
• Simulated 32nm CMP with 16 cores • Detailed modeling of leakage, temperature, variation • Applications: SPECint2000 benchmarks• 50 Monte Carlo die samples
Ulya Karpuzcu LeadOut 20
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
Ulya Karpuzcu LeadOut 21
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
Ulya Karpuzcu LeadOut 22
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
Ulya Karpuzcu LeadOut 23
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
Ulya Karpuzcu LeadOut 23
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
~20% speed-up at most at 2x power cost
Ulya Karpuzcu LeadOut 24
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
Ulya Karpuzcu LeadOut 24
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
20% speed-up at most at 2x power cost
38% speed-up at most at 3x power cost
Ulya Karpuzcu LeadOut 25
Power-Performance TradeoffPe
r Cor
e Po
wer
(W)
Performance
Increasing Load4
6
8
10
12
11
13
14
1 1.1 1.2 1.3 1.4
Unoptimized
Paceline
V/f-Boost
V/f-Boost + Paceline
The techniques are orthogonal
Ulya Karpuzcu LeadOut 26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
• Including V/f Boosting with activity migration
26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
• Including V/f Boosting with activity migration
• Sensitivity analysis
26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
• Including V/f Boosting with activity migration
• Sensitivity analysis
• Thermal design points, power grid designs, guardband
26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
• Including V/f Boosting with activity migration
• Sensitivity analysis
• Thermal design points, power grid designs, guardband
• A hierarchical controller design to dynamically set
26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
• Including V/f Boosting with activity migration
• Sensitivity analysis
• Thermal design points, power grid designs, guardband
• A hierarchical controller design to dynamically set
• Technique to apply
26
Also in the Paper
Ulya Karpuzcu LeadOut
• Detailed analysis for different load conditions
• Including V/f Boosting with activity migration
• Sensitivity analysis
• Thermal design points, power grid designs, guardband
• A hierarchical controller design to dynamically set
• Technique to apply
• Per core V/f assignment for the chosen technique
26
Also in the Paper
Ulya Karpuzcu LeadOut 27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
• V/f Boosting and Timing Speculation
27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
• V/f Boosting and Timing Speculation
• Individual application of any technique suboptimal
27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
• V/f Boosting and Timing Speculation
• Individual application of any technique suboptimal
• Techniques are complementary
27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
• V/f Boosting and Timing Speculation
• Individual application of any technique suboptimal
• Techniques are complementary
• LeadOut: A highly-configurable CMP
27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
• V/f Boosting and Timing Speculation
• Individual application of any technique suboptimal
• Techniques are complementary
• LeadOut: A highly-configurable CMP
• Combining V/f Boosting and TS synergistically
27
Conclusion
Ulya Karpuzcu LeadOut
• Two low overhead techniques for sequential acceleration
• V/f Boosting and Timing Speculation
• Individual application of any technique suboptimal
• Techniques are complementary
• LeadOut: A highly-configurable CMP
• Combining V/f Boosting and TS synergistically
• 34% thread speedup at 220% power increase
27
Conclusion
Universityof Illinois
http://iacoma.cs.uiuc.edu/
LeadOut: Composing Low-Overhead Techniques for Single-Thread Performance
Brian Greskamp, Ulya Karpuzcu, Josep Torrellas
top related