embracing heterogeneity with dynamic core boosting
DESCRIPTION
Embracing Heterogeneity with Dynamic Core Boosting. Hyoun Kyu Cho and Scott Mahlke. University of Michigan. May 20, 2014. Parallel Programming. Core1. Core2. Workload. Core3. Core4. Workload Imbalance Among Threads. Asymmetric S/W Control flow divergence - PowerPoint PPT PresentationTRANSCRIPT
University of MichiganElectrical Engineering and Computer Science1
Embracing Heterogeneity with Dynamic Core Boosting
Hyoun Kyu Cho and Scott Mahlke
University of Michigan
May 20, 2014
University of MichiganElectrical Engineering and Computer Science2
Parallel Programming
Core1
Core2
Core3
Core4
Workload
University of MichiganElectrical Engineering and Computer Science3
Workload Imbalance Among Threads
• Asymmetric S/W– Control flow divergence– Non-deterministic memory
latencies– Synchronization operations
• Asymmetric H/W– Heterogeneous multicores– Core-to-core process variation
University of MichiganElectrical Engineering and Computer Science4
Performance Impact of Asymmetric H/W
• Symmetric 8 Cores vs. 8 Cores w/ variations
University of MichiganElectrical Engineering and Computer Science5
CPU Time Wasted for SynchronizationHomogeneous Heterogeneous
University of MichiganElectrical Engineering and Computer Science6
Thread Criticality due to Workload Imbalance
T1
T2
T3
T4
T5
IdleBarrier
time
T1
T2
T3
T4
T5time
University of MichiganElectrical Engineering and Computer Science7
Accelerating Critical Path w/ Core Boosting
T1
T2
T3
T4
T5
IdleBarrier
time
T1
T2
T3
T4
T5time
T1
T2
T3
T4
T5time
University of MichiganElectrical Engineering and Computer Science8
Modeling Workload Imbalance & Boosting
University of MichiganElectrical Engineering and Computer Science9
Boosting Assignment• Data parallel programs
• Pipeline parallel programsWorkerWorker Worker Worker Worker
Stage1 Stage2 Stage3 Stage4
University of MichiganElectrical Engineering and Computer Science10
Boosting Data Parallel Programs• Greedy scheduling
University of MichiganElectrical Engineering and Computer Science11
Boosting Pipeline Parallel Programs• Epoch-based scheduling
– Monitors CPU utilization with H/W performance counter– Assigns boosting budget at the end of epoch
University of MichiganElectrical Engineering and Computer Science12
Dynamic Core Boosting
University of MichiganElectrical Engineering and Computer Science13
Progress Monitoring Example … pthread_barrier_wait(barrier); period = calc_period_LID_007(start, end); for ( i = start ; i < end ; i++ ) { … compute(…); if ( side_exit ) { SET_PROGRESS_TO(MAX_PROGRESS_007); break; } if ( ( ( end – i ) % period ) == 0 ) PROGRESS_STEP_FORWARD; } pthread_barrier_wait(barrier); …
University of MichiganElectrical Engineering and Computer Science14
Evaluation Methodology• Asymmetry emulation with Dynamic Binary Translation
– Slow down proportionally instead of accelerating• 8 cores with frequency variation
– • 1 core boosted, boosting rate = 1.5x• Compares
– Heterogeneous– Reactive– DCB
University of MichiganElectrical Engineering and Computer Science15
Performance Improvementbla
cksc
holes
body
track
cann
eal
dedu
pfa
cesim ferre
tflu
idanim
ate
raytr
ace
strea
mcluste
rsw
aptio
nsx2
64g.
mean
0.5
0.6
0.7
0.8
0.9
1.0Heterogeneous Reactive DCB
Norm
aliz
ed E
xecu
tion
Tim
e
University of MichiganElectrical Engineering and Computer Science16
Synchronization Overheadsbl
acks
chol
esbo
dytra
ckca
nnea
lde
dup
face
sim
ferre
tflu
idan
imat
era
ytra
cest
ream
clus
ter
swap
tions
x264
g.m
ean
0%10%20%30%40%50%60%70%80%
Heterogeneous Reactive DCB
Rel
ativ
e C
PU T
ime
University of MichiganElectrical Engineering and Computer Science17
Thread Arrival Time
University of MichiganElectrical Engineering and Computer Science18
Conclusion• DCB mitigates workload imbalance in performance
asymmetric CMPs– Accelerating critical threads– Coordinating compiler, runtime, and architecture for
near-optimal assignment
• Overall, improves performance by 33%, outperforming a reactive boosting scheme by 10%
University of MichiganElectrical Engineering and Computer Science19
Thank you!
University of MichiganElectrical Engineering and Computer Science20
Core Boosting with Frequency Scaling
Transition time < 10ns [Dreslinski`12]
University of MichiganElectrical Engineering and Computer Science21
Asymmetry Emulation with DBT
University of MichiganElectrical Engineering and Computer Science22
Evaluation Platform Accuracybl
acks
chol
esbo
dytra
ckca
nnea
lde
dup
face
sim
ferre
tflu
idan
imat
era
ytra
cest
ream
clus
ter
swap
tions
x264
mea
n
0%
2%
4%
6%
8%
10%
12%
Rel
ativ
e Er
ror