predicting inter-thread cache contention on a chip multi-processor architecture
DESCRIPTION
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005. Cache Sharing in CMP. Processor Core 1. Processor Core 2. L1 $. L1 $. - PowerPoint PPT PresentationTRANSCRIPT
Predicting Inter-Thread Cache Contention on a Chip
Multi-Processor Architecture
Dhruba Chandra Fei Guo Seongbeom Kim
Yan Solihin
Electrical and Computer EngineeringNorth Carolina State University
HPCA-2005
2Chandra, Guo, Kim, Solihin - Contention Model
L2 $
Cache Sharing in CMP
L1 $
……
Processor Core 1 Processor Core 2
L1 $
3Chandra, Guo, Kim, Solihin - Contention Model
Impact of Cache Space Contention
0%50%
100%150%200%250%300%350%400%
Alo
ne
mcf
+art
mcf
+sw
im
mcf
+mst
mcf
+gzi
p
L2
Cac
he M
isse
s
Application-specific (what) Coschedule-specific (when) Significant: Up to 4X cache misses, 65% IPC reduction
Need a model to understand cache sharing impact
0%
20%
40%
60%
80%
100%
Alo
ne
mcf
+art
mcf
+sw
im
mcf
+mst
mcf
+gzi
p
mcf
's N
orm
aliz
ed I
PC
4Chandra, Guo, Kim, Solihin - Contention Model
Related Work Uniprocessor miss estimation:
Cascaval et al., LCPC 1999 Chatterjee et al., PLDI 2001
Fraguela et al., PACT 1999 Ghosh et al., TPLS 1999J. Lee at al., HPCA 2001 Vera and Xue, HPCA 2002Wassermann et al., SC 1997
Context switch impact on time-shared processor: Agarwal, ACM Trans. On Computer Systems, 1989Suh et al., ICS 2001
No model for cache sharing impact: Relatively new phenomenon: SMT, CMP Many possible access interleaving scenarios
5Chandra, Guo, Kim, Solihin - Contention Model
Contributions Inter-Thread cache contention models
2 Heuristics models (refer to the paper) 1 Analytical model
Input: circular sequence profiling for each thread Output: Predicted num cache misses per thread in a co-schedule
Validation Against a detailed CMP simulator 3.9% average error for the analytical model
Insight Temporal reuse patterns impact of cache sharing
6Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
7Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
8Chandra, Guo, Kim, Solihin - Contention Model
Assumptions One circular sequence profile per thread
Average profile yields high prediction accuracy Phase-specific profile may improve accuracy
LRU Replacement Algorithm Others are usu. LRU approximations
Threads do not share data Mostly true for serial apps Parallel apps: threads likely to be impacted uniformly
9Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability (Prob) Model Validation Case Study Conclusions
10Chandra, Guo, Kim, Solihin - Contention Model
Definitions seqX(dX,nX) = sequence of nX accesses to dX distinct
addresses by a thread X to the same cache set cseqX(dX,nX) (circular sequence) = a sequence in which
the first and the last accesses are to the same address
A B C D A E E Bcseq(4,5) cseq(1,2)
cseq(5,7)
seq(5,8)
11Chandra, Guo, Kim, Solihin - Contention Model
Circular Sequence Properties Thread X runs alone in the system:
Given a circular sequence cseqX(dX,nX), the last access is a cache miss iff dX > Assoc
Thread X shares the cache with thread Y: During cseqX(dX,nX)’s lifetime if there is a sequence of
intervening accesses seqY(dY,nY), the last access of thread X is a miss iff dX+dY > Assoc
12Chandra, Guo, Kim, Solihin - Contention Model
Example Assume a 4-way associative cache:
A B A
X’s circular sequence cseqX(2,3)
U V V W
Y’s intervening access sequence
lifetime
No cache sharing: A is a cache hitCache sharing: is A a cache hit or miss?
13Chandra, Guo, Kim, Solihin - Contention Model
Example Assume a 4-way associative cache:
A U B V V W A
A B A
X’s circular sequence cseqX(2,3)
U V V W
Y’s intervening access sequence
A U B V V A W
Cache Hit Cache Miss
seqY(2,3) intervening in cseqX’s lifetime
seqY(3,4) intervening in cseqX’s lifetime
14Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
15Chandra, Guo, Kim, Solihin - Contention Model
Inductive Probability Model For each cseqX(dX,nX) of thread X
Compute Pmiss(cseqX): the probability of the last access is a miss
Steps: Compute E(nY): expected number of intervening
accesses from thread Y during cseqX’s lifetime
For each possible dY, compute P(seq(dY, E(nY)): probability of occurrence of seq(dY, E(nY)),
If dY + dX > Assoc, add to Pmiss(cseqX)
Misses = old_misses + ∑ Pmiss(cseqX) x F(cseqX)
16Chandra, Guo, Kim, Solihin - Contention Model
Computing P(seq(dY, E(nY))) Basic Idea:
P(seq(d,n)) = A * P(seq(d-1,n)) + B * P(seq(d-1,n-1)) Where A and B are transition probabilities
Detailed steps in paper
seq(d,n)
seq(d-1,n-1) seq(d,n-1)
+ 1 access to a distinct address
+ 1 access to a non-distinct address
17Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
18Chandra, Guo, Kim, Solihin - Contention Model
Validation SESC simulator Detailed CMP + memory hierarchy
14 co-schedules of benchmarks (Spec2K and Olden) Co-schedule terminated when an app completes
CMP Cores
2 cores, each 4-issue dynamic. 3.2GHz
Base Memory
L1 I/D (private): each WB, 32KB, 4way, 64B line
L2 Unified (shared): WB, 512 KB, 8way, 64B line
L2 replacement: LRU
19Chandra, Guo, Kim, Solihin - Contention Model
ValidationCo-schedule Actual Miss
IncreasePrediction Error
gzip
+ applu
243% -25%
11% 2%
gzip
+ apsi
180% -9%
0% 0%
mcf
+ art
296% 7%
0% 0%
mcf
+ gzip
18% 7%
102% 22%
mcf
+ swim
59% -7%
0% 0%
Error =
(PM-AM)/AM
Larger error happens when miss increase is very large Overall, the model is accurate
20Chandra, Guo, Kim, Solihin - Contention Model
Other Observations Based on how vulnerable to cache sharing impact:
Highly vulnerable (mcf, gzip) Not vulnerable (art, apsi, swim) Somewhat / sometimes vulnerable (applu, equake, perlbmk,
mst)
Prediction error: Very small, except for highly vulnerable apps 3.9% (average), 25% (maximum) Also small for different cache associativities and sizes
21Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
22Chandra, Guo, Kim, Solihin - Contention Model
Case Study Profile approx. by geometric progression
F(cseq(1,*)) F(cseq(2,*)) F(cseq(3,*)) … F(cseq(A,*)) …
Z Zr Zr2 … ZrA … Z = amplitude 0 < r < 1 = common ratio Larger r larger working set
Impact of interfering thread on the base thread? Fix the base thread Interfering thread: vary
Miss frequency = # misses / time Reuse frequency = # hits / time
23Chandra, Guo, Kim, Solihin - Contention Model
Base Thread: r = 0.5 (Small WS)
Base thread: Not vulnerable to interfering thread’s miss frequency Vulnerable to interfering thread’s reuse frequency
1 1.5 2 2.5 3 3.5 4
Multiplying Factor
L2
Cac
he
Mis
ses
Miss Freq Reuse Freq
24Chandra, Guo, Kim, Solihin - Contention Model
Base Thread: r = 0.9 (Large WS)
Base thread: Vulnerable to interfering thread’s miss and reuse frequency
1 1.5 2 2.5 3 3.5 4
Multiplying Factor
L2 C
ach
e M
isses
Miss Freq Reuse Freq
25Chandra, Guo, Kim, Solihin - Contention Model
Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions
26Chandra, Guo, Kim, Solihin - Contention Model
Conclusions New Inter-Thread cache contention models Simple to use:
Input: circular sequence profiling per thread Output: Number of misses per thread in co-schedules
Accurate 3.9% average error
Useful Temporal reuse patterns cache sharing impact
Future work: Predict and avoid problematic co-schedules Release the tool at http://www.cesr.ncsu.edu/solihin