asr: adaptive selective replication for cmp caches
DESCRIPTION
ASR: Adaptive Selective Replication for CMP Caches. Brad Beckmann † , Mike Marty, and David Wood Multifacet Project University of Wisconsin-Madison 12/13/06. † currently at Microsoft. Maximize Cache Capacity. 40+ Cycles. A. Slow Access Latency. Introduction: Shared Cache. L1 I $. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/1.jpg)
ASR: Adaptive Selective ASR: Adaptive Selective Replication for CMP CachesReplication for CMP Caches
Brad Beckmann†, Mike Marty, and David Wood
Multifacet ProjectUniversity of Wisconsin-Madison
12/13/06
† currently at Microsoft
![Page 2: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/2.jpg)
2
Introduction: Introduction: Shared CacheShared Cache
CPU 3L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
CPU 2
CPU 1
CPU 0
CPU 4
CPU 5
CPU 6
CPU 7
L2Bank
L2Bank
L2Bank
L2Bank
L2Bank
L2Bank
L2Bank
L2Bank
A
MaximizeCache
Capacity40+ Cycles
SlowAccessLatency
![Page 3: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/3.jpg)
3
Introduction: Introduction: Private CachesPrivate Caches
CPU 3L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
CPU 2
CPU 1
CPU 0
CPU 4
CPU 5
CPU 6
CPU 7
Private
L2
Private
L2
Private
L2
Private
L2
Private
L2
Private
L2
Private
L2
PrivateL2
FastAccessLatencyA
LowerEffectiveCapacity
A
A Desire bothFast Access &High Capacity
![Page 4: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/4.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 4
IntroductionIntroduction• Previous hybrid proposals
– Victim Replication, CMP-NuRapid, Cooperative Caching– Achieve fast access and high capacity
• Under certain workloads & system configurations• Utilize static rules
– Non-adaptive
• Adaptive Selective Replication: ASR– Dynamically monitor workload behavior– Adapt the L2 cache to workload demand– Up to 12% improvement vs. previous proposals
![Page 5: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/5.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 5
OutlineOutline• Introduction
• Understanding L2 Replication• Benefit• Cost• Key Observation• Solution
• ASR: Adaptive Selective Replication
• Evaluation
![Page 6: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/6.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 6
Understanding L2 ReplicationUnderstanding L2 Replication
• Three L2 block sharing types1. Single requestor
– All requests by a single processor
2. Shared read only– Read only requests by multiple processors
3. Shared read-write– Read and write requests by multiple processors
• Profile L2 blocks during their on-chip lifetime– 8 processor CMP– 16 MB shared L2 cache– 64-byte block size
![Page 7: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/7.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 7
Understanding L2 ReplicationUnderstanding L2 Replication
Shared Read-only
Shared Read-write
Single Requestor
ApacheJbbOltpZeus
High Locality
Mid Locality
Low Locality
![Page 8: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/8.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 8
Understanding L2 Replication: Understanding L2 Replication: BenefitBenefit
L2 H
it C
ycle
s
Replication Capacity
![Page 9: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/9.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 9
Understanding L2 Replication: Understanding L2 Replication: CostCost
L2 M
iss
Cyc
les
Replication Capacity
![Page 10: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/10.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 10
Understanding L2 Replication: Understanding L2 Replication: Key ObservationKey Observation
L2 H
it C
ycle
s
Replication Capacity
Top 3% of Shared Read-only blocks satisfy70% of Shared Read-only requests
Replicate FrequentlyRequested Blocks First
![Page 11: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/11.jpg)
TotalCycleCurve
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 11
Understanding L2 Replication: Understanding L2 Replication: SolutionSolution
Tot
al C
ycle
s
Replication Capacity
Optimal
Property of WorkloadCache Interaction
Not Fixed Must Adapt
![Page 12: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/12.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 12
OutlineOutline• Wires and CMP caches
• Understanding L2 Replication
• ASR: Adaptive Selective Replication– SPR: Selective Probabilistic Replication– Monitoring and adapting to workload behavior
• Evaluation
![Page 13: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/13.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 13
SPR: SPR: Selective Probabilistic Selective Probabilistic ReplicationReplication
• Mechanism for Selective Replication– Relax L2 inclusion property
• L2 evictions do not force L1 evictions• Non-exclusive cache hierarchy
– Ring Writebacks• L1 Writebacks passed clockwise between private L2 caches• Merge with other existing L2 copies
• Probabilistically choose between– Local writeback allow replication– Ring writeback disallow replication
• Replicates frequently requested blocks
![Page 14: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/14.jpg)
14
PrivateL2
PrivateL2
SPR: SPR: Selective Probabilistic Selective Probabilistic ReplicationReplication
CPU 3L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
CPU 2
CPU 1
CPU 0
CPU 4
CPU 5
CPU 6
CPU 7
PrivateL2
PrivateL2
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
L1I $
L1D $
PrivateL2
PrivateL2
PrivateL2
PrivateL2
![Page 15: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/15.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 15
SPR: SPR: Selective Probabilistic Selective Probabilistic ReplicationReplication
Rep
licat
ion
Cap
acity
Replication Levels0 1 2 3 4 5
Replication Level 0 1 2 3 4 5
Prob. of Replication 0 1/64 1/16 1/4 1/2 1
CurrentLevel
![Page 16: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/16.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 16
Monitoring and Adapting to Monitoring and Adapting to Workload BehaviorWorkload Behavior
1. Decrease in Replication Benefit– Bit marks replicas of the current, but not lower level
2. Increase in Replication Benefit– Store 8-bit partial tags of next higher level replications
L2 H
it C
ycle
s
Replication Capacitycurrent levellower level higher level
ReplicationBenefit Curve
![Page 17: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/17.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 17
Monitoring and Adapting to Monitoring and Adapting to Workload BehaviorWorkload Behavior
3. Decrease in Replication Cost– Stores 16-bit partial tags of recently evicted blocks
4. Increase in Replication Cost– Way and Set counters track soon-to-be-evicted blocks
L2 M
iss
Cyc
les
Replication Capacitycurrent level
ReplicationCost Curve
higher levellower level
![Page 18: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/18.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 18
OutlineOutline• Wires and CMP caches
• Understanding L2 Replication
• ASR: Adaptive Selective Replication
• Evaluation
![Page 19: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/19.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 19
MethodologyMethodology
• Full system simulation– Simics– Wisconsin’s GEMS Timing Simulator
• Out-of-order processor• Memory system
• Workloads– Commercial
• apache, jbb, otlp, zeus
– Scientific (see paper)• SpecOMP: apsi & art• Splash: barnes & ocean
![Page 20: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/20.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 20
System ParametersSystem Parameters
Memory System Dynamically Scheduled Processor
L1 I & D caches 64 KB, 4-way, 3 cycles Clock frequency 5.0 GHz
Unified L2 cache 16 MB, 16-way Reorder buffer / scheduler
128 / 64 entries
L1 / L2 prefetching Unit & Non-unit strided prefetcher (similar Power4)
Pipeline width 4-wide fetch & issue
Memory latency 500 cycles Pipeline stages 30
Memory bandwidth 50 GB/s Direct branch predictor 3.5 KB YAGS
Memory size 4 GB of DRAM Return address stack 64 entries
Outstanding memory request / CPU
16 Indirect branch predictor 256 entries (cascaded)
[ 8 core CMP, 45 nm technology ]
![Page 21: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/21.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 21
Replication Benefit, Cost, & Replication Benefit, Cost, & Effectiveness CurvesEffectiveness Curves
Benefit Cost
![Page 22: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/22.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 22
Replication Benefit, Cost, & Replication Benefit, Cost, & Effectiveness CurvesEffectiveness Curves
Effectiveness
![Page 23: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/23.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 23
Comparison of Replication Comparison of Replication PoliciesPolicies
• SPR multiple possible policies• Evaluated 4 shared read-only replication policies
1. VR: Victim Replication– Previously proposed [Zhang ISCA 05]– Disallow replicas to evict shared owner blocks
2. NR: CMP-NuRapid– Previously proposed [Chishti ISCA 05]– Replicate upon the second request
3. CC: Cooperative Caching– Previously proposed [Chang ISCA 06]– Replace replicas first– Spill singlets to remote caches– Tunable parameter 100%, 70%, 30%, 0%
4. ASR: Adaptive Selective Replication– Our proposal– Monitor and adjust to workload demand
LackDynamic
Adaptation
![Page 24: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/24.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 24
ASR: ASR: PerformancePerformance
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
![Page 25: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/25.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 25
ConclusionsConclusions
• CMP Cache Replication– No replications conservers capacity– All replications reduces on-chip latency– Previous hybrid proposals
• Work well for certain criteria• Non-adaptive
• Adaptive Selective Replication– Probabilistic policy favors frequently requested blocks– Dynamically monitor replication benefit & cost– Replicate benefit > cost– Improves performance up to 12% vs. previous schemes
![Page 26: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/26.jpg)
Backup SlidesBackup Slides
![Page 27: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/27.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 27
ASR: ASR: Memory CyclesMemory Cycles
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
![Page 28: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/28.jpg)
L2 Cache Requests BreakdownL2 Cache Requests Breakdown
![Page 29: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/29.jpg)
L2 Cache Requests Breakdown: L2 Cache Requests Breakdown: User & OSUser & OS
![Page 30: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/30.jpg)
Shared Read-write Requests Shared Read-write Requests BreakdownBreakdown
![Page 31: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/31.jpg)
Shared Read-write Block Shared Read-write Block BreakdownBreakdown
![Page 32: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/32.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 32
ASR: ASR: Decrease-in-replication Decrease-in-replication BenefitBenefit
L2 H
it C
ycle
s
Replication Capacity
current levellower level
![Page 33: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/33.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 33
ASR: ASR: Decrease-in-replication Decrease-in-replication BenefitBenefit
• Goal– Determine replication benefit decrease of the next lower level
• Mechanism– Current Replica Bit
• Per L2 cache block• Set for replications of the current level• Not set for replications of lower level
– Current replica hits would be remote hits with next lower level
• Overhead– 1-bit x 256 K L2 blocks = 32 KB
![Page 34: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/34.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 34
ASR: ASR: Increase-in-replication Increase-in-replication BenefitBenefit
L2 H
it C
ycle
s
Replication Capacity
current level higher level
![Page 35: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/35.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 35
ASR: ASR: Increase-in-replication Increase-in-replication BenefitBenefit
• Goal– Determine replication benefit increase of the next higher level
• Mechanism– Next Level Hit Buffers (NLHBs)
• 8-bit partial tag buffer• Store replicas of the next higher
– NLHB hits would be local L2 hits with next higher level
• Overhead– 8-bits x 16 K entries x 8 processors = 128 KB
![Page 36: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/36.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 36
ASR: ASR: Decrease-in-replicationDecrease-in-replicationCostCost
L2 M
iss
Cyc
les
Replication Capacitycurrent levellower level
![Page 37: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/37.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 37
ASR: ASR: Decrease-in-replication Decrease-in-replication CostCost
• Goal– Determine replication cost decrease of the next lower level
• Mechanism– Victim Tag Buffers (VTBs)
• 16-bit partial tags • Store recently evicted blocks of current replication level
– VTB hits would be on-chip hits with next lower level
• Overhead– 16-bits x 1 K entry x 8 processors = 16 KB
![Page 38: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/38.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 38
ASR: ASR: Increase-in-replicationIncrease-in-replicationCostCost
L2 M
iss
Cyc
les
Replication Capacitycurrent level higher level
![Page 39: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/39.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 39
ASR: ASR: Increase-in-replication Increase-in-replication CostCost
• Goal– Determine replication cost increase of the next higher level
• Mechanism– Way and Set counters [Suh et al. HPCA 2002]
• Identify soon-to-be-evicted blocks• 16-way pseudo LRU• 256 set groups
– On-chip hits that would be off-chip with next higher level
• Overhead– 255-bit pseudo LRU tree x 8 processors = 255 B
Overall storage overhead: 212 KB or 1.2% of total storage
![Page 40: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/40.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 40
ASR: ASR: Triggering a Cost-Triggering a Cost-Benefit AnalysisBenefit Analysis
• Goal– Dynamically adapt to workload behavior– Avoid unnecessary replication level changes
• Mechanism– Evaluation trigger
• Local replications or NLHB allocations exceed 1K
– Replication change• Four consecutive evaluations in the same direction
![Page 41: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/41.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 41
ASR: ASR: Adaptive AlgorithmAdaptive AlgorithmDecrease in
Replication Cost > Increase in Replication Benefit
Decrease in
Replication Cost < Increase in Replication Benefit
Decrease in
Replication Benefit > Increase in Replication Cost
Go in direction with greater value
Increase
ReplicationDecrease in
Replication Benefit < Increase in Replication Cost
Decrease
Replication
Do
Nothing
![Page 42: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/42.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 42
ASR: ASR: Adapting to Workload Adapting to Workload BehaviorBehavior
Oltp: All CPUs
![Page 43: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/43.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 43
ASR: ASR: Adapting to Workload Adapting to Workload BehaviorBehavior
Apache: All CPUs
![Page 44: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/44.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 44
ASR: ASR: Adapting to Workload Adapting to Workload BehaviorBehavior
Apache: CPU 0
![Page 45: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/45.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 45
ASR: ASR: Adapting to Workload Adapting to Workload BehaviorBehavior
Apache: CPUs 1-7
![Page 46: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/46.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 46
Replication CapacityReplication Capacity
![Page 47: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/47.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 47
Replication CapacityReplication Capacity4 MB150 Memory LatencyIn-order processors
![Page 48: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/48.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 48
Replication Benefit, Cost, & Replication Benefit, Cost, & Effectiveness CurvesEffectiveness Curves
Benefit Cost 4 MB150 Memory LatencyIn-order processors
![Page 49: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/49.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 49
Replication Benefit, Cost, & Replication Benefit, Cost, & Effectiveness CurvesEffectiveness Curves
Effectiveness4 MB150 Memory LatencyIn-order processors
![Page 50: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/50.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 50
Replication Benefit, Cost, & Replication Benefit, Cost, & Effectiveness CurvesEffectiveness Curves
Benefit Cost 16 MB500 Memory LatencyIn-order processors
![Page 51: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/51.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 51
Replication Benefit, Cost, & Replication Benefit, Cost, & Effectiveness CurvesEffectiveness Curves
Effectiveness16 MB500 Memory LatencyIn-order processors
![Page 52: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/52.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 52
Replication Analytic ModelReplication Analytic Model
• Utilize workload characterization data
• Goal: initutition not accuracy
• Optimal point of replication– Sensitive to cache size– Sensitive to memory latency
![Page 53: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/53.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 53
Replication Model: Replication Model: Selective Selective ReplicationReplication
![Page 54: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/54.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 54
ASR: ASR: Memory CyclesMemory Cycles
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
4 MB150 Memory LatencyIn-order processors
![Page 55: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/55.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 55
ASR: ASR: PerformancePerformance
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
4 MB150 Memory LatencyIn-order processors
![Page 56: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/56.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 56
ASR: ASR: Memory CyclesMemory Cycles
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
16 MB250 Memory LatencyOut-of-order processors
![Page 57: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/57.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 57
ASR: ASR: PerformancePerformance
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
16 MB250 Memory LatencyOut-of-order processors
![Page 58: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/58.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 58
ASR: ASR: Memory CyclesMemory Cycles
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
16 MB500 Memory LatencyOut-of-order processors
![Page 59: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/59.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 59
ASR: ASR: PerformancePerformance
S: CMP-SharedP: CMP-PrivateV: SPR-VRN: SPR-NRC: SPR-CCA: SPR-ASR
16 MB500 Memory LatencyOut-of-order processors
![Page 60: ASR: Adaptive Selective Replication for CMP Caches](https://reader036.vdocuments.us/reader036/viewer/2022062423/5681452d550346895db1f1b7/html5/thumbnails/60.jpg)
Beckmann, Marty, & Wood ASR: Adaptive Selective Replication for CMP Caches 60
Token CoherenceToken Coherence
• Proposed for SMPs [Martin 03], CMPs [Marty 05]• Provides a simple correctness substrate
– One token to read– All tokens to write
• Advantages– Permits a broadcast protocol on unordered network without
acknowledgement messages– Supports multiple allocation policies
• Disadvantages– All blocks must be written back (cannot destroy tokens)– Token counts at memory– Persistent request can be a performance bottleneck