improving direct-mapped cache performance by the addition of a small fully-associative cache and...

26
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Upload: basil-payne

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Improving Direct-Mapped Cache Performance by the Addition of a Small

Fully-Associative CacheAnd Pefetch Buffers

Norman P. Jouppi

Presenter:Shrinivas Narayani

Page 2: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Contents Cache Basics Types of Cache misses Cost of Cache misses How to remove the cache misses Larger Block size Adding Associativity (Reducing Conflict Misses)• Miss Cache • Victim Cache .. An Improvement over miss cache Removing Capacity Misses and Compulsory Misses• Prefetch Technique• Stream Buffers Conclusion

Page 3: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

• Mapping

(Block Address) modulo (Number of cache blocks in the cache)

• Cache is accessed using lower order bits.

e.g Memory address between (0001) and 11101 map to locations 001 and 101 in cache.

• Data is addressed using tag (higher order bits of address)

Page 4: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Direct Mapped Cache 000 001 010 011 100 101 110 111

00001 00101 01001 01101 10001

Page 5: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Cache Terminology Cache Hit

Cache Miss

Miss Penalty : The miss penalty is the time to replace the block in the upper level with corresponding block from the lower level.

Page 6: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

In a direct-Mapped cache, there is only one place the newly requested item and hence only one choice of what to replace.

Page 7: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Types of MissesCompulsory—The first access to a block is not in the cache, so

the block must be brought into the cache. These are also called cold start misses or first reference misses.(Misses in Infinite Cache)

– Capacity—If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved.(Misses in Size )

– Conflict—If the block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. These are also called collision misses or interference misses.(Misses in N-way Associative)

– Coherence Misses: Result of invalidation to preserve multiprocessor cahce consistency.

Page 8: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Conflict Misses account for

Between 20% to 40% of of all direct-mapped cache misses

Page 9: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Cost of Cache Misses

• Cycle time has been decreasing much faster than memory access time.

• Average number of machine cycles per instruction has been decreasing dramatically. This two effects can results in miss cost.

• Eg : Cache miss on VAX11/780 only cost 60% of the average instruction execution. If every instruction had cache miss then machine performance can go down by %60.

Page 10: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

How to Reduce the Cache Miss Increase Block Size Increase Associativity Use a Victim Cache Use a Pseudo Associative Cache Hardware Prefetching Compiler-Controlled Prefetching Compiler Optimizations

Page 11: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Increasing Block size

• One way to reduce the miss rate is to increase the block size– Reduce compulsory misses - why?

• Take advantage of spacial locality

• However, larger blocks have disadvantages– May increase the miss penalty (need to get more data)

– May increase hit time (need to read more data from cache and larger mux)

– May increase conflict and capacity misses.

Page 12: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Adding Associativity

tag and comparatortag and comparatortag and comparatortag and comparator

one cache line of dataone cache line of dataone cache line of dataone cache line of data

• when a miss occur,data is returned

to DM and miss cache

• Each time the upper cache and miss cache is probed

From processor To processor

From next lower cache

tag data

MRU entryFully-associative miss cacheLRU entry

Page 13: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Performance of Miss cache

• Replaces a long off-chip miss penalty with a short one-cycle on-chip miss.

• Data conflict misses more removed

Page 14: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Disadvantage of Miss Cache

Waste of storage space in the miss cache due to duplication of data.

Page 15: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Victim Cache• An improvement

over miss cache.• Loads victim line

instead of requested line.

• In case of miss contents of DM cache and victim cache are swapped.

Page 16: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

The effect of DM cache size on victim cache performance

•DM size increase, likelyhood of conflict miss removed by victim cache reduces

Page 17: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Reducing Capacity and Compulsory Misses

Use prefetch technique

1.prefetch always

2.prefetch on miss

3.tagged prefetch

Page 18: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

• Prefetch always prfetches always after every reference.

• On miss prefetch on miss always fetches the next line.

• In tagged prefetch each block has a tag bit associated with it.

• When a block is fetched its tag bit set is set zero and one when it is used

• While block undergoes this change a new block is fetched.

Page 19: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Stream buffers• Start prefetch before tag transition

Page 20: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

• Stream buffer consist of a series of entries, each consisting of a tag, an available bit, and a data line.

• On a miss it fetches successive line at the miss target.

• Lines after the line requested are placed in buffer which avoid populating the cache with the data which is not needed.

Page 21: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Multi-Way Stream Buffers

▪ only remove 25% of data cache miss

interleaved stream of data from different sources

▪ four stream buffer in parallel

▪ instruction stream unchanged

▪ twice the performance of the single

stream buffer

Page 22: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Stream buffer Vs Prefetch

• Feasible to Implement

• Lower latency

• Extra hardware required by stream buffers is comparable with additional tag required by tagged prefetch.

Page 23: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Stream buffer performance vs.cache size• Only data stream buffer performance

improve as cache size increase

• It can contain data for reference pattern

that access several sets of data.

Page 24: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani
Page 25: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

Conclusion

• Miss cache beneficial in removing data cache miss and conflict misses.

• Victim cache is an improvement over Miss cache that saves the victim of the cache miss instead of target.

• stream buffer reduces capacity,compulsory miss

• Multiway stream buffers are set of stream buffers that can prefetch down several stream concurrently.

Page 26: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani

References• Improving Direct-Mapped Cache Performance by the

Addition of a small Fully-Associative Cache and Prefetch Buffers Norman P. Jouppi Computer Organization and design Patterson D. and Hennesy J.