the goal: illusion of large, fast, cheap memory

16
The Goal: illusion of large, fast, cheap memory •Fact: Large memories are slow, fast memories are small •How do we create a memory that is large, cheap and fast (most of the time)? –Hierarchy –Parallelism

Upload: wesley-wolfe

Post on 01-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

The Goal: illusion of large, fast, cheap memory. Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism. An Expanded View of the Memory System. Processor. Control. Memory. Memory. Memory. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Goal: illusion of large, fast, cheap memory

The Goal: illusion of large, fast, cheap memory

• Fact: Large memories are slow, fast memories are small

• How do we create a memory that is large, cheap and fast (most of the time)?

–Hierarchy–Parallelism

Page 2: The Goal: illusion of large, fast, cheap memory

An Expanded View of the Memory System

Control

Datapath

Memory

Processor

Mem

ory

Memory

Memory

Mem

ory

Fastest Slowest

Smallest Biggest

Highest Lowest

Speed:

Size:

Cost:

Page 3: The Goal: illusion of large, fast, cheap memory

Memory Hierarchy: How Does it Work?

• Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to

the processor

• Spatial Locality (Locality in Space):=> Move blocks consisting of contiguous words to

the upper levels

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

Page 4: The Goal: illusion of large, fast, cheap memory

Memory Hierarchy of a Modern Computer System• By taking advantage of the principle of locality:

– Present the user with as much memory as is available in the cheapest technology.

– Provide access at the speed offered by the fastest technology.

Control

Datapath

SecondaryStorage(Disk)

Processor

Registers

MainMemory(DRAM)

SecondLevelCache

(SRAM)

On

-Ch

ipC

ache

1s 10,000,000s (10s ms)

Speed (ns): 10s 100s

100sGs

Size (bytes):Ks Ms

TertiaryStorage(Disk)

10,000,000,000s (10s sec)

Ts

Page 5: The Goal: illusion of large, fast, cheap memory

• Users want large and fast memories!

SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte.Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte.

• Try and give it to them anyway

– build a memory hierarchy

Exploiting Memory Hierarchy

1997

CPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

Page 6: The Goal: illusion of large, fast, cheap memory

How is the hierarchy managed?

• Registers <-> Memory–by compiler (programmer?)

• cache <-> memory–by the hardware

• memory <-> disks–by the hardware and operating system (virtual

memory)–by the programmer (files)

Page 7: The Goal: illusion of large, fast, cheap memory

• Read hits– this is what we want!

• Read misses– stall the CPU, fetch block from memory, deliver to cache, restart

• Write hits:– can replace data in cache and memory (write-through)– write the data only into the cache (write-back the cache later)

• Write misses:– read the entire block into the cache, then write the word

Hits vs. Misses

Page 8: The Goal: illusion of large, fast, cheap memory

• Two issues:– How do we know if a data item is in the cache?

– If it is, how do we find it?

• Our first example:– block size is one word of data

– "direct mapped"

For each item of data at the lower level, there is exactly one location in the cache where it might be.

e.g., lots of items at the lower level share locations in the upper level

Cache

Page 9: The Goal: illusion of large, fast, cheap memory

• Mapping: address is modulo the number of blocks in the cache

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

Page 10: The Goal: illusion of large, fast, cheap memory

• For MIPS

What kind of locality are we taking advantage of?

Direct Mapped CacheAddress (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

Page 11: The Goal: illusion of large, fast, cheap memory

• Taking advantage of spatial locality:

Direct Mapped Cache

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

Page 12: The Goal: illusion of large, fast, cheap memory

Choosing a block size

• Large block sizes help with spatial locality, but...– It takes time to read the memory in

• Larger block sizes increase the time for misses

– It reduces the number of blocks in the cache• Number of blocks = cache size/block size

• Need to find a middle ground– 16-64 bytes works nicely

• Use split caches because there is more spatial locality in code

Page 13: The Goal: illusion of large, fast, cheap memory

Performance

• Simplified model:

execution time = (execution cycles + stall cycles) cycle time

stall cycles = # of instructions miss ratio miss penalty

• Two ways of improving performance:– decreasing the miss ratio– decreasing the miss penalty

What happens if we increase block size?

Page 14: The Goal: illusion of large, fast, cheap memory

Decreasing miss ratio with associativity

Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

Eight-way set associative (fully associative)

Tag Data Tag Data Tag Data Tag Data

Four-way set associative

Set

0

1

Tag Data

One-way set associative(direct mapped)

Block

0

7

1

2

3

4

5

6

Tag Data

Two-way set associative

Set

0

1

2

3

Tag Data

Page 15: The Goal: illusion of large, fast, cheap memory

Fully Associative vs. Direct Mapped• Fully associative caches provide much greater

flexibility– Nothing gets “thrown out” of the cache until it is

completely full

• Direct-mapped caches are more rigid– Any cached data goes directly where the index

says to, even if the rest of the cache is empty

• A problem, though...– Fully associative caches require a complete search

through all the tags to see if there’s a hit– Direct-mapped caches only need to look one place

Page 16: The Goal: illusion of large, fast, cheap memory

A Compromise

7.3

2-Way set associative2-Way set associative

Address = Tag | Index | Block offset

4-Way set associative4-Way set associative

Address = Tag | Index | Block offset

0:

1:

2:

3:

4:

5:

6:

7:

V Tag Data

Each address has two possiblelocations with the same index

Each address has two possiblelocations with the same index

One fewer index bit: 1/2 the indexes

One fewer index bit: 1/2 the indexes

0:

1:

2:

3:

V Tag Data

Each address has four possiblelocations with the same index

Each address has four possiblelocations with the same index

Two fewer index bits: 1/4 the indexes

Two fewer index bits: 1/4 the indexes