Download - Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory

Memory Hierarchy

How to improve memory access

Outline• Locality• Structure of memory hierarchy• Cache• Virtual memory

Locality• Principle of locality

– Programs access a relatively small portion of their address space at any instant of time.

• Temporal locality– If an item is referenced, it tends to be referenced

again soon.• Spatial locality

– If an item is referenced, items near by tends to be referenced soon.

Memory Hierarchy• Multiple levels of

memory with different speeds and sizes.

• Give users the perception that the memory is as large as the largest and as fast as the fastest.

• The unit of memory considered in memory hierarchy is a block.

CPUregistersMemorySRAMMemoryDRAMMemory

Magnetic disk

Structure of memory hierarchy

MemorySRAM

MemoryDRAM

MemoryMagnetic disk

size

CPUregisters

spee

dC

ost p

er b

it

Structure of memory hierarchy

Memory type Access time Cost per bit

Registers ~ 0.2 ns

SRAM: Static RAM 0.5 – 5 ns $4,000-$10,000

DRAM: Dynamic RAM 50 – 70 ns $100 - $200

Magnetic Disk 5 - 20 ms $0.5 - $2

Cache• A level of memory hierarchy between CPU

and main memory.

registers

cache

Memory

Disk

Every thing you need is in a register.

Everything you need is in cache.

Everything you need is in memory.

How to improve memory access time

registers

A B C D

a b c d e f g h

CPU

Cache

Memory

Disk

A CB D

A BC D

a b edc

fhg

Address Space

Suppose• 1 block = 256 byte = 28 byte• cache has 8 blocks• memory has 32 blocks• disk has 64 blocks.Then,• cache has 828 = 211 bytes• memory has 3228 = 213

bytes• disk has 6428 = 214 bytes.

• For the cache, a block number has 3 bits, and an address has 11 bits.

• For the memory, a block number has 5 bits, and an address has 13 bits.

• For the disk, a block number has 6 bits, and an address has 14 bits.

cache memory disk11 13 14

address address addressdata datadata

8 88

Address Space

000000 000001 000010 000011

000100 000101 000110 000111

001000 001001 001010 001011

001100 001101 001110 001111

111000 111001 111010 111011

111100 111101 111110 111111

00000 00001 00010 00011

00100 00101 00110 00111

01000 01001 01010 01011

… …

11000 11001 11010 11011

11100 11101 11110 11111

000 001 010 011100 101 110 111

Cache: 8 blocks

Mem

ory:

32

bloc

ks

Disk: 64 blocks

Address: Block number || offset in block

Address in cache : xxx || xxxxxxxxAddress in disk : xxxxxx || xxxxxxxxAddress in memory: xxxxx || xxxxxxxx

Hit / MissHit

• The requested data is found in the upper level of the hierarchy.

Hit rate or hit ratio• The fraction of memory

access found in the upper level

Hit time• The time to access data

when it hits (= time to check if the data is in the upper level + access time)

Miss• The requested data is not

found the upper level, but is in the lower level, of the hierarchy.Miss rate or miss ratio

• 1 – hit rate

Miss penalty• The time to get a block of

data into the upper level, and then into the CPU.

Cache• A level of memory hierarchy between CPU

and main memory.• To access data in memory hierarchy

– CPU requests data from cache.– Check if data is in the cache.

• Cache hit– Transfer the requested data from cache to CPU

• Cache miss– Transfer a block containing the requested data from memory

to cache– Transfer the requested data from cache to CPU

How cache works

A B C D E F

cache

memory

A

CPU Request A

B C D E F

miss

Request BRequest CRequest D

hit

Request ERequest FCache is full;Replace a block.

Where to place a block in cacheDirect-mapped cache• Each memory location is mapped to exactly one

location in the cache.(But one cache location can be mapped to different memory location at different time.)

Other mapping can be used.

c0 c1 c2 c3

b0 b1 b2 b3b4 b5 b6 b7b8 b9 b10 ……

Cache-memory mapping

Direct-mapped cache

00 01 10 11

000000 - 000011

000100 - 000111

001000 - 001011

001100 - 001111

010000 - 010011

010100 - 010111

011000 - 011011

011100 - 011111

100000 - 100011

100100 - 100111

101000 - 101011

101100 - 101111

110000 - 110011

110100 - 110111

111000 - 111011

111100 - 111111

Cache

Memory

1 bl

ock

= 4

byte

Fully-associative cache

00 01 10 11

000000 - 000011

000100 - 000111

001000 - 001011

001100 - 001111

010000 - 010011

010100 - 010111

011000 - 011011

011100 - 011111

100000 - 100011

100100 - 100111

101000 - 101011

101100 - 101111

110000 - 110011

110100 - 110111

111000 - 111011

111100 - 111111

Cache

Memory

1 bl

ock

= 4

byte

Set-associative cache

000 001 010 011 100 101 110 111

000000 - 000011

000100 - 000111

001000 - 001011

001100 - 001111

010000 - 010011

010100 - 010111

011000 - 011011

011100 - 011111

100000 - 100011

100100 - 100111

101000 - 101011

101100 - 101111

110000 - 110011

110100 - 110111

111000 - 111011

111100 - 111111

Cache

Memory

1 bl

ock

= 4

byte

Determine if a block is in the cache• For each block in the cache

– Valid bit• indicate that the block contains valid data

– Tag• Contain the information of the associated block in the

memory

• Example:– If the valid bit is false, no block from memory is

stored in that block of cache. – If the valid bit is true, the address of data stored in

the block is stored in tag.

Example: direct-mapped

00 01 10 11

000000 - 000011

000100 - 000111

001000 - 001011

001100 - 001111

010000 - 010011

010100 - 010111

011000 - 011011

011100 - 011111

100000 - 100011

100100 - 100111

101000 - 101011

101100 - 101111

110000 - 110011

110100 - 110111

111000 - 111011

111100 - 111111

Valid bitTag

Cache

Memory

1 1 0 101 11 11 00

Example: Fully-associative

00 01 10 11

000000 - 000011

000100 - 000111

001000 - 001011

001100 - 001111

010000 - 010011

010100 - 010111

011000 - 011011

011100 - 011111

100000 - 100011

100100 - 100111

101000 - 101011

101100 - 101111

110000 - 110011

110100 - 110111

111000 - 111011

111100 - 111111

Valid bitTag

Cache

Memory

1 1 1 11101 0101 1110 0110

Example: set-associative cache

000 001 010 011 100 101 110 111

000000 - 000011

000100 - 000111

001000 - 001011

001100 - 001111

010000 - 010011

010100 - 010111

011000 - 011011

011100 - 011111

100000 - 100011

100100 - 100111

101000 - 101011

101100 - 101111

110000 - 110011

110100 - 110111

111000 - 111011

111100 - 111111

Valid bitTag

Cache

Memory

1 1 1 0 0 0 1 011 01 11 00 01 11 00 00

Access a direct-mapped cache

Cache index Valid bit tag000 1 00111001 1 10011010 0 11000… … …

111 1 01101

1 0 0 1 1 0 0 1 0 0 1 1 0Memoryaddress

mem

ory

=

ANDhit

Cache address

Access a fully-associative cache

Cache index Valid bit tag000 1 00111000001 1 10011001010 0 11000111… … …

111 1 01101100

1 0 0 1 1 0 0 1 0 0 1 1 0Memory address

AND

0 0 1 0 0 1 1 0 Cache address0 0 1 0 0 1 1 0

Access a set-associative cache

Cache index Valid bit tag Valid bit tag000 0 10001 1 11000001 1 11001 1 00001010 1 11010 0 11000… … … … …

111 1 01111 1 01111

1 1 0 0 1 0 0 1 1 0Memory address

AND

0 0 1 0 0 1 1 0

Cache address0 0 1 0 0 0 1 1 0

AND

0 0 1 0 0 1 1 010

Cache address

Access a set-associative cache

Cache index Valid bit tag Valid bit tag000 1 00000 1 01000001 1 11001 1 00001010 1 11010 0 10010… … … … …

111 0 01101 0 01101

Mem

ory

addr

ess

Cache address

= =

1 1

0 0

10

0 1

1 0

AND AND

hit1

hit0

Block size vs. Miss rate

Handling Cache Misses• If an instruction is not in a cache, we have to

wait for the memory to respond and write data into the cache. (multiple cycles)

• Cause processor stall.• Steps to handle

– Send PC-4 to memory– Read from memory to cache and wait for the

result– Update cache information (tag + valid bit)– Restart instruction execution.

Handling WritesWrite-through• When data is written, both

the cache and the memory are updated.

• Consistent copies of memory.

• Slow because writing to memory is slower.

• Improve by:– using a write buffer, storing

data waiting to be written to memory

– Then, processor can continue execution.

Write-back• When data is written, only

the cache is updated.• Memory is inconsistent with

cache.• Faster.• But, once a block is

removed from cache, it must be written back to memory.

Performance Improvement• Increase hit rate/reduce miss rate

– Increase cache size– Block size– Good cache associativity– Good replacement policy

• Reduce cache access time– Multilevel cache

CPU

Multilevel Cache

Memory

L1 cache

L2 cache

Processor L1 cache L2 cachePentium 16 KBPentium Pro 16KB 256/512 KBPentium MMX 32 KBPentium II and III 32 KBCeleron 32 KB 128 KBPentium III Cumine 32 KB 256 KBAMD K6 and K6-2 64 KBAMD K6-3 64 KB 256 KBAMD K7 Athlon 128 KBAMD Duron 128 KB 64 KBAMD Athlon Thunderbird 128 KB 256 KB

Virtual Memory• Similar to cache

– Based on principle of locality– Memory is divided into equal blocks called page.– If a requested page is not found in the memory,

page fault occurs.• Allow efficient and safe sharing of memory

among multiple programs– Each program has its own address space

• Virtually extends the memory size– A program can be larger than the memory.

Program A

Virtual Memory

Program B

Program C

Main memory

Physical addressV

irtua

l mem

ory

Virtual address

Address translationdisk

swap space

Program A

Virtual Memory

Main memory

Virtual address space can be larger than physical address space.

Address Calculation

Virtual memory

physical memory

Virtual page number Page offset

Physical page number Page offset

Virtual address

physical address

Address translationpage table

Page TableVirtual page number

Virtual page number Valid bit Physical page number

0000…000 1

0000…001 1

…

0011…110 0

1111…111 1

Page offset

Physical page number Page offset

Virtual address

physical address

Page table register

Page fault• When the valid bit of the requested page = 0,

a page fault occurs.• Handling page fault

– Get the requested page from disk (use information in the page table)

– Find an available page in the memory• If there is one, put the requested page in and update

the entry in the page table.• If there is none, find a page to be replaced (according

to page replacement policy), replace it, and update both entries in the page table.

Page Replacement• Page replacement policy

– Least recently used (LRU): replace the page that has not been used for the longest time.

• Updating data in the virtual memory– If the replaced page was changed (written on the

page), the page must be updated in the virtual memory.

– Writing-back is more efficient than write-through.– If the replaced page was not changed (written on

the page), no virtual memory update is necessary.

Other information in page tables• Use/reference bit

– Used for LRU policy• Dirty bit

– Used for updating the virtual memory

Translation-lookaside buffer (TLB)• Cache that stores recently-used part of page

table for efficiency• When the operating system switches from

process A to process B (called context switch), A’s page table must be replaced by B’s page table in TLB.

disk

Aswap space

CB

memory

part of A part of Cpart of B

A’s page table B’s page table C’s page table

TLB

Currently used page table

cache

Currently used data & prog

CPU

Three C’s

Effects of the three C’sCompulsory misses are too small to be seen in this graph.

One-way set associativity

two-way set

associativity

Four and eight-way set associativity

Design factors

Download - Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory

Top Related