cs 152 computer architecture and engineering recap: snoopy cache protocols cache...

13
CS 152 Computer Architecture and Engineering Lecture 21: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.cs.berkeley.edu/~cs152 4/29/2008 2 CS152-Spring!08 Recap: Snoopy Cache Protocols Use snoopy mechanism to keep all processors’ view of memory coherent M 1 M 2 M 3 Snoopy Cache DMA Physical Memory Memory Bus Snoopy Cache Snoopy Cache DISKS

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

CS 152 Computer Architecture

and Engineering

Lecture 21: Directory-Based

Cache Protocols

Krste AsanovicElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~krste

http://inst.cs.berkeley.edu/~cs152

4/29/2008 2CS152-Spring!08

Recap: Snoopy Cache Protocols

Use snoopy mechanism to keep all processors’view of memory coherent

M1

M2

M3

Snoopy Cache

DMA

Physical Memory

Memory Bus

Snoopy Cache

Snoopy Cache

DISKS

Page 2: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 3CS152-Spring!08

Recap: MESI: An Enhanced MSI protocol increased performance for private data

M E

S I

M: Modified ExclusiveE: Exclusive, unmodifiedS: Shared I: Invalid

Each cache line has a tag

Address tag

state bits

Write miss

Other processorintent to write

Read miss,shared

Other processorintent to write

P1 write

Read by any processor

Other processor reads

P1 writes back

P1 readP1 writeor read

Cache state inprocessor P1

P 1 in

tent

to w

rite

Read miss,not shared

4/29/2008 4CS152-Spring!08

Performance of Symmetric Shared-MemoryMultiprocessors

Cache performance is combination of:

1. Uniprocessor cache miss traffic

2. Traffic caused by communication– Results in invalidations and subsequent cache misses

• Adds 4th C: coherence miss

– Joins Compulsory, Capacity, Conflict

– (Sometimes called a Communication miss)

Page 3: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 5CS152-Spring!08

Coherency Misses

1. True sharing misses arise from the communicationof data through the cache coherence mechanism• Invalidates due to 1st write to shared block

• Reads by another CPU of modified block in different cache

• Miss would still occur if block size were 1 word

2. False sharing misses when a block is invalidatedbecause some word in the block, other than the onebeing read, is written into• Invalidation does not cause a new value to be communicated, but

only causes an extra cache miss

• Block is shared, but no word in block is actually shared ! miss would not occur if block size were 1 word

4/29/2008 6CS152-Spring!08

Example: True v. False Sharing v.Hit?

Read x25

Write x24

Write x13

Read x22

Write x11

True, False, Hit? Why?P2P1Time

• Assume x1 and x2 in same cache block.

P1 and P2 both read x1 and x2 before.

True miss; invalidate x1 in P2

False miss; x1 irrelevant to P2

False miss; x1 irrelevant to P2

False miss; x1 irrelevant to P2

True miss; invalidate x2 in P1

Page 4: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 7CS152-Spring!08

MP Performance 4 ProcessorCommercial Workload: OLTP, DecisionSupport (Database), Search Engine

0

0.25

0.5

0.75

1

1.25

1.5

1.75

2

2.25

2.5

2.75

3

3.25

1 MB 2 MB 4 MB 8 MB

Cache size

Me

mo

ry c

yc

les

pe

r in

str

uc

tio

n

Instruction

Capacity/Conflict

Cold

False Sharing

True Sharing

• True sharing and

false sharing

unchanged going

from 1 MB to 8

MB (L3 cache)

• Uniprocessor

cache misses

improve with

cache size

increase(Instruction,

Capacity/Conflict,

Compulsory)

4/29/2008 8CS152-Spring!08

MP Performance 2MB CacheCommercial Workload: OLTP, DecisionSupport (Database), Search Engine

• True

sharing,

false sharing

increase

going from 1

to 8 CPUs

0

0.5

1

1.5

2

2.5

3

1 2 4 6 8

Processor count

Me

mo

ry c

yc

les

pe

r in

str

uc

tio

n

Instruction

Conflict/Capacity

Cold

False Sharing

True Sharing

Page 5: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 9CS152-Spring!08

A Cache Coherent System Must:

• Provide set of states, state transition diagram, andactions

• Manage coherence protocol– (0) Determine when to invoke coherence protocol

– (a) Find info about state of block in other caches to determineaction

» whether need to communicate with other cached copies

– (b) Locate the other copies

– (c) Communicate with those copies (invalidate/update)

• (0) is done the same way on all systems– state of the line is maintained in the cache

– protocol is invoked if an “access fault” occurs on the line

• Different approaches distinguished by (a) to (c)

4/29/2008 10CS152-Spring!08

Bus-based Coherence

• All of (a), (b), (c) done through broadcast on bus– faulting processor sends out a “search”

– others respond to the search probe and take necessary action

• Could do it in scalable network too– broadcast to all processors, and let them respond

• Conceptually simple, but broadcast doesn’t scale withnumber of processors, P

– on bus, bus bandwidth doesn’t scale

– on scalable network, every fault leads to at least P networktransactions

• Scalable coherence:– can have same cache states and state transition diagram

– different mechanisms to manage protocol

Page 6: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 11CS152-Spring!08

Scalable Approach: Directories

• Every memory block has associated directoryinformation

– keeps track of copies of cached blocks and their states

– on a miss, find directory entry, look it up, and communicate onlywith the nodes that have copies if necessary

– in scalable networks, communication with directory and copies isthrough network transactions

• Many alternatives for organizing directory information

4/29/2008 12CS152-Spring!08

Basic Operation of Directory

• k processors.

• With each cache-block in memory:k presence-bits, 1 dirty-bit

• With each cache-block in cache:1 valid bit, and 1 dirty (owner) bit• ••

P P

Cache Cache

Memory Directory

presence bits dirty bit

Interconnection Network

• Read from main memory by processor i:

• If dirty-bit OFF then { read from main memory; turn p[i] ON; }

• if dirty-bit ON then { recall line from dirty proc (cache state toshared); update memory; turn dirty-bit OFF; turn p[i] ON; supplyrecalled data to i;}

• Write to main memory by processor i:

• If dirty-bit OFF then {send invalidations to all caches that have theblock; turn dirty-bit ON; supply data to i; turn p[i] ON; ... }

Page 7: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 13CS152-Spring!08

CS152 Administrivia

• No lecture, Thursday May 1– (Faculty retreat)

• Last lecture, Tuesday May 6

• Final quiz, Thursday May 8

• Informal course feedback– Want to hear your opinion of new format

– What worked, and what didn’t work, especially in labs

4/29/2008 14CS152-Spring!08

Directory Cache Protocol(Handout 6)

• Assumptions: Reliable network, FIFO messagedelivery between any given source-destination pair

CPU

Cache

Interconnection Network

Directory

Controller

DRAM Bank

Directory

Controller

DRAM Bank

CPU

Cache

CPU

Cache

CPU

Cache

CPU

Cache

CPU

Cache

Directory

Controller

DRAM Bank

Directory

Controller

DRAM Bank

Page 8: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 15CS152-Spring!08

Cache States

For each cache line, there are 4 possible states:

– C-invalid (= Nothing): The accessed data is not resident in the

cache.

– C-shared (= Sh): The accessed data is resident in the cache,

and possibly also cached at other sites. The data in memory

is valid.

– C-modified (= Ex): The accessed data is exclusively resident

in this cache, and has been modified. Memory does not have

the most up-to-date data.

– C-transient (= Pending): The accessed data is in a transient

state (for example, the site has just issued a protocol request,

but has not received the corresponding protocol reply).

4/29/2008 16CS152-Spring!08

Home directory states

• For each memory block, there are 4 possiblestates:

– R(dir): The memory block is shared by the sites specified indir (dir is a set of sites). The data in memory is valid in thisstate. If dir is empty (i.e., dir = !), the memory block is notcached by any site.

– W(id): The memory block is exclusively cached at site id,and has been modified at that site. Memory does not havethe most up-to-date data.

– TR(dir): The memory block is in a transient state waiting forthe acknowledgements to the invalidation requests that thehome site has issued.

– TW(id): The memory block is in a transient state waiting fora block exclusively cached at site id (i.e., in C-modifiedstate) to make the memory block at the home site up-to-date.

Page 9: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 17CS152-Spring!08

ShRep(v), ExRep(v)Memory to CacheResponses

WbRep(v), InvRep, FlushRep(v)Cache to MemoryResponses

WbReq, InvReq, FlushReqMemory to CacheRequests

ShReq, ExReqCache to MemoryRequests

MessagesCategory

Protocol Messages

There are 10 different protocol messages:

4/29/2008 18CS152-Spring!08

Cache State Transitions(from invalid state)

Page 10: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 19CS152-Spring!08

Cache State Transitions(from shared state)

4/29/2008 20CS152-Spring!08

Cache State Transitions(from exclusive state)

Page 11: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 21CS152-Spring!08

Cache Transitions(from pending)

4/29/2008 22CS152-Spring!08

Home Directory State Transitions

Messages sent from site id

Page 12: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 23CS152-Spring!08

Home Directory State Transitions

Messages sent from site id

4/29/2008 24CS152-Spring!08

Home Directory State Transitions

Messages sent from site id

Page 13: CS 152 Computer Architecture and Engineering Recap: Snoopy Cache Protocols Cache …cs152/sp08/lectures/L21... · 2008-04-29 · 4/29/2008 3 CS152-Spring!08 Recap: MESI: An Enhanced

4/29/2008 25CS152-Spring!08

Home Directory State Transitions

Messages sent from site id

4/29/2008 26CS152-Spring!08

Acknowledgements

• These slides contain material developed andcopyright by:

– Arvind (MIT)

– Krste Asanovic (MIT/UCB)

– Joel Emer (Intel/MIT)

– James Hoe (CMU)

– John Kubiatowicz (UCB)

– David Patterson (UCB)

• MIT material derived from course 6.823

• UCB material derived from course CS252