lecture 9 ece/csc 506 - spring 2007 - e. f. gehringer, based on slides by yan solihin1 lecture 9...

Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin

Lecture 9 Outline

MESI protocol Dragon update-based protocol Impact of protocol optimizations

Lower-Level Protocol Choices

BusRd observed in M state: what transition to make? Change to S: assume I’ll read again soon

good for mostly read data what about “migratory” data, thus:

Change to I: assume other will write to it (Synapse) I read and write, then you read and write, then X reads and

writes... Sequent Symmetry and MIT Alewife use adaptive protocols

MESI (4-state) Invalidation Protocol

Problem with MSI protocol Rd, Wr sequence incurs 2 transactions

even when no one is sharing (e.g., serial program!) BusRd (I S) followed by BusRdX or BusUpgr (S M) In general, coherence traffic from serial programs is unacceptable

Add exclusive state: Invalid Modified (dirty) Shared (two or more caches may have copies) Exclusive (only this cache has clean copy, same value as in memory)

How to decide I E or I S? Need to check whether someone else has copy “Shared” signal on bus: wired-or line asserted in response to BusRd

MESI: Processor-Initiated Transactions

PrRd/–PrWr/–

PrRd/–

PrWr/–

PrRd/BusRd(~S)

PrRd/BusRd(S)

PrWr/BusRdX

PrRd/–

MESI: Bus-Initiated Transactions

BusRd/–BusRdX/–

BusRd/Flush BusRd/FlushBusRdX/Flush

BusRdX/Flush

BusRdX/Flush׳

BusRd/Flush׳

MESI State Transition Diagram

BusRd(S) means shared line asserted on BusRd transaction

PrWr/—

BusRd/Flush

BusRdX/Flush

PrWr/BusRdX

PrWr/—

PrRd/—

PrRd/—BusRd/Flush

BusRd(S)

BusRdX/Flush

BusRd/Flush

PrWr/BusRdX

PrRd/BusRd (S)

Flush vs. Flush'

Flush: mandatory Flush' happens only when

Cache-to-cache sharing is used, and, Only one cache flushes data

MESI Visualization

P1 P3P2

Main Memory

BusSnooper Snooper Snooper

Mem Ctrl

MESI Visualization

P1 P3P2

Snooper Snooper Snooper

Mem Ctrl

MESI Visualization

P1 P3P2

Mem Ctrl

MESI Visualization

P1 P3P2

Mem Ctrl

wr &X(X=2)

One less bus requestdue to Exclusive state,esp. for serial programs

MESI Visualization

P1 P3P2

Mem Ctrl

MESI Visualization

P1 P3P2

Mem Ctrl

X=2 M X=2 S

MESI Visualization

P1 P3P2

Mem Ctrl

X=2 S X=2 S

wr &XX=3

BusUpgr

Note: BusUpgr insteadof BusRdX

MESI Visualization

P1 P3P2

Mem Ctrl

X=2 I X=3

S3 M S

MESI Visualization

P1 P3P2

Mem Ctrl

X=3 S X=3 S

MESI Visualization

P1 P3P2

Mem Ctrl

X=3 S X=3 S

Referred to as Cache-to-cache transferin Illinois MESI protocol

Flush1

MESI Example (Cache-to-Cache Transfer)

* Data from memory if no cache2cache transfer, BusRd/-

Proc Action

State P1 State P2 State P3 Bus Action Data From

R1 E – – BusRd Mem

W1 M – – – Own cache

R3 S – S BusRd/Flush P1 cache

W3 I – M BusRdX Mem

R1 S – S BusRd/Flush P3 cache

R3 S – S – Own cache

R2 S S S BusRd/Flush׳׳P1/P3

Cache*

MESI Example (Cache-to-Cache Transfer+BusUpgr)

* Data from memory if no cache2cache transfer, BusRd/-

Proc Action

R1 E - - BusRd Mem

W1 M - - - Own cache

R3 S - S BusRd/Flush P1 cache

W3 I - M BusUpgr Own cache

R1 S - S BusRd/Flush P3 cache

R3 S - S - Own cache

R2 S S S BusRd/Flush׳P1/P3

Cache*

Who supplies data on miss when not in M state: memory or cache? Original, lllinois MESI: cache

assume cache faster than memory (cache-to-cache transfer) Not necessarily true

Adds complexity How does memory know it should supply data? (must wait for caches) Selection algorithm if multiple caches have valid data

Valuable for distributed memory May be cheaper to obtain from nearby cache than distant memory Especially when constructed out of SMP nodes (Stanford DASH)

Lecture 9 Outline

Dragon Writeback Update Protocol

Four states Exclusive-clean (E): I and memory have it Shared clean (Sc): I, others, and maybe memory, but I’m not owner Shared modified (Sm): I and others but not memory, and I’m the owner

Sm and Sc can coexist in different caches, with at most one Sm Modified or dirty (M): I and, no one else On replacement: Sc can silently drop, Sm has to flush

No invalid state If in cache, cannot be invalid If not present in cache, can view as being in not-present or invalid state

New processor events: PrRdMiss, PrWrMiss Introduced to specify actions when block not present in cache

New bus transaction: BusUpd Broadcasts single word written on bus; updates other relevant caches

Dragon: Processor-Initiated Transactions

PrRdMiss/BusRd(~S)

PrRd/–

PrWr/–

PrRd/–

PrWr/BusUpd(S)

PrWr/BusUpd(~S)

PrRdMiss/BusRd(S)

PrWrMiss/(BusRd(S);BusUpd)

PrRd/–

PrWr/BusUpd(~S)

PrRdMiss/BusRd(~S)

PrRd/–PrWr/BusUpd(S) PrWr/–

Dragon: Bus-Initiated Transactions

BusRd/–BusUpd/Update

BusRd/–

BusRd/Flush

BusUpd/Update

BusRd/Flush

Dragon State Transition Diagram

PrWr/—

PrRd/—

PrRdMiss/BusRd(S)

PrRdMiss/BusRd(S) PrWr/—

PrWrMiss/(BusRd(S); BusUpd)

PrWrMiss/BusRd(S)

PrWr/BusUpd(S)

BusRd/—

BusRd/Flush

PrRd/— BusUpd/Update

BusUpd/Update

BusRd/Flush

PrWr/BusUpd(S)

Dragon Visualization

P1 P3P2

Main Memory

BusSnooper Snooper Snooper

Mem Ctrl

P1 P3P2

Mem Ctrl

P1 P3P2

Mem Ctrl

P1 P3P2

Mem Ctrl

wr &X(X=2)

One less bus requestdue to Exclusive state,esp. for serial programs

P1 P3P2

Mem Ctrl

P1 P3P2

Mem Ctrl

X=2 M X=2 ScSm

P1 P3P2

Mem Ctrl

X=2 Sm X=2 Sc

wr &XX=3

BusUpd

Note: BusUpdate insteadof BusUpgr (no inval isperformed)

P1 P3P2

Mem Ctrl

X=3 Sc X=3

This is a miss in theMESI and MSI protocols

P1 P3P2

Mem Ctrl

X=3 Sc X=3 Sm

P1 P3P2

Mem Ctrl

X=3 Sc X=3 Sm

X=3 Sc

Note: Only the cache inState Sm is responsiblefor cache-to-cache transfer

P1 P3P2

Mem Ctrl

X=3 Sc X=3 SmX=3 Sc

P1 replaces X

P1 P3P2

Mem Ctrl

X=3 Sc X=3 SmX=3 Sc

P3 replaces XOwner responsiblefor writing back to mem 3

vs. MSI or MESI wherewrite-back only when the line is in M state

Dragon Example

Proc Action

R1 E – – BusRd Mem

W1 M – – – Own cache

R3 Sm – Sc BusRd/Flush P1 cache

W3 Sc – Sm BusUpd/Upd Own cache

R1 Sc – Sm – Own cache

R3 Sc – Sm – Own cache

R2 Sc Sc Sm BusRd/Flush P3 cache

Can shared-modified state be eliminated? If update memory as well on BusUpd transactions (DEC Firefly) Dragon protocol doesn’t (assumes DRAM memory slow to update)

Should replacement of an Sc block be broadcast? Would allow last copy to go to Exclusive state and not generate updates Replacement bus transaction is not in critical path, later update may be

Shouldn’t update local copy on write hit before controller gets bus Can mess up serialization

Coherence, consistency considerations much like write-through case

In general, many subtle race conditions in protocols But first, let’s illustrate quantitative assessment at logical level

Lecture 9 Outline

Assessing Protocol Tradeoffs

Methodology: Use simulator; choose parameters per earlier methodology

(default 1MB, 4-way cache, 64-byte block, 16 processors; 64K cache for some)

Focus on frequencies, not end performance for now transcends architectural details, but not what we’re really

after Use idealized memory performance model to avoid

changes of reference interleaving across processors with machine parameters

Cheap simulation: no need to model contention

Impact of Protocol Optimizations

MSI = MESI Upgrades instead of read-exclusive helps Same story when working sets don’t fit for Ocean, Radix, Raytrace

MESI vs. MSI (w/ BusUpgr) vs. MSI (w/ BusRdX)Traffic (MB/s)

Traffic (MB/s)

Data bus

Address bus

Data bus

Address bus

a/III App

Impact of Cache-Block Size

Multiprocessors add new kind of miss to cold, capacity, conflict Coherence misses: Due to invalidations

True sharing: Write to same word False sharing: Write to different words

Reducing misses architecturally in invalidation protocol Capacity: enlarge cache; increase block size (if spatial locality) Conflict: increase associativity Cold and coherence: only block size

Increasing block size has advantages and disadvantages Can reduce misses if spatial locality is good Can hurt too

increase misses due to false sharing if spatial locality not good increase misses due to conflicts in fixed-size cache increase traffic due to fetching unnecessary data and due to false sharing can increase miss penalty and perhaps hit cost

Impact of Block Size on Miss Rate For default problem size: vary block/line size from 8-256 Bytes

• Decreases with larger lines: cold, capacity (due to spatial locality), true sharing (due to spatial locality)• Increases with larger lines: false sharing • Working set doesn’t fit: impact of capacity misses large: (Ocean, Radix)

Capacity

True sharing

False sharing

Upgrade

Capacity

True sharing

False sharing

Upgrade

8 6 2 4 8 6 80

Impact of Block Size on Traffic

Results different than for miss rate: traffic almost always increases When working sets fits, overall traffic still small, except for Radix Fixed overhead is significant component

So total traffic often minimized at 16-32 byte block, not smaller

Working set doesn’t fit: even 128-byte good for Ocean due to capacity Address bus traffic behaves in opposite way as the data bus traffic

Traffic (bytes/inst) affects performance indirectly through contentionTraffic (bytes/inst) affects performance indirectly through contention

Data bus

Address busData bus

Address bus

2 4 280

Data bus

Address bus

lecture 9 ece/csc 506 - spring 2007 - e. f. gehringer, based on slides by yan solihin1 lecture 9...

mem ctrlx

mem ctrlececsc

busupgr s

cache sharing

bus requestdue

serial programsececsc

cache flushes dataececsc

shared line

Documents

alimentazione neonatale. nutrizione enterale e specifica...

powerpoint sunusu - mesi...

systems thinking and complexity science - cehd | umn · 1...

wine passion in neive - pasquale pelissero...rifermentazione...

mesi open university uk in europe, march 2012

online focus groups presentation mesi 2012

evaluation 101 laura pejsa goff pejsa & associates mesi 2014

mesi - "priča odečaku koji je postao legenda"

manual da carreira por max gehringer

mesi ecg module...mesi ecg module is a 12-lead ecg...

mesi planning the curriculum in higher education for...

instructions for use - mesi...automated ankle-brachial...

robust cache coherence protocol verification with...

best practices in classroom peer review edward f. ...

snoopy mesi,mos

marketing, sales & digital communication...master i livello...

implementation of mesi protocol using verilogadvantages of...

mesi kas ainult maiustus

lohmann brown-classic -...

2013 - app dating mesi conference (apps for evaluators)