distributed shared memory (part 1). distributed shared memory (dsm) mem0 proc0 mem1 proc1 mem2 proc2...

Post on 05-Jan-2016

246 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Distributed Shared Memory (part 1)

Distributed Shared Memory (DSM)

mem0

proc0

mem1

proc1

mem2

proc2

memN

procN

network

...

shared memory

Shared memory programming

• Standard – pthread• synchronizations

– Barriers – Locks– Semaphores

Sequential SOR

for some number of timesteps/iterations {for (i=0; i<n; i++ )

for( j=1, j<n, j++ )temp[i][j] = 0.25 *

( grid[i-1][j] + grid[i+1][j]

grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )

for( j=1; j<n; j++ )grid[i][j] = temp[i][j];

}

Parallel SOR with Barriers (1 of 2)

void* sor (void* arg){

int slice = (int)arg;int from = (slice * (n-1))/p + 1;int to = ((slice+1) * (n-1))/p + 1;

for some number of iterations { … }}

Parallel SOR with Barriers (2 of 2)

for (i=from; i<to; i++) for (j=1; j<n; j++)

temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]);

barrier();for (i=from; i<to; i++)

for (j=1; j<n; j++) grid[i][j]=temp[i][j];

barrier();

Differences between SMP and Software DSM

• Delay: tradeoffs, such as block size• Software => traps: cost of

read/write misses• Goals of caches: multiprocessor =

performance, dist. system = transparency

• bus vs. long networks: reliance on serialization and broadcast.

Consequent differences in protocols and applications

• Bigger block size– Cost amortization, higher hit ratio for larger

blocks?– Reduced overhead

• But therefore...– Migration vs. Replication– False sharing increases

• DSM protocol more complex: Must handle lost, corrupted, and out-of-order packets

• Above, coupled with cost of traps, => SDSM consistency cost much higher!

Results of high consistency costs

• Manage sharing more carefully• Align data to page boundaries

Consistency Models

• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that

reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.

Common consistency protocols

• Write update– Multicast update to all replicas

• Write invalidate– Invalidate cached copies in p2, p3– Cache miss if p2/p3 access X

• Valid data from other cache

Conventional Implementation

• As proposed by Li & Hudak, TOCS ‘86.• Use virtual memory to implement

sharing.• Shared memory divided up by virtual

memory pages.• Use single-writer, multiple-reader write-

invalidate coherence protocol.• Keep pages in one of three states:

– invalid, read-only, read-write

Example

proc0 proc1 proc2 procN

shared memory

Example: Read Access Hit

proc0 proc1 proc2 procN

read

Example: Write Access Hit

proc0 proc1 proc2 procN

write

Example: Read Access Miss

proc0 proc1 proc2 procN

read

Example: Read Fault

proc0 proc1 proc2 procN

readfault

Example: Replication on Read

proc0 proc1 proc2 procN

read

Example: Write Access Miss

proc0 proc1 proc2 procN

write

Example: Write Fault

proc0 proc1 proc2 procN

writefault

Example: Write Invalidation

proc0 proc1 proc2 procN

write

Example: Write Access to Read-Only

proc0 proc1 proc2 procN

write

Example: Write Fault

proc0 proc1 proc2 procN

writefault

Example: Write Invalidation

proc0 proc1 proc2 procN

write

How to Remember Locations?

• Broadcast on miss (as in SMP).• Static home.• Dynamic home or owner.

Ownership and Owner Location

• Owner is the last writer.• Owner maintains copyset.• Every processor maintains

probable owner (not always the real owner).

Ownership Location

• Every read or write miss is sent to (local) probable owner.

• If owner, handle appropriately, else forward to probable owner.

Ownership Modification

• If write miss, new writer becomes owner, and all forwarders set probable owner to requester.

• If read miss, set probable owner to responding processor.

Example

• Initially, owner(page0) = p0, and probable owner(page0) = p0 everywhere.

• Write miss by p1, sends message to its probable owner (p0), handled there, new owner = p1, probable owner(0) on p0 = 1.

• Read miss by p2, sends message to probable owner (p0), forwarded to probable owner (p1), handled there, probable owner(0) on p2 becomes p1.

Implement synchronizations

• Use messages to implement synchronizations

Barriers

• Designate one processor as barrier manager.

• When a process waits at a barrier, it sends an arrival message to the barrier manager and waits.

• When barrier manager has received all messages, it sends a departure message to all processes.

Locks

• Designate one process as the lock manager for a particular lock.

• When a process acquires a lock, it sends an acquire message to the manager and waits.

• Manager forwards message to last acquirer.

• If lock free, send lock grant message.• If lock held, hold on to request until

free, and then send lock grant message.

Problem: False Sharing

• Concurrent access to different data within the same consistency unit.

• With page as consistency unit, lots of opportunity for false sharing.

• Two flavors:– read-write – write-write

Read-Write False Sharing

x

y

Read-Write False Sharing (Cont.)

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Read-Write False Sharing (Cont.)

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Write-Write False Sharing

w(x)

w(y) w(y) r(x)

synch

w(x) w(x)

Summary

• Software shared memory on distributed memory hardware.– Uses virtual memory.

• Home migration to improve locality– important because of high latencies.

• Sequential consistency suffers from false sharing

top related