lecture 6 dr. samman h. ameen 2014/saman... · 2017. 1. 22. · samman h. ameen shared-memory...

LECTURE 6

DR. SAMMAN H. AMEEN

Shared-memory systems

1

Last week

We discussed Static interconnection networks

This week we explain

1. Shared-memory systems

PAGE 2

In today’s competitive technology where speed and cost are so important, the efficiency of tasks is crucial to achieve success. In the most common human handle tasks, team work is the way to go. Technology follows the same principle as multi-processors allows speedup achievement through parallel programming. Like for team work, communication is very important to achieve maximum efficiency.

One popular way to allow multiple processors to communicate is to use shared memory architecture. This lecture describes the different characteristics of shared memory systems, the paradigm surrounding the software programming, the hardware requirement for implementation and a description of Altera’s solution. An emphasis is made on cache coherence and synchronization.

Memory is a very important component of a computer, yet

even more importantly is the type of memory in a computer’s

architecture. Having the wrong type of memory for the system

can be costly.

There are many types of memory all of which must work

together in a system to perform tasks efficiently. System

efficiency is gained by having different levels of memory units

that have different data transfer speeds. The top level there is a

layer with slow memory access.Please check the grammar

Grammar not corrected -1

Is it normally magnetic tape drives? No reference -1

The use of a shared memory system for parallel processing has several advantages. For programmers, the

coding is very similar to a multi-threaded application running a uniprocessor system because the resources

are shared the same way among the tasks, using the same synchronization techniques, which makes the

adaptation easy for them. The location of global variables can be loaded in local memory caches which increase

the performance of application using extensively the shared memory. Another advantage is that the most

popular platforms now offer hardware extensions to take care of memory and cache coherence. This

makes the incremental cost very low to add processors to a multi-processor design.

Two main problems need to be addressed when designing a shared memory system: performance degradation

due to contention, and coherence problems. Performance degradation might happen when multiple processors

are trying to access the shared memory simultaneously. A typical design might use caches to solve the contention

problem. However, having multiple copies of data, spread throughout the caches, might lead to a coherence

problem. The copies in the caches are coherent if they are all equal to the same value. However, if one of the

processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer

equals the value of the other copies. In this chapter we study a variety of shared memory systems and their

solutions of the cache coherence problem.

Using the shared memory model for multiprocessor could induce a bottleneck to the

architecture. Multiple processor could be writing at a occasion. And at some instance, more

then one processor could be accessing the same memory location which could greatly ones

computation output.

Using a local memory for each

processing element and using

the message passing model

improves the previously issue.

• Uniform Memory Access (UMA)

• Non-Uniform Memory Access (NUMA)

• Cache-only Memory Architecture (COMA)

• All processors have equal access time to any memory location.

• The interconnection network used in the UMA can be a single bus, multiple buses, a crossbar

switch or multistage switching networks .

• Two or more CPUs and one or more memory modules all use the same bus for communication.

• Tightly-coupled systems (high degree of resource sharing)

• Symmetric: - all processors have equal access to all peripheral devices.- all processors are identical.

• Asymmetric:- one processor (master) executes the operating system- other processors may be of different types and may be dedicated to special tasks.

• each processor has part of the shared memory attached.

• There is a single address space visible to all CPUs. All local memories form a global address space accessible by all processors

• Access to remote memory is slower than access to local memory.

NUMA do not scale well because they do not do caching. Having to go to the remote memory every time a nonlocal memory word is accessed is a major performance hit. However, if caching is added, then cache coherence must also be added and the model is called ccNUMA.

• Similar to the NUMA, each processor has part of the shared memory in the COMA. However, in this case the shared memory

• A COMA system requires that data be migrated to the processor requesting it. consists of cache memory.

• There is a cache directory (D) that helps in remote cache access.

• In a single cache system, coherence between memory and the cache is maintained using one of two policies: (1) write-through, and (2) write-back.

1-In write-through, the memory

is updated every time the

cache is updated,

2. In write-back, the memory is

updated only when the block

in the cache is being

replaced.

Write-Through vs. Write-Back

Multiple copies of x

-What if P1 updates x?

• Caches play key role in all cases

• Reduce average data access time

• Reduce bandwidth demands placed on shared interconnect

• But private processor caches create a problem

• Copies of a variable can be present in multiple caches

• A write by one processor may not become visible to others• They’ll keep accessing stale value in their caches

• Cache coherence problem

• Need to take actions to ensure visibility

• There are two fundamental cache coherence policies: (1) write-invalidate, and (2) write-update.

Write-invalidate maintains consistency

by reading from local caches until a

write occurs. When any processor

updates the value of X through a write,

posting a dirty bit for X invalidates all

other copies.

Write-update maintains consistency by

immediately updating all copies in all

caches. All dirty bits are set during each

write operation.

Write-Update vs. Write-Invalidate

• Writing to Cache in n processor case

• Write Update - Write Through

• Write Update - Write Back

• Write Invalidate - Write Through

• Write Invalidate - Write Back

Snooping protocols are based on watching bus activities and carry out the appropriate coherency commands when necessary. Global memory is moved in blocks, and each block has a state associated with it, which determines what happens to the entire contents of the block. The state of a block might change as a result of the operations Read-Miss, Read-Hit, Write-Miss, and Write-Hit.

Multiple processors can read block copies from main

memory safely until one processor updates its copy. At

this time, all cache copies are invalidated and the

memory is updated to remain consistent.

State Description

Valid

[VALID]The copy is consistent with global memory

Invalid

[INV]The copy is inconsistent

X = 5

1. P reads X

2. Q reads X

3. Q updates X, X=10

4. Q reads X

5. Q updates X, X=15

6. P updates X, X=20

7. Q reads X

Event MemoryLocation X

P’s Cache Q’s Cache comments

Location X State Location X State

Original value

5

P reads X 5 5 VALID Read-Miss

Q reads X 5 5 VALID 5 VALID Read-Miss

Q updates X 10 5 INV 10 VALID Write-Hit

Q reads X 10 5 INV 10 VALID Read-Hit


P updates X 20 20 VALID 15 INV Write-Miss


Event Memory

Location X



Original

value

5

P reads X 5 5 VALID Read-Miss



Q reads X 10 5 INV 10 VALID Read-Hit


P updates X 20 20 VALID 15 INV Write-Miss


A valid block can be owned by memory and shared in multiple caches that can contain only the shared copies of the block. Multiple processors can safely read these blocks from their caches until one processor updates its copy. At this time, the writer becomes the only owner of the valid block and all other copies are invalidated

State Description

Shared

(Read-Only) [RO]

Data is valid and can be read safely. Multiple copies can be in this

state

Exclusive

(Read-Write) [RW]

Only one valid cache copy exists and can be read from and written

to safely. Copies in other caches are invalid

Invalid

[INV]

The copy is inconsistent

Event MemoryLocation X



Original value

5

P reads X 5 5 RO Read-Miss

Q reads X 5 5 RO 5 RO Read-Miss

Q updates X 5 5 INV 10 RW Write -hit

Q reads X 5 5 INV 10 RW Read-Hit

Q updates X 5 5 INV 15 RW Write-Hit

P updates X 5 20 RW 15 INV Write-Miss


Event Memory

Location X



Original

value

5

P reads X 5 5 RO Read-Miss


Q updates X 5 5 INV 10 RW Write-hit

Q reads X 5 5 INV 10 RW Read-Hit

Q updates X 5 5 INV 15 RW Write-Hit

P updates X 5 20 RW 15 INV Write-Miss


• Explain both Von Neumann architecture and Harvard architecture showing the advantages of each architecture.

• What are the fundamental decision issues in selecting the appropriate architecture for an interconnection network (IN) for parallel machines. Then explain the synchronous and asynchronous mode of operation.

• Discuss SIMD Architecture in detail with its variance configurations.

• What are the Characteristics of CISC and RISC Architecture?

• Explain Flynn's classification of computer architecture using neat block diagram.

• Discuss different types of Dynamic Interconnection Networks.

lecture 6 dr. samman h. ameen 2014/saman... · 2017. 1. 22. · samman h. ameen shared-memory...

Documents