introduction multiprocessing architecture
Post on 05-Jun-2022
13 Views
Preview:
TRANSCRIPT
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
SMD149 - Operating Systems - Multiprocessing
Roland Parviainen
December 1, 2005
1 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Overview
Multiprocessor systems
Multiprocessor, operating system and memory organizations
2 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Introduction
Multiprocessor system
Computer that contains more than one processor
Benefits
Increased processing powerScale resource use to application requirements
Additional operating system responsibilities
All processors remain busyEven distribution of processes throughout the systemAll processors work on consistent copies of shared dataExecution of related processes synchronizedMutual exclusion enforced
3 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Multiprocessing architecture
Examples of multiprocessors
Dual-processor personal computerPowerful server containing many processorsCluster of workstations
Classifications of multiprocessor architecture
Nature of datapathInterconnection schemeHow processors share resources
4 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Flynn’s taxonomy
Based upon the number of concurrent instruction (or control) anddata streams available
SISD: Single instruction, single data streamMISD: Multiple instruction, single data streamSIMD: Single instruction, multiple data streamsMIMD: Multiple instruction, multiple data streams
5 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
SISD
Single instruction, single data stream
Sequential computer
Exploit minor parallelism:
PipeliningSuperscalar processorsHyper threading
6 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
MISD
Multiple instruction, single data stream
Not commonly used
Redundancy
Three processors do the same computation, results are compared
7 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
SIMD
Single instruction, multiple data streams
One or more processing units
Same instruction on a block of data
Array or vector processors
Most processors have SIMD instructions
PowerPC’s AltiVec, Intel’s MMX, SSE, SSE2 and SSE3, AMD’s3DNow!, SPARCs VIS, PA-RISCs MAX and the MIPS MDMX andMIPS-3D.Multimedia
8 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
MIMD
Multiple instruction, multiple data streams
Multiple autonomous processors simultaneously executing differentinstructions on different data
Multiprocessor systems
Network of workstations
Distributed systems
9 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Processor interconnection schemes
Describes how the system’s components, such as processors andmemory modules, are connected
Consists of nodes (components or switches) and links (connections)
Parameters used to evaluate interconnection schemes
Node degreeBisection widthNetwork diameterCost of the interconnection scheme
10 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Shared bus
Single communication path between all nodes
Contention can build up for shared bus
Lacks scalability
Useful for interconnecting a small number of processors
For systems with many processors, a bus could be used tointerconnect nodes within a supernode
Dynamic network
Example: dual-processor Intel Pentium
11 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Shared bus
12 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Crossbar-switch matrix
Separate path from every processor to every memory module (or ingeneral, from every node to every other node)
For n processors and m memory modules, there will be n x m totalswitches
High fault tolerance:
Bisection width = (n x m)/2
High performance:
Simultaneous transmissions can be supported
Network diameter = 1 yields low latency
High cost: proportional to n x m
Example: Sun UltraSPARC-III
13 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Crossbar-switch matrix
14 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
2-D mesh network
n rows and m columns, in which a node is connected to nodesdirectly north, south, east and west of it
Maximum node degree = 4
Moderate performance, fault tolerance and cost
Not very scalable given that network diameter may be substantial
Example: Intel Paragon (large-scale system, but communication iskept between neighboring nodes)
15 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
2-D mesh network
16 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Hypercube
n-dimensional hypercube has 2n nodes in which each node isconnected to n neighbor nodes
Better scalability and performance
4-D hypercube (16 nodes) results in network diameter of 4, while a2-D mesh network with 16 nodes results in network diameter of 6
More fault tolerant, but more expensive than crossbar switch or 2-Dmesh networks
Example: nCUBE (up to 8192 processors)
17 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Hypercube
18 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Multistage networks
Switch nodes act as hubs routing messages between nodes
Cheaper (use simple hardware for switches that can be packedtightly)
Less fault tolerant and worse performance compared to acrossbar-switch matrix
Contention can occur at switches and network diameter is typicallylarger
Example: IBM POWER4
19 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Multistage networks
20 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Loosely coupled vs. tightly coupled systems
Tightly coupled systems
Processors share most resources including memoryCommunicate over shared busesLess burden for operating system programmersLess flexible, poor scalability and fault tolerance
Loosely coupled systems
Processors do not share most resourcesDedicated memoryExplicit messages using communication link or shared virtual memoryMore flexible, fault tolerant, scalable
21 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Loosely coupled vs. tightly coupled systems
22 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Multiprocessor operating system organizations
Significantly different from uniprocessor systems
Systems classified into three types (based on how processors shareoperating system responsibilities)
Master/slaveSeparate kernelsSymmetrical organization
23 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Master/slave
Master processor executes operating system code
Computation and I/O jobs
Slaves execute only user programs
Processor-bound jobs
Hardware asymmetry
Slave processors have to wait for the master processor to handle theinterrupt for OS operations
Low fault tolerance
Failure of the master processor results in system failure
Good for computationally intensive jobs
Example: nCUBE system
24 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Separate kernels
Each processor executes its own operating system and responds onlyto interrupts generated from user processes running on thatprocessor
Loosely coupled
Little contention over resourcesSome globally shared operating system data
System failure unlikely, but failure of one processor results intermination of processes on that processor
Useful in environments where processes do not interact much andhigh availability is important
25 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Symmetrical
Operating system manages a pool of identical processors
High amount of resource sharingNeed for mutual exclusionAllows processes to truly share the processors and exploit parallelism
Highest degree of fault tolerance of any organization
Facilitates graceful degradation
Some contention for resources can limit performance improvementgained by adding more processors
26 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Memory access architectures
Classification based on how memory is shared
Trade offs between performance, scalability and cost
27 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
UMA
Uniform memory access (UMA) multiprocessor system
All processors share main memory
Access to any memory page is nearly the same for all processors andall memory modules
Typically uses shared bus or crossbar-switch matrix
Used in small multiprocessor systems (typically two to eightprocessors)
28 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
UMA
29 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
NUMA
Nonuniform memory access (NUMA) multiprocessor
Global memory divided into memory partitions or modules
Each node contains a few processors and a portion of systemmemory, which is local to that node
Access to local memory faster than access to global memory (rest ofthe memory modules)
More scalable than UMA with fewer bus collisions
Most of processor’s memory request served by local memory
30 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
NUMA
31 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
COMA
Basic physical interconnection similar to NUMA Main memory isviewed as a cache and called an attraction memory (AM)
Allows system to migrate data to node that most often accesses it atgranularity of a memory line (more efficient than a memory page)Reduces the number of cache misses serviced remotelyOverhead: duplicated data items and complex protocol to ensure allupdates are received at all processors
32 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
COMA
33 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
NORMA
No-Remote Memory Access
Loosely coupled
Simple to build: no complex interconnection
No shared global memory
Each node maintains its own local memory
Communication through explicit messages
Some implement the illusion of shared global memory known asshared virtual memory (SVM)
Distributed systems
Involve message passing via remote procedure calls (RPCs)
Example: Google server farms
34 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
NORMA
35 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Memory sharing
Memory coherence
Ensure that all the processors have access to the most up- to-datecopy of dataCache coherency
Strategies to increase percentage of local memory hits
Page migrationPage replication
36 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Cache coherence
UMA cache coherenceThree different ways to implement
Bus snooping (a.k.a. cache snooping)Maintain a centralized directory to indicate when to remove stale dataOnly one processor can cache a data item
Cache coherent NUMA
Each address is associated with a home nodeContact home node on reads and writes to receive most updatedversion of dataThe protocol requires a maximum of three network traversals percoherency operation
37 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Page replication and migration
Page replication
System places a copy of a memory page at a new nodeGood for pages read by processes at different nodes, but not written
Page migration
System moves a memory page from one node to anotherGood for pages written to by a single remote process
Increases percentage of local memory hits
Drawbacks
Memory overheadReplicating and migrating more expensive than remotely referencingso only useful if referenced again
38 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Scheduling
Determines the order and to which processors processes aredispatched
Two goals
Parallelism - timesharing schedulingProcessor affinity - space-partitioning scheduling
Soft affinityHard affinity
Types of scheduling algorithms
Job-blindJob-aware
39 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Job-blind multiprocessor scheduling
Straightforward extension of uniprocessor algorithms
Examples
First-in-first out (FIFO) multiprocessor schedulingRound-robin (RR) multiprocessor schedulingShortest process first (SPF) multiprocessor scheduling
40 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Job-aware multiprocessor scheduling
Algorithms that maximize parallelism
Shortest-number-of-processes-first (SNPF) schedulingRound-robin job (RRjob) schedulingCoscheduling
Processes from the same job placed in adjacent spots in the globalrun queueSliding window moves down run queue running adjacent processesthat fit into window (size = number of processors)
Algorithms that maximize processor affinity
Dynamic partitioning
41 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Coscheduling example
42 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Dynamic partitioning
43 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Process migration
Transferring a process from one node to another node
Benefits of process migration
Increases fault toleranceLoad balancingReduces communication costsResource sharing
44 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Flow of process migration
Either sender or receiver initiates request
Sender suspends migrating process
Sender creates message queue for migrating process’s messages
Sender transmits state to a “dummy” process at the receiver
Sender and receiver notify other nodes of process’s new location
Sender deletes its instance of the process
45 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Flow of process migration
46 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Process migration concepts
Trade-off between residual dependency and initial migration latency
Characteristics of a successful migration strategy
Minimal residual dependencyTransparentScalableFault tolerantHeterogeneous
47 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Process migration strategies
Eager migration
Transfer entire state during initial migration
Dirty eager migration
Only transfer dirty memory pagesClean pages brought in from secondary storage
Copy-on reference migration
Similar to dirty eager except clean pages can also be acquired fromsending node on demand
Lazy copying
No initial transfer of pages
48 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Process migration strategies
Flushing migration
Flush all dirty pages to disk before migrationTransfer no memory pages during initial migration
Precopy migration
Begin transferring memory pages before suspending processMigrate once dirty pages reach a lower threshold
49 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Load balacing
System attempts to distribute the processing load evenly amongprocessors
Reduces variance in response timesEnsures no processors remain idle while other processors areoverloadedTwo types
Static load balancingDynamic load balancing
50 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Static load balacing
Useful in environments in which jobs exhibit predictable patternsGoals
Distribute the loadReduce communication costs
Solve using a graphMust use approximations for large systems to balance loads withreasonable overhead
51 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Static load balacing using graphs
52 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Dynamic load balancing
More useful than static load balancing in environments wherecommunication patterns change and processes are created orterminated unpredictably
Types of policies
Sender-initiatedReceiver-initiatedSymmetricRandom
Algorithms
Bidding algorithmDrafting algorithm
53 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Dynamic load balancing
Communication issues
Communication with remote nodes can have long delays; load on aprocessor might change in this timeCoordination between processors to balance load creates manymessagesCan overload the systemSolution
Only communicate with neighbor nodesLoad diffuses throughout the system
54 / 55
IntroductionMultiprocessing architecture
Operating system organizationMemory access architectures
Scheduling and process migration
Summary
Next time: networks and distributed systems
55 / 55
top related