networks of workstations prabhaker mateti wright state university
TRANSCRIPT
Networks of Workstations
Prabhaker MatetiWright State University
Prabhaker Mateti, Networks of Workstations 2
Overview
• Parallel computers• Concurrent computation• Parallel Methods• Message Passing• Distributed Shared Memory• Programming Tools• Cluster configurations
Prabhaker Mateti, Networks of Workstations 3
Granularity of Parallelism
• Fine-Grained Parallelism • Medium-Grained Parallelism• Coarse-Grained Parallelism • NOWs (Networks of Workstations)
Prabhaker Mateti, Networks of Workstations 4
Fine-Grained Machines
• Tens of thousands of Processors • Processors
– Slow (bit serial) – Small (K bits of RAM)– Distributed Memory
• Interconnection Networks – Message Passing
• Single Instruction Multiple Data (SIMD)
Prabhaker Mateti, Networks of Workstations 5
Sample Meshes
• Massively Parallel Processor (MPP) • TMC CM-2 (Connection Machine) • MasPar MP-1/2
Prabhaker Mateti, Networks of Workstations 6
Medium-Grained Machines
• Typical Configurations – Thousands of processors – Processors have power between
coarse- and fine-grained • Either shared or distributed
memory• Traditionally: Research Machines • Single Code Multiple Data (SCMD)
Prabhaker Mateti, Networks of Workstations 7
Medium-Grained Machines
• Ex: Cray T3E • Processors
– DEC Alpha EV5 (600 MFLOPS peak) – Max of 2048
• Peak Performance: 1.2 TFLOPS • 3-D Torus • Memory: 64 MB - 2 GB per CPU
Prabhaker Mateti, Networks of Workstations 8
Coarse-Grained Machines
• Typical Configurations – Hundreds of Processors
• Processors – Powerful (fast CPUs) – Large (cache, vectors, multiple fast buses)
• Memory: Shared or Distributed-Shared • Multiple Instruction Multiple Data
(MIMD)
Prabhaker Mateti, Networks of Workstations 9
Coarse-Grained Machines
• SGI Origin 2000: – PEs (MIPS R10000): Max of 128 – Peak Performance: 49 Gflops – Memory: 256 GBytes – Crossbar switches for interconnect
• HP/Convex Exemplar: – PEs (HP PA-RISC 8000): Max of 64 – Peak Performance: 46 Gflops – Memory: Max of 64 GBytes – Distributed crossbar switches for interconnect
Prabhaker Mateti, Networks of Workstations 10
Networks of Workstations
• Exploit inexpensive Workstations/PCs • Commodity network • The NOW becomes a “distributed memory
multiprocessor”• Workstations send+receive messages • C and Fortran programs with PVM, MPI, etc.
libraries• Programs developed on NOWs are portable to
supercomputers for production runs
Prabhaker Mateti, Networks of Workstations 11
“Parallel” Computing
• Concurrent Computing• Distributed Computing• Networked Computing• Parallel Computing
Prabhaker Mateti, Networks of Workstations 12
Definition of “Parallel”
• S1 begins at time b1, ends at e1• S2 begins at time b2, ends at e2• S1 || S2
– Begins at min(b1, b2)– Ends at max(e1, e2)– Equiv to S2 || S1
Prabhaker Mateti, Networks of Workstations 13
Data Dependency
• x := a + b; y := c + d;• x := a + b || y := c + d;• y := c + d; x := a + b;• X depends on a and b, y depends
on c and d• Assumed a, b, c, d were
independent
Prabhaker Mateti, Networks of Workstations 14
Types of Parallelism
• Result• Specialist• Agenda
Prabhaker Mateti, Networks of Workstations 15
Perfect Parallelism
• Also called – Embarrassingly Parallel– Result parallel
• Computations that can be subdivided into sets of independent tasks that require little or no communication – Monte Carlo simulations– F(x, y, z)
Prabhaker Mateti, Networks of Workstations 16
MW Model
• Manager – Initiates computation – Tracks progress – Handles worker’s requests – Interfaces with user
• Workers – Spawned and terminated by manager – Make requests to manager – Send results to manager
Prabhaker Mateti, Networks of Workstations 17
Reduction
• Combine several sub-results into one
• Reduce r1 r2 … rn with op• Becomes r1 op r2 op … op rn
Prabhaker Mateti, Networks of Workstations 18
Data Parallelism
• Also called – Domain Decomposition– Specialist
• Same operations performed on many data elements simultaneously – Matrix operations– Compiling several files
Prabhaker Mateti, Networks of Workstations 19
Control Parallelism
• Different operations performed simultaneously on different processors
• E.g., Simulating a chemical plant; one processor simulates the preprocessing of chemicals, one simulates reactions in first batch, another simulates refining the products, etc.
Prabhaker Mateti, Networks of Workstations 20
Process communication
• Shared Memory• Message Passing
Prabhaker Mateti, Networks of Workstations 21
Shared Memory
• Process A writes to a memory location
• Process B reads from that memory location
• Synchronization is crucial• Excellent speed
Prabhaker Mateti, Networks of Workstations 22
Shared Memory
• Needs hardware support: – multi-ported memory
• Atomic operations: – Test-and-Set– Semaphores
Prabhaker Mateti, Networks of Workstations 23
Shared Memory Semantics: Assumptions
• Global time is available. Discrete increments.• Shared variable s, = vi at ti, i=0,…• Process A: s := v1 at time t1• Assume no other assignment occurred after t1.• Process B reads s at time t and gets value v.
Prabhaker Mateti, Networks of Workstations 24
Shared Memory: Semantics
• Value of Shared Variable– v = v1, if t > t1– v = v0, if t < t1– v = ??, if t = t1
• t = t1 +- discrete quantum• Next Update of Shared Variable
– Occurs at t2– t2 = t1 + ?
Prabhaker Mateti, Networks of Workstations 25
Condition Variables and Semaphores
• Semaphores– V(s) ::= < s := s + 1 >– P(s) ::= <when s > 0 do s := s – 1>
• Condition variables– C.wait()– C.signal()
Prabhaker Mateti, Networks of Workstations 26
Distributed Shared Memory
• A common address space that all the computers in the cluster share.
• Difficult to describe semantics.
Prabhaker Mateti, Networks of Workstations 27
Distributed Shared Memory: Issues
• Distributed– Spatially– LAN– WAN
• No global time available
Prabhaker Mateti, Networks of Workstations 28
Messages
• Messages are sequences of bytes moving between processes
• The sender and receiver must agree on the type structure of values in the message
• “Marshalling” of data
Prabhaker Mateti, Networks of Workstations 29
Message Passing
• Process A sends a data buffer as a message to process B.
• Process B waits for a message from A, and when it arrives copies it into its own local memory.
• No memory shared between A and B.
Prabhaker Mateti, Networks of Workstations 30
Message Passing
• Obviously, – Messages cannot be received before they
are sent.– A receiver waits until there is a message.
• Asynchronous– Sender never blocks, even if infinitely many
messages are waiting to be received– Semi-asynchronous is a practical version of
above with large but finite amount of buffering
Prabhaker Mateti, Networks of Workstations 31
Message Passing: Point to Point
• Q: send(m, P) – Send message M to process P
• P: recv(x, Q)– Receive message from process P, and
place it in variable x• The message data
– Type of x must match that of m– As if x := m
Prabhaker Mateti, Networks of Workstations 32
Broadcast
• One sender, multiple receivers• Not all receivers may receive at
the same time
Prabhaker Mateti, Networks of Workstations 33
Types of Sends
• Synchronous• Asynchronous
Prabhaker Mateti, Networks of Workstations 34
Synchronous Message Passing
• Sender blocks until receiver is ready to receive.
• Cannot send messages to self.• No buffering.
Prabhaker Mateti, Networks of Workstations 35
Message Passing: Speed
• Speed not so good– Sender copies message into system
buffers.– Message travels the network.– Receiver copies message from
system buffers into local memory.– Special virtual memory techniques
help.
Prabhaker Mateti, Networks of Workstations 36
Message Passing: Programming
• Less error-prone cf. shared memory
Prabhaker Mateti, Networks of Workstations 37
Message Passing: Synchronization
• Synchronous MP: – Sender waits until receiver is ready.– No intermediary buffering
Prabhaker Mateti, Networks of Workstations 38
Barrier Synchronization
• Processes wait until “all” arrive
Prabhaker Mateti, Networks of Workstations 39
Parallel Software Development
• Algorithmic conversion by compilers
Prabhaker Mateti, Networks of Workstations 40
Development of Distributed+Parallel Programs
• New code + algorithms• Old programs rewritten
– in new languages that have distributed and parallel primitives
– With new libraries
• Parallelize legacy code
Prabhaker Mateti, Networks of Workstations 41
Conversion of Legacy Software
• Mechanical conversion by software tools
• Reverse engineer its design, and re-code
Prabhaker Mateti, Networks of Workstations 42
Automatically parallelizing compilers
Compilers analyze programs and parallelize (usually loops).
Easy to use, but with limited success
Prabhaker Mateti, Networks of Workstations 43
OpenMP on Networks of Workstations
• The OpenMP is an API for shared memory architectures.
• User-gives hints as directives to the compiler
• http://www.openmp.org
Prabhaker Mateti, Networks of Workstations 44
Message Passing Libraries
• Programmer is responsible for data distribution, synchronizations, and sending and receiving information
• Parallel Virtual Machine (PVM)• Message Passing Interface (MPI)• BSP
Prabhaker Mateti, Networks of Workstations 45
BSP: Bulk Synchronous Parallel model
• Divides computation into supersteps• In each superstep a processor can work
on local data and send messages. • At the end of the superstep, a barrier
synchronization takes place and all processors receive the messages which were sent in the previous superstep
• http://www.bsp-worldwide.org/
Prabhaker Mateti, Networks of Workstations 46
BSP Library
• Small number of subroutines to implement – process creation, – remote data access, and – bulk synchronization.
• Linked to C, Fortran, … programs
Prabhaker Mateti, Networks of Workstations 47
Parallel Languages
• Shared-memory languages• Parallel object-oriented languages • Parallel functional languages• Concurrent logic languages
Prabhaker Mateti, Networks of Workstations 48
Tuple Space: Linda
• <v1, v2, …, vk>• Atomic Primitives
– In (t)– Read (t)– Out (t)– Eval (t)
• Host language: e.g., JavaSpaces
Prabhaker Mateti, Networks of Workstations 49
Data Parallel Languages
• Data is distributed over the processors as a arrays
• Entire arrays are manipulated:– A(1:100) = B(1:100) + C(1:100)
• Compiler generates parallel code– Fortran 90– High Performance Fortran (HPF)
Prabhaker Mateti, Networks of Workstations 50
Parallel Functional Languages
• Erlang http://www.erlang.org/
• SISAL http://www.llnl.gov/sisal/• PCN Argonne
Prabhaker Mateti, Networks of Workstations 51
Clusters
Prabhaker Mateti, Networks of Workstations 52
Buildings-Full of Workstations
1. Distributed OS have not taken a foot hold.
2. Powerful personal computers are ubiquitous.
3. Mostly idle: more than 90% of the up-time?
4. 100 Mb/s LANs are common. 5. Windows and Linux are the top two OS
in terms of installed base.
Prabhaker Mateti, Networks of Workstations 53
Cluster Configurations
• NOW -- Networks of Workstations• COW -- Clusters of Dedicated
Nodes• Clusters of Come-and-Go Nodes• Beowulf clusters
Prabhaker Mateti, Networks of Workstations 54
Beowulf
• Collection of compute nodes• Full trust in each other
– Login from one node into another without authentication
– Shared file system subtree
• Dedicated
Prabhaker Mateti, Networks of Workstations 55
Close Cluster Configuration
computenode
computenode
computenode
computenode
-Front end
High Speed Network
Service Network
gatewaynode
External Network
computenode
computenode
computenode
computenode
-Front end
High Speed Network
gatewaynode
External Network
FileServernode
Prabhaker Mateti, Networks of Workstations 56
Open Cluster Configuration
computenode
computenode
computenode
computenode
computenode
computenode
computenode
computenode
-Front end
External Network
FileServernode
High Speed Network
Prabhaker Mateti, Networks of Workstations 57
Interconnection Network
• Most popular: Fast Ethernet• Network topologies
– Mesh– Torus
• Switch v Hub
Prabhaker Mateti, Networks of Workstations 58
Software Components
• Operating System– Linux, FreeBSD, …
• Parallel programming– PVM, MPI
• Utilities, …• Open source
Prabhaker Mateti, Networks of Workstations 59
Software Structure of PC Cluster
- HIGH SPEED NETWORK
HARDWARELAYER
OS LAYER
HARDWARELAYER
OS LAYER
HARDWARELAYER
OS LAYER
PARALLEL VIRTUAL MACHINE LAYER
Parallel Program Parallel Program Parallel Program
Prabhaker Mateti, Networks of Workstations 60
Single System View
• Single system view– Common filesystem structure view fro
m any node– Common accounts on all nodes– Single software installation point
• Benefits– Easy to install and maintain system– Easy to use for users
Prabhaker Mateti, Networks of Workstations 61
Installation Steps• Install Operating system• Setup a Single System View
– Shared filesystem– Common accounts – Single software installation point
• Install parallel programming packages s uch as MPI, PVM, BSP
• Install utilities, libraries, and applications
Prabhaker Mateti, Networks of Workstations 62
Linux Installation
• Linux has many distributions : Redh at, Caldera, SuSe, Debian, …
• Caldera is easy to install• All above u pgrade with RPM packag
e management• Mandrake and SuSe come with a ve
ry complete set of software
Prabhaker Mateti, Networks of Workstations 63
Clusters with Part Time Nodes
• Cycle Stealing: Running of jobs on a workstation that don't belong to the owner.
• Definition of Idleness: E.g., No keyboard and no mouse activity
• Tools/Libraries– Condor– PVM– MPI
Prabhaker Mateti, Networks of Workstations 64
Migration of Jobs
• Policies – Immediate-Eviction– Pause-and-Migrate
• Technical Issues– Checkpointing: Preserving the state of
the process so it can be resumed.– Migrating from one architecture to
another
Prabhaker Mateti, Networks of Workstations 65
Summary
• Parallel – computers– computation
• Parallel Methods• Communication primitives
– Message Passing– Distributed Shared Memory
• Programming Tools• Cluster configurations