networks of workstations prabhaker mateti wright state university

Networks of Workstations

Prabhaker MatetiWright State University

Prabhaker Mateti, Networks of Workstations 2

Overview

• Parallel computers• Concurrent computation• Parallel Methods• Message Passing• Distributed Shared Memory• Programming Tools• Cluster configurations


Granularity of Parallelism

• Fine-Grained Parallelism • Medium-Grained Parallelism• Coarse-Grained Parallelism • NOWs (Networks of Workstations)


Fine-Grained Machines

• Tens of thousands of Processors • Processors

– Slow (bit serial) – Small (K bits of RAM)– Distributed Memory

• Interconnection Networks – Message Passing

• Single Instruction Multiple Data (SIMD)


Sample Meshes

• Massively Parallel Processor (MPP) • TMC CM-2 (Connection Machine) • MasPar MP-1/2


Medium-Grained Machines

• Typical Configurations – Thousands of processors – Processors have power between

coarse- and fine-grained • Either shared or distributed

memory• Traditionally: Research Machines • Single Code Multiple Data (SCMD)


Medium-Grained Machines

• Ex: Cray T3E • Processors

– DEC Alpha EV5 (600 MFLOPS peak) – Max of 2048

• Peak Performance: 1.2 TFLOPS • 3-D Torus • Memory: 64 MB - 2 GB per CPU


Coarse-Grained Machines

• Typical Configurations – Hundreds of Processors

• Processors – Powerful (fast CPUs) – Large (cache, vectors, multiple fast buses)

• Memory: Shared or Distributed-Shared • Multiple Instruction Multiple Data

(MIMD)


Coarse-Grained Machines

• SGI Origin 2000: – PEs (MIPS R10000): Max of 128 – Peak Performance: 49 Gflops – Memory: 256 GBytes – Crossbar switches for interconnect

• HP/Convex Exemplar: – PEs (HP PA-RISC 8000): Max of 64 – Peak Performance: 46 Gflops – Memory: Max of 64 GBytes – Distributed crossbar switches for interconnect


Networks of Workstations

• Exploit inexpensive Workstations/PCs • Commodity network • The NOW becomes a “distributed memory

multiprocessor”• Workstations send+receive messages • C and Fortran programs with PVM, MPI, etc.

libraries• Programs developed on NOWs are portable to

supercomputers for production runs


“Parallel” Computing

• Concurrent Computing• Distributed Computing• Networked Computing• Parallel Computing


Definition of “Parallel”

• S1 begins at time b1, ends at e1• S2 begins at time b2, ends at e2• S1 || S2

– Begins at min(b1, b2)– Ends at max(e1, e2)– Equiv to S2 || S1


Data Dependency

• x := a + b; y := c + d;• x := a + b || y := c + d;• y := c + d; x := a + b;• X depends on a and b, y depends

on c and d• Assumed a, b, c, d were

independent


Types of Parallelism

• Result• Specialist• Agenda


Perfect Parallelism

• Also called – Embarrassingly Parallel– Result parallel

• Computations that can be subdivided into sets of independent tasks that require little or no communication – Monte Carlo simulations– F(x, y, z)


MW Model

• Manager – Initiates computation – Tracks progress – Handles worker’s requests – Interfaces with user

• Workers – Spawned and terminated by manager – Make requests to manager – Send results to manager


Reduction

• Combine several sub-results into one

• Reduce r1 r2 … rn with op• Becomes r1 op r2 op … op rn


Data Parallelism

• Also called – Domain Decomposition– Specialist

• Same operations performed on many data elements simultaneously – Matrix operations– Compiling several files


Control Parallelism

• Different operations performed simultaneously on different processors

• E.g., Simulating a chemical plant; one processor simulates the preprocessing of chemicals, one simulates reactions in first batch, another simulates refining the products, etc.


Process communication

• Shared Memory• Message Passing


Shared Memory

• Process A writes to a memory location

• Process B reads from that memory location

• Synchronization is crucial• Excellent speed


Shared Memory

• Needs hardware support: – multi-ported memory

• Atomic operations: – Test-and-Set– Semaphores


Shared Memory Semantics: Assumptions

• Global time is available. Discrete increments.• Shared variable s, = vi at ti, i=0,…• Process A: s := v1 at time t1• Assume no other assignment occurred after t1.• Process B reads s at time t and gets value v.


Shared Memory: Semantics

• Value of Shared Variable– v = v1, if t > t1– v = v0, if t < t1– v = ??, if t = t1

• t = t1 +- discrete quantum• Next Update of Shared Variable

– Occurs at t2– t2 = t1 + ?


Condition Variables and Semaphores

• Semaphores– V(s) ::= < s := s + 1 >– P(s) ::= <when s > 0 do s := s – 1>

• Condition variables– C.wait()– C.signal()


Distributed Shared Memory

• A common address space that all the computers in the cluster share.

• Difficult to describe semantics.


Distributed Shared Memory: Issues

• Distributed– Spatially– LAN– WAN

• No global time available


Messages

• Messages are sequences of bytes moving between processes

• The sender and receiver must agree on the type structure of values in the message

• “Marshalling” of data


Message Passing

• Process A sends a data buffer as a message to process B.

• Process B waits for a message from A, and when it arrives copies it into its own local memory.

• No memory shared between A and B.


Message Passing

• Obviously, – Messages cannot be received before they

are sent.– A receiver waits until there is a message.

• Asynchronous– Sender never blocks, even if infinitely many

messages are waiting to be received– Semi-asynchronous is a practical version of

above with large but finite amount of buffering


Message Passing: Point to Point

• Q: send(m, P) – Send message M to process P

• P: recv(x, Q)– Receive message from process P, and

place it in variable x• The message data

– Type of x must match that of m– As if x := m


Broadcast

• One sender, multiple receivers• Not all receivers may receive at

the same time


Types of Sends

• Synchronous• Asynchronous


Synchronous Message Passing

• Sender blocks until receiver is ready to receive.

• Cannot send messages to self.• No buffering.


Message Passing: Speed

• Speed not so good– Sender copies message into system

buffers.– Message travels the network.– Receiver copies message from

system buffers into local memory.– Special virtual memory techniques

help.


Message Passing: Programming

• Less error-prone cf. shared memory


Message Passing: Synchronization

• Synchronous MP: – Sender waits until receiver is ready.– No intermediary buffering


Barrier Synchronization

• Processes wait until “all” arrive


Parallel Software Development

• Algorithmic conversion by compilers


Development of Distributed+Parallel Programs

• New code + algorithms• Old programs rewritten

– in new languages that have distributed and parallel primitives

– With new libraries

• Parallelize legacy code


Conversion of Legacy Software

• Mechanical conversion by software tools

• Reverse engineer its design, and re-code


Automatically parallelizing compilers

Compilers analyze programs and parallelize (usually loops).

Easy to use, but with limited success


OpenMP on Networks of Workstations

• The OpenMP is an API for shared memory architectures.

• User-gives hints as directives to the compiler

• http://www.openmp.org


Message Passing Libraries

• Programmer is responsible for data distribution, synchronizations, and sending and receiving information

• Parallel Virtual Machine (PVM)• Message Passing Interface (MPI)• BSP


BSP: Bulk Synchronous Parallel model

• Divides computation into supersteps• In each superstep a processor can work

on local data and send messages. • At the end of the superstep, a barrier

synchronization takes place and all processors receive the messages which were sent in the previous superstep

• http://www.bsp-worldwide.org/


BSP Library

• Small number of subroutines to implement – process creation, – remote data access, and – bulk synchronization.

• Linked to C, Fortran, … programs


Parallel Languages

• Shared-memory languages• Parallel object-oriented languages • Parallel functional languages• Concurrent logic languages


Tuple Space: Linda

• <v1, v2, …, vk>• Atomic Primitives

– In (t)– Read (t)– Out (t)– Eval (t)

• Host language: e.g., JavaSpaces


Data Parallel Languages

• Data is distributed over the processors as a arrays

• Entire arrays are manipulated:– A(1:100) = B(1:100) + C(1:100)

• Compiler generates parallel code– Fortran 90– High Performance Fortran (HPF)


Parallel Functional Languages

• Erlang http://www.erlang.org/

• SISAL http://www.llnl.gov/sisal/• PCN Argonne

http://www.erlang.org/



http://www.llnl.gov/sisal/


Clusters


Buildings-Full of Workstations

1. Distributed OS have not taken a foot hold.

2. Powerful personal computers are ubiquitous.

3. Mostly idle: more than 90% of the up-time?

4. 100 Mb/s LANs are common. 5. Windows and Linux are the top two OS

in terms of installed base.


Cluster Configurations

• NOW -- Networks of Workstations• COW -- Clusters of Dedicated

Nodes• Clusters of Come-and-Go Nodes• Beowulf clusters


Beowulf

• Collection of compute nodes• Full trust in each other

– Login from one node into another without authentication

– Shared file system subtree

• Dedicated


Close Cluster Configuration

computenode

computenode

computenode

computenode

-Front end

High Speed Network

Service Network

gatewaynode

External Network

computenode

computenode

computenode

computenode

-Front end

High Speed Network

gatewaynode

External Network

FileServernode


Open Cluster Configuration

computenode

computenode

computenode

computenode

computenode

computenode

computenode

computenode

-Front end

External Network

FileServernode

High Speed Network


Interconnection Network

• Most popular: Fast Ethernet• Network topologies

– Mesh– Torus

• Switch v Hub


Software Components

• Operating System– Linux, FreeBSD, …

• Parallel programming– PVM, MPI

• Utilities, …• Open source


Software Structure of PC Cluster

- HIGH SPEED NETWORK

HARDWARELAYER

OS LAYER

HARDWARELAYER

OS LAYER

HARDWARELAYER

OS LAYER

PARALLEL VIRTUAL MACHINE LAYER

Parallel Program Parallel Program Parallel Program


Single System View

• Single system view– Common filesystem structure view fro

m any node– Common accounts on all nodes– Single software installation point

• Benefits– Easy to install and maintain system– Easy to use for users


Installation Steps• Install Operating system• Setup a Single System View

– Shared filesystem– Common accounts – Single software installation point

• Install parallel programming packages s uch as MPI, PVM, BSP

• Install utilities, libraries, and applications


Linux Installation

• Linux has many distributions : Redh at, Caldera, SuSe, Debian, …

• Caldera is easy to install• All above u pgrade with RPM packag

e management• Mandrake and SuSe come with a ve

ry complete set of software


Clusters with Part Time Nodes

• Cycle Stealing: Running of jobs on a workstation that don't belong to the owner.

• Definition of Idleness: E.g., No keyboard and no mouse activity

• Tools/Libraries– Condor– PVM– MPI


Migration of Jobs

• Policies – Immediate-Eviction– Pause-and-Migrate

• Technical Issues– Checkpointing: Preserving the state of

the process so it can be resumed.– Migrating from one architecture to

another


Summary

• Parallel – computers– computation

• Parallel Methods• Communication primitives

– Message Passing– Distributed Shared Memory

• Programming Tools• Cluster configurations

networks of workstations prabhaker mateti wright state university

Documents

interconnect slide

independent slide

cpu slide

s2 s1 slide

c d y

c d x

processors processors

gflops memory