grid computing, b. wilkinson, 20047a.1 computational grids

61
Grid Computing, B. Wilkinson, 2004 7a.1 Computational Grids

Upload: marybeth-marsh

Post on 31-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.1

Computational Grids

Page 2: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.2

Computational Problems

• Problems that have lots of computations and usually lots of data.

Page 3: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.3

Demand for Computational Speed

• Continual demand for greater computational speed from a computer system than is currently possible

• Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems.

• Computations must be completed within a “reasonable” time period.

Page 4: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.4

Grand Challenge Problems

One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable.

Examples• Modeling large DNA structures• Global weather forecasting• Modeling motion of astronomical bodies.

Page 5: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.5

Weather Forecasting

• Atmosphere modeled by dividing it into 3-dimensional cells.

• Calculations of each cell repeated many times to model passage of time.

Page 6: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.6

Global Weather Forecasting Example

• Suppose global atmosphere divided into cells of size 1 mile 1 mile 1 mile to a height of 10 miles - about 5 108 cells.

• Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary.

• To forecast weather over 7 day period using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days.

Page 7: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.7

Modeling Motion of Astronomical Bodies

• Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body.

• With N bodies, N - 1 forces to calculate for each

body, or ≈ N2 calculations.

(N log2 N for an efficient approximate algorithm.)

• After determining new positions of bodies, calculations repeated.

Page 8: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.8

• A galaxy might have, say, 1011 stars.

• Even if each calculation done in 1 ms (extremely optimistic figure), it takes almost a year for one iteration using the N log2 N algorithm.

• 100 years for 100 iterations. Typically require millions of iterations.

Page 9: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.9

Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).

Page 10: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.10

High Performance Computing (HPC)

• Traditionally, achieved by using the multiple computers together - parallel computing.

• Simple idea! -- Using multiple computers (or processors) simultaneously should be able can solve the problem faster than a single computer.

Page 11: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.11

Using multiple computers or processors

• Key concept - dividing problem into parts that can be computed simultaneously.

• Parallel programming - programming a computing platform consisting of more than one processor or computer.

• Concept very old (50 years).

Page 12: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.12

High Performance Computing

• Long History:– Multiprocessor system of various

types (1950’s onwards)

– Supercomputers (1960s-80’s)

– Cluster computing (1990’s)

– Grid computing (2000’s) ??Maybe, but let’s first look at how to achieve HPC.

Page 13: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.13

Speedup Factor

ts is execution time on a single processor

tp is execution time on a multiprocessor.

S(p) gives increase in speed by using multiprocessor.

Best sequential algorithm for single processor. Parallel algorithm usually different.

S(p) = Execution time using one processor (best sequential algorithm)Execution time using a multiprocessor with p processors

=ts

tp

Page 14: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.14

Maximum Speedup

Maximum speedup is usually p with p processors (linear speedup).

Possible to get superlinear speedup (greater than p) but usually a specific reason such as:

– Extra memory in multiprocessor system– Non-deterministic algorithm

Page 15: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.15

Maximum Speedup Amdahl’s law

Serial section Parallelizable sections

(a) One processor

(b) Multipleprocessors

fts (1 - f)ts

ts

(1 - f)ts /ptp

p processors

Page 16: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.16

Speedup factor is given by:

This equation is known as Amdahl’s law

S(p) = ts p=fts + (1 − f)ts/p 1 + (p − 1)f

Page 17: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.17

Speedup against number of processors

Even with infinite number of processors, max. speedup limited to 1/f .

Example: With only 5% of computation being serial, max. speedup 20, irrespective of number of

processors.

4

8

12

16

20

4 8 12 16 20

f = 20%

f = 10%

f = 5%

f = 0%

Number of processors, p

Page 18: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.18

Superlinear Speedup Example Searching

(a) Searching each sub-space sequentially

tsts/p

Start Time

Δt

Solution foundxts/p

-Sub spacesearch

x indeterminate

Page 19: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.19

(b) Searching each sub-space in parallel

Solution found

Δt

Page 20: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.20

Question

What is the speed-up now?

Page 21: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.21

Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e.

Sp()

p 1–p

------------⎝⎠⎛⎞ ts tΔ+×

tΔ---------------------------------------- ∞→=

asΔ t tends to zero

Page 22: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.22

Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e.

Actual speed-up depends upon which subspace holds solution but could be extremely large.

Sp() tΔtΔ

-----1==

Page 23: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.23

Types of Parallel Computers

Two principal types:

1. Single computer containing multiple processors - main memory is shared, hence called “Shared memory multiprocessor”

2. Multiple computer system

Page 24: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.24

Conventional ComputerConsists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address within a single memory space.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 25: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.25

Shared Memory Multiprocessor• Extend single processor model - multiple processors

connected to multiple memory modules• Each processor can access any memory module:

Processors

Interconnectionnetwork

Memory modulesOneaddressspace

Page 26: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.26

Examples

• Dual Pentiums

• Quad Pentiums

Page 27: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.27

Programming Shared Memory Multiprocessors

• Threads - programmer decomposes program into parallel sequences (threads), each being able to access variables declared outside threads. Example: Pthreads

• Use sequential programming language with preprocessor compiler directives, constructs, or syntax to declare shared variables and specify parallelism. Examples: OpenMP (an industry standard), UPC (Unified Parallel C) -- needs compilers.

Page 28: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.28

• Parallel programming language with syntax to express parallelism. Compiler creates executable code -- not now common.

• Use parallelizing compiler to convert regular sequential language programs into parallel executable code - also not now common.

Page 29: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.29

Multiple ComputersMessage-passing multicomputer

Complete computers connected through and interconnection network:

Processor

Interconnectionnetwork

Local

Computers

Messages

memory

Page 30: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.30

Networked Computers as a Computing Platform

• Became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in 1990’s.

• Several early projects. Notable:– Berkeley NOW (network of workstations)

project. – NASA Beowulf project.

Page 31: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.31

Key Hardware Advantages:

• Very high performance workstations and PCs readily available at low cost.

• Latest processors can easily be incorporated into the system as they become available.

Page 32: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.32

Programming Clusters

• Usually based upon explicit message-passing.

• Common approach -- a set of user-level libraries for message passing. Example:– Parallel Virtual Machine (PVM) - late 1980’s.

Became very popular in mid 1990’s. – Message-Passing Interface (MPI) - standard

defined in 1990’s and now dominant.

Page 33: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.33

Beowulf Clusters

• Name given to a group of interconnected “commodity” computers designed to achieve high performance with low cost.

• Typically using commodity interconnects (high-speed Ethernet).

• Typically Linux OS.

Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

Page 34: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.34

Cluster Interconnects

• Originally fast Ethernet on low cost clusters• Gigabit Ethernet - easy upgrade path

More Specialized/Higher Performance• Myrinet - 2.4 Gbits/sec - disadvantage: single vendor• Infiniband - may be important as Infiniband

interfaces may be integrated on next generation PCs

Page 35: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.35

Dedicated cluster with a master node

Dedicated Cluster User

Switch

Master node

Compute nodes

Up link

2nd Ethernetinterface

External network

Page 36: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.36

WCU Department of Mathematics and CS

leo I cluster(now dismandled)

Front-end

16 ×16 100 /Mb s

8 Compute nodes

Internet

F or expansion

LEO I CLUSTER

SOFTWAREROCKSREDHA T LINUXMPICHGLOB 2.4US

P entium III500 , 384 Mhz Mb yte

main memory

P entium IIIs500 , 128 MhzMb yte mainmemory

Ether net switch

Being replaced with Pentium IVs and Gigabit Ethernet.

Page 37: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.37

Message-Passing Programming using User-level Message Passing Libraries

Two primary mechanisms needed:

1. A method of creating separate processes for execution on different computers

2. A method of sending and receiving messages

Page 38: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.38

Multiple program, multiple data model(MPMD)

Sourcefile

Executables

Processor 0 Processor p - 1

Compile to suitprocessor

Sourcefile

Page 39: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.39

Single Program Multiple Data Model(SPMD)

• Different processes merged into one program.

• Control statements select different parts for

each processor to execute.

• All executables started together - static process creation

Page 40: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.40

Single Program Multiple Data Model(SPMD)

Sourcefile

Executables

Processor 0 Processor p − 1

Compile to suitprocessor

Basic MPI way

Page 41: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.41

Multiple Program Multiple Data Model(MPMD)

• Separate programs for each processor.

• One processor executes master process.

• Other processes started from within master process - dynamic process creation.

Page 42: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.42

Multiple Program Multiple Data Model(MPMD)

Process 1

Process 2spawn();

Time

Start executionof process 2

Page 43: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.43

“Point-to-point” send and receive routines

Process 1 Process 2

send(&x, 2);

recv(&y, 1);

x y

Movementof data

Generic syntax (actual formats later)

Passing a message between processes using send() and recv() library calls:

Page 44: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.44

Synchronous Message PassingRoutines that return when message transfer completed.

Synchronous send routine• Waits until complete message can be accepted by

the receiving process before sending the message.

Synchronous receive routine• Waits until the message it is expecting arrives.

Page 45: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.45

Synchronous send() and recv() using 3-way protocol

Process 1 Process 2

send();

recv();Suspend

Time

processAcknowledgment

MessageBoth processescontinue

(a) When send() occurs before recv()

Process 1 Process 2

recv();

send();Suspend

Time

process

Acknowledgment

MessageBoth processescontinue

(b) When recv() occurs before send()

Request to send

Request to send

Page 46: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.46

• Synchronous routines intrinsically perform two actions:– They transfer data and – They synchronize processes.

Page 47: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.47

Asynchronous Message Passing• Do not wait for actions to complete before

returning.

• More than one version depending upon semantics for returning.

• Usually require local storage for messages.

• They do not synchronize processes and allow processes to move forward sooner. Must be used with care.

Page 48: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.48

MPI Definitions of Blocking and Non-Blocking

• Blocking - return after their local actions complete, though message transfer may not have been completed.

• Non-blocking - return immediately.

Assumes data storage not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this.

Page 49: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.49

How message-passing routines return before transfer completed

Process 1 Process 2

send();

recv();

Message buffer

Readmessage buffer

Continueprocess

Time

Message buffer needed between source and destination to hold message:

Page 50: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.50

Asynchronous (blocking) routines changing to synchronous routines

• Buffers only of finite length and a point could be reached when send routine held up because all available buffer space exhausted.

• Then, send routine will wait until storage becomes re-available - i.e then routine behaves as a synchronous routine.

Page 51: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.51

Message Tag

• Used to differentiate between different types of messages being sent.

• Message tag is carried within message.

Page 52: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.52

Message Tag Example

Process 1 Process 2

send(&x,2,5);

recv(&y,1,5);

x y

Movementof data

W aits for a message from process 1 with a tag of 5

To send a data, x, with message tag 5 from process, 1, to destination process, 2, and assign to y:

Page 53: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.53

Wild Card

• If message tag matching not required, wild card message tag used.

• Then, recv() will match with any send().

Page 54: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.54

Collective Message Passing Routines

Have routines that send message(s) to a group of processes or receive message(s) from a group of processes

Higher efficiency than separate point-to-point routines although not absolutely necessary.

Page 55: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.55

BroadcastSending same message to all processes concerned with problem.

bcast();

buf

bcast();

data

bcast();

datadata

Process 0 Process p − 1 1Process

Action

Code

MPI form

Page 56: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.56

Scatter

scatter();

buf

scatter();

data

scatter();

datadata

Process 0 Process p − 1 1Process

Action

Code

MPI form

Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process.

Page 57: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.57

Gather

gather();

buf

gather();

data

gather();

datadata

Process 0 Process p − 1 1Process

Action

Code

MPI form

Having one process collect individual values from set of processes.

Page 58: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.58

Reduce

reduce();

buf

reduce();

data

reduce();

datadata

Process 0 Process p − 1 1Process

+

Action

Code

MPI form

Gather operation combined with arithmetic/logical operation.Example: Values gathered and added together:

Page 59: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.59

Grid Computing

• A grid is a form of multiple computer system.

• For solving computational problems, it could be viewed as the next step after cluster computing, and the same programming techniques used.

Why is this not necessarily true?

Page 60: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.60

• VERY expensive, sending data across network costs millions of cycles

• Bandwidth shared with other users

• Links unreliable

Page 61: Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

Grid Computing, B. Wilkinson, 2004 7a.61

Computational Strategies

• As a computing platform, a grid favors situations with absolute minimum communication between computers.

• Next class will look at these strategies and details of MPI programming.