parallel computing and parallel computers. home work assignment 1. write few paragraphs (max two...

65
Parallel Computing and Parallel Computers

Upload: donna-henderson

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Parallel Computing and Parallel Computers

Page 2: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Home work assignment 1.

• Write few paragraphs (max two page) about yourself. Currently what going on in your life, what you like and dislike, your family etc. You should have one paragraph about your expectation from this course.

• Due date: 09/04/09 (Friday in class).

Page 3: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Parallel Computing

• Using more than one computer, or a computer with more than one processor, to solve a problem.

Motives• Usually faster computation - very simple idea - that n

computers operating simultaneously can achieve the result n times faster - it will not be n times faster for various reasons.

• Other motives include: fault tolerance, larger amount of memory available, ...

Page 4: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Demand for Computational Speed

• Continual demand for greater computational speed from a computer system than is currently possible

• Areas requiring great computational speed include:– Numerical modeling– Simulation of scientific and engineering problems.

• Computations need to be completed within a “reasonable” time period.

Page 5: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Grand Challenge Problems

Ones that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable.

Examples

• Modeling large DNA structures• Global weather forecasting• Modeling motion of astronomical bodies.

Page 6: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Weather Forecasting

• Atmosphere modeled by dividing it into 3-dimensional cells.

• Calculations of each cell repeated many times to model passage of time.

Page 7: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Global Weather Forecasting Example

• Suppose whole global atmosphere divided into cells of size 1 mile 1 mile 1 mile to a height of 10 miles (10 cells high) - about 5 108 cells.

• Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary.

• To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days.

• To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4 1012 floating point operations/sec).

Page 8: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Modeling Motion of Astronomical Bodies

• Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body.

• With N bodies, N - 1 forces to calculate for each body,

or approx. N2 calculations. (N log2 N for an efficient approx. algorithm.)

• After determining new positions of bodies, calculations repeated.

Page 9: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

N-body

• A galaxy might have, say, 1011 stars.

• Even if each calculation done in 1 microsecond (optimistic figure), it takes 109 years for one iteration using N2 algorithm and almost a year for one iteration using an efficient N log2 N approximate algorithm.

Page 10: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

• Parallel programming – programming parallel computers

– Has been around for more than 40 years.

Page 11: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Types of Parallel Computers

Two principal types:

• Shared memory multiprocessor

• Distributed memory multicomputer

Page 12: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Shared Memory Multiprocessor

Page 13: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Conventional ComputerConsists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 14: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Shared Memory Multiprocessor SystemNatural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module :

Processors

Processor-memory Interconnections

Memory moduleOneaddressspace

Page 15: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Simplistic view of a small shared memory multiprocessor

Examples:• Dual Pentiums• Quad Pentiums

Processors Shared memory

Bus

Page 16: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Real computer system have cache memory between the main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache.

Example Quad Shared Memory Multiprocessor

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Memory controller

Memory

I/O interface

I/O bus

Processor/memorybus

Shared memory

Page 17: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

“Recent” innovation

• Dual-core and multi-core processors– Two or more independent processors in one package

• Actually an old idea but not put into wide practice until recently.

• Since L1 cache is usually inside package and L2 cache outside package, dual-/multi-core processors usually share L2 cache.

Page 18: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

1b.18

Single quad core shared memory multiprocessor

L2 Cache

Memory controller

MemoryShared memory

Chip

ProcessorL1 cache

ProcessorL1 cache

ProcessorL1 cache

ProcessorL1 cache

Page 19: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

1b.19

Examples• Intel:

– Core Dual processors -- Two processors in one package sharing a common L2 Cache. 2005-2006

– Intel Core 2 family dual cores, with quad core from Nov 2006 onwards

– Core i7 processors replacing Core 2 family - Quad core Nov 2008

– Intel Teraflops Research Chip (Polaris), a 3.16 GHz, 80-core processor prototype.

• Xbox 360 game console -- triple core PowerPC microprocessor.

• PlayStation 3 Cell processor -- 9 core design.References and more information: http://www.intel.com/products/processor/core2duo/index.htmhttp://en.wikipedia.org/wiki/Dual_core

Page 20: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

1b.20

Multiple quad-core multiprocessors

Memory controller

MemoryShared memory

L2 Cache

possible L3 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Processor

L1 cache

Page 21: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Programming Shared Memory Multiprocessors

Several possible ways

1. Use Threads - programmer decomposes program into individual parallel sequences, (threads), each being able to access shared variables declared outside threads.

Example Pthreads

Page 22: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

2. Use library functions and preprocessor compiler directives with a sequential programming language to declare shared variables and specify parallelism.

Example OpenMP - industry standard. Consists of library functions, compiler directives, and environment variables - needs OpenMP compiler

Page 23: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

3. Use a modified sequential programming language -- added syntax to declare shared variables and specify parallelism.

Example UPC (Unified Parallel C) - needs a UPC compiler.

Page 24: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

4. Use a specially designed parallel programming language -- with syntax to express parallelism. Compiler automatically creates executable code for each processor (not common now).

Page 25: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

5. Use a regular sequential programming language such as C and ask parallelizing compiler to convert it into parallel executable code. Also not common now.

Page 26: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Message-Passing Multicomputer

Complete computers connected through an interconnection network:

Processor

Interconnectionnetwork

Local

Computers

Messages

memory

Page 27: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Interconnection Networks

• Limited and exhaustive interconnections• 2- and 3-dimensional meshes• Hypercube (not common)• Using Switches:

– Crossbar– Trees– Multistage interconnection networks

Page 28: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Two-dimensional array (mesh)

Also three-dimensional - used in some large high performance systems.

LinksComputer/processor

Page 29: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Three-dimensional hypercube

000 001

010 011

100

110

101

111

Page 30: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Four-dimensional hypercube

0000 0001

0010 0011

0100

0110

0101

0111

1000 1001

1010 1011

1100

1110

1101

1111

Page 31: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Hypercube routing

To go from node x1x2x3 to node y1y2y3

• Go to y1x2x3 if y1 and x1 different• Then go to y1y2x3 if y2 and x2 different• Then go to y1y2y3 if y3 and x3 different

Example101 to 010

101 -> 001 -> 011->010

Page 32: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Question

• What is the maximum number of steps to route a message in an n-dimensional hypercube?

Page 33: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Crossbar switch

SwitchesProcessors

Memories

Page 34: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Tree

Switchelement

Root

Links

Processors

Page 35: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Multistage Interconnection NetworkExample: Omega network

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Inputs Outputs

2 x 2 switch elements(straight-through or

crossover connections)

Page 36: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Routing• Destination address gives which way to

go, starting with (say) most significant bit:

– If bit in destination address is a 1 take top switch output

– If bit in destination address is a 0, take bottom switch output

Page 37: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Networked Computers as a Computing Platform

• A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990’s.

• Several early projects. Notable:

– Berkeley NOW (network of workstations) project.

– NASA Beowulf project.

Page 38: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Key advantages:

• Very high performance workstations and PCs readily available at low cost.

• The latest processors can easily be incorporated into the system as they become available.

• Existing software can be used or modified.

Page 39: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Beowulf Clusters

• A group of interconnected “commodity” computers achieving high performance with low cost.

• Typically using commodity interconnects - high speed Ethernet, and Linux OS.

* Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

Page 40: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Cluster Interconnects

• Originally fast Ethernet on low cost clusters• Gigabit Ethernet - easy upgrade path

More Specialized/Higher Performance• Myrinet - 2.4 Gbits/sec - disadvantage: single vendor• cLan• SCI (Scalable Coherent Interface)• QNet• Infiniband - may be important as infininband interfaces

may be integrated on next generation PCs

Page 41: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Dedicated cluster with a master node and compute nodes

User

Master node

Compute nodes

Dedicated Cluster

Ethernet interface

Switch

External network

Computers

Local network

Page 42: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Software Tools for Clusters

• Based upon Message Passing Parallel Programming:

• Parallel Virtual Machine (PVM) - developed in late 1980’s. Became very popular.

• Message-Passing Interface (MPI) - standard defined in 1990s.

• Both provide a set of user-level libraries for message passing. Use with regular programming languages (C, C++, ...).

Page 43: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Flynn Classifications• Flynn (1966) classification of computer

architectures– Based upon the number of concurrent instruction

and data streams available in the architecture:

• Single Instruction, Single Data stream-SISD • Single Instruction, Multiple Data streams-SIMD • Multiple Instruction, Single Data stream-MISD• Multiple Instruction, Multiple Data stream-MISD

Page 44: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Performance of Parallel Algorithm

Page 45: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Performance Metrics

• Execution time

• Speed up

• Efficiency

• Cost

• Parallel Overhead`

• Scalability

1a.45

Page 46: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Speedup Factor

where ts is execution time on a single processor and tp is execution time on a multiprocessor.

S(p,n) gives increase in speed by using multiprocessor.

The speedup is defined with respect to the best sequential algorithm! If it were defined via some ts, any speedup could be achieved by comparing against a slow sequential algorithm. Underlying algorithm for parallel implementation might be (and is usually) different.

S(p,n) = Execution time using one processor (best sequential algorithm)

Execution time using a multiprocessor with p processors

ts(best)

tp

Page 47: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Speedup factor can also be cast in terms of computational steps:

Can also extend time complexity to parallel computations.

S(p,n) = Number of computational steps using one processor

Number of parallel computational steps with p processors

Page 48: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Speedup Bound• The speedup S(p,n) fulfils the inequality

S <= p

• Proof– Suppose we have S(p,n) > p, then simulating the

parallel algorithm on one processor would need time pTP(p,n) and PTP(p,n) = P ts/S(p,n) < ts(best) which is a contradiction to the denition of ts(best)

1a.48

Page 49: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Maximum Speedup

Maximum speedup is usually p with p processors (linear speedup).

Possible to get superlinear speedup (greater than p) but usually a specific reason such as:

• Extra memory in multiprocessor system• Nondeterministic algorithm

Page 50: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Superlinear Speedup example - Searching

(a) Searching each sub-space sequentially

tsts/p

Start Time

t

Solution foundxts/p

Sub-spacesearch

x indeterminate

Page 51: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

(b) Searching each sub-space in parallel

Solution found

t

Page 52: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Speed-up then given by

S(p)xtsp

t+

t=

Page 53: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e.

S(p)

p 1–p

ts t+

t=

  as t tends to zero

Page 54: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e.

Actual speed-up depends upon which subspace holds solution but could be extremely large.

S(p) = tt

= 1

Page 55: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Reasons for non-optimal speedup

• Following are the main reasons for non-optimal speedup:– Load imbalance: Not every processor has the

same amount of work to do.– Sequential part: Not all operations of the

sequential algorithm can be processes in parallel.– Communication overhead: Depends relative cost

of computation and communication– Additional operations: The optimal sequential

algorithm cannot be parallelized directly and must be replaced by a slower but parallelizable algorithm.

Page 56: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Cost

• The cost of a parallel algorithm is defined as C(p,n) = ptp(p,n): In contrast to the previous numbers it is not dimensionless

• An algorithm is called cost optimal if its cost is proportional to ts(best).

– Then its efficiency E(p,n) = ts(best)=C(p,n) is constant

Page 57: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Efficiency

• The efficiency of a parallel algorithm is defined as E = ts/ptp

• Efficiency is bounded by the following equation– E <= 1

• Interpretation: E.p processors are effectively working on the solution of the problem, (1 - E).P are overhead.

1a.57

Page 58: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Overhead

• Total Parallel overhead– Total time collectively spent by all the

processing elements over and above that required by fastest known sequential algorithm for solving the same problem.

• to = ptp - ts

– Overhead is an increasing function of p because of the serial component of the problem.

1a.58

Page 59: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Scalability• Scalability is the ability of a parallel

algorithm to use an increasing number of processors: How does S(p, n) behave with p?

• S(p,n) has two arguments. What about n?

• We will consider several different choices.– Fixed size scaling– Gustafson scaling– Iso-efficient scaling

1a.59

Page 60: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Fixed Size Scaling

• Very simple: Choose problem size (n) to be fixed. Equivalently, we can fix the sequential execution time.

• Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an algorithm relative to the serial algorithm, under the assumption that the problem size remains the same when parallelized.

1a.60

Page 61: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Maximum Speedup Amdahl’s law

Serial section Parallelizable sections

(a) One processor

(b) Multipleprocessors

fts (1 - f)ts

ts

(1 - f)ts /ptp

p processors

Page 62: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

1a.62

Speedup factor is given by:

This equation is known as Amdahl’s law

S(p) ts p

fts (1 f )ts /p 1 (p 1)f

Page 63: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Speedup against number of processors

4

8

12

16

20

4 8 12 16 20

f = 20%

f = 10%

f = 5%

f = 0%

Number of processors, p

Page 64: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Even with infinite number of processors, maximum speedup limited to 1/f.

ExampleWith only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.

Page 65: Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your

Gustafson Scaling• Amdahl's law is based on fixed problem

size.

• Gustafson assumed that the parallel execution time is fixed – as the system size is increased the problem size is also increased.

• The resulting speed up factor is numerically different from Amdahl’s speed up factor and is called scaled speed up.

1a.65