scalable parallel computing

22
SCALABLE PARALLEL COMPUTING CENG 546 Dr. Esma Yıldırım

Upload: demi

Post on 24-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Scalable Parallel ComputIng. CENG 546 Dr. Esma Yıldırım. What is a computing cluster?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scalable Parallel  ComputIng

SCALABLE PARALLEL COMPUTING

CENG 546Dr. Esma Yıldırım

Page 2: Scalable Parallel  ComputIng

Copyright © 2012, Elsevier Inc. All rights reserved. 2 - 2

What is a computing cluster? A computing cluster consists of a collection

of interconnected stand-alone/complete computers, which can cooperatively working together as a single, integrated computing resource. Cluster explores parallelism at job level and distributed computing with higher availability.

A typical cluster: Merging multiple system images to a SSI

(single-system image ) at certain functional levels. Low latency communication protocols applied Loosely coupled than an SMP with a SSI

Page 3: Scalable Parallel  ComputIng

3

What is a Commodity Cluster It is a distributed/parallel computing system It is constructed entirely from commodity subsystems

All subcomponents can be acquired commercially and separately Computing elements (nodes) are employed as fully operational

standalone mainstream systems Two major subsystems:

Compute nodes System area network (SAN)

Employs industry standard interfaces for integration Uses industry standard software for majority of services Incorporates additional middleware for interoperability among

elements Uses software for coordinated programming of elements in

parallel

Page 4: Scalable Parallel  ComputIng

Copyright © 2012, Elsevier Inc. All rights reserved. 2 - 4

Multicomputer Clusters: Cluster: A network of computers supported by middleware and interacting by message passing

PC Cluster (Most Linux clusters)

Workstation Cluster (NOW, COW)

Server cluster or Server Farm

Cluster of SMPs or ccNUMA systems

Cluster-structured massively parallel processors (MPP) – about 85% of the top-500 systems

Page 5: Scalable Parallel  ComputIng

Copyright © 2012, Elsevier Inc. All rights reserved. 2 - 5

Page 6: Scalable Parallel  ComputIng

Copyright © 2012, Elsevier Inc. All rights reserved. 2 - 6

Operational Benefits of Clustering

System availability (HA) : Cluster offers inherent high system availability due to the redundancy of hardware, operating systems, and applications.

Hardware Fault Tolerance: Cluster has some degree of redundancy in most system components including both hardware and software modules.

OS and application reliability : Run multiple copies of the OS and applications, and through this redundancy

Scalability : Adding servers to a cluster or adding more clusters to a network as the application need arises.

High Performance : Running cluster enabled programs to yield higher throughput.

Page 7: Scalable Parallel  ComputIng

7

Scalability The ability to deliver proportionally greater sustained performance

through increased system resources Strong Scaling

Fixed size application problem Application size remains constant with increase in system size

Weak Scaling Variable size application problem Application size scales proportionally with system size

Capability computing in most pure form: strong scaling Marketing claims tend toward this class

Capacity computing Throughput computing Includes job-stream workloads In most simple form: weak scaling

Cooperative computing Interacting and coordinating concurrent processes Not a widely used term Also: “coordinated computing”

Page 8: Scalable Parallel  ComputIng

8

Performance Metrics Peak floating point operations per second (flops) Peak instructions per second (ips) Sustained throughput

Average performance over a period of time flops, Mflops, Gflops, Tflops, Pflops flops, Megaflops, Gigaflops, Teraflops, Petaflops ips, Mips, ops, Mops …

Cycles per instruction cpi Alternatively: instructions per cycle, ipc

Memory access latency cycles per second

Memory access bandwidth bytes per second (Bps) bits per second (bps) or Gigabytes per second, GBps, GB/s

Page 9: Scalable Parallel  ComputIng

Basic Uni-processor Architecture elements

I/O Interface Memory Interface Cache hierarchy Register Sets Control Execution pipeline Arithmetic Logic

Units

9

Page 10: Scalable Parallel  ComputIng

10

Multiprocessor A general class of system Integrates multiple processors in to an interconnected ensemble MIMD: Multiple Instruction Stream Multiple Data Stream Different memory models

Distributed memory Nodes support separate address spaces

Shared memory Symmetric multiprocessor UMA – uniform memory access Cache coherent

Distributed shared memory NUMA – non uniform memory access Cache coherent

PGAS Partitioned global address space NUMA Not cache coherence

Hybrid : Ensemble of distributed shared memory nodes Massively Parallel Processor, MPP

Page 11: Scalable Parallel  ComputIng

11

Massively Parallel Processor MPP General class of large scale multiprocessor Represents largest systems

IBM BG/L Cray XT3

Distinguished by memory strategy Distributed memory Distributed shared memory

Cache coherent Partitioned global address space

Custom interconnect network Potentially heterogeneous

May incorporate accelerator to boost peak performance

Page 12: Scalable Parallel  ComputIng

DM - MPP

12

Page 13: Scalable Parallel  ComputIng

13

IBM Blue Gene/L

Page 14: Scalable Parallel  ComputIng

Copyright © 2012, Elsevier Inc. All rights reserved. 2 - 14

IBM BlueGene/L Supercomputer: The World Fastest Message-Passing MPP built in 2005

Built jointly by IBM and LLNL teams and funded by US DoE ASCI Research Program

Page 15: Scalable Parallel  ComputIng

15

Symmetric Multiprocessor(SMP) Building block for large MPP Multiple processors

2 to 32 processors Now Multicore

Uniform Memory Access (UMA) shared memory Every processor has equal access in equal time to all

banks of the main memory Cache coherent

Multiple copies of variable maintained consistent by hardware

Page 16: Scalable Parallel  ComputIng

SMP - UMA

16

Page 17: Scalable Parallel  ComputIng

17

SMP Node Diagram

USBPeripherals

JTAG

MPL1L2

MPL1L2

L3

MPL1L2

MPL1L2

L3

M1 M1 Mn-1

Controller

S

S

NIC NIC

Legend : MP : MicroProcessorL1,L2,L3 : CachesM1.. : Memory BanksS : StorageNIC : Network Interface Card

Ethernet

PCI-e

Page 18: Scalable Parallel  ComputIng

DSM - NUMA

18

Distributed Shared Memory- Non-uniform memory access

Page 19: Scalable Parallel  ComputIng

19

Commodity Clusters vs “Constellations”

16X16X

16X 16X

System Area Network

64 Processor Constellation

64 Processor Commodity Cluster

4X

4X

4X

4X

4X 4X 4X 4X

4X

4X

4X

4X

4X 4X 4X 4X

System Area Network

• An ensemble of N nodes each comprising p computing elements

• The p elements are tightly bound shared memory (e.g., smp, dsm)

• The N nodes are loosely coupled, i.e., distributed memory

• p is greater than N• Distinction is which layer gives us the

most power through parallelism

Page 20: Scalable Parallel  ComputIng

System Stack

20

Science Problems : Environmental Modeling, Physics, Computational Chemistry, etc.

Application : Coastal Modeling, Black hole simulations, etc.

Algorithms : PDE, Gaussian Elimination, 12 Dwarves, etc.

Program Source Code

Programming Languages: Fortran, C, C++ , UPC, Fortress, X10, etc.

Compilers : Intel C/C++/Fortran Compilers, PGI C/C++/Fortran, IBM XLC, XLC++, XLF, etc.

Runtime Systems : Java Runtime, MPI etc.

Operating Systems : Linux, Unix, AIX etc.

Systems Architecture : Vector, SIMD array, MPP, Commodity Cluster

Firmware : Motherboard chipset, BIOS, NIC drivers,

Microarchitectures : Intel/AMD x86, SUN SPARC, IBM Power 5/6

Logic Design : RTL

Circuit Design : ASIC, FPGA, Custom VLSI

Device Technology : NMOS, CMOS, TTL, Optical

Mod

el o

f Com

puta

tion

Page 21: Scalable Parallel  ComputIng

Historical Top-500 List

21

Page 22: Scalable Parallel  ComputIng

22

Clusters Dominate Top-500