architecture or parallel computers csc / ece 506 summer 2006 introduction / overview

21
Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Introduction / Overview 5/22/2006 Dr Steve Hunter

Upload: teige

Post on 10-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Introduction / Overview. 5/22/2006 Dr Steve Hunter. Architecture of Parallel Computers. Taught jointly by Dr Ed Gehringer and Dr Steve Hunter Course days Monday 4:00 – 6:45 Wednesday 4:00 – 5:15 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

Architecture or Parallel ComputersCSC / ECE 506

Summer 2006

Introduction / Overview

5/22/2006

Dr Steve Hunter

Page 2: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 2Arch of Parallel Computers

Architecture of Parallel Computers

• Taught jointly by Dr Ed Gehringer and Dr Steve Hunter

• Course days – Monday 4:00 – 6:45– Wednesday 4:00 – 5:15

• Goal: Understand the interaction of hardware and software with respect to parallel systems design and implementation.

• Textbook “Parallel Computer Architecture”, by Culler and Singh

• Selected papers possible

Page 3: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 3Arch of Parallel Computers

Architecture of Parallel Computers

• Steve’s Info:– NCSU Adjunct Professor– IBM Corporation– Website: http://www.ee.duke.edu/~shunter/– email: [email protected]

• Academic– Auburn University BSEE– NC State University MSEE– Duke University PhD

• IBM Corporation– IBM Networking Division 14 years– Systems and Technology Group 8 years

• Areas of Interest– Systems and Network Architecture and Technology– Computer and Network Performance and Dependability– Server Clustering and Software Dependability

Page 4: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 4Arch of Parallel Computers

Course Outline (Tentative)  Mon. 4:00 Mon. 5:30 Wed. 4:00

5/22-5/24 1. Overview of parallel computation (1)

2. Message-passing and data-parallel models (3-5) 3. Steps in parallelization (6)

5/29-5/31 Memorial Day   4. Parallelizing the Ocean application (7)

6/5-6/7 5. Data-parallel algorithms (8) 6. Cache organization (10)

7. The cache-coherence problem/interleaved memory (12)

6/12-6/14

8. Invalidation-based cache-coherence protocols (13)

9. Update-based coherence protocols and perf.(14) Test 1

6/19-6/21 10. Scalable multiprocessors (16)

11. Realizing programming models in scalable systems (17)

12. Design space for communication architectures

6/26-6/'28 13. Scalable cache coherence (19)

14. Directory-protocol correctness and performance (20)  

7/'3-7/5 Independence Day  

15. The Silicon Graphics S2MP architecture (21)

7/10-7/12 16. Extending cache coherence (15) 17. Open MPI Test 2

7/17-7/19 18. Open RDMA, open fabrics 19. Memory consistency (22)

20. Relaxed memory-consistency models (23)

7/24-7/26

21. Interconnection network topologies (25)

22. Routing in interconnection networks (26) 23. Switch design (27)

7/31-8/2      

8/7-8/9 Final exam    

http://courses.ncsu.edu/csc506/lec/052/lectures/syllabus.html

Page 5: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 5Arch of Parallel Computers

What is Parallel Computer Architecture?• A Parallel Computer is a collection of processing elements that

cooperate to solve large problems fast

• Some broad issues:– Resource Allocation:

» how large a collection? » how powerful are the elements?» how much memory?

– Data access, Communication and Synchronization» how do the elements cooperate and communicate?» how are data transmitted between processors?» what are the abstractions and primitives for cooperation?

– Performance and Scalability» how does it all translate into performance?» how does it scale?

Page 6: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 6Arch of Parallel Computers

Historical Perspective• Parallel computing was represented by competing models and

corresponding unique architectures, no clear path for growth

• Competing Methods– Dataflow– Systolic Arrays– SIMD (bit serial)– Shared Memory– Message passing

• Confusion occurs over which model to use paralyzed parallel software development

– Section 1.2 shows several architectures.– Shared-Memory Multiprocessors

» Bus-based; Crossbar-based; * MIN-based– Message Passing Machines (Hypercube)

» IBM SP2 Architecture

Page 7: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 7Arch of Parallel Computers

Why Study Parallel Computer Architecture?

• Role of a computer architect: – To design and engineer the various levels of a computer system to

maximize performance and programmability within limits of technology and cost.

• Parallelism:– Provides alternative to faster clock for performance– Applies at all levels of system design– Is a fascinating perspective from which to view architecture– Traditionally central in information processing elements in the same

locality– However, greater networking bandwidth is expanding parallelism

over greater distances.

Page 8: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 8Arch of Parallel Computers

Parallel Computation: Why and Why Not?

• Pros– Performance– Cost-effectiveness (commodity parts)– Smooth upgrade path– Fault Tolerance

• Cons– Difficult to parallelize applications– Requires automatic parallelization or parallel program development– Software!

Page 9: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 9Arch of Parallel Computers

Is Parallel Computing Inevitable?

• Application demands: (the need for computing cycles)– Petroleum (reservoir analysis)– Automotive (crash simulation, drag analysis, combustion efficiency)– Aeronautics (airflow analysis, engine efficiency, structural

mechanics, electromagnetism)– Computer-aided design– Pharmaceuticals (molecular modeling)– Visualization

» in all of the above» entertainment (films like Toy Story, The Hulk)» architecture (walk-throughs and rendering)

– Financial modeling (yield and derivative analysis)– Search Engines– etc.

Page 10: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 10Arch of Parallel Computers

Application Trends

• Application demand for performance fuels advances in hardware, which enables new applications, which...

– Cycle drives exponential increase in microprocessor performance– Drives parallel architecture harder

» most demanding applications

• Range of performance demands– Need range of system performance with progressively increasing cost

New ApplicationsMore Performance

Page 11: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 11Arch of Parallel Computers

Speedup

• Speedup (p processors) =

• For a fixed problem size (input data set), performance = 1/time

• Speedup fixed problem (p processors) =

Performance (p processors)

Performance (1 processor)

Time (1 processor)

Time (p processors)

Page 12: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 12Arch of Parallel Computers

Is Parallel Computing Inevitable?

• Technology Trends– Chip technology continues to increase in density– Driving frequency of single core designs requires too much power– Use of commodity or off-the-shelf technology for low costs– Multi-core processing becoming common among mainstream

microprocessors (e.g., AMD, IBM, Intel)– Greater interconnect bandwidth becoming generally available

» Standard interconnects: Infiniband, 10Gb Ethernet

• Architecture Trends– Packaging parallel solutions in a common chassis

» e.g., Blade servers (IBM, HP, Dell, etc.)– Software being packaged for mainstream solutions

» e.g., Windows Compute Cluster Server 2003– High availability commonly achieved by clustering of processing elements

Page 13: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 13Arch of Parallel Computers

Is Parallel Computing Inevitable?

• Economics– The reducing costs of low end servers (dual and quad socket) with

high bandwidth of interconnects is driving applications to be parallel

– Commodity microprocessors not only fast but CHEAP» Development costs tens of millions of dollars» BUT, many more are sold compared to supercomputers» Crucial to take advantage of the investment, and use the commodity

building block– Multiprocessors being pushed by software vendors (e.g. database)

as well as hardware vendors– Standardization makes small, bus-based SMPs commodity– Desktop: few smaller processors versus one larger one?– Multiprocessor on a chip?

Page 14: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 14Arch of Parallel Computers

Sc

ale

Up

/ SM

P C

ompu

ting

Scal

e U

p / S

MP

Com

putin

g

Scale Out / Distributed ComputingScale Out / Distributed Computing

x445

Large ParallelClusters

BladeCenter™

xSeries 335/eServer 325

High DensityRack Mount

Large SMP

x455

Scale Up vs Scale Out Model

Page 15: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 15Arch of Parallel Computers

Blade Server Example - BladeCenter

BladeCenter 7U Chassis Form Factor

Highest Density, Lowest costSuper power efficient,

Consolidated Management

Nov 2002

Web hosting/serving FSS, File/Print Geophysical Analysis Collaboration Graphic Rendering

BladeCenter T8U Chassis Form Factor

Highly rugged, TelcoAC/DC, Long Life,NEBS, Air Filtration

March 2004

Telco/Core ApplicationsGovernmentMilitary Rugged IndustrialDC Medical

BladeCenter H9U Chassis Form FactorUltra High Performance4xIB/10Gb Backplane

New Management Module

Jan 2006

HPC ApplicationsTechnical ClustersVirtual Enterprise Solutions Future I/O

One family, many applications, many environments, long term investment protection- BladeCenter Simply Smarter IT

Compatible Set of Blades and Switches

Page 16: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 16Arch of Parallel Computers

• Fourteen Blades in a 9U Chassis Form Factor– Blade and switch compatibility across BladeCenter and BladeCenter-T

• High performance networking fabrics– New high performance switches and blade I/O– Corresponding bridge bays for protocol translation

• Power Enhancements– Four front load 2900W Power Supplies

Blade Server Example – BladeCenter H

Page 17: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 17Arch of Parallel Computers

• Switching Modules– Ethernet– Fiber Channel– Infiniband

• Blade I/O Card (or local drive)– I/O card matches switch technology

in corresponding slot

BladeCenter Overview

.

.

.

CPU Blade

CPU Blade

CPU Blade

Mgmt Module

CPU Blade

CPU Blade

CPU Blade

MgmtModule

CPU Blade

CPU Blade

CPU Blade

Switch Module

Switch Module

Page 18: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 18Arch of Parallel Computers

I/O Bridge• e.g., Ethernet, Fibre

Channel, Passthru

• Dual 4x (16 wire) wiring internally to each HSSM

High-speed Switch• Ethernet or Infiniband

• 4x (16 wire) blade links

• 4x (16 wire) bridge links

• 1x (4 wire) Mgmt links

• Uplinks: Up to 12x links for IB and at least four 10Gb links for Ethernet

BladeCenter H Architecture

Switch Module 2

Blade 2

Blade 14

Blade 1

...

Mgmt Mod 2

Mgmt Mod 1

Switch Module 1

I/O Bridge

HS Switch 2

HS Switch 1

I/O Bridge

I/O Bridge 4 / SM4

HS Switch 4

HS Switch 3

I/O Bridge 3/ SM3

Page 19: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 19Arch of Parallel Computers

InfiniBand on BladeCenter H

• Expanding BladeCenter Ecosystem with Cisco Systems– Switch module and daughter card designed for BladeCenter H– Daughter card provides dual port 4x (10G) InfiniBand connectivity to each blade

• Help Reduce Data Center Complexity– Reduce the number of adapters, cables, and switch ports required– Manage the addition or removal of I/O or storage bandwidth centrally– Enable users to adjust resources on demand without downtime

• High Performance Computing Features– Leverages RDMA to deliver low latency performance– Delivers higher bandwidth connectivity (160 Gbps to chassis)– Achieve blade port consolidation through remote I/O– I/O Virtualization via Cisco VFrame

Enabling High Performance and Virtualized I/O

BladeCenter H InfiniBand Solution provides high-speed, low latency solutions while lowering TCO

Page 20: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 20Arch of Parallel Computers

The End

Page 21: Architecture or Parallel Computers CSC / ECE 506  Summer 2006 Introduction / Overview

CSC / ECE 506 21Arch of Parallel Computers

Intra-Grids

Extra-Grids

Inter-Grids

GridGrid

NAS/SANNAS/SAN

Grid

NAS/SAN

VPN

Cactus

NTG(SF)

Express Project

MFGFin.

Services

Intra-Grids

Extra-Grids

Inter-Grids

GridGrid

NAS/SANNAS/SAN

Grid

NAS/SAN

VPN

Cactus

NTG(SF)

Express Project

MFGFin.

Services

2003

2006+ "Full Commercialization" with unknown partners

Commerce with Trusted Partners

Courtesy of Ellen Stokes

Grid Example