architecture or parallel computers csc / ece 506 summer 2006 introduction / overview
DESCRIPTION
Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Introduction / Overview. 5/22/2006 Dr Steve Hunter. Architecture of Parallel Computers. Taught jointly by Dr Ed Gehringer and Dr Steve Hunter Course days Monday 4:00 – 6:45 Wednesday 4:00 – 5:15 - PowerPoint PPT PresentationTRANSCRIPT
Architecture or Parallel ComputersCSC / ECE 506
Summer 2006
Introduction / Overview
5/22/2006
Dr Steve Hunter
CSC / ECE 506 2Arch of Parallel Computers
Architecture of Parallel Computers
• Taught jointly by Dr Ed Gehringer and Dr Steve Hunter
• Course days – Monday 4:00 – 6:45– Wednesday 4:00 – 5:15
• Goal: Understand the interaction of hardware and software with respect to parallel systems design and implementation.
• Textbook “Parallel Computer Architecture”, by Culler and Singh
• Selected papers possible
CSC / ECE 506 3Arch of Parallel Computers
Architecture of Parallel Computers
• Steve’s Info:– NCSU Adjunct Professor– IBM Corporation– Website: http://www.ee.duke.edu/~shunter/– email: [email protected]
• Academic– Auburn University BSEE– NC State University MSEE– Duke University PhD
• IBM Corporation– IBM Networking Division 14 years– Systems and Technology Group 8 years
• Areas of Interest– Systems and Network Architecture and Technology– Computer and Network Performance and Dependability– Server Clustering and Software Dependability
CSC / ECE 506 4Arch of Parallel Computers
Course Outline (Tentative) Mon. 4:00 Mon. 5:30 Wed. 4:00
5/22-5/24 1. Overview of parallel computation (1)
2. Message-passing and data-parallel models (3-5) 3. Steps in parallelization (6)
5/29-5/31 Memorial Day 4. Parallelizing the Ocean application (7)
6/5-6/7 5. Data-parallel algorithms (8) 6. Cache organization (10)
7. The cache-coherence problem/interleaved memory (12)
6/12-6/14
8. Invalidation-based cache-coherence protocols (13)
9. Update-based coherence protocols and perf.(14) Test 1
6/19-6/21 10. Scalable multiprocessors (16)
11. Realizing programming models in scalable systems (17)
12. Design space for communication architectures
6/26-6/'28 13. Scalable cache coherence (19)
14. Directory-protocol correctness and performance (20)
7/'3-7/5 Independence Day
15. The Silicon Graphics S2MP architecture (21)
7/10-7/12 16. Extending cache coherence (15) 17. Open MPI Test 2
7/17-7/19 18. Open RDMA, open fabrics 19. Memory consistency (22)
20. Relaxed memory-consistency models (23)
7/24-7/26
21. Interconnection network topologies (25)
22. Routing in interconnection networks (26) 23. Switch design (27)
7/31-8/2
8/7-8/9 Final exam
http://courses.ncsu.edu/csc506/lec/052/lectures/syllabus.html
CSC / ECE 506 5Arch of Parallel Computers
What is Parallel Computer Architecture?• A Parallel Computer is a collection of processing elements that
cooperate to solve large problems fast
• Some broad issues:– Resource Allocation:
» how large a collection? » how powerful are the elements?» how much memory?
– Data access, Communication and Synchronization» how do the elements cooperate and communicate?» how are data transmitted between processors?» what are the abstractions and primitives for cooperation?
– Performance and Scalability» how does it all translate into performance?» how does it scale?
CSC / ECE 506 6Arch of Parallel Computers
Historical Perspective• Parallel computing was represented by competing models and
corresponding unique architectures, no clear path for growth
• Competing Methods– Dataflow– Systolic Arrays– SIMD (bit serial)– Shared Memory– Message passing
• Confusion occurs over which model to use paralyzed parallel software development
– Section 1.2 shows several architectures.– Shared-Memory Multiprocessors
» Bus-based; Crossbar-based; * MIN-based– Message Passing Machines (Hypercube)
» IBM SP2 Architecture
CSC / ECE 506 7Arch of Parallel Computers
Why Study Parallel Computer Architecture?
• Role of a computer architect: – To design and engineer the various levels of a computer system to
maximize performance and programmability within limits of technology and cost.
• Parallelism:– Provides alternative to faster clock for performance– Applies at all levels of system design– Is a fascinating perspective from which to view architecture– Traditionally central in information processing elements in the same
locality– However, greater networking bandwidth is expanding parallelism
over greater distances.
CSC / ECE 506 8Arch of Parallel Computers
Parallel Computation: Why and Why Not?
• Pros– Performance– Cost-effectiveness (commodity parts)– Smooth upgrade path– Fault Tolerance
• Cons– Difficult to parallelize applications– Requires automatic parallelization or parallel program development– Software!
CSC / ECE 506 9Arch of Parallel Computers
Is Parallel Computing Inevitable?
• Application demands: (the need for computing cycles)– Petroleum (reservoir analysis)– Automotive (crash simulation, drag analysis, combustion efficiency)– Aeronautics (airflow analysis, engine efficiency, structural
mechanics, electromagnetism)– Computer-aided design– Pharmaceuticals (molecular modeling)– Visualization
» in all of the above» entertainment (films like Toy Story, The Hulk)» architecture (walk-throughs and rendering)
– Financial modeling (yield and derivative analysis)– Search Engines– etc.
CSC / ECE 506 10Arch of Parallel Computers
Application Trends
• Application demand for performance fuels advances in hardware, which enables new applications, which...
– Cycle drives exponential increase in microprocessor performance– Drives parallel architecture harder
» most demanding applications
• Range of performance demands– Need range of system performance with progressively increasing cost
New ApplicationsMore Performance
CSC / ECE 506 11Arch of Parallel Computers
Speedup
• Speedup (p processors) =
• For a fixed problem size (input data set), performance = 1/time
• Speedup fixed problem (p processors) =
Performance (p processors)
Performance (1 processor)
Time (1 processor)
Time (p processors)
CSC / ECE 506 12Arch of Parallel Computers
Is Parallel Computing Inevitable?
• Technology Trends– Chip technology continues to increase in density– Driving frequency of single core designs requires too much power– Use of commodity or off-the-shelf technology for low costs– Multi-core processing becoming common among mainstream
microprocessors (e.g., AMD, IBM, Intel)– Greater interconnect bandwidth becoming generally available
» Standard interconnects: Infiniband, 10Gb Ethernet
• Architecture Trends– Packaging parallel solutions in a common chassis
» e.g., Blade servers (IBM, HP, Dell, etc.)– Software being packaged for mainstream solutions
» e.g., Windows Compute Cluster Server 2003– High availability commonly achieved by clustering of processing elements
CSC / ECE 506 13Arch of Parallel Computers
Is Parallel Computing Inevitable?
• Economics– The reducing costs of low end servers (dual and quad socket) with
high bandwidth of interconnects is driving applications to be parallel
– Commodity microprocessors not only fast but CHEAP» Development costs tens of millions of dollars» BUT, many more are sold compared to supercomputers» Crucial to take advantage of the investment, and use the commodity
building block– Multiprocessors being pushed by software vendors (e.g. database)
as well as hardware vendors– Standardization makes small, bus-based SMPs commodity– Desktop: few smaller processors versus one larger one?– Multiprocessor on a chip?
CSC / ECE 506 14Arch of Parallel Computers
Sc
ale
Up
/ SM
P C
ompu
ting
Scal
e U
p / S
MP
Com
putin
g
Scale Out / Distributed ComputingScale Out / Distributed Computing
x445
Large ParallelClusters
BladeCenter™
xSeries 335/eServer 325
High DensityRack Mount
Large SMP
x455
Scale Up vs Scale Out Model
CSC / ECE 506 15Arch of Parallel Computers
Blade Server Example - BladeCenter
BladeCenter 7U Chassis Form Factor
Highest Density, Lowest costSuper power efficient,
Consolidated Management
Nov 2002
Web hosting/serving FSS, File/Print Geophysical Analysis Collaboration Graphic Rendering
BladeCenter T8U Chassis Form Factor
Highly rugged, TelcoAC/DC, Long Life,NEBS, Air Filtration
March 2004
Telco/Core ApplicationsGovernmentMilitary Rugged IndustrialDC Medical
BladeCenter H9U Chassis Form FactorUltra High Performance4xIB/10Gb Backplane
New Management Module
Jan 2006
HPC ApplicationsTechnical ClustersVirtual Enterprise Solutions Future I/O
One family, many applications, many environments, long term investment protection- BladeCenter Simply Smarter IT
Compatible Set of Blades and Switches
CSC / ECE 506 16Arch of Parallel Computers
• Fourteen Blades in a 9U Chassis Form Factor– Blade and switch compatibility across BladeCenter and BladeCenter-T
• High performance networking fabrics– New high performance switches and blade I/O– Corresponding bridge bays for protocol translation
• Power Enhancements– Four front load 2900W Power Supplies
Blade Server Example – BladeCenter H
CSC / ECE 506 17Arch of Parallel Computers
• Switching Modules– Ethernet– Fiber Channel– Infiniband
• Blade I/O Card (or local drive)– I/O card matches switch technology
in corresponding slot
BladeCenter Overview
.
.
.
CPU Blade
CPU Blade
CPU Blade
Mgmt Module
CPU Blade
CPU Blade
CPU Blade
MgmtModule
CPU Blade
CPU Blade
CPU Blade
Switch Module
Switch Module
CSC / ECE 506 18Arch of Parallel Computers
I/O Bridge• e.g., Ethernet, Fibre
Channel, Passthru
• Dual 4x (16 wire) wiring internally to each HSSM
High-speed Switch• Ethernet or Infiniband
• 4x (16 wire) blade links
• 4x (16 wire) bridge links
• 1x (4 wire) Mgmt links
• Uplinks: Up to 12x links for IB and at least four 10Gb links for Ethernet
BladeCenter H Architecture
Switch Module 2
Blade 2
Blade 14
Blade 1
...
Mgmt Mod 2
Mgmt Mod 1
Switch Module 1
I/O Bridge
HS Switch 2
HS Switch 1
I/O Bridge
I/O Bridge 4 / SM4
HS Switch 4
HS Switch 3
I/O Bridge 3/ SM3
CSC / ECE 506 19Arch of Parallel Computers
InfiniBand on BladeCenter H
• Expanding BladeCenter Ecosystem with Cisco Systems– Switch module and daughter card designed for BladeCenter H– Daughter card provides dual port 4x (10G) InfiniBand connectivity to each blade
• Help Reduce Data Center Complexity– Reduce the number of adapters, cables, and switch ports required– Manage the addition or removal of I/O or storage bandwidth centrally– Enable users to adjust resources on demand without downtime
• High Performance Computing Features– Leverages RDMA to deliver low latency performance– Delivers higher bandwidth connectivity (160 Gbps to chassis)– Achieve blade port consolidation through remote I/O– I/O Virtualization via Cisco VFrame
Enabling High Performance and Virtualized I/O
BladeCenter H InfiniBand Solution provides high-speed, low latency solutions while lowering TCO
CSC / ECE 506 20Arch of Parallel Computers
The End
CSC / ECE 506 21Arch of Parallel Computers
Intra-Grids
Extra-Grids
Inter-Grids
GridGrid
NAS/SANNAS/SAN
Grid
NAS/SAN
VPN
Cactus
NTG(SF)
Express Project
MFGFin.
Services
Intra-Grids
Extra-Grids
Inter-Grids
GridGrid
NAS/SANNAS/SAN
Grid
NAS/SAN
VPN
Cactus
NTG(SF)
Express Project
MFGFin.
Services
2003
2006+ "Full Commercialization" with unknown partners
Commerce with Trusted Partners
Courtesy of Ellen Stokes
Grid Example