parallel processing i’ve gotta spend at least 10 hours studying for the it 344 final! i’m going...
TRANSCRIPT
Parallel Processing
I’ve gotta spend at least 10 hours studying for the
IT 344 final!I’m going to study with 9 friends… we’ll be done
in an hour.
Next up: TIPS
• Mega- = 106, Giga- = 109, Tera- = 1012, Peta- = 1015
• BOPS, anyone?
• Light travels about 1 ft / 10-9 secs in free space. •A Tera-Hertz uniprocessor could have no clock-to-clock path longer than 300 microns…
•We already know of problems that require greater than a TIP (Simulations of weather, weapons, brains)
Solution: Parallelism
• Pipelining – reasonable for a small number of stages (5-10), after that bypassing and stalls become unmanageable.
• Superscalar – replicate data paths and design control logic to discover parallelism in traditional programs.
• Explicit parallelism – must learn how to write programs that run on multiple CPUs.
Pipelining
Superscalar – How far can it go? Multiple functional units (ALUs, Addr, Floating point, etc.)
Instruction dispatch
Dynamic scheduling
Pipelines
Speculative execution
Explicit Parallelism
• Distributed– Transaction-oriented– Geographically dispersed locations– E.g. SETI@home
• Parallel– Single goal computing– Computing intense and/or data-intense– High-speed data exchange
• Often on custom hardware
– E.g. Geochemical surveys
Challenges
• For distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy.
• For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.
Other vocabulary
• Decomposition– The way that a program can be broken up for
parallel processing
• Course-grain– Breaks into big chunks (fewer processors)– SMP– Distributed (often)
• Fine-grain– Breaks into small chunks (more processors)– Image processing
Inter-processor communications
Loosely-coupled
Tightly-coupled
Distributed processors
Beowulf clusters
Custom supercomputers
More Terminology
1. SIMD (Single Instruction Multiple Data)
2. MIMD (Multiple Instruction Multiple Data)
3. MISD (Pipeline)
SIMD• Same instruction
executed in multiple units, on different data
• Examples: Vector processors, AltiVec
I
I
I
I
D1
D2
D3
D4
MIMD
• Each unit does own instruction on own text
• Examples: Mercury, Beowulf, etc.
I1
I2
I3
I4
D1
D2
D3
D4
MISD (pipeline)
I1 I2 I3 I4D1D2D3D4
Distributed Programming Tools
•C/C++ with TCP/IP
•Perl with TCP/IP
•Java
•Corba
•ASP
•.Net
Parallel Programming Tools
• PVM
• MPI
• Synergy
• Others (proprietary hardware)
Parallel Programming Difficulties
• Program partition and allocation
• Data partition and allocation
• Program(process) synchronization
• Data access mutual exclusion
• Dependencies
• Process(or) failures
• Scalability…
Software techniques
• Shared Memory Buffers — Areas of memory that any node can read or write
• Sockets — Provide full-duplex message passing between processes.
• Semaphores and Spinlocks — Provide locking and synchronization functions
• Mailbox Interrupts — Provide an interrupt-driven communication mechanism
• Direct Memory Access — Provides asynchronous shared memory bufferI/O.
Hardware configurations – Interconnects and Memory
Interconnects
Crossbar
Mesh
Interconnects
What it really looks like
Note: this computer would rank well on www.top500.org
Summary
• Prospects for future CPU architectures:– Pipelining - Well understood, but mined-out– Superscalar - Nearing its practical limits– SIMD - Limited use for special applications– VLIW - Returns controls to S/W. The future?
• Prospects for future Computer System architectures:– SMP - Limited scalability. Harder than it appears.– MIMD/message-passing - It’s been the future for over
20 years now. How to program?