![Page 1: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/1.jpg)
Parallel Processing
I’ve gotta spend at least 10 hours studying for the
IT 344 final!I’m going to study with 9 friends… we’ll be done
in an hour.
![Page 2: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/2.jpg)
Next up: TIPS
• Mega- = 106, Giga- = 109, Tera- = 1012, Peta- = 1015
• BOPS, anyone?
• Light travels about 1 ft / 10-9 secs in free space. •A Tera-Hertz uniprocessor could have no clock-to-clock path longer than 300 microns…
•We already know of problems that require greater than a TIP (Simulations of weather, weapons, brains)
![Page 3: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/3.jpg)
Solution: Parallelism
• Pipelining – reasonable for a small number of stages (5-10), after that bypassing and stalls become unmanageable.
• Superscalar – replicate data paths and design control logic to discover parallelism in traditional programs.
• Explicit parallelism – must learn how to write programs that run on multiple CPUs.
![Page 4: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/4.jpg)
Pipelining
![Page 5: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/5.jpg)
Superscalar – How far can it go? Multiple functional units (ALUs, Addr, Floating point, etc.)
Instruction dispatch
Dynamic scheduling
Pipelines
Speculative execution
![Page 6: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/6.jpg)
Explicit Parallelism
• Distributed– Transaction-oriented– Geographically dispersed locations– E.g. SETI@home
• Parallel– Single goal computing– Computing intense and/or data-intense– High-speed data exchange
• Often on custom hardware
– E.g. Geochemical surveys
![Page 7: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/7.jpg)
Challenges
• For distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy.
• For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.
![Page 8: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/8.jpg)
Other vocabulary
• Decomposition– The way that a program can be broken up for
parallel processing
• Course-grain– Breaks into big chunks (fewer processors)– SMP– Distributed (often)
• Fine-grain– Breaks into small chunks (more processors)– Image processing
![Page 9: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/9.jpg)
Inter-processor communications
Loosely-coupled
Tightly-coupled
Distributed processors
Beowulf clusters
Custom supercomputers
![Page 10: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/10.jpg)
More Terminology
1. SIMD (Single Instruction Multiple Data)
2. MIMD (Multiple Instruction Multiple Data)
3. MISD (Pipeline)
![Page 11: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/11.jpg)
SIMD• Same instruction
executed in multiple units, on different data
• Examples: Vector processors, AltiVec
I
I
I
I
D1
D2
D3
D4
![Page 12: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/12.jpg)
MIMD
• Each unit does own instruction on own text
• Examples: Mercury, Beowulf, etc.
I1
I2
I3
I4
D1
D2
D3
D4
![Page 13: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/13.jpg)
MISD (pipeline)
I1 I2 I3 I4D1D2D3D4
![Page 14: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/14.jpg)
Distributed Programming Tools
•C/C++ with TCP/IP
•Perl with TCP/IP
•Java
•Corba
•ASP
•.Net
![Page 15: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/15.jpg)
Parallel Programming Tools
• PVM
• MPI
• Synergy
• Others (proprietary hardware)
![Page 16: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/16.jpg)
Parallel Programming Difficulties
• Program partition and allocation
• Data partition and allocation
• Program(process) synchronization
• Data access mutual exclusion
• Dependencies
• Process(or) failures
• Scalability…
![Page 17: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/17.jpg)
Software techniques
• Shared Memory Buffers — Areas of memory that any node can read or write
• Sockets — Provide full-duplex message passing between processes.
• Semaphores and Spinlocks — Provide locking and synchronization functions
• Mailbox Interrupts — Provide an interrupt-driven communication mechanism
• Direct Memory Access — Provides asynchronous shared memory bufferI/O.
![Page 18: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/18.jpg)
Hardware configurations – Interconnects and Memory
![Page 19: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/19.jpg)
Interconnects
![Page 20: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/20.jpg)
Crossbar
![Page 21: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/21.jpg)
Mesh
![Page 22: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/22.jpg)
Interconnects
![Page 23: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/23.jpg)
What it really looks like
Note: this computer would rank well on www.top500.org
![Page 24: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour](https://reader036.vdocuments.us/reader036/viewer/2022070413/5697bff71a28abf838cbecf1/html5/thumbnails/24.jpg)
Summary
• Prospects for future CPU architectures:– Pipelining - Well understood, but mined-out– Superscalar - Nearing its practical limits– SIMD - Limited use for special applications– VLIW - Returns controls to S/W. The future?
• Prospects for future Computer System architectures:– SMP - Limited scalability. Harder than it appears.– MIMD/message-passing - It’s been the future for over
20 years now. How to program?