parallel processing i’ve gotta spend at least 10 hours studying for the it 344 final! i’m going...

24
Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour.

Upload: timothy-harrington

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Parallel Processing

I’ve gotta spend at least 10 hours studying for the

IT 344 final!I’m going to study with 9 friends… we’ll be done

in an hour.

Page 2: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Next up: TIPS

• Mega- = 106, Giga- = 109, Tera- = 1012, Peta- = 1015

• BOPS, anyone?

• Light travels about 1 ft / 10-9 secs in free space. •A Tera-Hertz uniprocessor could have no clock-to-clock path longer than 300 microns…

•We already know of problems that require greater than a TIP (Simulations of weather, weapons, brains)

Page 3: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Solution: Parallelism

• Pipelining – reasonable for a small number of stages (5-10), after that bypassing and stalls become unmanageable.

• Superscalar – replicate data paths and design control logic to discover parallelism in traditional programs.

• Explicit parallelism – must learn how to write programs that run on multiple CPUs.

Page 4: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Pipelining

Page 5: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Superscalar – How far can it go? Multiple functional units (ALUs, Addr, Floating point, etc.)

Instruction dispatch

Dynamic scheduling

Pipelines

Speculative execution

Page 6: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Explicit Parallelism

• Distributed– Transaction-oriented– Geographically dispersed locations– E.g. SETI@home

• Parallel– Single goal computing– Computing intense and/or data-intense– High-speed data exchange

• Often on custom hardware

– E.g. Geochemical surveys

Page 7: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Challenges

• For distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy.

• For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.

Page 8: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Other vocabulary

• Decomposition– The way that a program can be broken up for

parallel processing

• Course-grain– Breaks into big chunks (fewer processors)– SMP– Distributed (often)

• Fine-grain– Breaks into small chunks (more processors)– Image processing

Page 9: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Inter-processor communications

Loosely-coupled

Tightly-coupled

Distributed processors

Beowulf clusters

Custom supercomputers

Page 10: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

More Terminology

1. SIMD (Single Instruction Multiple Data)

2. MIMD (Multiple Instruction Multiple Data)

3. MISD (Pipeline)

Page 11: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

SIMD• Same instruction

executed in multiple units, on different data

• Examples: Vector processors, AltiVec

I

I

I

I

D1

D2

D3

D4

Page 12: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

MIMD

• Each unit does own instruction on own text

• Examples: Mercury, Beowulf, etc.

I1

I2

I3

I4

D1

D2

D3

D4

Page 13: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

MISD (pipeline)

I1 I2 I3 I4D1D2D3D4

Page 14: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Distributed Programming Tools

•C/C++ with TCP/IP

•Perl with TCP/IP

•Java

•Corba

•ASP

•.Net

Page 15: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Parallel Programming Tools

• PVM

• MPI

• Synergy

• Others (proprietary hardware)

Page 16: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Parallel Programming Difficulties

• Program partition and allocation

• Data partition and allocation

• Program(process) synchronization

• Data access mutual exclusion

• Dependencies

• Process(or) failures

• Scalability…

Page 17: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Software techniques

• Shared Memory Buffers — Areas of memory that any node can read or write

• Sockets — Provide full-duplex message passing between processes.

• Semaphores and Spinlocks — Provide locking and synchronization functions

• Mailbox Interrupts — Provide an interrupt-driven communication mechanism

• Direct Memory Access — Provides asynchronous shared memory bufferI/O.

Page 18: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Hardware configurations – Interconnects and Memory

Page 19: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Interconnects

Page 20: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Crossbar

Page 21: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Mesh

Page 22: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Interconnects

Page 23: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

What it really looks like

Note: this computer would rank well on www.top500.org

Page 24: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Summary

• Prospects for future CPU architectures:– Pipelining - Well understood, but mined-out– Superscalar - Nearing its practical limits– SIMD - Limited use for special applications– VLIW - Returns controls to S/W. The future?

• Prospects for future Computer System architectures:– SMP - Limited scalability. Harder than it appears.– MIMD/message-passing - It’s been the future for over

20 years now. How to program?