parallel processing sharing the load. inside a processor chip in package circuits primarily...

28
Parallel Processing Sharing the load

Upload: adrian-johnston

Post on 03-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Parallel ProcessingSharing the load

Page 2: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Inside a Processor

Chip in Package

Circuits

• Primarily Crystalline Silicon

• 1 mm – 25 mm on a side

• 100 million to billions of transistors– current “feature size” (process)

~ 22 nanometers

• Package provides:– communication with motherboard– heat dissipation

Page 3: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Moore's Law

• Number of transistors in same area doubles every 2 years

• Net effects:Processing power doubles approximately every 18 months

Page 4: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Exponential Growth

• Doubling is exponential growthYear 0 1.5 3 4.5 6 7.5 9 10.5 12

Speed 1 2 4 8 16 32 64 128 256

Page 5: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Moore's Law

Gordon MooreIntel Cofounder

Page 6: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Moore's Law

• If Moore's Law were applicable to the airline industry, a flight from New York to Paris in 1978 that cost $900 and took seven hours, would now cost about $0.01 and take less than one second.

Page 7: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Power Density Prediction circa 2000

40048008

8080 8085

8086

286 386486

Pentium® procP6

1

10

100

1000

10000

1970 1980 1990 2000 2010

Year

Pow

er D

ensi

ty (W

/cm

2)

Hot Plate

Nuclear Reactor

Rocket Nozzle

Source: S. Borkar (Intel)

Sun’s Surface

Core 2

Page 8: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

MultiCore• Multicore : Multiple processing cores on one chip– Each core can run a different program

Page 9: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Going Multi-core Helps Energy Efficiency• Speed takes power,

Power = heat– Can run at 80% speed with

50% power

Page 10: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Moore's Law Related Curves

Page 11: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Moore's Law Related Curves

Page 12: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Issues

• Not every part of a problem scales well– Parallel : can run at same time– Serial : must run one at a time in order

Page 13: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

• 5 workers can do parallel portion in 1/5th the time• Can't affect serial part

Speedup Issues

Time

Number of Cores

Parallel portion

Serial portion

1 5

Page 14: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Speedup IssuesTime

Number of Cores

Parallel portion

Serial portion

1 2 3 4 5

• Increasing workers provide diminishing returns

Page 15: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Amdahl’s Law

• Amdahl’s law : Predicts how many times faster N workers can do a task in which P portion is parallel

Page 16: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Amdahl’s Law

• 60% of a job can be made parallel. We use 2 processors:

• 1.43x faster with 2 than 1

Page 17: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Amdahl’s Law

• 60% of a job can be made parallel. We use 3 processors:

• 1.67x faster than with 1 worker

Page 18: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Amdahl’s Law

• Always have to do 40% of the work in serial• With infinite workers:

Only 2.5x faster!

2.5

Page 19: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Limits

• Max speedup limited by parallel portion of code:

Page 20: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Speedup Issues : Overhead• Even assuming no sequential portion, there’s…– Time to think how to divide the problem up – Time to hand out small “work units” to workers – All workers may not work equally fast– Some workers may fail – There may be contention for shared resources – Workers could overwriting each others’ answers– You may have to wait until the last worker returns to

proceed (the slowest / weakest link problem)– There’s time to put the data back together in a way that

looks as if it were done by one

Page 21: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Concurrency

• Concurrency : two things happening at the same time

• Many things don't work well concurrently– Printers– Shared memory

Page 22: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

No Synchronization

• Race Condition : unpredictable result based on timing of concurrent operations

Page 23: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

• X starts as 5, four possible answers:

No Synchronization

Case 1 Case 2 Case 3 Case 4

A runs, x = 15B runs, x =16

B runs, x = 6A runs, x = 16

A gets x (5)B gets x (5)A adds 10 (has 15)B adds 1 (has 6)A stores x = 15B stores x = 6

A gets x (5)B gets x (5)A adds 10 (has 15)B adds 1 (has 6) B stores x = 6A stores x = 15

Page 24: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Locks

• Can prevent concurrency problems with locks:

Page 25: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Deadlock

• But if we have…– Mutual exclusion : can't share resources– Hold and wait : you can reserve one resource

while waiting on another– No preemption : can't remove a resource from a

process's control

• Can have deadlock…

Page 26: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Deadlock

• Workers A and B both want to use locked resources X and Y:

Page 27: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Breaking Deadlock

• Must remove one condition– Mutual Exclusion• Find a way to share

– Hold and Wait• If you wait you must give up other resources

– No preemption• Take back a resource someone has claimed

Page 28: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Why Parallelism?

• We have no choice!– Multicore processors are a plan B, not a triumph for

parallelism

• Parallel processing takes new– Architectures– Algorithms– Structures– Languages