parallel processing sharing the load. inside a processor chip in package circuits primarily...
TRANSCRIPT
![Page 1: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/1.jpg)
Parallel ProcessingSharing the load
![Page 2: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/2.jpg)
Inside a Processor
Chip in Package
Circuits
• Primarily Crystalline Silicon
• 1 mm – 25 mm on a side
• 100 million to billions of transistors– current “feature size” (process)
~ 22 nanometers
• Package provides:– communication with motherboard– heat dissipation
![Page 3: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/3.jpg)
Moore's Law
• Number of transistors in same area doubles every 2 years
• Net effects:Processing power doubles approximately every 18 months
![Page 4: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/4.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Exponential Growth
• Doubling is exponential growthYear 0 1.5 3 4.5 6 7.5 9 10.5 12
Speed 1 2 4 8 16 32 64 128 256
![Page 5: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/5.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law
Gordon MooreIntel Cofounder
![Page 6: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/6.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law
• If Moore's Law were applicable to the airline industry, a flight from New York to Paris in 1978 that cost $900 and took seven hours, would now cost about $0.01 and take less than one second.
![Page 7: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/7.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Power Density Prediction circa 2000
40048008
8080 8085
8086
286 386486
Pentium® procP6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Pow
er D
ensi
ty (W
/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
Source: S. Borkar (Intel)
Sun’s Surface
Core 2
![Page 8: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/8.jpg)
MultiCore• Multicore : Multiple processing cores on one chip– Each core can run a different program
![Page 9: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/9.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Going Multi-core Helps Energy Efficiency• Speed takes power,
Power = heat– Can run at 80% speed with
50% power
![Page 10: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/10.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law Related Curves
![Page 11: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/11.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law Related Curves
![Page 12: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/12.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Issues
• Not every part of a problem scales well– Parallel : can run at same time– Serial : must run one at a time in order
![Page 13: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/13.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
• 5 workers can do parallel portion in 1/5th the time• Can't affect serial part
Speedup Issues
Time
Number of Cores
Parallel portion
Serial portion
1 5
![Page 14: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/14.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Speedup IssuesTime
Number of Cores
Parallel portion
Serial portion
1 2 3 4 5
• Increasing workers provide diminishing returns
![Page 15: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/15.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• Amdahl’s law : Predicts how many times faster N workers can do a task in which P portion is parallel
![Page 16: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/16.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• 60% of a job can be made parallel. We use 2 processors:
• 1.43x faster with 2 than 1
![Page 17: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/17.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• 60% of a job can be made parallel. We use 3 processors:
• 1.67x faster than with 1 worker
![Page 18: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/18.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• Always have to do 40% of the work in serial• With infinite workers:
Only 2.5x faster!
2.5
![Page 19: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/19.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Limits
• Max speedup limited by parallel portion of code:
![Page 20: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/20.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Speedup Issues : Overhead• Even assuming no sequential portion, there’s…– Time to think how to divide the problem up – Time to hand out small “work units” to workers – All workers may not work equally fast– Some workers may fail – There may be contention for shared resources – Workers could overwriting each others’ answers– You may have to wait until the last worker returns to
proceed (the slowest / weakest link problem)– There’s time to put the data back together in a way that
looks as if it were done by one
![Page 21: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/21.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Concurrency
• Concurrency : two things happening at the same time
• Many things don't work well concurrently– Printers– Shared memory
![Page 22: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/22.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
No Synchronization
• Race Condition : unpredictable result based on timing of concurrent operations
![Page 23: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/23.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
• X starts as 5, four possible answers:
No Synchronization
Case 1 Case 2 Case 3 Case 4
A runs, x = 15B runs, x =16
B runs, x = 6A runs, x = 16
A gets x (5)B gets x (5)A adds 10 (has 15)B adds 1 (has 6)A stores x = 15B stores x = 6
A gets x (5)B gets x (5)A adds 10 (has 15)B adds 1 (has 6) B stores x = 6A stores x = 15
![Page 24: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/24.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Locks
• Can prevent concurrency problems with locks:
![Page 25: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/25.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Deadlock
• But if we have…– Mutual exclusion : can't share resources– Hold and wait : you can reserve one resource
while waiting on another– No preemption : can't remove a resource from a
process's control
• Can have deadlock…
![Page 26: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/26.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Deadlock
• Workers A and B both want to use locked resources X and Y:
![Page 27: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/27.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Breaking Deadlock
• Must remove one condition– Mutual Exclusion• Find a way to share
– Hold and Wait• If you wait you must give up other resources
– No preemption• Take back a resource someone has claimed
![Page 28: Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to](https://reader030.vdocuments.us/reader030/viewer/2022032806/56649f055503460f94c1993e/html5/thumbnails/28.jpg)
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Why Parallelism?
• We have no choice!– Multicore processors are a plan B, not a triumph for
parallelism
• Parallel processing takes new– Architectures– Algorithms– Structures– Languages