Download - Computing Parallel Architectures
![Page 1: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/1.jpg)
Parallel Computing ArchitecturesMarcus Chow
![Page 2: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/2.jpg)
Logistics
● Please sign up for a week to present○ One presentation per group for now
● Emerson will present DVFS in Heterogeneous Embedded Systems next week● As always, feel free to message or email me if you have any questions or concerns● Let’s start working!!!!
![Page 3: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/3.jpg)
Today
● Provide you a helpful resource for some concepts, terminology, and history● Accelerated workshop on various computer architectures● Hope to peak your interests enough for you to start to ask questions
![Page 4: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/4.jpg)
Parallel Computing ArchitecturesHow is computing machine is design?
![Page 5: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/5.jpg)
COMPUTING ARCHITECTURES
5
A Computing Architecture is the design of functionality, organization, and implementation of a computing system
Hardware Software User
![Page 6: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/6.jpg)
COMPUTING HARDWARE
6
Physical components of a computing system
ElectricalBiologicalDesigned to solve some computational problem
Application Specific General Purpose
Mechanical
![Page 7: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/7.jpg)
COMPUTING SOFTWARE
7
A set of instructions that directs a computer’s hardware to perform a task
Biological Mechanical Electrical
![Page 8: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/8.jpg)
THE USERS
8
The people who creates software and operates the hardware
Melba Roy Mouton Dorothy Vaughan
Marget Hamiliton
![Page 9: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/9.jpg)
WEAVING AS CODE
9
https://www2.cs.arizona.edu/patterns/weaving/webdocs/gre_dda1.pdfhttp://blog.blprnt.com/blog/blprnt/infinite-weft-exploring-the-old-aesthetic
A warp thread is woven either above or belove the weft threadCan be visualized through binary representation (drawdown)A drawdown pattern can be made to produce a non-repeating automataIt is possible to generate a Turing-complete program woven into the fabric itself
![Page 10: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/10.jpg)
A History of Women’s Role in Computing Architectures
https://www.youtube.com/watch?v=GuEyXXZA3Ms&t=18s
![Page 11: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/11.jpg)
Threads as a streams of information
● Visualize two separate flows of information through a computer● One for instructions (Thread) and the Data that is computed on● Mixing the two streams produces a single output
![Page 12: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/12.jpg)
A thread as a Von Neumon processor
![Page 13: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/13.jpg)
5-stage Compute pipeline
![Page 14: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/14.jpg)
5-stage Compute pipeline
![Page 15: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/15.jpg)
Increased Complexity
![Page 16: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/16.jpg)
Memory Performance Gap
![Page 17: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/17.jpg)
Typical Memory Hierarchy
![Page 18: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/18.jpg)
Principle of Locality
![Page 19: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/19.jpg)
Principle of Locality
![Page 20: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/20.jpg)
Temporal locality in instructions
![Page 21: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/21.jpg)
Temporal locality in Data
![Page 22: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/22.jpg)
Spatial locality in Instructions
![Page 23: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/23.jpg)
Spatial locality in Data
![Page 24: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/24.jpg)
Abstracted CPU
![Page 25: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/25.jpg)
CPU in a Computer System
CPU
Disk
Memory
I/O
GPU / Accelerator
![Page 26: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/26.jpg)
Software-Hardware Stack
![Page 27: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/27.jpg)
The Process in Operating Systems
![Page 28: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/28.jpg)
The Process in Operating Systems
![Page 29: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/29.jpg)
Process vs Thread
![Page 30: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/30.jpg)
Process vs Thread
![Page 31: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/31.jpg)
Parallel Computing ArchitecturesHow do we process more data at once?
![Page 32: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/32.jpg)
Multicore Designs
● Gain parallelism by adding addition cores● Each core is independent of one another● Multiple Instruction Multiple Data● How do you connect each core together?
![Page 33: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/33.jpg)
Interconnection Network
● Independent Cores communicate through higher level caches (L2 and L3)● Are connected through bus system interconnect
![Page 34: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/34.jpg)
Utilizing multiple cores
● Multithreaded process can map a thread to a single core● Can have multiple process run on different cores
![Page 35: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/35.jpg)
Tiled Mesh Networks
● Can connect cores through a Mesh Network
● Each tile is very tiny● Each tile routes messages to its
neighbors● Hammerblade Manycore project● How to efficiently write parallel
algorithms for it?● How can we utilize the fact that each tile
is independent of one another?
![Page 36: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/36.jpg)
Parallel Other than MIMD
● What if we don’t need to have independent cores?● How can we still increase parallelization within the hardware?
![Page 37: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/37.jpg)
Vectorization
● We want to be able process more data● Add more physical compute units● this is how a loom works
![Page 38: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/38.jpg)
Back to Theory
● Single Data -> Multiple Data
![Page 39: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/39.jpg)
Vector Processors
![Page 40: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/40.jpg)
Vector Processors
![Page 41: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/41.jpg)
Vector Operations
![Page 42: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/42.jpg)
Von-Neumann Model with SIMD units
SIMD Unit
![Page 43: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/43.jpg)
Pipeline with SIMD Units
Increased Complexity
Vector RF Vector Unit Pipeline
![Page 44: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/44.jpg)
Abstracted CPU with Vector Units
Vector Units
Vector Units
![Page 45: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/45.jpg)
Raspberry Pi QPU
● Single instruction stream has mix of scalar and vector instructions
● How to program this style of simd unit?
● What are the overheads of needing to copy data to other parts of the chip?
![Page 46: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/46.jpg)
Massively Parallel Vector Processors
Increased Complexity
Vector RF
Vector Unit Pipeline
Vector Unit Pipeline
Vector Unit Pipeline
● Give more space to vector units, reduce the number of scalar units
![Page 47: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/47.jpg)
Massively Parallel Vector Processors
Increased Complexity
Vector RF
Vector Unit Pipeline
Vector Unit Pipeline
Vector Unit Pipeline
Compute Unit / Core
![Page 48: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/48.jpg)
GPU Architecture ● Now we can handle thousands of threads!!● How do we organize thread execution?
![Page 49: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/49.jpg)
Vector Operations
![Page 50: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/50.jpg)
Arrays of Parallel Threads
![Page 51: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/51.jpg)
Thread Blocks: Scalable Cooperation
![Page 52: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/52.jpg)
Transparent Scalability
![Page 53: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/53.jpg)
blockIdx and threadIdx
![Page 54: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/54.jpg)
GPU Hierarchies
![Page 55: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/55.jpg)
CU Scaling● Are using this many cores efficient?● How should we manage all of the CUs/SEs?● How do you keep track of utilization?
8% less energy by using only 32 CUs out of 56 total
![Page 56: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/56.jpg)
GPU in a Computer System
CPU
Disk
Memory
I/O
GPU / Accelerator
![Page 57: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/57.jpg)
Kernel Launching and Tasking● How are kernels launched from the CPU to GPU?● How to enforce dependencies between kernels?● What kind of parallel algorithms does this
mechanism allow us to make?
![Page 58: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/58.jpg)
Integrated GPU Architectures
● What if CPU and GPU are on the same chip?
● How does that affect power consumption?
● How do we control both set of cores now?
![Page 59: Computing Parallel Architectures](https://reader033.vdocuments.us/reader033/viewer/2022061016/629a3862dec5b3428a43bc05/html5/thumbnails/59.jpg)
Sustainable Computing Architectures
● Parallel architectures are energy efficient● Everyone is trying to make using them more efficient and cost effective