![Page 1: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/1.jpg)
“SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!”
Bogazici UniversityIstanbul, Turkey
Presented by:
Dr. Abu AsaduzzamanAssistant Professor in Computer Architecture and Director of CAPPLabDepartment of Electrical Engineering and Computer Science (EECS)
Wichita State University (WSU), USA
June 2, 2014
![Page 2: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/2.jpg)
Dr. Zaman 2
“SMT/GPU Provides High Performance;at WSU CAPPLab, we can help you!”
Outline ►■ Introduction
Single-Core to Multicore Architectures
■ Performance Improvement Simultaneous Multithreading (SMT) (SMT enabled) Multicore CPU with GPUs
■ Energy-Efficient Computing Dynamic GPU Selection
■ CAPPLab “People First” Resources Research Grants/Activities
■ Discussion
QUESTIONS? Any time, please!
![Page 3: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/3.jpg)
Dr. Zaman 3
Thank you!■ Dr. Professor Can Ozturan
Chair, ComE Department Bogazici University, Istanbul, Turkey
■ Dr. Professor Bayram Yildirim Alumni, Bogazici University IME Department Wichita State University
■ Many more…
![Page 4: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/4.jpg)
Dr. Zaman 4
Introduction
Some Important “Laws”■ Moore’s law■ Amdahl’s law Vs. Gustafson’s law ■ Law of diminishing returns■ Koomey's law
■ (Juggling) http://www.youtube.com/watch?v=PqBlA9kU8ZE http://www.youtube.com/watch?v=S0d3fK9ZHUI
![Page 5: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/5.jpg)
Dr. Zaman 5
Introduction
Moore’s Law■ The number of transistors on
integrated circuits doubles approximately every 18 months.
![Page 6: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/6.jpg)
Dr. Zaman 6
Introduction
Amdahl’s law Vs. Gustafson’s law■ The speedup of a program using multiple processors in parallel
computing is limited by the sequential fraction of the program.■ Computations involving arbitrarily large data sets can be parallelized.
![Page 7: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/7.jpg)
Dr. Zaman 7
Introduction
Law of diminishing returns■ In all productive processes, adding more of one factor of production,
while holding all others constant, will at some point yield lower per-unit returns.
![Page 8: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/8.jpg)
Dr. Zaman 8
Introduction
Koomey's law■ The number of computations
per joule of energy dissipated has been doubling approximately every 1.57 years. This trend has been remarkably stable since the 1950s.
![Page 9: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/9.jpg)
Dr. Zaman 9
Introduction
Single-Core to Multicore Architecture■ History of Computing
Word “computer” in 1613 (this is not the beginning) Von Neumann architecture (1945) – data/instructions memory Harvard architecture (1944) – data memory, instruction memory
■ Single-Core Processors In most modern processors: split CL1 (I1, D1), unified CL2, … Intel Pentium 4, AMD Athlon Classic, …
■ Popular Programming Languages C, …
![Page 10: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/10.jpg)
Dr. Zaman 10
(Single-Core to) Multicore Architecture
Courtesy: Jernej Barbič, Carnegie Mellon University
Input Process/Store Output
Multi-tasking Time sharing (Juggling!)
Cache not shown
Introduction
![Page 11: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/11.jpg)
Dr. Zaman 11
Single-Core “Core”
Introduction
a single core
Courtesy: Jernej Barbič, Carnegie Mellon University
A thread is a running “process”
![Page 12: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/12.jpg)
Dr. Zaman 12
Major Steps to
Execute an
Instruction
68000 CPU and
Memory
Memory
CPU
D7……D0Data Registers31…16….8..0
A7’A7…A0Address Registers
31…16….8..0
PC31…16….8..0
ALU
Decoder / Control
Unit
IR??…16….8..0
SR15….8..0
Start
1: I.F.
2: I.D.
(3) O.F.
16b24b
(3) O.F.
24b
4: I.E.
Introduction
(5) W.B.
(5) W.B.
![Page 13: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/13.jpg)
Dr. Zaman 13
Introduction
Thread 1: Integer (INT) Operation(Pipelining Technique)
1: InstructionFetch
2: InstructionDecode
(3) Operand(s)Fetch
4: IntegerOperation
ArithmeticLogicUnit
(5) ResultWrite Back
FloatingPointOperation
Thread 1: Integer Operation
![Page 14: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/14.jpg)
Dr. Zaman 14
Introduction
Thread 2: Floating Point (FP) Operation
(Pipelining Technique)
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Thread 2: Floating Point Operation
![Page 15: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/15.jpg)
Dr. Zaman 15
Introduction
Threads 1 and 2: INT and FP Operations
(Pipelining Technique)
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Thread 1: Integer Operation
Thread 2: Floating Point Operation
POSSIBLE?
![Page 16: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/16.jpg)
Dr. Zaman 16
Performance
Threads 1 and 2: INT and FP Operations
(Pipelining Technique)
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Thread 1: Integer Operation
Thread 2: Floating Point Operation
POSSIBLE?
![Page 17: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/17.jpg)
Dr. Zaman 17
Performance Improvement
Threads 1 and 3: Integer Operations
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Thread 1: Integer Operation
Thread 3: Integer Operation
POSSIBLE?
![Page 18: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/18.jpg)
Dr. Zaman 18
Performance Improvement
Threads 1 and 3: Integer Operations
(Multicore)
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Thread 1: Integer Operation
Thread 3: Integer Operation
POSSIBLE?
Core 1
Core 2
![Page 19: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/19.jpg)
Dr. Zaman 19
Performance Improvement
Threads 1, 2, 3, and 4: INT & FP Operations
(Multicore)InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Core 2
Thread 1: Integer Operation
Thread 3: Integer Operation
Thread 4: Floating Point Operation
Thread 2: Floating Point Operation
POSSIBLE?
Core 1
![Page 20: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/20.jpg)
Dr. Zaman 20
More Performance?
Threads 1, 2, 3, and 4: INT & FP Operations
(Multicore)InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
InstructionFetch
InstructionDecode
Operand(s)Fetch
IntegerOperation
ArithmeticLogicUnit
ResultWriteBack
FloatingPointOperation
Core 2
Thread 1: Integer Operation
Thread 3: Integer Operation
Thread 4: Floating Point Operation
Thread 2: Floating Point Operation
POSSIBLE?
Core 1
![Page 21: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/21.jpg)
Dr. Zaman 21
“SMT/GPU Provides High Performance;at WSU CAPPLab, we can help you!”
Outline ►■ Introduction
Single-Core to Multicore Architectures
■ Performance Improvement Simultaneous Multithreading (SMT) (SMT enabled) Multicore CPU with GPUs
■ Energy-Efficient Computing Dynamic GPU Selection
■ CAPPLab “People First” Resources Research Grants/Activities
■ Discussion
![Page 22: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/22.jpg)
Dr. Zaman 22
Parallel/Concurrent Computing
Parallel Processing – It is not fun!Let’s play a game: Paying the lunch bill together
Started with $30; spent $29 ($27 + $2)Where did $1 go?
Friend Before Eating
Total Bill
Return Tip After Paying
A $10 $1
B $10 $25 $5 $2 $1
C $10 $1
Total $30 $2
Total Spent
$9
$9
$9
$27
SMT enabled Multicore CPU with Manycore GPU for Ultimate Performance!
![Page 23: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/23.jpg)
Dr. Zaman 23
Performance Improvement
Simultaneous Multithreading (SMT)■ Thread
A running program (or code segment) is a process Process processes / threads
■ Simultaneous Multithreading (SMT) Multiple threads running in a single-processor at the same time Multiple threads running in multiple processors at the same time
■ Multicore Programming Language supports OpenMP, Open MPI, CUDA, …C
![Page 24: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/24.jpg)
Dr. Zaman 24
Performance Improvement
Simultaneous Multithreading (SMT)■ Example:
■ Generating/Managing Multiple Threads OpenMP, Open MPI …C
![Page 25: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/25.jpg)
Dr. Zaman 25
Identify Challenges■ Sequential data-independent problems
C[] A[] + B[]♦ C[5] A[5] + B[5]
A’[] A[]♦ A’[5] A[5]
SMT capable multicore processor; CUDA/GPU Technology
Core 1 Core 2
Performance Improvement
![Page 26: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/26.jpg)
Dr. Zaman 26
■ CUDA/GPU Programming■ GP-GPU Card
A GPU card with 16 streaming multiprocessors (SMs)
Inside each SM:• 32 cores
• 64KB shared memory
• 32K 32bit registers
• 2 schedulers
• 4 special function units
■ CUDA GPGPU Programming Platform
Performance Improvement
![Page 27: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/27.jpg)
Dr. Zaman 27
Performance Improvement
CPU-GPU Technology■ Tasks/Data exchange mechanism
Serial Computations – CPU Parallel Computations - GPU
![Page 28: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/28.jpg)
Dr. Zaman 28
Performance Improvement
GPGPU/CUDA Technology■ The host (CPU) executes a kernel in GPU in 4 steps
(Step 1) CPU allocates and copies data to GPUOn CUDA API:
cudaMalloc()cudaMemCpy()
![Page 29: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/29.jpg)
Dr. Zaman 29
Performance Improvement
GPGPU/CUDA Technology■ The host (CPU) executes a kernel in GPU in 4 steps
(Step 2) CPU Sends function parameters and instructions to GPU
CUDA API:
myFunc<<<Blocks, Threads>>>(parameters)
![Page 30: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/30.jpg)
Dr. Zaman 30
Performance Improvement
GPGPU/CUDA Technology■ The host (CPU) executes a kernel in GPU in 4 steps
(Step 3) GPU executes instruction as scheduled in warps
(Step 4) Results will need to be copied back to Host memory (RAM) using cudaMemCpy()
![Page 31: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/31.jpg)
Dr. Zaman 31
Performance Improvement
Case Study 1 (data independent computation without GPU/CUDA)
■ Matrix Multiplication
Matrices Systems
![Page 32: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/32.jpg)
Dr. Zaman 32
Performance Improvement
Case Study 1 (data independent computation without GPU/CUDA)
■ Matrix Multiplication
Execution Time Power Consumption
![Page 33: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/33.jpg)
Dr. Zaman 33
Performance Improvement
Case Study 2 (data dependent computation without GPU/CUDA)
■ Heat Transfer on 2D Surface
Execution Time Power Consumption
![Page 34: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/34.jpg)
Dr. Zaman 34
Performance Improvement
Case Study 3 (data dependent computation with GPU/CUDA)
■ Fast Effective Lightning Strike Simulation The lack of lightning strike protection for the composite materials
limits their use in many applications.
![Page 35: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/35.jpg)
Dr. Zaman 35
Performance Improvement
Case Study 3 (data dependent computation with GPU/CUDA)
■ Fast Effective Lightning Strike Simulation■ Laplace’s Equation■ Simulation
CPU OnlyCPU/GPU w/o shared memoryCPU/GPU with shared memory
![Page 36: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/36.jpg)
Dr. Zaman 36
Performance Improvement
Case Study 4 (MATLAB Vs GPU/CUDA)■ Different simulation modelsTraditional sequential programCUDA program (no shared memory)CUDA program (with shared memory)Traditional sequential MATLABParallel MATLAB
CUDA/C parallel programming of the finite difference method based Laplace’s equation demonstrate up to 257x speedup and 97% energy savings over a parallel MATLAB implementation while solving a 4Kx4K problem with reasonable accuracy.
![Page 37: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/37.jpg)
Dr. Zaman 37
Identify More Challenges■ Sequential data-independent problems
C[] A[] + B[]♦ C[5] A[5] + B[5]
A’[] A[]♦ A’[5] A[5]
SMT capable multicore processor; CUDA/GPU Technology
■ Sequential data-dependent problems B’[] B[]
♦ B’[5] {B[4], B[5], B[6]}
Communication needed♦ Core 1 and Core 2
Core 1 Core 2
Core 1 Core 2
Performance Improvement
![Page 38: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/38.jpg)
Dr. Zaman 38
Develop Solutions■ Task Regrouping
Create threads
■ Data Regrouping Regroup data Data for each thread
Threads with G2s first Then, threads with G1s
(Step 2 of 5) CPU copies data to GPUOn CUDA API:
cudaMemCpy()
Performance Improvement
![Page 39: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/39.jpg)
Dr. Zaman 39
Assess the Solutions■ What is the Key?■ Synchronization
With synchronization Without synchronization
♦ Fast Vs. Accuracy
Threads with G2s first Then, threads with G1s
(Step 2 of 5) CPU copies data to GPUOn CUDA API:
cudaMemCpy()
Performance Improvement
![Page 40: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/40.jpg)
Dr. Zaman 40
“SMT/GPU Provides High Performance;at WSU CAPPLab, we can help you!”
Outline ►■ Introduction
Single-Core to Multicore Architectures
■ Performance Improvement Simultaneous Multithreading (SMT) (SMT enabled) Multicore CPU with GP-GPU
■ Energy-Efficient Computing Dynamic GPU Selection
■ CAPPLab “People First” Resources Research Grants/Activities
■ Discussion
![Page 41: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/41.jpg)
Dr. Zaman 41
Kansas Unique Challenge■ Climate and Energy
Protect environment from harms due to climate change
Save natural energy
Energy-Efficient Computing
![Page 42: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/42.jpg)
Dr. Zaman 42
“Power” Analysis■ CPU with multiple GPU
GPU usages vary
■ Power Requirements NVIDIA GTX 460 (336-core) - 160W [1] Tesla C2075 (448-core) - 235W [2] Intel Core i7 860 (4-core, 8-thread) -
150-245W [3, 4]
■ Dynamic GPU Selection Depending on
♦ the “tasks”/threads
♦ GPU usages
CPU
GPU
GPUGPU
Energy-Efficient Computing
![Page 43: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/43.jpg)
Dr. Zaman 43
CPU-to-GPU Memory Mapping■ GPU Shared Memory
Improves performance CPU to GPU global memory GPU global to shared
■ Data Regrouping CPU to GPU global memory
Energy-Efficient Computing
![Page 44: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/44.jpg)
Dr. Zaman 44
Integrate Research into Education■ CS 794 – Multicore Architectures Programming
Multicore Architecture Simultaneous Multithreading Parallel Programming
Moore’s law Amdahl’s law Gustafson’s law Law of diminishing returns Koomey's law
Teaching Low-Power HPC Systems
![Page 45: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/45.jpg)
Dr. Zaman 45
“SMT/GPU Provides High Performance;at WSU CAPPLab, we can help you!”
Outline ►■ Introduction
Single-Core to Multicore Architectures
■ Performance Improvement Simultaneous Multithreading (SMT) (SMT enabled) Multicore CPU with GP-GPU
■ Energy-Efficient Computing Dynamic GPU Selection
■ CAPPLab “People First” Resources Research Grants/Activities
■ Discussion
![Page 46: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/46.jpg)
Dr. Zaman 46
WSU CAPPLab
CAPPLab■ Computer Architecture & Parallel Programming
Laboratory (CAPPLab) Physical location: 245 Jabara Hall, Wichita State University URL: http://www.cs.wichita.edu/~capplab/ E-mail: [email protected]; [email protected] Tel: +1-316-WSU-3927
■ Key Objectives Lead research in advanced-level computer architecture, high-
performance computing, embedded systems, and related fields. Teach advanced-level computer systems & architecture, parallel
programming, and related courses.
![Page 47: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/47.jpg)
Dr. Zaman 47
WSU CAPPLab
“People First”■ Students
Kishore Konda Chidella, PhD Student Mark P Allen, MS Student Chok M. Yip, MS Student Deepthi Gummadi, MS Student
■ Collaborators Mr. John Metrow, Director of WSU HiPeCC Dr. Larry Bergman, NASA Jet Propulsion Laboratory (JPL) Dr. Nurxat Nuraje, Massachusetts Institute of Technology (MIT) Mr. M. Rahman, Georgia Institute of Technology (Georgia Tech) Dr. Henry Neeman, University of Oklahoma (OU)
![Page 48: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/48.jpg)
Dr. Zaman 48
WSU CAPPLab
Resources■ Hardware
3 CUDA Servers – CPU: Xeon E5506, 2x 4-core, 2.13 GHz, 8GB DDR3; GPU: Telsa C2075, 14x 32 cores, 6GB GDDR5 memory
2 CUDA PCs – CPU: Xeon E5506, … Supercomputer (Opteron 6134, 32 cores per node, 2.3 GHz, 64
GB DDR3, Kepler card) via remote access to WSU (HiPeCC) 2 CUDA enabled Laptops More …
■ Software CUDA, OpenMP, and Open MPI (C/C++ support) MATLAB, VisualSim, CodeWarrior, more (as may needed)
![Page 49: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/49.jpg)
Dr. Zaman 49
WSU CAPPLab
Scholarly Activities■ WSU became “CUDA Teaching Center” for 2012-13
Grants from NSF, NVIDIA, M2SYS, Wiktronics Teaching Computer Architecture and Parallel Programming
■ Publications Journal: 21 published; 3 under preparation Conference: 57 published; 2 under review; 6 under preparation Book Chapter: 1 published; 1 under preparation
■ Outreach USD 259 Wichita Public Schools Wichita Area Technical and Community Colleges Open to collaborate
![Page 50: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/50.jpg)
Dr. Zaman 50
WSU CAPPLab
Research Grants/Activities■ Grants
WSU: ORCA NSF – KS NSF EPSCoR First Award M2SYS-WSU Biometric Cloud Computing Research Grant Teaching (Hardware/Financial) Award from NVIDIA Teaching (Hardware/Financial) Award from Xilinx
■ Proposals NSF: CAREER (working/pending) NASA: EPSCoR (working/pending) U.S.: Army, Air Force, DoD, DoE Industry: Wiktronics LLC, NetApp Inc, M2SYS Technology
![Page 51: “SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649ea35503460f94ba70db/html5/thumbnails/51.jpg)
Bogazici University; Istanbul, Turkey; 2014
“SMT/GPU Provides High Performance;at WSU CAPPLab, we can help you!”
Thank You!
QUESTIONS?
Contact: Abu AsaduzzamanE-mail: [email protected]
Phone: +1-316-978-5261http://www.cs.wichita.edu/~capplab/