lecture 1 introduction - computer...
TRANSCRIPT
![Page 1: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/1.jpg)
Lecture 1
Introduction
![Page 2: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/2.jpg)
2
Welcome to CSE 260! • Your instructor is Scott Baden
u [email protected] u Office hours in EBU3B Room 3244
• Tues/Thurs 4pm to 5pm or by appointment
• Your TA is Jing Zheng u [email protected]
• The class home page is
http://www.cse.ucsd.edu/classes/wi14/cse260-a
• CSME?
• Logins – www-cse.ucsd.edu/classes/wi14/cse260-a/lab.html u Moodle u Bang u Dirac u Lilliput
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 3: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/3.jpg)
3
Text and readings • Required texts
u Programming Massively Parallel Processors: A Hands-on Approach, 2nd Ed., by David Kirk and Wen-mei Hwu, Morgan Kaufmann Publishers (201w)
u An Introduction to Parallel Programming, by Peter Pacheco, Morgan Kaufmann (2011)
• Assigned class readings will include handouts and on-line material
• Lecture slides www.cse.ucsd.edu/classes/wi14/cse260-a/Lectures
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 4: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/4.jpg)
4
Policies • Academic Integrity
u Do you own work u Plagiarism and cheating will not be tolerated
• By taking this course, you implicitly agree to abide by the following the course polices: www.cse.ucsd.edu/classes/wi14/cse260-a/Policies.html
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 5: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/5.jpg)
5
Course Requirements • Do the readings before lecture and be
prepared to discuss in class • 3 Programming labs
u Teams of 2 or 3, Teams of 1 with permission u May switch teams but be sure to give notice as
explained in Course Policies u Includes a lab report, greater emphasis
(grading) with each lab u Find a partner using the
“looking for a partner” Moodle forum
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 6: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/6.jpg)
6
Programming Laboratories • 3 labs
u A1: Performance programming a single core u A2: GPU Fermi (NERSC’s Dirac) – CUDA u A3: MPI on a cluster (Bang) - with options
• A3 is worth 50%, one month duration u More discussed later in the quarter
• Teams of 2 or 3 • Establish a schedule with your partner at the start
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 7: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/7.jpg)
7
Course overview and background • How to solve computationally intensive problems
on parallel computers u Software techniques u Performance tradeoffs
• Background u Graduate standing u C or C++ u Recommended: computer architecture (CSE 240A) u Students outside CSE are welcome u See me if you are unsure about your background
• Prior experience u Parallel computation? u Numerical analysis?
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 8: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/8.jpg)
8
Background Markers • C/C++ Java Fortran? • Navier Stokes Equations • Sparse factorization • TLB misses • Multithreading • MPI • CUDA, GPUs • RPC • Abstract base class €
∇ • u = 0DρDt
+ ρ ∇ •v( ) = 0
€
f (a) +" f (a)1!
(x − a) +" " f (a)2!
(x − a)2 + ...
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 9: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/9.jpg)
• How to solve computationally intensive problems on parallel computers effectively u Theory and practice u Software techniques u Performance tradeoffs
• Different platforms: GPUs, cluster • CSE 260 will build on what you learned earlier in
your career about programming, algorithm design & analysis and generalize them
What you’ll learn in this class
©2014 Scott B. Baden / CSE 260 / Winter '14 9
![Page 10: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/10.jpg)
10
Syllabus • Fundamentals
Motivation, system organization, hardware execution models, limits to performance, program execution models, theoretical models, technology
• Software and programming u Programming models and techniques: message passing,
multithreading u Architectural considerations: GPUs and multicore u Higher level run time models, language support u CUDA, openmp, MPI
• Parallel algorithm design and implementation
u Case studies to develop a repertoire of problem solving techniques: discretization, sorting, linear algebra, irregular problems, sorting
u Data structures and their efficient implementation: load balancing and performance
u Optimizing data motion u Performance tradeoffs, evaluation, and tuning
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 11: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/11.jpg)
• Learning is not a passive process • I won’t just “broadcast” the material • Consider the slides as talking points, class
discussions driven by your interest • Class participation is important to keep the
lecture active • Complete the assigned readings before class • Different lecture modalities
u The 2 minute pause u In class problem solving
Class presentation technique
©2014 Scott B. Baden / CSE 260 / Winter '14 11
![Page 12: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/12.jpg)
• Opportunity in class to develop understanding, make sure you “grok” it u By trying to explain to someone else u Getting your brain actively working on it
• What will happen u I pose a question u You discuss with 1-2 people around you
• Most important is your understanding of why the answer is correct
u After most people seem to be done • I’ll ask for quiet • A few will share what their group talked about
– Good answers are those where you were wrong, then realized…
The 2 minute pause
©2014 Scott B. Baden / CSE 260 / Winter '14 12
![Page 13: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/13.jpg)
• Syllabus • Intro to parallel computation
The rest of the lecture
©2014 Scott B. Baden / CSE 260 / Winter '14 13
![Page 14: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/14.jpg)
14
What is parallel processing? • Decompose a workload onto simultaneously
executing physical resources • Multiple processors co-operate to process a related
set of tasks – tightly coupled • Improve some aspect of performance
u Speedup: 100 processors run ×100 faster than one u Capability: Tackle a larger problem, more accurately u Algorithmic, e.g. search u Locality: more cache memory and bandwidth
• Resilience an issue at the high end Exascale:1018 flops/sec
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 15: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/15.jpg)
Parallel Processing, Concurrency & Distributed Computing
• Parallel processing u Performance (and capacity) is the main goal u More tightly coupled than distributed computation
• Concurrency u Concurrency control: serialize certain computations to ensure
correctness, e.g. database transactions u Performance need not be the main goal
• Distributed computation u Geographically distributed u Multiple resources computing & communicating unreliably u “Cloud” computing, large amounts of storage u Looser, coarser grained communication and synchronization
• May or may not involve separate physical resources, e.g. multitasking “Virtual Parallelism”
©2014 Scott B. Baden / CSE 260 / Winter '14 15
![Page 16: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/16.jpg)
Granularity
• A measure of how often a computation communicates, and what scale u Distributed computer: a whole program u Multicomputer: function, a loop nest u Multiprocessor: + memory reference u Multicore: similar to a multiprocessor but
perhaps finer grained u GPU: kernel thread u Instruction level parallelism: instruction,
register
16 ©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 17: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/17.jpg)
17
The impact of technology
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 18: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/18.jpg)
18
Why is parallelism inevitable? • Physical limits on processor clock speed and heat
dissipation • A parallel computer increases memory capacity
and bandwidth as well as the computational rate
Average CPU clock speeds (via Bill Gropp) http://www.pcpitstop.com/research/cpu.asp ©2014 Scott B. Baden / CSE 260 / Winter '14
Nvidia
![Page 19: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/19.jpg)
Today’s mobile computer would have been yesterday’s supercomputer
• Cray-1 Supercomputer
• 80 MHz processor • 240 Mflops/sec peak • 3.4 Mflops Linpack • 8 Megabytes memory
• Water cooled • 1.8m H x 2.2m W • 4 tons • Over $10M in 1976
©2014 Scott B. Baden / CSE 260 / Winter '14 19
![Page 20: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/20.jpg)
• Transformational: modelling, healthcare… • New capabilities • Changes the common wisdom for solving a
problem including the implementation Intel Phi, ���
50 cores, 2012 Cray-1, 1976,���240 Megaflops
Nvidia Tesla, 4.14 Tflops, 2009
Sony Playstation 3, 150 Glfops, 2006
Beowulf cluster, ���late 1990s
Apple A6X, 2012 ASCI Red,
1997, 1Tflop
Connection Machine CM-2, 1987
The progress of technological disruption
©2014 Scott B. Baden / CSE 260 / Winter '14 20
![Page 21: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/21.jpg)
21 1/7/14 21
The age of the multi-core processor • On-chip parallel computer • IBM Power4 (2001), many others follow
(Intel, AMD, Tilera, Cell Broadband Engine) • First dual core laptops (2005-6) • GPUs (nVidia, ATI): desktop supercomputer • In smart phones, behind the dashboard
blog.laptopmag.com/nvidia-tegrak1-unveiled • Everyone has a parallel computer at their
fingertips • If we don’t use parallelism, we lose it! realworldtech.com
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 22: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/22.jpg)
22 1/6/14 22
The GPU • Specialized many-core processor • Massively multithreaded, long vectors • Reduced on-chip memory per core • Explicitly manage the memory
hierarchy
©2014 Scott B. Baden / CSE 260 / Winter '14
Christopher Dyken, SINTEF
1200
1000
800
600
GF
LOP
S
AMD (GPU)NVIDIA (GPU)Intel (CPU)
400
200
02001 2002 2003 2004 2005
Year2006 2007
Quad-coreDual-core
Many-core GPU
Multicore CPU
Courtesy: John Owens2008 2009
![Page 23: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/23.jpg)
23 1/6/14 23
The impact
• You are taking this class • A renaissance in parallel computation • Parallelism is no longer restricted to the
HPC cult with large machine rooms, it is relevant to everyone
• May or may not know when using a parallel computer
©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 24: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/24.jpg)
Have you written a parallel program? • Threads • MPI • RPC • C++11 Async • CUDA
©2014 Scott B. Baden / CSE 260 / Winter '14 24
![Page 25: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/25.jpg)
Motivating Applications
25 ©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 26: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/26.jpg)
Simulates a 7.7 earthquake along the southern San Andreas fault near LA using seismic, geophysical, and other data from the Southern California Earthquake Center
©2014 Scott B. Baden / CSE 260 / Winter '14 26
epicenter.usc.edu/cmeportal/TeraShake.html
A Motivating Application - TeraShake
![Page 27: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/27.jpg)
• Divide up Southern California into blocks
• For each block, get all the data about geological structures, fault information, …
• Map the blocks onto processors of the supercomputer
• Run the simulation using current information on fault activity and on the physics of earthquakes
©2014 Scott B. Baden / CSE 260 / Winter '14 27
SDSC Machine Room
DataCentral@SDSC
How TeraShake Works
![Page 28: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/28.jpg)
©2014 Scott B. Baden / CSE 260 / Winter '14 28
Animation
![Page 29: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/29.jpg)
• Capability u We solved a problem that we couldn’t solve
before, or under conditions that were not possible previously
• Performance u Solve the same problem in less time than before u This can provide a capability if we are solving
many problem instances • The result achieved must justify the effort
u Enable new scientific discovery u Software costs must be reasonable
The Payoff
©2014 Scott B. Baden / CSE 260 / Winter '14 29
![Page 30: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/30.jpg)
I increased performance – so what’s the catch? • A well behaved single processor algorithm may
behave poorly on a parallel computer, and may need to be reformulated numerically
• There is no magic compiler that can turn a serial program into an efficient parallel program all the time and on all machines u Performance programming involving low-level details:
heavily application dependent u Irregularity in the computation and its data structures
forces us to think even harder u Users don’t start from scratch-they reuse old code.
Poorly structured code, or code structured for older architectures can entail costly reprogramming
30 ©2014 Scott B. Baden / CSE 260 / Winter '14
![Page 31: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/31.jpg)
• Performance programming u Low-level details: heavily application
dependent u Irregularity in the computation and its data
structures forces us to think even harder • Simplified processor design, but more user
control over the hardware resources • Parallelism introduces many new tradeoffs
u Redesign the software u Rethink the problem solving technique
What’s involved
©2014 Scott B. Baden / CSE 260 / Winter '14 31
![Page 32: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/32.jpg)
The two mantras for high performance
• Domain-specific knowledge is important in optimizing performance, especially locality
• Significant time investment for each 10-fold increase in a performance
©2014 Scott B. Baden / CSE 260 / Winter '14 34
Today
X 10
X 100
![Page 33: Lecture 1 Introduction - Computer Sciencecseweb.ucsd.edu/classes/wi14/cse260-a/Lectures/Lec01.pdf · Programming Massively Parallel Processors: A Hands-on Approach, nd2 Ed., by David](https://reader035.vdocuments.us/reader035/viewer/2022070111/605153acf26372772d5d2efb/html5/thumbnails/33.jpg)
Performance and Implementation Issues
1/6/14 35 35 ©2014 Scott B. Baden / CSE 260 / Winter '14
• Data motion cost growing relative to computation ! Conserve locality ! Hide latency
• Little’s Law [1961]
# threads = performance × latency
T = p × λ
! p and λ increasing with time p =1 - 8 flops/cycle λ = 500 cycles/word
Memory (DRAM)
Processor
Year
Perf
orm
ance
fotosearch.com fotosearch.com
λ p-1