lecture 1 multicore programming primer and programming ... · multicore programming primer and...

27
Prof. Saman Amarasinghe, MIT. 1 6.189 IAP 2007 MIT 6.189 IAP 2007 Lecture 1 Multicore Programming Primer and Programming Competition Introduction

Upload: lemien

Post on 16-May-2018

242 views

Category:

Documents


1 download

TRANSCRIPT

Prof. Saman Amarasinghe, MIT. 1 6.189 IAP 2007 MIT

6.189 IAP 2007

Lecture 1

Multicore Programming Primer and Programming Competition

Introduction

2 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The “Software Crisis”

“To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem."

-- E. Dijkstra, 1972 Turing Award Lecture

3 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The First Software Crisis

● Time Frame: ’60s and ’70s

● Problem: Assembly Language ProgrammingComputers could handle larger more complex programs

● Needed to get Abstraction and Portability without losing Performance

4 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

How Did We Solve the First Software Crisis?

● High-level languages for von-Neumann machinesFORTRAN and C

● Provided “common machine language” for uniprocessors

Single memory image

Single flow of control

Common Properties

ISA

Functional Units

Register FileDifferences:

5 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The Second Software Crisis

● Time Frame: ’80s and ’90s

● Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers

Computers could handle larger more complex programs

● Needed to get Composability, Malleability and Maintainability

High-performance was not an issue left for Moore’s Law

6 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

How Did We Solve the Second Software Crisis?

● Object Oriented ProgrammingC++, C# and Java

● Also…Better tools– Component libraries, Purify Better software engineering methodology – Design patterns, specification, testing, code reviews

7 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

● Solid boundary between Hardware and Software

● Programmers don’t have to know anything about the processor

High level languages abstract away the processors– Ex: Java bytecode is machine independent Moore’s law does not require the programmers to know anything about the processors to get good speedups

● Programs are oblivious of the processor work on all processors

A program written in ’70 using C still works and is much faster today

● This abstraction provides a lot of freedom for the programmers

Today: Programmers are Oblivious to Processors

8 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The Origins of a Third Crisis

● Time Frame: 2005 to 20??

● Problem: Sequential performance is left behind by Moore’s law

● Needed continuous and reasonable performance improvements to support new featuresto support larger datasets

● While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer

critical to keep-up with the current rate of evolution in software

9 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Perfo

rman

ce (v

s. V

AX-

11/7

80)

25%/year

52%/year

??%/year

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

The March to Multicore:Moore’s Law

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

10 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The March to Multicore:Uniprocessor Performance (SPECint)

Specint2000

1.00

10.00

100.00

1000.00

10000.00

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07

i ntel 386

i ntel 486

i ntel pent i um

i ntel pent i um 2

i ntel pent i um 3i ntel pent i um 4

i ntel i tani um

A l pha 21064

A l pha 21164

A l pha 21264

Spar c

Super Spar c

Spar c64

M i ps

HP PAPower PC

AM D K6

AM D K7

AM D x86-64

11 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The March to Multicore:Uniprocessor Performance (SPECint)

● General-purpose unicores have stopped historic performance scaling

Power consumptionWire delaysDRAM access latencyDiminishing returns of more instruction-level parallelism

From David Patterson

12 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Power Consumption (watts)

Power

1

10

100

1000

85 87 89 91 93 95 97 99 01 03 05 07

intel 386

intel 486

intel pentium

intel pentium 2

intel pentium 3

intel pentium 4

intel i tanium

Alpha 21064

Alpha 21164

Alpha 21264

Spar c

Super Spar c

Spar c64

Mips

HP PA

Power PC

AMD K6

AMD K7

AMD x86-64

13 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Power Efficiency (watts/spec)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1982 1984 1987 1990 1993 1995 1998 2001 2004 2006

Year

Wat

ts/S

pec

intel 386intel 486intel pentiumintel pentium 2intel pentium 3intel pentium 4intel itaniumAlpha 21064Alpha 21164Alpha 21264SparcSuperSparcSparc64M ipsHP PAPower PCAM D K6AM D K7AM D x86-64

14 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Range of a Wire in One Clock Cycle

00.020.040.060.080.1

0.120.140.160.180.2

0.220.240.260.28

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014Year

Pro

cess

(mic

rons

)

700 MHz

1.25 GHz

2.1 GHz

6 GHz10 GHz

13.5 GHz

• 400 mm2 Die• From the SIA Roadmap

15 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

DRAM Access Latency

● Access times are a speed of light issue

● Memory technology is also changing

SRAM are getting harder to scaleDRAM is no longer cheapest cost/bit

● Power efficiency is an issue here as well

1

100

10000

1000000

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

Year

Perf

orm

ance

µProc60%/yr.

(2X/1.5yr)

DRAM9%/yr.

(2X/10 yrs)

16 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Diminishing Returns

● The ’80s: Superscalar expansion 50% per year improvement in performanceTransistors applied to implicit parallelism– pipeline processor (10 CPI --> 1 CPI)

● The ’90s: The Era of Diminishing ReturnsSqueaking out the last implicit parallelism– 2-way to 6-way issue, out-of-order issue, branch prediction– 1 CPI --> 0.5 CPIperformance below expectationsprojects delayed & canceled

● The ’00s: The Beginning of the Multicore EraThe need for Explicit Parallelism

17 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

AMD OpteronDual Core

Intel Montecito1.7 Billion transistors

Dual Core IA/64Intel TanglewoodDual Core IA/64

Intel Pentium Extreme3.2GHz Dual Core

Intel Tejas & JayhawkUnicore (4GHz P4)

Intel DempseyDual Core Xeon

Intel Pentium D(Smithfield)

Cancelled

Intel YonahDual Core Mobile

IBM Power 6Dual Core

IBM Power 4 and 5Dual Cores Since 2001

IBM CellScalable Multicore

Sun Olympus and Niagara8 Processor Cores

MIT Raw 16 Cores

Since 2002

… 1H 2005 1H 2006 2H 20062H 20052H 2004

Unicores are on the verge of extinction Multicores are here

18 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

1985 199019801970 1975 1995 2000 2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Boardcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128256

512

Opteron 4PXeon MP

AmbricAM2045

Multicores are Here

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2Athlon

19 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Requirements and Outcomes

● RequirementsA good programmer with experience Fluent in C

● OutcomesKnow fundamental concepts of parallel programming (both hardware and software)Understand issues of parallel performance Able to synthesize a fairly complex parallel programHands-on experience with the IBM Cell processor

20 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

The Project● You proposed the projects● We selected 7 teams

Mainly by the strength of the project proposals● Seven Great Projects

Distributed Real-time Ray TracerGlobal IlluminationLinear Algebra PackMolecular Dynamics SimulatorSpeech SynthesizerSoft RadioBackgammon Tutor

● Project Characteristics Ambitious but accomplishable Important and RelevantOpportunity to sizzle

● Get them started ASAP!

21 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

A Note of Caution

● Cell processor is very new● It is not an easy architecture to work with● The tool chain is thin and brittle● Most of the staff have limited experience ● Projects you are doing are of your own making.

They aren’t canned exercises that are tried and proven. ● You will face unexpected problems.● WE ARE ALL IN THIS TOGETHER!!

22 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Grading

● Mini Quizzes 16%At the beginning of each class day5 minutes each

● Lab Projects 24%

● Final Group Project 60%

23 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Final Competition

● The competition will be decided onPerformance Completeness Algorithmic complexityDemo and Presentation

● The winning team willGet gift certificates ($150 each)Be invited to IBM TJ Watson Research Center for a day– Tour of the facilities– Present your project

24 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Staff● Prof. Saman Amarasinghe ([email protected])

Interested in languages, compilers and computer architectureRaw Processor (with Prof. Anant Agarwal)StreamIt language SUIF parallelizing compiler

● Dr. Rodric Rabbah ([email protected])Currently a researcher at IBM Watson Research CenterWas a research scientist at CSAIL before thatInterested in compilers, computer architecture and FPGAs

● TAsDavid Zhang ([email protected])– Course 6 M.Eng.Phil Sung ([email protected])– Course 6 M.Eng.

25 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Guest Lectures

● Dr. Michael PerroneIBM Watson Research CenterExpert in Cell Architecture and Application Development

● Prof. Alan EdelmanMath and CS. Interested in parallel algorithms

● Prof. ArvindParallel architectures, compilers and languages

● Dr. Bradley KuszmaulResearch scientist at CSAIL working on Cilk

● Mike Acton Professional game developer

● Bill ThiesCSAIL PhD candidateArchitect of StreamIt

26 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Lecture Organization

Implicit Explicit

Hardware CompilerSuperscalarProcessors

(start of Lecture 3)

ParallelizingCompilers

(Lectures 11 & 12)

LibraryLanguagesConcurrency

(Lecture 4)

Design Patterns(Lectures 5,6 7)

StreamIt (Lecture 8)Star-P (Lecture 13)BlueSpec (Lecture 14)Cilk (Lecture 15)

Extracting Parallelism

27 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Schedule Monday Tuesday Wednesday Thursday Friday

10:00 – 10:55

Lecture 1: Course Introduction

Recitation 1: Getting to Know Cell

Lecture 3: Introduction to Parallel Architectures

Lecture 5: Parallel Programming Concepts Jan

8 11:05 – 12:00

Lecture 2: Introduction to Cell Processor

Lecture 4: Introduction to Concurrent Programming

Project Reviews

Lecture 6: Design Patterns for Parallel Programming I

10:00 – 10:55

Lecture 7: Design Patterns for Parallel Programming II

Recitation 4: Cell Debugging Tools

Lecture 9: Debugging and Performance Monitoring Jan

15 11:05 – 12:00

Holiday Recitation 2-3: Cell Programming Hands-On

Lecture 8: StreamIt Language

Lecture 10: Performance Optimizations

10:00 – 10:55

Lecture 11: Classic Parallelizing Compilers

Lecture 13: Star-P Lecture 15: Cilk Jan 22

11:05 – 12:00

Lecture 12: StreamIt Parallelizing Compiler

Recitation 5, 6: Cell Performance Monitoring Tools Lecture 14:

Synthesizing Parallel Programs

Lecture 16: Anatomy of a Game

10:00 – 10:55

Lecture 17: The Raw Experience Jan

29 11:05 – 12:00 18: The Future

Group Presentations Awards & Reception