the looming software crisis due to the multicore menace

55
The Looming Software Crisis due to the Multicore Menace Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Upload: others

Post on 12-Sep-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Looming Software Crisis due to the Multicore Menace

The Looming Software Crisis due to the Multicore Menace

Saman AmarasingheComputer Science and Artificial Intelligence Laboratory

Massachusetts Institute of Technology

Page 2: The Looming Software Crisis due to the Multicore Menace

2

The “Software Crisis”

“To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem."

-- E. Dijkstra, 1972 Turing Award Lecture

Page 3: The Looming Software Crisis due to the Multicore Menace

3

• Time Frame: ’60s and ’70s

• Problem: Assembly Language Programming– Computers could handle larger more complex programs

• Needed to get Abstraction and Portability without losing Performance

The First Software Crisis

Page 4: The Looming Software Crisis due to the Multicore Menace

4

• High-level languages for von-Neumann machines– FORTRAN and C

• Provided “common machine language” for uniprocessors– Only the common properties are exposed– Hidden properties are backed-up by good compiler technology

How Did We Solve the First Software Crisis?

Single memory image

Single flow of control

Common Properties

ISA

Functional Units

Register File

Differences:Register Allocation

Instruction SelectionInstruction Scheduling

Page 5: The Looming Software Crisis due to the Multicore Menace

5

The Second Software Crisis

• Time Frame: ’80s and ’90s

• Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers– Computers could handle larger more complex programs

• Needed to get Composability, Malleability and Maintainability– High-performance was not an issue left to Moore’s Law

Page 6: The Looming Software Crisis due to the Multicore Menace

6

How Did We Solve the Second Software Crisis?

• Object Oriented Programming– C++ – Now C# and Java

• Better tools– Component libraries, Purify

• Better software engineering methodology – Design patterns, specification,

testing, code reviews

Page 7: The Looming Software Crisis due to the Multicore Menace

7

Today: Programmers are Oblivious about the Processors

• Solid boundary between Hardware and Software

• Programmers don’t have to know anything about the processor– High level languages abstract away the processors

– Ex: Java bytecode is machine independent – Moore’s law does not require the programmers to know anything about

the processors to get good speedups

• Programs are oblivious to the processor works on all processors– A program written in ’70 using C still works and is much faster today

• This abstraction provides a lot of freedom for the programmers

Page 8: The Looming Software Crisis due to the Multicore Menace

8

The Origins of a Third Crisis

• Time Frame: 2010 to ??

• Problem: Sequential performance is left behind by Moore’s law– All software developers have abstracted away the processor

assuming that Moore’s law will always provide performance gains!

Page 9: The Looming Software Crisis due to the Multicore Menace

9

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Perfo

rman

ce (v

s. V

AX-

11/7

80)

25%/year

52%/year

??%/year

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

The March to Multicore:Moore’s Law

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

Page 10: The Looming Software Crisis due to the Multicore Menace

10

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Per

form

ance

(vs.

VA

X-11

/780

)

25%/year

52%/year

The March to Multicore:Uniprocessor Performance (SPECint)

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

Page 11: The Looming Software Crisis due to the Multicore Menace

11

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Perfo

rman

ce (v

s. V

AX-

11/7

80)

25%/year

52%/year

??%/year

The March to Multicore:Uniprocessor Performance (SPECint)

• General-purpose unicores have stopped historic performance scaling– Power consumption– Wire delays– DRAM access latency– Diminishing returns of more instruction-level parallelism

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

Page 12: The Looming Software Crisis due to the Multicore Menace

12

How to programmulticores?

1985 199019801970 1975 1995 2000 2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128256

512

Opteron 4PXeon MP

AmbricAM2045

The Origins of a Third Crisis

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2Athlon

A Program written in the 70’s not only works today…but also runs faster (tracking Moore’s law)

Page 13: The Looming Software Crisis due to the Multicore Menace

13

The Origins of a Third Crisis

• Time Frame: 2010 to ??

• Problem: Sequential performance is left behind by Moore’s law

• Needed continuous and reasonable performance improvements – to support new features– to support larger datasets

• While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer

critical to keep-up with the current rate of evolution in software

Page 14: The Looming Software Crisis due to the Multicore Menace

14

Why Parallelism is Hard

• A huge increase in complexity and work for the programmer– Programmer has to think about performance! – Parallelism has to be designed in at every level

• Humans are sequential beings – Deconstructing problems into parallel tasks is hard for many of us

• Parallelism is not easy to implement– Parallelism cannot be abstracted or layered away– Code and data has to be restructured in very different (non-intuitive) ways

• Parallel programs are very hard to debug– Combinatorial explosion of possible execution orderings – Race condition and deadlock bugs are non-deterministic and illusive – Non-deterministic bugs go away in lab environment and with

instrumentation

Page 15: The Looming Software Crisis due to the Multicore Menace

15

Outline: Ideas on Solvingthe Third Software Crisis

1. Advances in Computer Architecture

2. Novel Programming Models and Languages

3. Aggressive Compilers

4. Tools to support parallelization, debugging and migration

Page 16: The Looming Software Crisis due to the Multicore Menace

16

Computer Architecture

• Current generation of multicores– How can we cobble together something with existing

parts/investments? – Impact of multicores haven’t hit us yet

• The move to multicore will be a disruptive shift – An inversion of the cost model – A forced shift in the programming model

• A chance to redesign the microprocessor from scratch.

• What are the innovations that will reduce/eliminate the extra burden placed on the programmer?

Page 17: The Looming Software Crisis due to the Multicore Menace

17

Novel Opportunities in Multicores

• Don’t have to contend with uniprocessors• Not your same old multiprocessor problem

– How does going from Multiprocessors to Multicores impact programs?

– What changed?– Where is the Impact?

– Communication Bandwidth– Communication Latency

Page 18: The Looming Software Crisis due to the Multicore Menace

18

Communication Bandwidth

• How much data can be communicated between two cores?

• What changed?– Number of Wires– Clock rate– Multiplexing

• Impact on programming model?– Massive data exchange is possible– Data movement is not the bottleneck

processor affinity not that important

32 Giga bits/sec ~300 Tera bits/sec

10,000X

Page 19: The Looming Software Crisis due to the Multicore Menace

19

Communication Latency

• How long does it take for a round trip communication?

• What changed?– Length of wire– Pipeline stages

• Impact on programming model?– Ultra-fast synchronization– Can run real-time apps

on multiple cores

50X

~200 Cycles ~4 cycles

Page 20: The Looming Software Crisis due to the Multicore Menace

Architectural Innovations

The Raw Experience

Supported by a CISE Experimental Partnership

Page 21: The Looming Software Crisis due to the Multicore Menace

The MIT Raw Processor• Raw project started in 1997

Prototype operational in 2003• The Problem: How to keep the

Moore’s Law going with– Increasing processor complexity– Longer wire delays– Higher power consumption

• Raw philosophy – Build a tightly integrated multicore– Off-load most functions to

compilers and software• Raw design

– 16 single issue cores– 4 register-mapped networks– Huge IO bandwidth

• Raw power– 16 Flops/ops per cycle– 16 Memory Accesses per cycle– 208 Operand Routes per cycle– 12 IO Operations per cycle

180 nm ASIC (IBM SA-27E)18.2mm x 18.2mm~100 million transistorsDesigned for 225 MHzTested at 425 MHz

Page 22: The Looming Software Crisis due to the Multicore Menace

Raw’s networks are tightly coupled into the bypass paths

IF RFD

A TL

M1 M2

F P

E

U

TV

F4 WB

r26

r27

r25

r24

NetworkInputFIFOs

r26

r27

r25

r24

NetworkOutputFIFOs

Ex: lb r25, 0x341(r26)

fmul r24, r3, r4

softwarecontrolledcrossbar

softwarecontrolledcrossbar

fadd r5, r3, r24

route P->E route W->P

Page 23: The Looming Software Crisis due to the Multicore Menace

Raw Networks is Rarely the Bottleneck

• Raw has 4 bidirectional, point-to-point mesh networks– Two of them statically routed– Two of the dynamically routed

• A single issue core may read from or write to one network in a given cycle

• The cores cannot saturate the network!

(225 Gb/s @ 225 Mhz)

MIPS-StylePipeline

8 32-bitbuses

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

BitonicSort FFT DCT DES TDE Serpent

ave. bandwidth utilization

ave. instruction utilization

Page 24: The Looming Software Crisis due to the Multicore Menace

24

Outline: Ideas on Solvingthe Third Software Crisis

1. Advances in Computer Architecture

2. Novel Programming Models and Languages

3. Aggressive Compilers

4. Tools to support parallelization, debugging and migration

Page 25: The Looming Software Crisis due to the Multicore Menace

25

Programming Models and Languages

• Critical to solving the third software crisis

• Novel languages were the central solution in the last two crises

Page 26: The Looming Software Crisis due to the Multicore Menace

26

Lessons from the Last Crisis:The OO Revolution

• Object Oriented revolution did not come out of a vacuum

• Hundreds of small experimental languages

• Rely on lessons learned from lesser-known languages– C++ grew out of C, Simula, and other languages– Java grew out of C++, Eiffel, SmallTalk, Objective C, and Cedar/Mesa1

• Depend on results from research community

J. Gosling, H. McGilton, The Java Language Enviornment1

Page 27: The Looming Software Crisis due to the Multicore Menace

27

Object Oriented Languages

• Ada 95• BETA• Boo• C++• C#• ColdFusion• Common Lisp• COOL (Object

Oriented COBOL)• CorbaScript• Clarion• Corn• D• Dylan• Eiffel• F-Script• Fortran 2003• Gambas• Graphtalk• IDLscript• incr Tcl• J• JADE

• Java• Lasso• Lava• Lexico• Lingo• Modula-2• Modula-3• Moto• Nemerle• Nuva• NetRexx• Nuva• Oberon (Oberon-1)• Object REXX• Objective-C• Objective Caml• Object Pascal (Delphi)• Oz• Perl 5• PHP• Pliant• PRM• PowerBuilder

• ABCL• Python• REALbasic• Revolution• Ruby• Scala• Simula• Smalltalk• Self• Squeak• Squirrel• STOOP (Tcl

extension)• Superx++• TADS• Ubercode• Visual Basic• Visual FoxPro• Visual Prolog• Tcl• ZZT-oop

Source: Wikipedia

Page 28: The Looming Software Crisis due to the Multicore Menace

28

Language EvolutionFrom FORTRAN to a few present day languages

Source: Eric Levenez

Page 29: The Looming Software Crisis due to the Multicore Menace

29

Origins of C++

Source: B. Stroustrup, The Design and Evolution of C++

1960

1970

1980

1990

Structural influenceFeature influence

FortranAlgol 60

CPL

BCPL

C

ANSI C

Simula 67

C with Classes

C++

C++arm

C++std

ML CluAlgol 68

Ada

Page 30: The Looming Software Crisis due to the Multicore Menace

30

Academic Influence on C++

“Exceptions were considered in the original design of C++, butwere postponed because there wasn't time to do a thorough job ofexploring the design and implementation issues.

In retrospect, the greatest influence on the C++ exceptionhandling design was the work on fault-tolerant systems started at the University of Newcastle in England by Brian Randell and his colleagues and continued in many places since.”

-- B. Stroustrup, A History of C++

Page 31: The Looming Software Crisis due to the Multicore Menace

31

Origins of Java• Java grew out of C++, Eiffel, SmallTalk, Objective C, and Cedar/Mesa• Example lessons learned:

– Stumbling blocks of C++ removed (multiple inheritance, preprocessing, operator overloading, automatic coercion, etc.)

– Pointers removed based on studies of bug injection– GOTO removed based on studies of usage patterns– Objects based on Eiffel, SmallTalk– Java interfaces based on Objective C protocols– Synchronization follows monitor and condition variable paradigm (introduced by Hoare,

implemented in Cedar/Mesa)– Bytecode approach validated as early as UCSD P-System (‘70s)

Lesser-known precursors essential to Java’s success

Source: J. Gosling, H. McGilton, The Java Language Enviornment

Page 32: The Looming Software Crisis due to the Multicore Menace

32

Why New Programming Models and Languages?

• Paradigm shift in architecture– From sequential to multicore– Need a new “common machine language”

• New application domains– Streaming– Scripting– Event-driven (real-time)

• New hardware features– Transactions – Introspection– Scalar Operand Networks or Core-to-core DMA

• New customers– Mobile devices– The average programmer!

• Can we achieve parallelism without burdening the programmer?

Page 33: The Looming Software Crisis due to the Multicore Menace

33

Domain Specific Languages

• There is no single programming domain!– Many programs don’t fit the OO model (ex: scripting and streaming)

• Need to identify new programming models/domains– Develop domain specific end-to-end systems– Develop languages, tools, applications ⇒ a body of knowledge

• Stitching multiple domains together is a hard problem– A central concept in one domain may not exist in another

– Shared memory is critical for transactions, but not available in streaming – Need conceptually simple and formally rigorous interfaces – Need integrated tools– But critical for many DOD and other applications

Page 34: The Looming Software Crisis due to the Multicore Menace

34

• Two choices:• Bend over backwards to support

old languages like C/C++• Develop parallel architectures

that are hard to program

Programming Languages and Architectures

Modernarchitecture

C von-Neumannmachine

Page 35: The Looming Software Crisis due to the Multicore Menace

Compiler-Aware Language Design

The StreamIt Experience

Speaker

FMDemod

LPF1

Scatter

Gather

LPF2 LPF3

Supported by ITR and NGS Awards

Page 36: The Looming Software Crisis due to the Multicore Menace

programmability

domain specificoptimizations

enable parallelexecution

simple and effective optimizations for domain specific abstractions

boost productivity, enable faster development and rapid prototyping

Is there a win-win situation?

• Some programming models are inherently concurrent– Coding them using a sequential language is…

• Harder than using the right parallel abstraction • All information on inherent parallelism is lost

• There are win-win situations – Increasing the programmer productivity while extracting parallel performance

target tiled architectures, clusters, DSPs, multicores, graphics processors, …

Page 37: The Looming Software Crisis due to the Multicore Menace

• Applications– DES and Serpent [PLDI 05]– MPEG-2 [IPDPS 06]– SAR, DSP benchmarks, JPEG, …

• Programmability– StreamIt Language (CC 02) – Teleport Messaging (PPOPP 05)– Programming Environment in Eclipse (P-PHEC 05)

• Domain Specific Optimizations– Linear Analysis and Optimization (PLDI 03)– Optimizations for bit streaming (PLDI 05)– Linear State Space Analysis (CASES 05)

• Architecture Specific Optimizations– Compiling for Communication-Exposed

Architectures (ASPLOS 02)– Phased Scheduling (LCTES 03)– Cache Aware Optimization (LCTES 05)– Load-Balanced Rendering

(Graphics Hardware 05)– Task, Data and Pipeline Parallelism (ASPLOS 06)

StreamIt Program

Front-end

Stream-AwareOptimizations

Uniprocessorbackend

Clusterbackend

Rawbackend

IBM X10backend

C/C++ C per tile +msg code

StreamingX10 runtime

Annotated Java

MPI-likeC/C++

Simulator(Java Library)

The StreamIt Project

Page 38: The Looming Software Crisis due to the Multicore Menace

Picture Reorder

joiner

joiner

IDCT

IQuantization

splitter

splitter

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Channel Upsample Channel Upsample

Motion Vector Decode

Y Cb Cr

quantization coefficients

picture type

<QC>

<QC>

reference picture

Motion Compensation

<PT1> referencepicture

Motion Compensation

<PT1>reference picture

Motion Compensation

<PT1>

<PT2>

Repeat

Color Space Conversion

<PT1, PT2>

add VLD(QC, PT1, PT2);

add splitjoin {

split roundrobin(N∗B, V);

add pipeline {add ZigZag(B);add IQuantization(B) to QC;add IDCT(B);add Saturation(B);

}add pipeline {

add MotionVectorDecode();add Repeat(V, N);

}

join roundrobin(B, V);}

add splitjoin {split roundrobin(4∗(B+V), B+V, B+V);

add MotionCompensation(4∗(B+V)) to PT1;for (int i = 0; i < 2; i++) {

add pipeline {add MotionCompensation(B+V) to PT1;add ChannelUpsample(B);

}}

join roundrobin(1, 1, 1);}

add PictureReorder(3∗W∗H) to PT2;

add ColorSpaceConversion(3∗W∗H);

MPEG bit stream

Streaming Application Abstraction

• Structured block level diagram describes computation and flow of data

• Conceptually easy to understand– Clean abstraction of functionality

• Mapping to C (sequentialization) destroys this simple view

MPEG-2 Decoder

Page 39: The Looming Software Crisis due to the Multicore Menace

StreamIt Improves Productivity

output to player

Picture Reorder

joiner

joiner

IDCT

IQuantization

splitter

splitter

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Channel Upsample Channel Upsample

Motion Vector Decode

Y Cb Cr

quantization coefficients

picture type

<QC>

<QC>

reference picture

Motion Compensation

<PT1> referencepicture

Motion Compensation

<PT1>reference picture

Motion Compensation

<PT1>

<PT2>

Repeat

Color Space Conversion

<PT1, PT2>

add VLD(QC, PT1, PT2);

add splitjoin {split roundrobin(N∗B, V);

add pipeline {add ZigZag(B);add IQuantization(B) to QC;add IDCT(B);add Saturation(B);

}add pipeline {

add MotionVectorDecode();add Repeat(V, N);

}

join roundrobin(B, V);}

add splitjoin {split roundrobin(4∗(B+V), B+V, B+V);

add MotionCompensation(4∗(B+V)) to PT1;for (int i = 0; i < 2; i++) {

add pipeline {add MotionCompensation(B+V) to PT1;add ChannelUpsample(B);

}}

join roundrobin(1, 1, 1);}

add PictureReorder(3∗W∗H) to PT2;

add ColorSpaceConversion(3∗W∗H);

MPEG bit stream

Page 40: The Looming Software Crisis due to the Multicore Menace

Picture Reorder

joiner

joiner

IDCT

IQuantization

splitter

splitter

VLDmacroblocks, motion vectors

frequency encodedmacroblocks differentially coded

motion vectors

motion vectorsspatially encoded macroblocks

recovered picture

ZigZag

Saturation

Channel Upsample Channel Upsample

Motion Vector Decode

Y Cb Cr

quantization coefficients

picture type

<QC>

<QC>

reference picture

Motion Compensation

<PT1> referencepicture

Motion Compensation

<PT1>reference picture

Motion Compensation

<PT1>

<PT2>

Repeat

Color Space Conversion

<PT1, PT2>

add VLD(QC, PT1, PT2);

add splitjoin {

split roundrobin(N∗B, V);

add pipeline {add ZigZag(B);add IQuantization(B) to QC;add IDCT(B);add Saturation(B);

}add pipeline {

add MotionVectorDecode();add Repeat(V, N);

}

join roundrobin(B, V);}

add splitjoin {split roundrobin(4∗(B+V), B+V, B+V);

add MotionCompensation(4∗(B+V)) to PT1;for (int i = 0; i < 2; i++) {

add pipeline {add MotionCompensation(B+V) to PT1;add ChannelUpsample(B);

}}

join roundrobin(1, 1, 1);}

add PictureReorder(3∗W∗H) to PT2;

add ColorSpaceConversion(3∗W∗H);

MPEG bit stream

StreamIt Compiler Extracts Parallelism

• Task Parallelism– Thread (fork/join) parallelism– Parallelism explicit in algorithm– Between filters without

producer/consumer relationship

• Data Parallelism– Data parallel loop (forall)– Between iterations of a stateless filter – Can’t parallelize filters with state

• Pipeline Parallelism– Usually exploited in hardware– Between producers and consumers– Stateful filters can be parallelized

MPEG-2 Decoder

Page 41: The Looming Software Crisis due to the Multicore Menace

StreamIt Compiler Parallelism Processor Resources

• StreamIt Compilers Finds the Inherent Parallelism– Graph structure is architecture independent– Abundance of parallelism in the StreamIt domain

• Too much parallelism is as bad as too little parallelism– (remember dataflow!)

• Map the parallelism in to the available resources in a given multicore– Use all available parallelism– Maximize load-balance– Minimize communication

Page 42: The Looming Software Crisis due to the Multicore Menace

0123456789

10111213141516171819

Vocod

er FFT

FMRadio

TDE

Bitonic

SortMPEG2D

ecod

erCha

nnelV

ocod

er

DES

DCT

Filterba

nk

Serpen

t

Radar

Geometr

ic Mea

n

Benchmarks

Thro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It .

StreamIt Performance on Raw

Page 43: The Looming Software Crisis due to the Multicore Menace

43

Outline: Ideas on Solvingthe Third Software Crisis

1. Advances in Computer Architecture

2. Novel Programming Models and Languages

3. Aggressive Compilers

4. Tools to support parallelization, debugging and migration

Page 44: The Looming Software Crisis due to the Multicore Menace

44

Compilers

• Compilers are critical in reducing the burden on programmers– Identification of data parallel loops can be easily automated, but

many current systems (Brook, PeakStream) require the programmer to do it.

• Need to revive the push for automatic parallelization– Best case: totally automated parallelization hidden from the user– Worst case: simplify the task of the programmer

Page 45: The Looming Software Crisis due to the Multicore Menace

Parallelizing Compilers

The SUIF Experience

Page 46: The Looming Software Crisis due to the Multicore Menace

The SUIF Parallelizing Compiler

• The SUIF Project at Stanford in the ’90– Mainly FORTRAN– Aggressive transformations to undo “human optimizations”– Interprocedural analysis framework– Scalar and array data-flow, reduction recognition and a host of

other analyses and transformations

• SUIF compiler had the Best SPEC results by automatic parallelization

Page 47: The Looming Software Crisis due to the Multicore Menace

SPECFP92 performance

• Vector processor Cray C90 540• Uniprocessor Digital 21164 508• SUIF on 8 processors Digital 8400 1,016

spic

e2

g6

do

du

c

f pp

pp

ora

md

lj dp

2

wa

ve5

md

lj sp

2

alv

inn

na

s a7

ea

r

hyd

ro2

d

su2

cor

tom

catv

s wm

25

6

0

200

400

600

800

1000

1200

MF

LO

PS

Nu

mb

er

of

Pr o

ce

ss

or s

1

2

3

4

5

6

7

8

Page 48: The Looming Software Crisis due to the Multicore Menace

Automatic Parallelization “Almost” Worked

• Why did not this reach mainstream?– The compilers were not robust– Clients were impossible (performance at any cost)– Multiprocessor communication was expensive – Had to compete with improvements in sequential performance– The Dogfooding problem

• Today: Programs are even harder to analyze– Complex data structures– Complex control flow– Complex build process – Aliasing problem (type unsafe languages)

Page 49: The Looming Software Crisis due to the Multicore Menace

49

Outline: Ideas on Solvingthe Third Software Crisis

1. Advances in Computer Architecture

2. Novel Programming Models and Languages

3. Aggressive Compilers

4. Tools to support parallelization, debugging and migration

Page 50: The Looming Software Crisis due to the Multicore Menace

50

Tools

• A lot of progress in tools to improve programmer productivity

• Need tools to– Identify parallelism– Debug parallel code– Update and maintain parallel code– Stitch multiple domains together

• Need an “Eclipse platform for multicores”

Page 51: The Looming Software Crisis due to the Multicore Menace

51

Migrating the Dusty Deck

• Impossible to bring them to the new era automatically – Badly mangled, hand-optimized, impossible to analyze code– Automatic compilation, even with a heroic effort, cannot do anything

• Help rewrite the huge stack of dusty deck– Application in use– Source code available– Programmer long gone

• Getting the new program to have the same behavior is hard– “Word pagination problem”

• Can take advantage of many recent advances– Creating test cases– Extracting invariants – Failure oblivious computing

Page 52: The Looming Software Crisis due to the Multicore Menace

52

Facilitate Evaluation and Feedback for Rapid Evolution

Language/Compiler/ToolsIdea

Implementation

Evaluation

Evaluation

Develop aProgram

FunctionalDebugging

PerformanceDebuggingEvaluate

Page 53: The Looming Software Crisis due to the Multicore Menace

53

Rapid Evaluation

• Extremely hard to get– Real users have no interest in flaky tools– Hard to quantify – Superficial users vs. Deep users will give different feedback

– Fatal flaws as well as amazing uses may not come out immediately

• Need a huge, sophisticated (and expensive) infrastructure – How to get a lot of application experts to use the system? – How do you get them to become an expert?– How do you get them to use it for a long time?– How do you scientifically evaluate?– How go you get actionable feedback?

• A “Center for Evaluating Multicore Programming Environments”??

Page 54: The Looming Software Crisis due to the Multicore Menace

54

Build and Mobilize the Community

• Bring the High Performance Languages/Compilers/Tools folks out of the woodwork!– Then: A few customers with the goal of performance at any cost.– Then: Had to compete with Moore’s law– Now: Reasonable performance improvements for the masses

• Bring the folks who won the second crisis– Then: the focus is improving programmer productivity– Now: how to maintain performance in a multicore world– Now: If not solved, all the productivity gains will be lost!

• Get architects to listen to language/compiler people– Then: We don’t need you, we can do everything in hardware– Then: Or here is a cool hardware feature, you figure out how to use it.– Now: Superscalars crashed and burned; cannot keep the status quo!– Now: Need to create a useable programming model

Page 55: The Looming Software Crisis due to the Multicore Menace

55

Conclusions• Programming language research is a critical long-term investment

– In the 1950s, the early background for the Simula language was funded by the Norwegian Defense Research Establishment

– In 2002, the designers received the ACM Turing Award “for ideas fundamental to the emergence of object oriented programming.”

• Compilers and Tools are also essential components

• Computer Architecture is at a cross roads– Once in a lifetime opportunity to redesign from scratch – How to use the Moore’s law gains to improve the programmability?

• Switching to multicores without losing the gains in programmer productivity may be the Grandest of the Grand Challenges– Half a century of work ⇒ still no winning solution– Will affect everyone!

• Need a Grand Partnership between the Government, Industry and Academia to solve this crisis!