from serial to parallel - fortranfortrandev/2010/bcs_multicore_programming.pdf · 0.346 8m2...

122
From Serial to Parallel From Serial to Parallel From Serial to Parallel From Serial to Parallel Stephen Blair-Chappell Intel Compiler Labs www.intel.com

Upload: others

Post on 15-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

From Serial to Parallel From Serial to Parallel From Serial to Parallel From Serial to Parallel

Stephen Blair-ChappellIntel Compiler Labs

www.intel.com

Page 2: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

AgendaAgendaAgendaAgenda

�Why Parallel?

�Optimising Applications

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview6/18/20102

�Steps to move from Serial to Parallel

Page 3: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Congratulations BCS FIG.wmv

3

Congratulations BCS FIG.wmv

Page 4: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Moving to Parallel Moving to Parallel Moving to Parallel Moving to Parallel –––– a view from some developersa view from some developersa view from some developersa view from some developers

�Top 5 challenges

–Legacy

–Education

–Tools

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview6/18/20104

–Tools

–Fear of many cores

–Maintainability

Page 5: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Why Parallel?Why Parallel?Why Parallel?Why Parallel?

Section 1

Page 6: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Why is everyone going multi-core?

Po

wer

Den

sit

yP

ow

er

Den

sit

y(W

/cm

2)

(W/c

m2)

Power Density Race

1,0001,000

10,00010,000

Nuclear ReactorNuclear Reactor

Rocket NozzleRocket Nozzle

Sun’s SurfaceSun’s Surface

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Po

wer

Den

sit

yP

ow

er

Den

sit

y(W

/cm

2)

(W/c

m2)

4004400480088008

80808080

80858085

80868086

286286386386

486486

PentiumPentium®®

processorsprocessors

11

1010

100100

’70’70 ’80’80 ’90’90 ’00’00 ’10’10

Hot PlateHot Plate

Nuclear ReactorNuclear Reactor

Page 7: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Moore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpreted� Speed no longer

increasing

� Num transistors still growing

� Num Cores rather than clock speed is doubling every 18

From K. Olukotun, L. Hammond, H. Sutter, and B. Smith

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

doubling every 18 months

Page 8: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Theoretical growth of coresTheoretical growth of coresTheoretical growth of coresTheoretical growth of cores

Growth of Multicore

128

512

20481000

10000

Nu

m C

ore

s

Cores

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

2

8

32

128

21

10

100

2000 2005 2010 2015 2020 2025

year

Nu

m C

ore

s

Cores

Cores Act

Page 9: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Future: Multicore and ManycoreFuture: Multicore and ManycoreFuture: Multicore and ManycoreFuture: Multicore and Manycore

All Large Core

Mixed Largeand

Small Core

All Small Core

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

All Small Core

Note: the above pictures don’t necessarily represent any current or future Intel products

Connections to memory bank(s), connections between processors,memory coherency models – all come into play. Diversity!

Page 10: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

MultiMultiMultiMulti----core : beating the core : beating the core : beating the core : beating the powerpowerpowerpower\\\\performanceperformanceperformanceperformance barrierbarrierbarrierbarrier

1.00x1.00x1.00x1.00x1.00x1.00x1.00x1.00x

1.73x1.73x1.73x1.73x1.73x1.73x1.73x1.73x

1.13x1.13x1.13x1.13x1.13x1.13x1.13x1.13x

PowerPowerPowerPowerPowerPowerPowerPower

PerformancePerformancePerformancePerformancePerformancePerformancePerformancePerformance

1.02x1.02x1.02x1.02x1.02x1.02x1.02x1.02x

1.73x1.73x1.73x1.73x1.73x1.73x1.73x1.73x

DualDualDualDualDualDualDualDual--------CoreCoreCoreCoreCoreCoreCoreCore

0.51x0.51x0.51x0.51x0.51x0.51x0.51x0.51x

0.87x0.87x0.87x0.87x0.87x0.87x0.87x0.87x

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

OverOverOverOverOverOverOverOver--------clockedclockedclockedclockedclockedclockedclockedclocked(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)

Relative singleRelative singleRelative singleRelative singleRelative singleRelative singleRelative singleRelative single--------core frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcc

DesignDesignDesignDesignDesignDesignDesignDesignFrequencyFrequencyFrequencyFrequencyFrequencyFrequencyFrequencyFrequency

DualDualDualDualDualDualDualDual--------corecorecorecorecorecorecorecore((((((((--------20%)20%)20%)20%)20%)20%)20%)20%)

0.51x0.51x0.51x0.51x0.51x0.51x0.51x0.51x

UnderUnderUnderUnderUnderUnderUnderUnder--------clockedclockedclockedclockedclockedclockedclockedclocked((((((((--------20%)20%)20%)20%)20%)20%)20%)20%)

Page 11: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Improved Transistor DensityImproved Transistor Density ~2x~2x

Improved Transistor Switching SpeedImproved Transistor Switching Speed >20%>20%

Reduced Transistor Switching PowerReduced Transistor Switching Power ~30%~30%

Reduction in gate oxide leakage powerReduction in gate oxide leakage power >10x>10x

Industry’s First 45 nm HighIndustry’s First 45 nm HighIndustry’s First 45 nm HighIndustry’s First 45 nm High----K + Metal Gate K + Metal Gate K + Metal Gate K + Metal Gate Transistor TechnologyTransistor TechnologyTransistor TechnologyTransistor Technology

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

45 nm S45 nm S--RAM CellRAM Cell0.346 µm20.346 µm2

66--transistortransistor 65 nm S65 nm S--RAM Cell RAM Cell 0.570 µm20.570 µm2

Enables New Features, Higher Performance, Enables New Features, Higher Performance, Greater Energy EfficiencyGreater Energy Efficiency

65 nm Transistor65 nm Transistor 45 nm HK + MG45 nm HK + MG

Page 12: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Intel’s Teraflops Research Chip

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Podtech_Intel_Research_Day_Terascale.flv

Page 13: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Intel’s Teraflops Research Chip

Speed

GHz

Power

Watts

Perf.

Teraflops

3.16 62 1.01

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

5.1 175 1.63

5.7 265 1.81

Page 14: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

LarrabeeLarrabeeLarrabeeLarrabee

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Page 15: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Page 16: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

What can we do with faster Computers?What can we do with faster Computers?What can we do with faster Computers?What can we do with faster Computers?

0

20

40

60

80

100

120

0 5 10

Processors

Time

• Solve problems faster

– Reduce turn-around time of big jobs

– Increase responsiveness of interactive apps

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

0

100

200

300

400

500

600

700

0 5 10

Processors

Problem Size• Get better solutions in the

same amount of time

– Increase resolution of models

– Make model more sophisticated

Page 17: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Optimising CodeOptimising CodeOptimising CodeOptimising Code

Section 2

Page 18: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Two points to address before you start parallelisingTwo points to address before you start parallelisingTwo points to address before you start parallelisingTwo points to address before you start parallelising

�Will buying a faster computer solve your problem?

Dr Yann Golanski, York

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

�Maybe Serial Optimisation will be sufficient.

6/18/201020

Page 19: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

A ThreeA ThreeA ThreeA Three----Tiered Tuning ModelTiered Tuning ModelTiered Tuning ModelTiered Tuning Model

Tuning Level Question being asked Examples of issues

System wide Can my system be ‘tuned’ to improve the

performance of my application

Network, disk and memory

performance.

Intrusion by 3rd party programs

such as virus scanners.

Application

Heuristics

Can my application code or heuristics to

improve performance?

Code redundancy. Inefficient

program algorithms. Poor

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Heuristics improve performance? program algorithms. Poor

memory allocation strategies.

Bad \ missing threading

implementation.

Architectural

Bottlenecks

Is the CPU architecture being used at its best? Stalls in CPU pipeline. Data

alignment . Cache misses. Using

expensive instructions. Failing to

use latest generation optimised

instructions.

6/18/201021

Page 20: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Compiler generated Optimisations

Global Compiler Options

Inter-procedural Optimisations

Profile Guided Optimisations

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201022

Optimisations

Vectorisation

Parallelisation

Page 21: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Example of handExample of handExample of handExample of hand----crafted SSE instructionscrafted SSE instructionscrafted SSE instructionscrafted SSE instructions

1: bool SSEHasNumber(SUDOKU *pPuzzle,__m128i BinArray[], int

i, int j)

2: {

3: __m128i Tmp1 = ( _mm_and_si128(pPuzzle->BinNum[j-1],

BinArray[i]));

4: __m128i Tmp2 = _mm_setzero_si128();

5:

6: Tmp2 = _mm_cmpeq_epi32(Tmp2, Tmp1);

Time Taken Speedup

No SSE 4.55 sec 1

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

6: Tmp2 = _mm_cmpeq_epi32(Tmp2, Tmp1);

7:

8: unsigned int p[4];

9: _mm_storeu_si128((__m128i *)p, Tmp2);

10:

11: if (p[0] == 0 || p[1] == 0 || p[2] == 0)

12: return true;

13: return false;

14: }

No SSE 4.55 sec 1

With SSE 0.19 sec 24

Page 22: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Modern Architectures have lots of features to help speed up code

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201024

The internals of the Intel low power IA architecture

Page 23: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Intel® VTune™ Performance Analyzer

Graphical tool

Helps characterise runtime performance

System-wide View of application environment

Use to tune serial and

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201025

Use to tune serial and parallel code

Use to identify Hot Spotsin Code

Use to generate a call graph

Page 24: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Call graphCall graphCall graphCall graph:::: Application workflowApplication workflowApplication workflowApplication workflow

The red lines show the critical path. The critical path is the most time-consuming call path. It is based on self time.

The red lines show the critical path. The critical path is the most time-consuming call path. It is based on self time.

Filter view by self timeFilter view by self time

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201026

Bright orange nodes indicate functions with the highest self time.

Bright orange nodes indicate functions with the highest self time.

Intel, VTune, and the Intel logo are trademarks or registered trademarks of Intel

Corporation or its subsidiaries in the United States or other countries.

Page 25: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Execution Units

ReservationStation

ReorderBuffer

MemorySub-system

Inst. Fetch

Branch Pred

5. uops dispatched

The life of a program instruction

1. Instruction read from memory

2. Instruction fed

4. uops queued in RS

6. Results sent to ROB

Decoder

Retirement

Copyright © 2008, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

dispatched 2. Instruction fed to Decoder

3. Micro-ops (uops)

generated

7. Instruction marked – all

uops executed

8. Instruction sent for

retirement

Page 26: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Execution Units

ReservationStation

ReorderBuffer

MemorySub-system

Inst. Fetch

Branch Pred

Hardware Performance Events

BUS_TRANS_ANY.ALL_AGENTS

RS_UOPS_DISPATCHED.CYCLES_NONE

BUS_TRANS_ANY.ALL_AGENTS

RS_UOPS_DISPATCHED.CYCLES_NONE

Decoder

Retirement

Copyright © 2008, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

CPU_CLK_UNHALTED.CORE

INST_RETIRED.ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

MEM_LOAD_RETIRED.L2_MISS

CPU_CLK_UNHALTED.CORE

INST_RETIRED.ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

MEM_LOAD_RETIRED.L2_MISS

Page 27: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo Demo Demo Demo 0 0 0 0 –––– Using Intel© Using Intel© Using Intel© Using Intel© VTuneVTuneVTuneVTuneTMTMTMTM

Performance AnalyzerPerformance AnalyzerPerformance AnalyzerPerformance Analyzer

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

From 1 to 1,000,000

Page 28: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Notice the System Wide View Notice the System Wide View Notice the System Wide View Notice the System Wide View –––– Can you see any Can you see any Can you see any Can you see any problems?problems?problems?problems?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201030

Page 29: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

What is the problem here?What is the problem here?What is the problem here?What is the problem here?

� VTune Sample-over-time view

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201031

Page 30: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

The HotspotThe HotspotThe HotspotThe Hotspot

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201032

Page 31: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Four Steps in Moving to ParallelFour Steps in Moving to ParallelFour Steps in Moving to ParallelFour Steps in Moving to Parallel

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201034

Section 2

Page 32: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Steps in moving from Serial to Parallel

Architectural Analysis

IntroducingParallelism

Validating

Serial

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201035

ValidatingCorrectness

Performance Tuning

Parallel

Page 33: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Key Questions

Design

• Is my program parallel?

• Where is the best place to parallelise my program?

• How can I get my program to run faster?

• What’s the expected speedup?

Code & Debug

• How?

• How difficult?

• Is my code still working?

Verify

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Verify

• Is the parallelism correct?

• Do I have deadlocks or data races?

• Do I have memory errors?

• Does my program still work as intended?

Tune

• Do my tasks do equal amounts of work?

• Is my application scalable?

• Is the threading running efficiently?

6/18/201036

Page 34: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Intel Software Tools Supporting Parallel Design Cycle

Architectural Analysis

IntroducingParallelism

Serial

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201037

ValidatingCorrectness

Performance Tuning

Parallel

Page 35: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Analysis

IntroducingParallelism

Serial

Tools

Existing Intel Software

Intel Parallel Studio

Intel® VTuneTM

Performance Analyzer Advisor/Amplifier

Intel Compilers

Parallel Libraries

Composer

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201038

ValidatingCorrectness

Performance Tuning

Parallel

Intel® Thread Checker

Inspector

Intel® Thread Profiler Amplifier

Page 36: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

For Microsoft Visual Studio* C++ architects, developers, and software

innovators creating parallel Windows* applications.

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201039

Intel® Parallel Studio includes:

• Intel® Parallel Advisor Lite **

• Intel® Parallel Composer

• Intel® Parallel Inspector

• Intel® Parallel Amplifier

** Beta – from whatif.intel.com

Microsoft Visual Studio* plug-in

End-to-end product suite for parallelism

Forward scaling to many core

Page 37: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Analysis

IntroducingParallelism

Serial

Step 1

Which part of my code should I make Parallel?

ValidatingCorrectness

Performance Tuning

Parallel

Page 38: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Key Questions - Design

Is my program parallel?

Where is the best place to parallelise my program?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

How can I get my program to run faster?

What’s the expected speedup?

Page 39: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Identifying best parts to Parallelize

Ph

as

e 1

Ph

as

e 2

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201043

Serial

Ph

as

e 2 Parallel

We need to Identify Hotspots

Page 40: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Why use parallelism?Amdahl’s Law

Describes the upper bound of parallel execution speedup

Serial code limits speedup

0.5 + 0.250.5 + 0.25

n = 2n = 2n = n = ∞∞0.5 + 0.00.5 + 0.0

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201044

(1-P

)P

Tserial

(1-P

)

P/2n = number of processors

Tparallel = {(1-P) + P/n} Tserial

Speedup = Tserial / Tparallel

0.5 + 0.250.5 + 0.25

1.0/0.75 = 1.331.0/0.75 = 1.33P/∞∞

0.5 + 0.00.5 + 0.0

1.0/0.5 = 2.01.0/0.5 = 2.0

Page 41: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Some code is not worth making parallel…

Don’t parallelise code

– just because it’s clever

– With low CPU utilisation

– I/O bound

• Do parallelise code that

– Eats significant CPU

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201045

– Eats significant CPU cycles

• You need to get visibility of the runtime behaviour

Page 42: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Extensions can speed up your codeArchitectural Extensions can speed up your codeArchitectural Extensions can speed up your codeArchitectural Extensions can speed up your code

�Always optimise your code

�Even if you don’t go parallel, some architectural features can still give

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201046

architectural features can still give significant speed-up

�Example, SSE extensions

Page 43: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Our Application - Prime Number Generator

bool TestForPrime(int val)

{ // let’s start checking from 3

int limit, factor = 3;

limit = (long)(sqrtf((float)val)+0.5f);

while( (factor <= limit) && (val % factor) )

factor ++;

return (factor > limit);

}

void FindPrimes(int start, int end)

i factor

61 3 5 7 63 365 3 567 3 5 7

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201047

void FindPrimes(int start, int end)

{

int range = end - start + 1;

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

}

}

67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 577 3 5 7 79 3 5 7 9

Page 44: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 1 – Getting the Demo 1 – Getting the Benchmark

From 1 to 1,000,000

Page 45: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Our Application - Prime Number Generator

bool TestForPrime(int val)

{ // let’s start checking from 3

int limit, factor = 3;

limit = (long)(sqrtf((float)val)+0.5f);

while( (factor <= limit) && (val % factor) )

factor ++;

return (factor > limit);

}

void FindPrimes(int start, int end)

i factor

61 3 5 7 63 365 3 567 3 5 7

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201049

void FindPrimes(int start, int end)

{

int range = end - start + 1;

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

}

}

67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 577 3 5 7 79 3 5 7 9

Page 46: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Optimise the serial code firstOptimise the serial code firstOptimise the serial code firstOptimise the serial code first

� Using Intel Compiler to automatically generate SSE instructions.

– Code ran twice as fast

– No change made to original code

Calculating Pi

004018D9 movaps xmmword ptr [esp],xmm0

004018DD paddd xmm5,xmm6

004018E1 addpd xmm7,xmm3

004018E5 mulpd xmm7,xmm2

004018E9 add eax,8

004018EC mulpd xmm7,xmm7

004018F0 movaps xmm0,xmmword ptr ds:[406770h]

004018F7 addpd xmm7,xmm1

004018FB divpd xmm0,xmm7

004018FF cvtdq2pd xmm7,xmm5

00401903 paddd xmm5,xmm6

00401907 addpd xmm4,xmm0

0040190B movaps xmm0,xmmword ptr ds:[406770h]

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201050

TimeSecs

Speedup

No SSE 1.29 1.00

With SSE 0.66 1.95

0040190B movaps xmm0,xmmword ptr ds:[406770h]

00401912 addpd xmm7,xmm3

00401916 mulpd xmm7,xmm2

0040191A mulpd xmm7,xmm7

0040191E addpd xmm7,xmm1

00401922 divpd xmm0,xmm7

00401926 movaps xmm7,xmmword ptr [esp]

0040192A addpd xmm7,xmm0

0040192E cvtdq2pd xmm0,xmm5

00401932 movaps xmmword ptr [esp],xmm7

Example of SSE compiler-generated instructions

Page 47: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo Demo Demo Demo 2 2 2 2 –––– Using the Intel CompilerUsing the Intel CompilerUsing the Intel CompilerUsing the Intel Compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Demo Demo Demo Demo 2 2 2 2 –––– Using the Intel CompilerUsing the Intel CompilerUsing the Intel CompilerUsing the Intel Compiler

From 1 to 1,000,000

Page 48: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Swapping compilers.Swapping compilers.Swapping compilers.Swapping compilers.

� From solution drop-down menu

� Action is reversible

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201052

Page 49: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Program built and run with Intel compilerProgram built and run with Intel compilerProgram built and run with Intel compilerProgram built and run with Intel compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201053

Speedup 1.09

Page 50: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Identifying HotspotsIdentifying HotspotsIdentifying HotspotsIdentifying Hotspots

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201054

Pinpointing places where an application could be parallelised

Page 51: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

The Big QuestionThe Big QuestionThe Big QuestionThe Big Question

“How can I make my code run faster?”

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201055

run faster?”

Page 52: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Today’s QuestionToday’s QuestionToday’s QuestionToday’s Question

“Where do I split up my code to take advantage of multiple CPU cores?”

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201056

CPU cores?”

Page 53: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201057

Page 54: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

… and Splitting up the Work.… and Splitting up the Work.… and Splitting up the Work.… and Splitting up the Work.

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201058

Page 55: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 2 Demo 2 Demo 2 Demo 2 –––– Finding the Hotspots Finding the Hotspots Finding the Hotspots Finding the Hotspots

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201059

Page 56: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Finding a Hot SpotFinding a Hot SpotFinding a Hot SpotFinding a Hot Spot

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201060

Page 57: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Where to Parallelise Where to Parallelise Where to Parallelise Where to Parallelise –––– AmplifierAmplifierAmplifierAmplifier

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201061

Call StackHotspot

Page 58: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Design: What’s the expected speedup?Design: What’s the expected speedup?Design: What’s the expected speedup?Design: What’s the expected speedup?

�Use Amdhals LawSpeedup = 1/[s+(1-s)/n + H(n)]s is serial part (fraction of 1)H is parallel overhead (ignore)n is number of cores

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201062

S = 0Speedup = 1 / [ 0 + ( 1 - 0 ) / 2 ]

= 1 / [ 0 + 0.5 ]

Speedup = 2 ( i.e. new speed ~ 0.672 seconds)

Page 59: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Alternate CalculationAlternate CalculationAlternate CalculationAlternate Calculation

Speedup = 1/[s+(1-s)/n + H(n)]s is serial part (fraction of 1)H is parallel overhead (ignore)n is number of cores

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201063

S = 1 - (1.688 – 0.012)/1.688 = .007Speedup = 1 / [ .007 + ( 1 - .007 ) / 2 ]

= 1 / [ 0007 + 0.4965 ]

Speedup = 1.986 ( i.e. CPU Time ~ 0.850 seconds)

Page 60: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Analysis

IntroducingParallelism

Serial

Step 2

Implement Parallelism in code

ValidatingCorrectness

Performance Tuning

Parallel

Page 61: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Key Questions – Code & Debug

How?

How difficult?

Is my code still working?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Is my code still working?

Page 62: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Common types of parallelismCommon types of parallelismCommon types of parallelismCommon types of parallelism

�Functional or Task Parallelism

�Data Parallelism

�Software Pipelining

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201067

�Software Pipelining

Page 63: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Task and Data ParallelismTask and Data ParallelismTask and Data ParallelismTask and Data Parallelism

� Different job for each thread

� e.g. one thread prints, another reads keyboard

� Splitting workload between multiple identical threads

� e.g. three identical threads perform calculations on data array

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201068

Task parallelismTask parallelism Data parallelismData parallelism

Page 64: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Software PipelineSoftware PipelineSoftware PipelineSoftware Pipeline

Collect ACore 1

Core 2

Core 3

Collect B Collect C Collect D …

Transfer A Transfer B Transfer C Transfer D

Polish A Polish B Polish C

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201069

Core 3

Core 4

Time

Polish A Polish B Polish C

Produce A Produce B

Page 65: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

QuestionQuestionQuestionQuestion

�How many different ways can you think of to implement parallelism?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201070

parallelism?

–E.g OpenMP, …, …

Page 66: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201071

Page 67: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Auto ParallelismAuto ParallelismAuto ParallelismAuto Parallelism

Loop-level parallelismautomatically suppliedby the compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201072

Page 68: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

AutoAutoAutoAuto----parallelization parallelization parallelization parallelization

� Auto-parallelization: Automatic threading of loops without having to manually insert OpenMP* directives.

Windows* Linux* Mac*

/Qparallel -parallel -parallel

/Qpar_report[n] -par_report[n] -par_report[n]

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201073

� Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze.

/Qpar_report[n] -par_report[n] -par_report[n]

Page 69: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Optimisation Results Optimisation Results Optimisation Results Optimisation Results –––– pi applicationpi applicationpi applicationpi application

Optimisation Time Taken (secs) Speedup

default default default default 0.9380.9380.9380.938 1111

autoautoautoauto----vectorisationvectorisationvectorisationvectorisation 0.3750.3750.3750.375 2.52.52.52.5

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201074

autoautoautoauto----parallelismparallelismparallelismparallelism 0.5160.5160.5160.516 1.81.81.81.8

autoautoautoauto----vec. & autovec. & autovec. & autovec. & auto----par.par.par.par. 0.2030.2030.2030.203 4.64.64.64.6

Page 70: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

OpenMP ArchitectureOpenMP ArchitectureOpenMP ArchitectureOpenMP Architecture

� Fork-Join Model

Worksharing constructs

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201075

� Worksharing constructs

� Synchronization constructs

� Directive/pragma-based parallelism

� Extensive API for finer control

Page 71: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

OpenMP RuntimeOpenMP RuntimeOpenMP RuntimeOpenMP Runtime

Environment Variables

User

Application

Directive Compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201076

Threads in Operating System

Runtime Library

Page 72: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

OpenMP Programming Model: OpenMP Programming Model: OpenMP Programming Model: OpenMP Programming Model:

Fork-Join Parallelism: �Master thread spawns a team of threads as needed.

�Parallelism added incrementally until performance are met: i.e. the sequential program evolves into a parallel program.

Parallel Regions A Nested A Nested

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201077

Parallel RegionsMaster Thread in red

A Nested Parallel region

A Nested Parallel region

Sequential PartsSequential PartsSequential PartsSequential Parts*Other names and brands may be claimed as the property of others.

Page 73: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Introducing ParallelismIntroducing ParallelismIntroducing ParallelismIntroducing Parallelism

#pragma omp parallel for

for( int i = start; i <= end; i+= 2 ){

if( TestForPrime(i) )

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

OpenMP Divide iterations of the forfor loop

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201078

ShowProgress(i, range);

} Create threads here for this parallel region

Page 74: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using ####pragmapragmapragmapragma ompompompomp forforforfor

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201079

Page 75: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Results Results Results Results –––– Open MP Open MP Open MP Open MP

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201080

Amazing!

We Have a speed up!

Page 76: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Code: Is my code still working?Code: Is my code still working?Code: Is my code still working?Code: Is my code still working?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201081

Bother !!!!!!!!!!!!!!!!!!! Number of primes is wrong

Page 77: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

QuestionsQuestionsQuestionsQuestions

Are the results right?

Was the run quicker?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201082

Was the run quicker?

Page 78: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Analysis

IntroducingParallelism

Serial

Step 3

Check for any problems

ValidatingCorrectness

Performance Tuning

Parallel

Page 79: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Key Questions - Verify

Is the parallelism correct?

Do I have deadlocks or data races?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Do I have memory errors?

Does my program still work as intended?

Page 80: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

New paradigm requires new toolsNew paradigm requires new toolsNew paradigm requires new toolsNew paradigm requires new tools

�Using traditional debugging tools is difficult /impossible– Printf – not re-entrant

– Debugging several threads is notoriously hard

– Many debuggers \ profilers are not multi-core enabled

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201086

�Multi-core tools are available

Page 81: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Shared

Non deterministic Error Sources in parallel Applications

• Shared Resourcesrequire locks

Shared

Thread1 Thread2

L1

Thread1 Thread2

• Locks can– ‘serialize’ a program– lead to Deadlocks

X=0 X=0

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201087

SharedMemory X

SharedMemory X

time

L1

time

X=X+1 X=X+1

X=1

Wrong Result( X should be 2)

X=2

Page 82: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 4 Demo 4 Demo 4 Demo 4 –––– Checking for threading Checking for threading Checking for threading Checking for threading errors errors errors errors

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201088

Page 83: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Checking for Errors with Parallel InspectorChecking for Errors with Parallel InspectorChecking for Errors with Parallel InspectorChecking for Errors with Parallel Inspector

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201089

Page 84: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

The Offending SourcesThe Offending SourcesThe Offending SourcesThe Offending Sources

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201090

Page 85: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Protecting shared variables#pragma omp parallel for

for( int i = start; i <= end; i+= 2 ){

if( TestForPrime(i) )

#pragma omp critical

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

}

Will create a critical section for this reference

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201091

}

#pragma omp critical

{

gProgress++;

percentDone = (int)(gProgress/range *200.0f+0.5f)

}

Will create a critical section for both these references

Page 86: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 5 Demo 5 Demo 5 Demo 5 –––– Fixing the threading Fixing the threading Fixing the threading Fixing the threading errors errors errors errors

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201092

Page 87: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Data Races Have Disappeared!Data Races Have Disappeared!Data Races Have Disappeared!Data Races Have Disappeared!

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201093

Page 88: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Number of Primes is CorrectNumber of Primes is CorrectNumber of Primes is CorrectNumber of Primes is Correct

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201094

Number of primes is correct

Page 89: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Analysis

IntroducingParallelism

Serial

Step 4

Tune for best performance

ValidatingCorrectness

Performance Tuning

Parallel

Page 90: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Key Questions -Tune

Is the threading running efficiently?

Do my tasks do equal amounts of work?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

of work?

Is my application scalable?

Page 91: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Performance Issues

Load Balancing

Synchronisation Overhead

Scalability

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201098

Difficult to examine without the right tools

Page 92: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

A Reminder – Where are we?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201099

Number of primes is correct

Almost as slow as the serial version

Page 93: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 6 – Find the Threading Demo 6 – Find the Threading Performance Issues

Page 94: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Hotspot Analysis

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010101

Page 95: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Source View

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010102

Page 96: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Improving the Performance

void ShowProgress( int val, int range )

{

int percentDone;

gProgress++;

percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f);

if( percentDone % 10 == 0 )

void ShowProgress( int val, int range )

{

int percentDone;

static int lastPercentDone = 0;

#pragma omp critical

{

gProgress++;

percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f);

}

if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010103

The algorithm has many more updates than the 10 needed for showing progress

if( percentDone % 10 == 0 )

printf("\b\b\b\b%3d%%", percentDone);

}

if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){

printf("\b\b\b\b%3d%%", percentDone);

lastPercentDone++;

}

}

This change should fix the contention issue

Page 97: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 7 – Fixing the Demo 7 – Fixing the Synchronisation issues

Page 98: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Superb Speedup … ???

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010105

Speedup 7.36

On a dual core?

Page 99: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 8 – Getting New Serial Demo 8 – Getting New Serial Benchmark

Page 100: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

That’s better (but disappointing)…

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010107

Speedup 1.55

Page 101: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 9 – Correcting the Demo 9 – Correcting the Synchronisation Issue

Page 102: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Hotspots

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010109

Page 103: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Locks & Waits

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010110

Page 104: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Source Code View

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010111

Page 105: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Fixing the synchronisation issue -1

This fix removes the need for a critical section

void FindPrimes(int start, int end)

{

// start is always odd

int range = end - start + 1;

#pragma omp parallel for

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010112

#pragma omp parallel for

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[InterlockedIncrement(&gPrimesFound)] = i;

ShowProgress(i, range);

}

}

Page 106: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Fixing the synchronisation issue - 2

This fix removes the need for a critical section

void ShowProgress( int val, int range )

{

long percentDone, localProgress;

static int lastPercentDone = 0;

localProgress = InterlockedIncrement(&gProgress);

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010113

localProgress = InterlockedIncrement(&gProgress);

percentDone = (int)((float)localProgress/(float)range*200.0f+0.5f);

if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){

printf("\b\b\b\b%3d%%", percentDone);

lastPercentDone++;

}

}

Page 107: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

That’s better (but still disappointing)…

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010114

Speedup 1.6

Page 108: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Demo 9 – Improving the Demo 9 – Improving the Load Balancing

Page 109: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Threads are not doing equal work

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010117

Page 110: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Fixing a Load Imbalance

Distribute the work more evenly

void FindPrimes(int start, int end)

{

// start is always odd

int range = end - start + 1;

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010118

#pragma omp parallel for schedule(static, 8)

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[InterlockedIncrement(&gPrimesFound)] = i;

ShowProgress(i, range);

}

}

Page 111: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

That’s better

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010119

Speedup 1.92

Page 112: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

A Finely Balanced threaded Program!

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010120

Page 113: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

ScalabilityScalabilityScalabilityScalability

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010121

Page 114: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Key Questions -Tune

Is the threading running efficiently?

Do my tasks do equal amounts of work?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

of work?

Is my application scalable?

Page 115: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Moving to Parallel – a view from some developers

Top 5 challenges

•Legacy

•Education

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010123

•Education

•Tools

•Fear of many cores

•Maintainability

Page 116: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Scalability http://paralleluniverse.intel.com

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010124

Page 117: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

The Results

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010125

Page 118: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Without the printfs

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010126

Page 119: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

A run of 10 million

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010127

Page 120: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Architectural Analysis

IntroducingParallelism

Serial

Tools

Existing Intel Software

Intel Parallel Studio

Intel® VTuneTM

Performance Analyzer Advisor/Amplifier

Intel Compilers

Parallel Libraries

Composer

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010128

ValidatingCorrectness

Performance Tuning

Parallel

Intel® Thread Checker

Inspector

Intel® Thread Profiler Amplifier

Page 121: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

thank youthank youthank youthank you

intel.com / go / parallelintel.com / go / parallelintel.com / go / parallelintel.com / go / parallel

Page 122: From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features,

Q&AQ&Athank youthank youthank youthank you