intel cluster studio - hhlr · 2020. 9. 17. · intel (parallel) composer contains compiler...

44
6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 1 / 44 6/27/13 | FB Computer Science | Scientific Computing | Intel Cluster Studio HiPerCH April 2013 Michael Burger FG Scientific Computing TU Darmstadt [email protected]

Upload: others

Post on 26-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 1 / 446/27/13 | FB Computer Science | Scientific Computing |

Intel Cluster StudioHiPerCH April 2013

Michael BurgerFG Scientific ComputingTU [email protected]

Page 2: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 2 / 44

Agenda

● What is the Intel Cluster Studio?

● Planning with help of Intel Advisor

● Optimizations with the help of Intel Compiler

● Searching errors with Intel Inspector

● Finding MPI problems: Intel Trace Analyzer

● Analyzing code with Vtune Amplifier

● Summary

Page 3: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 3 / 44

What is the Intel Cluster Studio?

● Collection of different tools

● Should cover the hole software development process

● Different packages for different platforms

Page 4: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 4 / 44

What is the Intel Cluster Studio?

● Advisor: Help to parallelize code

● Composer: No IDE, compiler, libraries

● Inspector: Correctnesschecking

● Analyzer: Correctnesschecking (MPI)

● Amplifier: Parallel/serial tuning

Page 5: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 5 / 44

Intel Advisor

● Threading assistant for C (++/#) and Fortran

● Should lead through the process of designing software

● Parallelize existing code

● Compare different alternatives before implementing it

● Help in finding locks and points for synchronization

● Short demo in a few moments

Page 6: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 6 / 44

Intel Advisor

www.intel.com

Page 7: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 7 / 44

Intel Advisor

Page 8: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 8 / 44

Intel (Parallel) Composer

● Contains compiler (icc/ifort)

● „Special“ support for Intel processors

● „Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.“

Page 9: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 9 / 44

Intel (Parallel) Composer

● Support of C++ and Fortran. Additionally contains:

– Template Library: Intel Threading Building Blocks (TBB)

– Library: Intel Integrated Performance Primitives (IPP)

– Library: Intel Math Kernel Libraries (MKL)

– OpenMP / Intel MPI Support

Page 10: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 10 / 44

Intel (Parallel) Composer

● The wish / aim:

● Use as much existing libraries and automatization as possible

www.intel.com

Page 11: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 11 / 44

Intel (Parallel) Composer

● For Windows:

● Microsoft Visual Studio is prerequisite for composer

● Integration in existing IDE (DEMO):

Page 12: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 12 / 44

Intel (Parallel) Composer

● In Linux:

● Supports GNU tool chain

– Integration in eclipse possible (tutorials online*)

– Demo

– Or use command line

* http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-using-intel-compilers-with-the-eclipse-ide-pdf#installing

Page 13: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 13 / 44

Intel (Parallel) Composer

● It may already help to change the / optimize with

compiler:

– More performance through appropriate flags

– Indication of problematic parts by reports

Page 14: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 14 / 44

Intel (Parallel) Composer

● standardflags: - O3 / Ox

● - vec

● - parallel

● - ip / ipo

● - fast

Page 15: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 15 / 44

Intel (Parallel) Composer

Modell MSVC (2008) ICC (11)

Venus (7200) 27,02 sec 19,76 sec

Fishes (20000) 52,36 sec 42,82 sec

Fertility (55000) 211,35 sec 186,66

Page 16: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 16 / 44

Intel (Parallel) Composer

● Reports

– i.e. vectorization-report

● -vec-report3

– optimization-report

● Contains Vec-Report

● Inlining

● Loop unrolling / fusing

● ...

Page 17: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 17 / 44

Intel (Parallel) Composer

Page 18: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 18 / 44

DEMO: Intel (Parallel) Composer

● Show how to turn on reporting!

● Compile code as RELEASE

● Explain dependecies

● Remove by reversing loop

● Explain ivdep

Page 19: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 19 / 44

Intel Inspector

● Memory error checker (DEMO)

– Leaks, corruption

● Threading checker

– Data races, deadlocks

Page 20: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 20 / 44

Vtune Amplifier im Einsatz

1. Calculate eye ray

2. Test intersection with objects

3. Find nearest intersection point

4. Create reflection and shadow rays

5. Calculate local color

6. Repeat steps 2-5 with reflection rays

Page 21: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 21 / 44

Intel Trace Analyzer and Collector

● Analyzing MPI behavior

● Search for bottlenecks, deadlocks, data corruption

● Debugging (call stack, debug infos)

● Supports „Intel architecture based cluster systems“

● Only guaranteed to work with Intel MPI

Page 22: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 22 / 44

Intel Trace Analyzer and Collector

● Analyzer example screenshot:

Page 23: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 23 / 44

Intel Vtune Amplifier

● Supports C (++/#), Fortran, Assembler, Java

● Serial and parallel tuning

● Sample based

● Normal build with -g can be analyzed

→ Use releasebuild

● Graphical or commando line based execution

● Less overhead → real runtimes and results

Page 24: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 24 / 44

Intel Vtune Amplifier

● Workflow for optimizing

● Different types of analyses (see

next slides)

● Questions:

– Why optimizing?

– What should be optimized?

– Aim of optimizing?

– How to optimize?

Page 25: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 25 / 44

Intel Vtune Amplifier

● Use top-down-method

● Amount of work increases with

the depth of the level

● Aditional aims

– Portability

– Code readability

– Maintainability

– Reliability

Page 26: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 26 / 44

Intel Vtune Amplifier

Single Thread

183 Threads 244 Threads & Vectorized

Xeon Phi 5699,55 s 84,622 s 18,664 s

● Example: Diffusion Simulation from Maruyama

● Scale and Vectorize

● Example: 9-point stencil image blur filter

Single Thread

1 Thread & Vectorized

122 Threads & Vectorized

Xeon Phi 2838,342 s 623,302 s 8,772 s

Xeon 244,178 s 186,585 s 43,862 a

Taken from: “Intel Xeon Phi Coprocessor High-Performance Programming”

Page 27: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 27 / 44

Intel Vtune Amplifier

● Depends on underlying architecture

● Our new Cluster: Sandybridge

Page 28: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 28 / 44

Intel Vtune Amplifier

Page 29: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 29 / 44

Vtune Amplifier im Einsatz

● Intel delivers code samples with the Amplifier

● Considered here:

– The Tachyon-Raytracer

– Matrix Matrix Multiplication

● Include intentional errors / problems

● Little changes result in big impacts on performance

Page 30: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 30 / 44

Vtune Amplifier: DEMO

● DEMO

– Raytracer with hotspots und locs

– Matrix-Multiplivation and LL-Cache Misses

● Also on Xeon Phi

● Hotspots / Assembler

● (Hardwarecounter)

Page 31: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 31 / 44

Vtune Amplifier: DEMO

● Xeon Phi also supported

● 240 Threads, 60 cores, 8 GB GDDR5 RAM, PCI-E 2.0

Page 32: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 32 / 44

Vtune Amplifier: DEMO

● First compile the program and copy to Phi

Page 33: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 33 / 44

Vtune Amplifier: DEMO

● Run in native mode:

Page 34: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 34 / 44

Vtune Amplifier: DEMO

● Setup project correctly:

Page 35: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 35 / 44

Vtune Amplifier: DEMO

● Setup project correctly:

Page 36: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 36 / 44

Vtune Amplifier: DEMO

● Start correct analyses type

● Sample driver must be loaded and configured correctly

Page 37: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 37 / 44

Vtune Amplifier: DEMO

● Look at the results

● Summary

● Tree View

● Clock ticks per instruction

Page 38: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 38 / 44

Vtune Amplifier: DEMO

● Look in the code:

Page 39: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 39 / 44

Vtune Amplifier: DEMO

● Change the code:

Page 40: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 40 / 44

Vtune Amplifier: DEMO

● Change the code:

Page 41: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 41 / 44

Vtune Amplifier: DEMO

● Test if problem is resolved:

Page 42: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 42 / 44

Vtune Amplifier im Einsatz

● Approach for the investigation of the MMM-problems:

– Hardware counter

● Count certain „events“:

– Cache / Memory accesses

– Using of INT / FP-Units

– SIMD instructions

– ….

→ must be supported by hardware!

Page 43: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 43 / 44

Summary

● Cluster Studio is an extensive collection of tools

● Assists in a lot of parts of software development process

● Several pitfalls are still there however

● Only minimal examples covered here

Page 44: Intel Cluster Studio - HHLR · 2020. 9. 17. · Intel (Parallel) Composer Contains compiler (icc/ifort) „Special“ support for Intel processors „Intel’s compilers may or may

6/27/13 | FB Computer Science | Scientific Computing | Michael Burger | 44 / 44

The End :)

Thanks for your attention !