using intel® vtune™ amplifier xe and inspector xe in .net ...intel® vtune™ amplifier xe...

29
Using Intel® VTune™ Amplifier XE and Inspector XE in .NET environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1

Upload: others

Post on 05-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Using Intel® VTune™ Amplifier XE

and Inspector XE in .NET environment

Levent Akyil Technical Computing, Analyzers and Runtime

Software and Services group

1

Page 2: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Refresher - Intel® VTune™ Amplifier XE Intel® Inspector XE

2

Page 3: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® VTune™ Amplifier XE Performance Profiler

3

Where is my application…

Spending Time? Wasting Time? Waiting Too Long?

• Focus tuning on functions taking time

• See call stacks • See time on source

• See cache misses on your source

• See functions sorted by # of cache misses

• See locks by wait time

• Red/Green for CPU utilization during wait

• Windows & Linux • Low overhead • No special recompiles

Advanced Profiling For Scalable Multicore Performance

Page 4: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

4

Intel® VTune™ Amplifier XE Tune Applications for Scalable Multicore Performance

Fast, Accurate Performance Profiles

• Hotspot (Statistical call tree)

• Hardware-Event Based Sampling

Thread Profiling

• Visualize thread interactions on timeline

• Balance workloads

Compatible

• Microsoft, GCC, Intel compilers

• C/C++, Fortran, Assembly, .NET

• Latest Intel® processors and compatible

processors1

Windows or Linux

• Visual Studio Integration (Windows)

• Standalone user i/f and command line

• 32 and 64-bit

1 IA32 and Intel® 64 architectures. Many features work with compatible processors. Event based sampling requires a genuine Intel® Processor.

Page 5: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® VTune™ Amplifier XE

Easy Predefined Performance Profiles

5

Quickly Select an Analysis Type Click New Analysis Select an Analysis Type

Hotspots – Which functions use the most time? Click [+] for the call stack. Double click to see the source.

Concurrency – Colors show the number of cores used. Add parallelism for hotspots with poor concurrency.

Locks and Waits – Waiting a long time on a lock is bad if the cores are underutilized during the wait.

1

2

Page 6: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel® VTune™ Amplifier XE

Powerful EBS Made Easier

6

System Wide Event Based Sampling (EBS) uses the on chip PMU to count performance events like cache misses, clock ticks and instructions retired.

Every Intel® Processor has an on chip Performance Monitoring Unit (PMU).

Predefined EBS Profiles Easy EBS setup for newer processors. No memorizing complex event names. Profiles vary by microarchitecture. (Full custom profiles also available)

Opportunities Highlighted General Exploration turns the cell pink when it suspects a tuning opportunity is present. Hover gives suggestions.

Pinpoint tuning opportunities See opportunities like cache misses. View results on the timeline, in the grid view or on your source.

Page 7: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Double Click from Grid or Timeline

See Profile Data On Source / Asm

7

Time on Source / Asm

Quickly scroll to hot spots. Scroll Bar “Heat Map” is an overview of hot spots

Click jump to scroll Asm

Quick Asm navigation: Select source to highlight Asm

Right click for instruction reference manual

Intel® VTune™ Amplifier XE

Page 8: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

8

Intel® VTune™ Amplifier XE Filter Data - Get Actionable Information Frame Analysis – Analyze Long Latency Activity

Frame: a region executed repeatedly

API marks start and finish

Examples:

• for and while loops

• Game – Compute next graphics frame

• Simulator – Time step loop

• Computation – Convergence loop

Page 9: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Result:

9

Intel® VTune™ Amplifier XE Find Slow Frames With One Click (1) Regroup Data

… (Partial list shown)

Page 10: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Intel VTune Amplifier XE Algorithmic Analysis – Frame Analysis

10

Fast Good Slow

Frames / iterations

Page 11: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Result: Functions taking a lot of time in slow frames

(1) Only show slow frames

Just 2 more clicks shows where to focus tuning…

Slow functions in slow frames

11

(2) Regroup: Show functions

Page 12: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Profile a Running Application No need to stop and re-launch the app when profiling

Two Techniques:

Attach to Process:

• Hotspot

• Concurrency

• Locks & Waits

Profile System:

• Lightweight Hotspots

• Advanced & Custom EBS

• Optional: Filter by process after collection

12

Page 13: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

13

Intel® VTune™ Amplifier XE

Compare Results Quickly - Sort By Difference

Quickly identify cause of regressions.

• Run a command line analysis daily

• Identify the function responsible so you know who to alert

Compare 2 optimizations – What improved?

Compare 2 systems – What didn’t speed up as much?

Page 14: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

14

Windows & Linux Versions Available Stand-alone GUI, Command line, Visual Studio Integration

Microsoft Windows* OS • Windows XP*, Windows Vista*, Windows 7*

• Windows Server* 2003, 2008

• Integration with Microsoft Visual Studio* 2005, 2008 and 2010

• Standalone GUI and command line

• IA32 and Intel® 64

Linux* OS • RHEL*, Fedora*, SUSE*, CentOS*, Ubuntu*

• Additional distributions may also work

• Standalone GUI and command line

• IA32 and Intel® 64

Single user and floating licenses available

Page 15: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Refresher Intel® VTune™ Amplifier XE Intel® Inspector XE

15

Page 16: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Where are my application’s…

Memory Errors Threading Errors Security Errors

• Invalid Accesses • Memory Leaks • Uninitialized Memory

Accesses

• Races • Deadlocks • Cross Stack References

• Buffer overflows and underflows

• Incorrect pointer usage • Over 250 error types…

Intel® Inspector XE Move correctness analysis earlier in the design cycle

16

• Workflow for developers • Multiple tools – common i/f • Windows* & Linux*

Jean Kypreos Advanced Video Processing Team Manager

Envivio

"Having such a tool this early in the development stage frees the validation from trivial bug reports and gives our engineers the opportunity to code more efficiently from the very beginning of the product cycle."

Developer friendly tools help you find errors earlier

Page 17: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Dynamic Analysis Detects memory and threading errors

17

• Memory errors • Invalid Memory Accesses • Memory Leaks • Uninitialized Memory Accesses • Improper usage of Memory APIs • Resource Leaks (Windows only)

•Threading Errors • Data Races • Deadlock/Lock Hierarchy Violation • Cross Stack Memory Accesses

• Use your normal build & compiler (dynamic binary instrumentation)

• Analyze DLLs/SOs (source optional)

• Runs threaded if app threaded • Requires a workload (app is run)

• 32 and 64-bit OSs • API for custom mem. allocators • Easy user interface + cmd line

Page 18: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Static Analysis Detects over 250 different kinds of errors and security risks

18

Coding Errors (partial list)

• Memory and resource leaks • Incorrect OpenMP*/Intel® Cilk™ directives • Pointer and array errors

Security Errors (partial list)

• Buffer overflows and underflows • Uninitialized variables and objects • Incorrect pointer usage • Misuse of string, memory formatting libs

• Global analysis crosses subroutine and file boundaries

• No compiler change required - Existing compiler for code generation - Intel compiler front end for static analysis

• No workload required (app is not run)

• Fast (compared to dynamic analysis)

• Every developer can run SSA (no central server, it is like a regular build)

• Easy user interface + cmd line

Static Analysis is included in Parallel Studio XE studio bundles. It is not sold separately.

Page 19: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Static & Dynamic Analysis Complement Each Other

19

Dynamic Analysis Static Analysis

Memory & Threading Errors Memory, Code & Security Errors

Slow (1x – 20x - 100x workload) Fast (app is not run)

Workload must exercise path

(does not need to cause a program error) All paths checked

Fewer false errors – only on real paths More errors – we rank by risk

No source required – check DLLs/SOs Source required

Use your normal compiler Use your current build - No rebuild (debug build with symbols recommended)

No compiler change required - Existing compiler for code generation - Intel compiler front end for static analysis

No central server to set up Just create a build for static analysis - Auto setup available in Visual Studio†

Both reduce total lifecycle costs

† Requires Parallel Studio XE SP1

Page 20: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

20

Intel® VTune™ Amplifier XE and Inspector XE

Current .NET support

• Support the analysis of pure .NET applications as well as “mixed” applications that contain both managed and unmanaged code

Inspector XE detects

• potential deadlocks and data races in .NET programs

• object allocations and accesses to shared memory on the garbage-collected heap

• the static data areas and flags unsynchronized accesses (at least one of which is a write operation) of multiple threads to the same object/class data member as a potential data race

Page 21: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

21

Intel® VTune™ Amplifier XE and Inspector XE

Current .NET support

• Support the analysis of pure .NET applications as well as “mixed” applications that contain both managed and unmanaged code

VTune™ Amplifier XE helps developers

• in fine-tuning serial and parallel applications for optimal performance in their pure .NET or mixed applications.

• analyze and visualize the work distribution between threads as well as thread synchronization points

• identify work distribution problems and excessive threads synchronization which prevents parallel execution

• identify micro-architectural performance issues and identify the architectural bottlenecks on a given Intel® processor.

Page 22: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

22

Intel® VTune™ Amplifier XE and Inspector XE

Tune applications that mix C++ and C#

• See analysis results on your C# and C++ source

• Results mapped to your symbols

• Mixed mode stack walking (View stacks that mix C++ and C#)

• .NET* 2.0, 3.0, 4.0

• Tuning bottlenecks in pure C# may also require additional profiling tools to analyze things like garbage collection

New .NET 4.0 sync APIs and task APIs are not supported. ASP .NET is not supported.

Page 23: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Demo - Intel® VTune™ Amplifier XE Intel® Inspector XE

23

Page 24: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Upcoming features VTune™ Amplifier XE 2013

24

Page 25: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

25

VTune™ Amplifier XE 2013 Beta Features

Brand new 2013 features

– Windows 8 and Visual Studio* 11

– Intel® MIC Support

– Event-based Sampling with Stacks

– Statistical Call Count collection

– Task Analysis

– “Total time” analysis in command line

– gprof-compatible report

– Java* Support

– OpenCL* support

Previously released new features

– Inline function support

– Bandwidth analysis

Page 26: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

26

Hardware Event-based Sampling (EBS) with Call Stacks

• When collecting EBS data, turn on the “Collect stacks” box

• Call stacks for the EBS data will be collected

Page 27: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

27

Statistical Call Count collection

• Statistical call count is supported in 2013 beta version

Page 28: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

28

Page 29: Using Intel® VTune™ Amplifier XE and Inspector XE in .NET ...Intel® VTune™ Amplifier XE Powerful EBS Made Easier 6 System Wide Event Based Sampling (EBS) uses the on chip PMU

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Legal Disclaimer & Optimization Notice

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

29