Using Intel® VTune™ Amplifier XE
and Inspector XE in .NET environment
Levent Akyil Technical Computing, Analyzers and Runtime
Software and Services group
1
Refresher - Intel® VTune™ Amplifier XE Intel® Inspector XE
2
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® VTune™ Amplifier XE Performance Profiler
3
Where is my application…
Spending Time? Wasting Time? Waiting Too Long?
• Focus tuning on functions taking time
• See call stacks • See time on source
• See cache misses on your source
• See functions sorted by # of cache misses
• See locks by wait time
• Red/Green for CPU utilization during wait
• Windows & Linux • Low overhead • No special recompiles
Advanced Profiling For Scalable Multicore Performance
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
4
Intel® VTune™ Amplifier XE Tune Applications for Scalable Multicore Performance
Fast, Accurate Performance Profiles
• Hotspot (Statistical call tree)
• Hardware-Event Based Sampling
Thread Profiling
• Visualize thread interactions on timeline
• Balance workloads
Compatible
• Microsoft, GCC, Intel compilers
• C/C++, Fortran, Assembly, .NET
• Latest Intel® processors and compatible
processors1
Windows or Linux
• Visual Studio Integration (Windows)
• Standalone user i/f and command line
• 32 and 64-bit
1 IA32 and Intel® 64 architectures. Many features work with compatible processors. Event based sampling requires a genuine Intel® Processor.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® VTune™ Amplifier XE
Easy Predefined Performance Profiles
5
Quickly Select an Analysis Type Click New Analysis Select an Analysis Type
Hotspots – Which functions use the most time? Click [+] for the call stack. Double click to see the source.
Concurrency – Colors show the number of cores used. Add parallelism for hotspots with poor concurrency.
Locks and Waits – Waiting a long time on a lock is bad if the cores are underutilized during the wait.
1
2
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® VTune™ Amplifier XE
Powerful EBS Made Easier
6
System Wide Event Based Sampling (EBS) uses the on chip PMU to count performance events like cache misses, clock ticks and instructions retired.
Every Intel® Processor has an on chip Performance Monitoring Unit (PMU).
Predefined EBS Profiles Easy EBS setup for newer processors. No memorizing complex event names. Profiles vary by microarchitecture. (Full custom profiles also available)
Opportunities Highlighted General Exploration turns the cell pink when it suspects a tuning opportunity is present. Hover gives suggestions.
Pinpoint tuning opportunities See opportunities like cache misses. View results on the timeline, in the grid view or on your source.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Double Click from Grid or Timeline
See Profile Data On Source / Asm
7
Time on Source / Asm
Quickly scroll to hot spots. Scroll Bar “Heat Map” is an overview of hot spots
Click jump to scroll Asm
Quick Asm navigation: Select source to highlight Asm
Right click for instruction reference manual
Intel® VTune™ Amplifier XE
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
8
Intel® VTune™ Amplifier XE Filter Data - Get Actionable Information Frame Analysis – Analyze Long Latency Activity
Frame: a region executed repeatedly
API marks start and finish
Examples:
• for and while loops
• Game – Compute next graphics frame
• Simulator – Time step loop
• Computation – Convergence loop
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Result:
9
Intel® VTune™ Amplifier XE Find Slow Frames With One Click (1) Regroup Data
… (Partial list shown)
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel VTune Amplifier XE Algorithmic Analysis – Frame Analysis
10
Fast Good Slow
Frames / iterations
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Result: Functions taking a lot of time in slow frames
(1) Only show slow frames
Just 2 more clicks shows where to focus tuning…
Slow functions in slow frames
11
(2) Regroup: Show functions
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Profile a Running Application No need to stop and re-launch the app when profiling
Two Techniques:
Attach to Process:
• Hotspot
• Concurrency
• Locks & Waits
Profile System:
• Lightweight Hotspots
• Advanced & Custom EBS
• Optional: Filter by process after collection
12
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
13
Intel® VTune™ Amplifier XE
Compare Results Quickly - Sort By Difference
Quickly identify cause of regressions.
• Run a command line analysis daily
• Identify the function responsible so you know who to alert
Compare 2 optimizations – What improved?
Compare 2 systems – What didn’t speed up as much?
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
14
Windows & Linux Versions Available Stand-alone GUI, Command line, Visual Studio Integration
Microsoft Windows* OS • Windows XP*, Windows Vista*, Windows 7*
• Windows Server* 2003, 2008
• Integration with Microsoft Visual Studio* 2005, 2008 and 2010
• Standalone GUI and command line
• IA32 and Intel® 64
Linux* OS • RHEL*, Fedora*, SUSE*, CentOS*, Ubuntu*
• Additional distributions may also work
• Standalone GUI and command line
• IA32 and Intel® 64
Single user and floating licenses available
Refresher Intel® VTune™ Amplifier XE Intel® Inspector XE
15
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Where are my application’s…
Memory Errors Threading Errors Security Errors
• Invalid Accesses • Memory Leaks • Uninitialized Memory
Accesses
• Races • Deadlocks • Cross Stack References
• Buffer overflows and underflows
• Incorrect pointer usage • Over 250 error types…
Intel® Inspector XE Move correctness analysis earlier in the design cycle
16
• Workflow for developers • Multiple tools – common i/f • Windows* & Linux*
Jean Kypreos Advanced Video Processing Team Manager
Envivio
"Having such a tool this early in the development stage frees the validation from trivial bug reports and gives our engineers the opportunity to code more efficiently from the very beginning of the product cycle."
Developer friendly tools help you find errors earlier
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Dynamic Analysis Detects memory and threading errors
17
• Memory errors • Invalid Memory Accesses • Memory Leaks • Uninitialized Memory Accesses • Improper usage of Memory APIs • Resource Leaks (Windows only)
•Threading Errors • Data Races • Deadlock/Lock Hierarchy Violation • Cross Stack Memory Accesses
• Use your normal build & compiler (dynamic binary instrumentation)
• Analyze DLLs/SOs (source optional)
• Runs threaded if app threaded • Requires a workload (app is run)
• 32 and 64-bit OSs • API for custom mem. allocators • Easy user interface + cmd line
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Static Analysis Detects over 250 different kinds of errors and security risks
18
Coding Errors (partial list)
• Memory and resource leaks • Incorrect OpenMP*/Intel® Cilk™ directives • Pointer and array errors
Security Errors (partial list)
• Buffer overflows and underflows • Uninitialized variables and objects • Incorrect pointer usage • Misuse of string, memory formatting libs
• Global analysis crosses subroutine and file boundaries
• No compiler change required - Existing compiler for code generation - Intel compiler front end for static analysis
• No workload required (app is not run)
• Fast (compared to dynamic analysis)
• Every developer can run SSA (no central server, it is like a regular build)
• Easy user interface + cmd line
Static Analysis is included in Parallel Studio XE studio bundles. It is not sold separately.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Static & Dynamic Analysis Complement Each Other
19
Dynamic Analysis Static Analysis
Memory & Threading Errors Memory, Code & Security Errors
Slow (1x – 20x - 100x workload) Fast (app is not run)
Workload must exercise path
(does not need to cause a program error) All paths checked
Fewer false errors – only on real paths More errors – we rank by risk
No source required – check DLLs/SOs Source required
Use your normal compiler Use your current build - No rebuild (debug build with symbols recommended)
No compiler change required - Existing compiler for code generation - Intel compiler front end for static analysis
No central server to set up Just create a build for static analysis - Auto setup available in Visual Studio†
Both reduce total lifecycle costs
† Requires Parallel Studio XE SP1
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
20
Intel® VTune™ Amplifier XE and Inspector XE
Current .NET support
• Support the analysis of pure .NET applications as well as “mixed” applications that contain both managed and unmanaged code
Inspector XE detects
• potential deadlocks and data races in .NET programs
• object allocations and accesses to shared memory on the garbage-collected heap
• the static data areas and flags unsynchronized accesses (at least one of which is a write operation) of multiple threads to the same object/class data member as a potential data race
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
21
Intel® VTune™ Amplifier XE and Inspector XE
Current .NET support
• Support the analysis of pure .NET applications as well as “mixed” applications that contain both managed and unmanaged code
VTune™ Amplifier XE helps developers
• in fine-tuning serial and parallel applications for optimal performance in their pure .NET or mixed applications.
• analyze and visualize the work distribution between threads as well as thread synchronization points
• identify work distribution problems and excessive threads synchronization which prevents parallel execution
• identify micro-architectural performance issues and identify the architectural bottlenecks on a given Intel® processor.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
22
Intel® VTune™ Amplifier XE and Inspector XE
Tune applications that mix C++ and C#
• See analysis results on your C# and C++ source
• Results mapped to your symbols
• Mixed mode stack walking (View stacks that mix C++ and C#)
• .NET* 2.0, 3.0, 4.0
• Tuning bottlenecks in pure C# may also require additional profiling tools to analyze things like garbage collection
New .NET 4.0 sync APIs and task APIs are not supported. ASP .NET is not supported.
Demo - Intel® VTune™ Amplifier XE Intel® Inspector XE
23
Upcoming features VTune™ Amplifier XE 2013
24
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
25
VTune™ Amplifier XE 2013 Beta Features
Brand new 2013 features
– Windows 8 and Visual Studio* 11
– Intel® MIC Support
– Event-based Sampling with Stacks
– Statistical Call Count collection
– Task Analysis
– “Total time” analysis in command line
– gprof-compatible report
– Java* Support
– OpenCL* support
Previously released new features
– Inline function support
– Bandwidth analysis
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
26
Hardware Event-based Sampling (EBS) with Call Stacks
• When collecting EBS data, turn on the “Collect stacks” box
• Call stacks for the EBS data will be collected
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
27
Statistical Call Count collection
• Statistical call count is supported in 2013 beta version
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
28
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer & Optimization Notice
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
29