visual c++ 2005 new optimizations ayman shoukry program manager visual c++ microsoft corporation

27
Visual C++ 2005 Visual C++ 2005 New Optimizations New Optimizations Ayman Shoukry Ayman Shoukry Program Manager Program Manager Visual C++ Visual C++ Microsoft Corporation Microsoft Corporation

Upload: ella-west

Post on 26-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Visual C++ 2005 New Visual C++ 2005 New OptimizationsOptimizations

Ayman ShoukryAyman ShoukryProgram ManagerProgram ManagerVisual C++Visual C++Microsoft CorporationMicrosoft Corporation

How can your application run How can your application run faster?faster?

► Maximize optimization for each file.Maximize optimization for each file.► Whole Program Optimization (WPO) goes Whole Program Optimization (WPO) goes

beyond individual files.beyond individual files.► Profile Guided Optimization (PGO) Profile Guided Optimization (PGO)

specializes optimizations specifically for specializes optimizations specifically for your application.your application.

► New Floating Point Model.New Floating Point Model.► OpenMPOpenMP► 64bit Code Generation.64bit Code Generation.

Maximum Optimization for Each Maximum Optimization for Each FileFile

► Compiler optimizes each source code file to Compiler optimizes each source code file to get best runtime performance get best runtime performance The only type optimization available in Visual C++ The only type optimization available in Visual C++

66

► Visual C++ 2005 has better optimization Visual C++ 2005 has better optimization algorithmsalgorithms Specialized support for newer processors such as Specialized support for newer processors such as

Pentium 4Pentium 4 Improved speed and better precision of floating Improved speed and better precision of floating

point operationspoint operations New optimization techniques like loop unrollingNew optimization techniques like loop unrolling

Whole Program OpitmizationWhole Program Opitmization

► Typically Visual C++ will optimize programs by Typically Visual C++ will optimize programs by generating code for object files separately generating code for object files separately

► Introducing whole program optimizationIntroducing whole program optimization First introduced with Visual C++ 2002 and has since First introduced with Visual C++ 2002 and has since

improvedimproved Compiler and linker set with new options (/GL and /LTCG)Compiler and linker set with new options (/GL and /LTCG) Compiler has freedom to do additional optimizationsCompiler has freedom to do additional optimizations

► Cross-module inliningCross-module inlining► Custom calling conventionsCustom calling conventions

Visual C++ 2005 supports this on all platformsVisual C++ 2005 supports this on all platforms Whole program optimizations is widely used for Microsoft Whole program optimizations is widely used for Microsoft

products.products.

Profile Guided OptimizationProfile Guided Optimization► Static analysis leaves many open optimization Static analysis leaves many open optimization

questions for the compiler, leading to conservative questions for the compiler, leading to conservative optimizationsoptimizations

► Visual C++ programs can be tuned for expected Visual C++ programs can be tuned for expected user scenarios by collecting information from user scenarios by collecting information from running applicationrunning application

► Introducing profile guided optimization Introducing profile guided optimization Optimizing code by using program in a way how its Optimizing code by using program in a way how its

customer use itcustomer use it Runs optimizations at link time like whole program Runs optimizations at link time like whole program

optimizationoptimization Available in Visual Studio 2005 Available in Visual Studio 2005 Widely adopted in Microsoft Widely adopted in Microsoft

if (p != NULL) { /* Perform action with p */} else { /* Error code */}

Is it common for p to be NULL?

If it is not common for p to be NULL, the error

code should be collected with other

infrequently used code

PGO: InstrumentationPGO: Instrumentation

► We instrument with “probes” inserted into We instrument with “probes” inserted into the codethe code

► Two main types of probesTwo main types of probes Value probesValue probes

► Used to construct histogram of valuesUsed to construct histogram of values

Count (simple/entry) probesCount (simple/entry) probes► Used to count number of times a path is takenUsed to count number of times a path is taken

► We try to insert the minimum number of We try to insert the minimum number of probes to get full coverageprobes to get full coverage Minimizes the cost of instrumentationMinimizes the cost of instrumentation

PGO OptimizationsPGO Optimizations

►Switch expansionSwitch expansion►Better inlining decisionsBetter inlining decisions►Cold code separationCold code separation►Virtual call speculationVirtual call speculation►Partial inliningPartial inlining

Compile with /GL & Optimizations On (e.g. /O2)Source Object files

InstrumentedImage

Scenarios Output Profile data

Object files Link with /LTCG:PGI InstrumentedImage

Profile data

Object files

Link with /LTCG:PGOOptimized

Image

Profile Guided Optimization

PGO: Inlining SamplePGO: Inlining Sample►Profile Guided uses call graph path Profile Guided uses call graph path

profiling.profiling.

foo

bat

bar baz

a

PGO: Inlining Sample (Cont)PGO: Inlining Sample (Cont)

100

foo

bat

20 50bar baz

15bar

baz

►Profile Guided uses call graph path Profile Guided uses call graph path profiling.profiling.

a10 75

bar

baz15

PGO – Inlining Sample (cont)PGO – Inlining Sample (cont)

foo

bat

20 125bar baz

10015bar baz

► Inlining decisions are made at each Inlining decisions are made at each call site.call site.

a10

15

PGO – Switch ExpansionPGO – Switch Expansion

if (i == 10)

goto default;switch (i) {

case 1: …

case 2: …

case 3: …

default:…}

Most frequent values are pulled out.

switch (i) {

case 1: …

case 2: …

case 3: …

default:…}

// 90% of the // time i = 10;

PGO – Code SeparationPGO – Code Separation

A

CB

D

100

100

10

10

A

B

C

D

Default layout

A

B

C

D

Optimized layout

Basic blocks are ordered so that most frequent path falls through.

PGO – Virtual Call PGO – Virtual Call SpeculationSpeculation

class Foo:Base{…void call();}

class Bar:Base {…void call();}

class Base{…virtual void call();}

void Bar(Base *A){ … while(true) { … A->call(); … }}

void Func(Base *A){ … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … }}

The type of object A in function Func was almost always Foo via the profiles

PGO – Partial InliningPGO – Partial InliningBasic Block 1

Cond

Cold CodeHot Code

More Code

PGO – Partial Inlining (cont)PGO – Partial Inlining (cont)Basic Block 1

Cond

Cold CodeHot Code

More Code

Hot path is inlined,but NOT the cold

DemoDemo

Optimizing applications Optimizing applications with VC++ 2005with VC++ 2005

New Floating Point ModelNew Floating Point Model

►/Op made your code run slow /Op made your code run slow No intermediate switchNo intermediate switch

►New Floating Point ModelNew Floating Point Model /fp:fast/fp:fast /fp:precise (default)/fp:precise (default) /fp:strict/fp:strict /fp:except/fp:except

/fp:precise/fp:precise

►The default floating point switchThe default floating point switch►Performance and PrecisionPerformance and Precision►IEEE Conformant IEEE Conformant ►Round to the appropriate precisionRound to the appropriate precision

At assignments, casts and function At assignments, casts and function callscalls

/fp:fast/fp:fast

► When performance matters mostWhen performance matters most► You know your application does simple You know your application does simple

floating point operationsfloating point operations► What can /fp:fast do?What can /fp:fast do?

AssociationAssociation DistributionDistribution Factoring inverseFactoring inverse Scalar reductionScalar reduction Copy propagationCopy propagation And othersAnd others……

/fp:except/fp:except

►Reliable floating point exceptionsReliable floating point exceptions►Thrown and not thrown when Thrown and not thrown when

expectedexpected Faults and traps, when reliable, Faults and traps, when reliable,

should occur at the line that causes should occur at the line that causes the exceptionthe exception

FWAITs on x86 might be addedFWAITs on x86 might be added►Cannot be used with /fp:fast and in Cannot be used with /fp:fast and in

managed codemanaged code

/fp:strict/fp:strict

►The strictest FP optionThe strictest FP option Turns off contractionsTurns off contractions Assumes floating point control word Assumes floating point control word

can change or that the user will can change or that the user will examine flagsexamine flags

►/fp:except is implied/fp:except is implied►Low double digit percent slowdown Low double digit percent slowdown

versus /fp:fastversus /fp:fast

What is the output?What is the output?

#include <stdio.h>#include <stdio.h>int main()int main(){{

double x, y, z;double x, y, z;double sum;double sum;x = 1e20;x = 1e20;y = -1e20;y = -1e20;z = 10.0;z = 10.0;sum = x + y + z;sum = x + y + z;printf ("sum=%f\n",sum);printf ("sum=%f\n",sum);

}}

/fp:fast /O2 = 0.000

/fp:strict /O2 = 10.0

OpenMPOpenMP

A specification for writing multithreaded A specification for writing multithreaded programsprograms

It consists of a set of simple #pragmas It consists of a set of simple #pragmas and runtime routinesand runtime routines

Makes it very easy to parallelize loop-Makes it very easy to parallelize loop-based codebased code

Helps with load balancing, Helps with load balancing, synchronization, etc…synchronization, etc…

In Visual Studio, only available in C++In Visual Studio, only available in C++

OpenMP ParallelizationOpenMP Parallelization► Can parallelize loops and straight-line codeCan parallelize loops and straight-line code► Includes synchronization constructsIncludes synchronization constructs

first = 1last = 1000

1 ≤ i ≤ 250 251 ≤ i ≤ 500 501 ≤ i ≤ 750 751 ≤ i ≤ 1000

void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; }}

64bit Compiler in VC200564bit Compiler in VC2005

►64bit Compiler Cross Tools64bit Compiler Cross Tools Compiler is 32bit but resulting image is Compiler is 32bit but resulting image is

64bit64bit

►64bit Compiler Native Tools64bit Compiler Native Tools Compiler and resulting image are 64bit Compiler and resulting image are 64bit

binaries.binaries.

►All previous optimizations apply for All previous optimizations apply for 64bit as well.64bit as well.

ResourcesResources

►Visual C++ Dev CenterVisual C++ Dev Center http://msdn.microsoft.com/visualchttp://msdn.microsoft.com/visualc This is the place to go for all our news and This is the place to go for all our news and

whitepaperswhitepapers Also VC2005 specific forums at Also VC2005 specific forums at http://http://

forums.microsoft.comforums.microsoft.com

►MyselfMyself http://http://blogs.msdn.comblogs.msdn.com/aymans/aymans