automatic differentiation - mcmaster universitycs777/presentations/ad.pdf · automatic...

Post on 20-Apr-2018

221 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Automatic Differentiation

Automatic Differentiation

Hamid Reza Ghaffari , Jonathan Li, Yang Li, Zhenghua Nie

Instructor: Prof. Tamas TerlakySchool of Computational Engineering and School

McMaster University

March. 23, 2007

Automatic Differentiation

Outline

1 Introductions2 Forward and Reverse Mode

Forward methodsReverse methodsComparisonExtended knowledgeCase Study

3 Complexity AnalysisForward ModeComplexityReverse ModeComplexity

4 AD SoftwaresAD tools in MATLABAD in C/C++ (ADIC)

DevelopersintroductionADIS AnatomyADICProcessExampleHandling Side EffectsReferences

Automatic Differentiation

Introductions

Why Do we Need Derivatives?

Optimization via gradient method.

Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].

Solution of Nonlinear Equations f (x) = 0 by NewtonMethod

xn+1 = xn −[∂f (xn)

∂x

]−1

f (xn)

requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......

Automatic Differentiation

Introductions

Why Do we Need Derivatives?

Optimization via gradient method.

Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].

Solution of Nonlinear Equations f (x) = 0 by NewtonMethod

xn+1 = xn −[∂f (xn)

∂x

]−1

f (xn)

requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......

Automatic Differentiation

Introductions

Why Do we Need Derivatives?

Optimization via gradient method.

Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].

Solution of Nonlinear Equations f (x) = 0 by NewtonMethod

xn+1 = xn −[∂f (xn)

∂x

]−1

f (xn)

requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......

Automatic Differentiation

Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.

Automatic Differentiation

Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.

Automatic Differentiation

Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.

Automatic Differentiation

Introductions

Main Approaches

Hand CodingDivided DifferencesSymbolic DifferentiationAutomatic Differentiation

Automatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Automatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Automatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Automatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Automatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Automatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Automatic Differentiation

Introductions

Divided Differences

Approximate the derivative of a function f w.r.t the i thcomponent of x at a particular point x0 by differencenumerically, e.g

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

where ei is the i th Cartesian unit vector.

Automatic Differentiation

Introductions

Divided Differences(Ctd.)

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Automatic Differentiation

Introductions

Divided Differences(Ctd.)

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Automatic Differentiation

Introductions

Divided Differences(Ctd.)

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Automatic Differentiation

Introductions

Divided Differences(Ctd.)

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Automatic Differentiation

Introductions

Divided Differences(Ctd.)

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Automatic Differentiation

Introductions

Divided Differences(Ctd.)

∂f (x)

∂xi

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

h

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Automatic Differentiation

Introductions

Symbolic Differentiation

Find an explicit derivative expression by computer algebrasystems.

Disadvantages:The length of the representation of the resulting derivativeexpressions increases rapidly with the number, n, ofindependent variables;Inefficient in terms of computing time due to the rapidgrowth of the underlying expressions;Unable to deal with constructs such as branches, loops, orsubroutines that are inherent in computer codes.

Automatic Differentiation

Introductions

Automatic Differentiation

What is Automatic Differentiation?Algorithmic, or automatic, differentiation (AD) is concernedwith the accurate and efficient evaluation of derivatives forfunctions defined by computer programs. No truncationerrors are incurred, and the resulting numerical derivativevalues can be used for all scientific computations that arebased on linear, quadratic, or even higher orderapproximations to nonlinear scalar or vector functions.

Automatic Differentiation

Introductions

Automatic Differentiation (Cont.)

What’s the idea behind Automatic Differentiation?Automatic differentiation techniques rely on the fact thatevery function no matter how complicated is executed on acomputer as a (potentially very long) sequence ofelementary operations such as additions, multiplications,and elementary functions such as sin and cos. Byrepeated application of the chain rule of derivative calculusto the composition of those elementary operations, onecan computes in a completely mechanical fashion.

Automatic Differentiation

Introductions

How good AD is?

ReliabilityAccurate to machine precision, no truncation error exists.Computational CostForward Mode: 2 ∼ 3n × cost(f )Reverse Mode: 5× cost(f )Human EffortSpend less time in preparing a code for differentiation, inparticular in situations where computer models are boundto change frequently.

Automatic Differentiation

Introductions

How widely is AD used?

Sensitivity Analysis of a Mesoscale Weather ModelApplication Area: Climate ModelingData assimilation for ocean circulationApplication Area: OceanographyIntensity Modulated Radiation TherapyApplication Area: BiomedicineMultidisciplinary Design of AircraftApplication Area: Computational Fluid DynamicsThe NEOS serverApplication Area: Optimization......

Source: http://www.autodiff.org/?module=Applications&submenu=& category=all

Automatic Differentiation

Forward and Reverse Mode

AD methods : SimpleExample

Automatic Differentiation

Forward and Reverse Mode

SimpleExample

Unify all the variable..

Automatic Differentiation

Forward and Reverse Mode

Forward method

Forward methodDifferentiate the Code:

ui = xi i = 1, ...n,

ui = Φ({uj}j<i) i = n + 1, ..., N

Differentiate:∇ui = ei i = 1, ..., n

∇ui =∑j<i

ci,j ∗ ∇uj i = n + 1, ..., N

Automatic Differentiation

Forward and Reverse Mode

Reverse method

Reverse methodCompute the Adjoint of the Code

uj =∂y∂uj

=∂(y1, y2, ..ym)

∂uj

Compute for dependent variables

un+p+j =∂(y1, y2, ..ym)

∂uj= ej j = 1, ..., m

Compute for intermediates and independents uj , j = n + p, ..., 1

uj =∂y∂uj

=∑i>j

uici,j

Automatic Differentiation

Forward and Reverse Mode

Forward methods

Forward methods

Forward methodMethod : Compute the gradient of each variable, and usethe chain rule to pass the gradientThe size of computed object: In each computation, itcomputes the vectors with input size n.The computation of gradient of each variable proceedswith the computation of each variableEasily implement

Automatic Differentiation

Forward and Reverse Mode

Forward methods

Forward methods

Computing Variable Value Computing Gradient Value

Automatic Differentiation

Forward and Reverse Mode

Reverse methods

Reverse methods

Reverse methodMethod : Compute Adjoint of each variable, pass theAdjointThe size of computed object: In each computation, itcomputes the vectors with output size m. (Note,usually theoutput size is 1 in optimization application.)The computation of Adjoint of each variable proceed afterthe completion of the computation of all variables.

Automatic Differentiation

Forward and Reverse Mode

Reverse methods

Reverse methods

Reverse methodTraverse through the Computational Graph reversely andget the parents of each variable so as to compute theAdjoint.Obtain the gradient by compute each partial deriviate oneby oneHarder to implement

Automatic Differentiation

Forward and Reverse Mode

Reverse methods

Reverse methods

Computing Variable Value Computing Adjoint Value

Automatic Differentiation

Forward and Reverse Mode

Reverse methods

Implementation of Reverse mode

Implementation of Reverse modeAs mentioned above, the implementation in Forward modeis relatively straightforward. We only propose thecomparison of important feature between SourceTransformation and Operator Overloading:Using Source Transformation: Re-ordering the code upsidedownUsing Operator Overloading: Record computation on a"tape"

Automatic Differentiation

Forward and Reverse Mode

Reverse methods

Implementation of Reverse mode

Re-ordering the code upside down:

Automatic Differentiation

Forward and Reverse Mode

Reverse methods

Implementation of Reverse mode

Record computation on a "tape"Record:Operation,operandsRelated technique: CheckpointingIf the number of operations going large, Checkpointingprevent the program from exhausting all the memory

Automatic Differentiation

Forward and Reverse Mode

Comparison

Comparison

The following topic is discussed in the comparisonbetween Forward mode and backward modeComputational ComplexityMemory RequiredTime to develop

Automatic Differentiation

Forward and Reverse Mode

Comparison

Cost of Forward Propagation of Derivs.

Define{

N|c|=1 : No. of unit local derivatives ci,j = ±1N|c|6=1 : No. of nonunit local derivatives ci,j 6= 0, ±1

Solve for derivatives in forward order 5un+1,5un+2, . . . ,5uN

5ui =∑j≺i

ci,j ∗ 5uj , i = n + 1, . . . , N,

with each 5ui = (∂ui/∂x1, . . . , ∂ui/∂xn), a length n vector.Flop count flops(fwd) given by,

flops(fwd) = nN|c|6=1 (mults.ci,j ∗ 5uj , ci,j 6= 1, 0)+n(N|c|6=1 + N|c|=1) (adds./subs. + ci,j 5 uj)−n(p + m) (first n adds./subs.)

flops(fwd) = n(2N|c|6=1 + N|c|=1 − p −m)

Automatic Differentiation

Forward and Reverse Mode

Comparison

Cost of Reverse Propagation of Adjoints

Solve for adjoints in reverse order un+p, un+p−1, . . . , u1

uj =∑i�j

uici,j .

with uj = ∂∂uj

(y1, y2, . . . , ym) is a length m vector.

Flop count flops(rev) given by,

flops(rev) = mN|c|6=1 (mults.ui ∗ ci,j , ci,j 6= ±1, 0)= +m(N|c|=1 + N|c|6=1) (adds./subs. + (ui ∗ ci,j))

flops(rev) = m(2N|c|6=1 + N|c|=1).

Automatic Differentiation

Forward and Reverse Mode

Comparison

Memory Required

Used Storage:It’s uncertain that which mode takes more memory,usually, reverse mode takes more.The cost of memory for Forward mode is from:Storing size (1) in each variableStoring input size n in each gradient variableThe cost of memory for Reverse mode is from:Storing size (1) in each variableStoring output size m in each Adjoint variableStoring DAG(directed acyclic graph,which present thefunction)

Automatic Differentiation

Forward and Reverse Mode

Comparison

Memory Required

It’s more likely to have less memory used while usingforward mode:1.If there exists reused variable in original function2.If n is so large that Reverse requires lots of memory tostore DAG.It’s more likely to have less memory used while usingreverse mode:1.If n is relatively large, so the storage required for storinggradient is more than storing Adjoint

Automatic Differentiation

Forward and Reverse Mode

Comparison

Time to develop

Time to develop: Usually, it’s hard to develop Reversecode than Forward one, especially using SourceTransformation technique.

Automatic Differentiation

Forward and Reverse Mode

Comparison

Time to develop

Conclusion:Using Forward mode when n � m, such as optimizationUsing Reverse mode when m � n, such as SensitivityAnalysis

Automatic Differentiation

Forward and Reverse Mode

Extended knowledge

Extended knowledge

Directional DerivativesForward mode:seed d = (d1, ...dn)Tseeding ∇xi = dicalculates Jf ∗ dMulti-directional derivatives : replace d by D,whereD = [dij ]i=1,..n,j=1,..q

Automatic Differentiation

Forward and Reverse Mode

Extended knowledge

Extended knowledge

Directional AdjointsReverse mode:seed v = (v1, ...vm)seeding y j = vjcalculates v ∗ JfMulti-directional Adjoint : replace v by V,whereV = [vij ]i=1,..q,j=1,..m

Automatic Differentiation

Forward and Reverse Mode

Case Study

Case Study

Using FADBAD++:FADBAD++ were developed by Ole Stauning and ClausBendtsen.Flexible automatic differentiation using templates andoperator overloading in ANSI C++Only with source code, no additional library required.Free to use

Automatic Differentiation

Forward and Reverse Mode

Case Study

Case Study

Using FADBAD++:Test function : f (x) =

∏xi

Objective: Testing different coding of the function inForward mode, try to reuse the variableResult : Basically, no matter how you code,the memorycost as much as n ∗ n ∗ 8byte , no different between reusevariable or not

Automatic Differentiation

Forward and Reverse Mode

Case Study

Case Study

Using FADBAD++:Test function : f (x) =

∏xi

Objective: Testing Reverse modeResult : test until n = 6500 , Using Forward mode out ofmemory. Reverse is 127 times faster, and only take fewMB.Remark : Couldn’t see how the DAG take the memoryfrom using reverse mode, it’s more likely to observe byusing fewer independent variables but more complicatedfunction.

Automatic Differentiation

Complexity Analysis

Code List

Code-List given by re-writing the code into elemental binaryand unary operations/functions, e.g.[

y1y2

]=

[log2(x1x2) + x2x2

3 − a− x2√b · log(x1x2) + x2/x3 − x2x2

3 + a

]v1 = x1 v7 = v6 ∗ v2 v13 = v8 − v2v2 = x2 v8 = v7 − a v14 = v2

5v3 = x3 v9 = 1/v3 v15 =

√v12

v4 = v1 ∗ v2 v10 = v2 ∗ v9 v16 = v14 + v13v5 = log(v4) v11 = b ∗ v5 v17 = v15 − v8v6 = v2

3 v12 = v11 + v10

Automatic Differentiation

Complexity Analysis

Code-list (ctd.)

Assume code-list containsN± addition/substractions e.g v14 + v13N∗ multiplications e.g. v1 ∗ v2Nf nonlinear functions/operations e.g. log(v4), 1/v3Total of p + m = N± + N∗ + Nf statements

ThenEach addition/subtraction generates two ci,j = ±1Each multiplication generates two ci,j 6= ±1, 0Each nonlinear function generates one ci,j 6= 1, 0 requiringone nonlinear function evaluation e.g. v5 = log(v4) givesc5,4 = 1/v4.

So we have,N|c|=1 = 2N±N|c|6=1 = 2N∗ + 1Nf

Automatic Differentiation

Complexity Analysis

Forward Mode Complexity

Complexity of Forward Mode

flops(Jf ) = flops(f ) + flops(ci,j) + flops(fwd)

Assume flops(nonlinear function) = w , w > 1.

Cost of evaluation function is,

flops(f ) = N∗ + N± + wNf

Cost of evaluation local derivatives ci,j is,

flops(ci,j) = wNf .

Cost of forward propagation of derivatives is

flops(fwd) = n(2N|c|6=1 + N|c|=1 − p −m)= n(3N∗ + N± + Nf )

Automatic Differentiation

Complexity Analysis

Forward Mode Complexity

Complexity of Forward Mode (Ctd.)

Then for forward mode

flops(Jf )flops(f ) = 1 + wNf +n(3N∗+N±+Nf )

N∗+N±+wNf

= 1 + 3nN∗ + nN± + n( 1w + 1

n )wN f

where,

(N∗, N±, wN f ) =(N∗, N±, wNf )

N∗ + N± + wNf.

SinceN∗ + N± + wN f = 1 and all coefficients positive,

flops(Jf )flops(f )

≤ 1 + n ∗max(3, 1, (1w

+1n

)) = 1 + 3n.

n << m, Forward Mode preferred.

Automatic Differentiation

Complexity Analysis

Reverse Mode Complexity

Complexity of Reverse Mode

flops(rev) = m(4N∗ + 2N± + 2Nf ),

giving,

flops(Jf )flops(f ) = 1 + 4mN∗ + 2mN± + m( 2

w + 1m )wN f

and

flops(Jf )flops(f )

≤ 1 + m ∗max(4, 2, (2w

+1m

)) = 1 + 4m

For m = 1flops(5f ) ≤ 5flops(f )

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Differentiation Arithmetic

−→u = (u, u′),

where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).

−→u +−→v = (u + v , u′ + v ′)

−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)

−→x = (x , 1)−→c = (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Differentiation Arithmetic

−→u = (u, u′),

where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).

−→u +−→v = (u + v , u′ + v ′)

−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)

−→x = (x , 1)−→c = (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Differentiation Arithmetic

−→u = (u, u′),

where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).

−→u +−→v = (u + v , u′ + v ′)

−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)

−→x = (x , 1)−→c = (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example of a Rational Function

f (x) = (x+1)(x−2)x+3

f (3) = 2/3, f ′(3) =?

−→f (−→x ) =

(−→x +

−→1 )(

−→x −−→2 )

(−→x +

−→3 )

=((x , 1) + (1, 0))× ((x , 1)− (2, 0))

((x , 1) + (3, 0))

Inserting the value −→x = (3, 1) into−→f produces

−→f (3, 1) =

((3, 1) + (1, 0))× ((3, 1)− (2, 0))

((3, 1) + (3, 0))

=(4, 1)× (1, 1)

(6, 1)

=(4, 5)

(6, 1)=

(23

,1318

)

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example of a Rational Function

f (x) = (x+1)(x−2)x+3

f (3) = 2/3, f ′(3) =?

−→f (−→x ) =

(−→x +

−→1 )(

−→x −−→2 )

(−→x +

−→3 )

=((x , 1) + (1, 0))× ((x , 1)− (2, 0))

((x , 1) + (3, 0))

Inserting the value −→x = (3, 1) into−→f produces

−→f (3, 1) =

((3, 1) + (1, 0))× ((3, 1)− (2, 0))

((3, 1) + (3, 0))

=(4, 1)× (1, 1)

(6, 1)

=(4, 5)

(6, 1)=

(23

,1318

)

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example of a Rational Function

f (x) = (x+1)(x−2)x+3

f (3) = 2/3, f ′(3) =?

−→f (−→x ) =

(−→x +

−→1 )(

−→x −−→2 )

(−→x +

−→3 )

=((x , 1) + (1, 0))× ((x , 1)− (2, 0))

((x , 1) + (3, 0))

Inserting the value −→x = (3, 1) into−→f produces

−→f (3, 1) =

((3, 1) + (1, 0))× ((3, 1)− (2, 0))

((3, 1) + (3, 0))

=(4, 1)× (1, 1)

(6, 1)

=(4, 5)

(6, 1)=

(23

,1318

)

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Derivatives of Element Functions

Chain Rule:

(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)

−→g (−→u ) =

−→g ((u, u′)) = (g(u), u′g′(u))

sin−→u = sin(u, u′) = (sin u, u′ cos u)

cos−→u = cos(u, u′) = (cos u,−u′ sin u)

e−→u = e(u,u′) = (eu, u′eu)

...

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Derivatives of Element Functions

Chain Rule:

(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)

−→g (−→u ) =

−→g ((u, u′)) = (g(u), u′g′(u))

sin−→u = sin(u, u′) = (sin u, u′ cos u)

cos−→u = cos(u, u′) = (cos u,−u′ sin u)

e−→u = e(u,u′) = (eu, u′eu)

...

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Derivatives of Element Functions

Chain Rule:

(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)

−→g (−→u ) =

−→g ((u, u′)) = (g(u), u′g′(u))

sin−→u = sin(u, u′) = (sin u, u′ cos u)

cos−→u = cos(u, u′) = (cos u,−u′ sin u)

e−→u = e(u,u′) = (eu, u′eu)

...

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example of Sin

From ../Intlab/gradient/@gradient/sin.m

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.

f (x) = (1 + x + ex) sin x−→f (−→x ) = (

−→1 +

−→x + e−→x )sin−→x

−→f (0, 1) =

((1, 0) + (0, 1) + e(0,1)

)sin(0, 1)

=((1, 1) + (e0, e0)

)(sin 0, cos 0)

= (2, 2)(0, 1) = (0, 2).

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.

f (x) = (1 + x + ex) sin x−→f (−→x ) = (

−→1 +

−→x + e−→x )sin−→x

−→f (0, 1) =

((1, 0) + (0, 1) + e(0,1)

)sin(0, 1)

=((1, 1) + (e0, e0)

)(sin 0, cos 0)

= (2, 2)(0, 1) = (0, 2).

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.

f (x) = (1 + x + ex) sin x−→f (−→x ) = (

−→1 +

−→x + e−→x )sin−→x

−→f (0, 1) =

((1, 0) + (0, 1) + e(0,1)

)sin(0, 1)

=((1, 1) + (e0, e0)

)(sin 0, cos 0)

= (2, 2)(0, 1) = (0, 2).

Automatic Differentiation

AD Softwares

AD tools in MATLAB

High-order Derivatives

−→u = (u, u′, u′′),

−→u +−→v = (u + v , u′ + v ′, u′′ + v ′′)

−→u −−→v = (u − v , u′ − v ′, u′′ − v ′′)−→u ×−→v = (uv , uv ′ + u′v , uv ′′ + 2u′v ′ + u′′v ′)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v , (u′′ − 2(u/v)′v ′ − (u/v)v ′′)/v)

· · · · · ·

Automatic Differentiation

AD Softwares

AD tools in MATLAB

INTLab

Developers: Institute for Reliable Computing, HamburgUniversity of Technology

Mode: ForwardMethod: Operator overloading

Language: MATLABURL: http://www.ti3.tu-harburg.de/rump/intlab/

Licensing: Open Source

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Rosenbrock Function

y1 = 400x1(x21 − x2) + 2(x1 − 1)

y2 = 200(x21 − x2)

Automatic Differentiation

AD Softwares

AD tools in MATLAB

One Step of Newton Method with INTLab

Automatic Differentiation

AD Softwares

AD tools in MATLAB

TOMLAB/MAD

Developers: Marcus M. Edvall and Kenneth Holmstrom,Tomlab Optimization Inc. (TOMLAB /MADintegration)Shaun A. Forth and Robert Ketzscher, CranfieldUniversity (MAD)

Mode: ForwardMethod: Operator overloading

Language: MATLABURL: http://tomlab.biz/products/mad/

Licensing: License

Automatic Differentiation

AD Softwares

AD tools in MATLAB

One Step of Newton Method with MAD

Automatic Differentiation

AD Softwares

AD tools in MATLAB

ADiMat

Developers: Andre Vehreschild, Institute for ScientificComputing, RWTH Aachen University

Mode: ForwardMethod: Source transformation

Operator overloadingLanguage: MATLAB

URL: http://www.sc.rwth-aachen.de/vehreschild/adimat.html

Licensing: under discussion

Automatic Differentiation

AD Softwares

AD tools in MATLAB

ADiMat’s Example

function [result1, result2]= f(x)% Compute the sin and square-root of x*2.% Very simple example for ADiMat website.% Andre Vehreschild, Institute for% Scientific Computing,% RWTH Aachen University, D-52056 Aachen,% Germany.% vehreschild@sc.rwth-aachen.de

result1= sin(x);result2= sqrt(x*2);

Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

Automatic Differentiation

AD Softwares

AD tools in MATLAB

ADiMat’s Example (cont.)

>> addiff(@f, ’x’, ’result1,result2’);>> p=magic(5);>> g_p=createFullGradients(p);>> [g_r1, r1, g_r2, r2]= g_f(g_p, p);>> J1= [g_r1{:}]; % and>> J2= [g_r2{:}];

Source: http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

Automatic Differentiation

AD Softwares

AD tools in MATLAB

ADiMat’s Example (cont.)

function [g_result1, result1, g_result2, result2] = g_f(g_x, x)% Compute the sin and square-root of x*2.% Very simple example for ADiMat website.% Andre Vehreschild, Institute for Scientific Computing,% RWTH Aachen University, D-52056 Aachen, Germany.% vehreschild@sc.rwth-aachen.de

g_result1= ((g_x).* cos(x));result1= sin(x);g_tmp_f_00000= g_x* 2;tmp_f_00000= x* 2;g_result2= ((g_tmp_f_00000)./ (2.*sqrt(tmp_f_00000)));result2= sqrt(tmp_f_00000);

Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Matrix Calculus

Definition: If X is p × q and Y is m × n, then dY: = dY/dX dX:where the derivative dY/dX is a large mn × pq matrix.

d(X 2) : = (XdX + dXX ) :

d(det(X )) = d(det(X T )) = det(X )(X−T ) :T dX :

d(ln(det(X ))) = (X−T ) :T dX :

Ref: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Vandermonde Function

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in

MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Vandermonde Function (cont.)

Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in

MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Vandermonde Function (cont.)

Method 10 20 40 80 160 320 640 1280Function 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.000

MAD(Full) 0.070 0.060 0.070 0.130 0.581 2.664 10.535 45.535MAD(Sparse) 0.071 0.050 0.060 0.060 0.060 0.070 0.100 0.881

INTLab 0.050 0.040 0.040 0.090 0.040 0.050 0.071 0.120ADiMat 0.231 0.140 0.271 0.601 1.362 3.044 7.340 21.611

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)

Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Arrowhead Function

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in

MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Arrowhead Function (cont.)

Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in

MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Arrowhead Function (cont.)

Method 20 40 80 160 320 640 1280Function 0.010 0.000 0.000 0.000 0.000 0.000 0.000

MAD(Full) 0.180 0.050 0.070 0.200 1.111 4.367 17.796MAD(Sparse) 0.060 0.060 0.060 0.070 0.080 0.100 0.160

INTLab 0.090 0.051 0.050 0.050 0.081 0.140 0.340ADiMat 0.911 0.311 0.651 1.262 2.704 6.028 14.581

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)

Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

Automatic Differentiation

AD Softwares

AD tools in MATLAB

BDQRTIC mod

Automatic Differentiation

AD Softwares

AD tools in MATLAB

BDQRTIC mod (cont.)

Method 20 40 80 160 320 640 1280Function 12.809 0.010 0.000 0.000 0.000 0.010 0.000

MAD(Full) 2.604 0.121 0.150 0.490 2.513 10.926 43.162MAD(Sparse) 0.270 0.120 0.130 0.150 0.201 0.260 0.371

INTLab 2.293 0.080 0.100 0.110 0.150 0.230 0.481ADiMat 3.455 0.621 1.152 2.544 5.778 14.641 42.671

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)

Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

Automatic Differentiation

AD Softwares

AD tools in MATLAB

Summary of AD softwares in MATLab

Operator overloading method for AD forward mode is easyto implement by differentiation arithmetic.All of AD tools in Matlab are easy to use.Sparse storage provides a good way to improve theperformance of AD tools.

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

The Computational Differentiation Group atArgonne National Laboratory

ADIC introduced in 1997 by:

Chrirtian BischofScientific Computing at

RWTH Aachen University

Lucas Rohfounder, president andCEO of Hostway Co.

and the other team mem-bers.

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

State of ADIS

ADIC is an Automatic Differentiation tools In ANSI C/C++.

ADIC was introduced in 1966.

Last updated: June 10, 2005.

Official web site www-new.mcs.anl.gov/adic/down-2.htm.

ADIC is using forward method.

Supported Platforms: Unix/Linux.

Selected Application: NEOS

Related Research Group: Argonne National Laboratory,USA

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

ADICAnatomy

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

ADICProcess

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

func.cUntitled

#include "func.h"#include <math.h>

void func(data_t * pdata){ int i; double *x = pdata->x; double *y = pdata->y; double s, temp;

i=0; for (;i < pdata->len ;){ s = s + x[i]*y[i]; i++; }

temp = exp(s);

pdata->r = temp;}

Page 1

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

driver.c

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

Commands

The first command generates the header file ad_deriv.hand derivative function func.ad.c;

The second command compiles and links all neededfunctions and generates ad_func;

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

Handling Side Effects

Automatic Differentiation

AD Softwares

AD in C/C++ (ADIC)

For Further Reading in ADIC

Christian H. Bischof, Paul D. Hovland, Boyana NorrisImplementation of Automatic Differentiation Tools.PEPM Š02, Jan. 1415, 2002 Portland, OR, USA

Paul D. Hovlan and Boyana NorrisUsers’ Guide to ADIC 1.1.UsersŠ Guide to ADIC 1.1

C. H. Bischof, L. Roh, A. J. Mauer-OatsADIC: an extensible automatic differentiation tool forANSI-C.Mathematics and Computer Science Division, ArgonneNational Laboratory, Argonne, IL 60439, USA

Automatic Differentiation

Reference

ReferenceC.H. Bischof and H. M. Bucker. Computing Derivatives of Computer Programs, in Modern Methods andAlgorithms of Quantum Chemistry: Proceedings, Second Edition, edited by J. Grotendorst,NIC-Directors,2000, pages 315-327C. Bischof, A. Carle, P. Khademi, and G. Pusch. Automatic Differentiation: Obtaining Fast and ReliableDerivatives-Fast, in Control Problems in Industry, edited by I. Lasiecka and B. Morton,1995, pages 1-16Andreas Griewank. On Automatic Differentiation, in Mathematical Programming: Recent Developments andApplications, edited by M. Iri and K. Tanabe, Kluwer Academic Publishers, 1989.Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, Penn., 2000.Shaun Forth. Introduction to Automatic Differentiation, presentation slide for The 4th InternationalConference on Automatic Differentiation. July 19-23 University of Chicago, Gleacher Centre, Chicago USA,2004.G. F. Corliss, Automatic Differentiation.Warwick Tucker, http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdfhttp://www.autodiff.org/http://www.ti3.tu-harburg.de/rump/intlab/http://tomopt.com/tomlab/products/mad/http://www.sc.rwth-aachen.de/vehreschild/adimat/index.htmlShaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation inMATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195 − 222Siegfried M. Rump INTLAB −− INTerval LABoratory Developments in Reliable Computing, KluwerAcademic Publishers, 1999, p77 − 104Christian H. Bischof, H. Martin Bucker, Bruno Lang, A. Rasch, Andre Vehreschild Combining SourceTransformation and Operator Overloading Techniques to Compute Derivatives for MATLAB ProgramsConference proceeding, Proceedings of the Second IEEE International Workshop on Source Code Analysisand Manipulation (SCAM 2002), IEEE Computer Society, 2002

Automatic Differentiation

Thanks & Questions

Thanks!

Questions?

top related