automatic differentiation - mcmaster universitycs777/presentations/ad.pdf · automatic...

Automatic Differentiation

Hamid Reza Ghaffari , Jonathan Li, Yang Li, Zhenghua Nie

Instructor: Prof. Tamas TerlakySchool of Computational Engineering and School

McMaster University

March. 23, 2007

Outline

1 Introductions2 Forward and Reverse Mode

Forward methodsReverse methodsComparisonExtended knowledgeCase Study

3 Complexity AnalysisForward ModeComplexityReverse ModeComplexity

4 AD SoftwaresAD tools in MATLABAD in C/C++ (ADIC)

DevelopersintroductionADIS AnatomyADICProcessExampleHandling Side EffectsReferences

Introductions

Why Do we Need Derivatives?

Optimization via gradient method.

Unconstrained Optimization minimize y = f (x) requiresgradient or hessian.Constrained Optimization minimize y = f (x) such thatc(x) = 0 also requires Jacobian Jc(x) = [∂cj/∂xi ].

Solution of Nonlinear Equations f (x) = 0 by NewtonMethod

xn+1 = xn −[∂f (xn)

f (xn)

requires Jacobian JF = [∂f/∂x ].Parameter Estimation, Data Assimilation, SensitivityAnalysis, Inverse Problem, ......

Introductions

xn+1 = xn −[∂f (xn)

f (xn)

Introductions

xn+1 = xn −[∂f (xn)

f (xn)

Introductions

How Do We Obtain Derivatives?

Reliability: the correctness and numerical accuracy of thederivative results;Computational Cost: the amount of runtime and memoryrequired for the derivative code;Development Time: the time it takes to design,implement, and verify the derivative code, beyond the timeto implement the code for the computation of underlyingfunction.

Introductions

Main Approaches

Hand CodingDivided DifferencesSymbolic DifferentiationAutomatic Differentiation

Introductions

Hand Coding

An analytic expression for the derivative is identified first andthen implemented by hand using any high-level programminglanguage.

AdvantagesAccuracy up to machine precision, if care is taken.Highly-optimized implementation depending on the skill ofthe implementer.

DisadvantagesOnly applicable for "simple" functions and error-prone.Requires considerable human effort.

Introductions

Hand Coding

Introductions

Hand Coding

Introductions

Hand Coding

Introductions

Hand Coding

Introductions

Hand Coding

Introductions

Divided Differences

Approximate the derivative of a function f w.r.t the i thcomponent of x at a particular point x0 by differencenumerically, e.g

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

where ei is the i th Cartesian unit vector.

Introductions

Divided Differences(Ctd.)

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

Advantage:only f is needed, easy to be implemented, used as a "blackbox"easy to parallelize

Disadvantage:Accuracy hard to assess, depending on the choice of hComputational complexity bounded below: (n + 1)× cost(f )

Introductions

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

Introductions

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

Introductions

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

Introductions

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

Introductions

∂f (x)

∣∣∣∣x0

≈ f (x0 + hei)− f (x0)

Introductions

Symbolic Differentiation

Find an explicit derivative expression by computer algebrasystems.

Disadvantages:The length of the representation of the resulting derivativeexpressions increases rapidly with the number, n, ofindependent variables;Inefficient in terms of computing time due to the rapidgrowth of the underlying expressions;Unable to deal with constructs such as branches, loops, orsubroutines that are inherent in computer codes.

Introductions

What is Automatic Differentiation?Algorithmic, or automatic, differentiation (AD) is concernedwith the accurate and efficient evaluation of derivatives forfunctions defined by computer programs. No truncationerrors are incurred, and the resulting numerical derivativevalues can be used for all scientific computations that arebased on linear, quadratic, or even higher orderapproximations to nonlinear scalar or vector functions.

Introductions

Automatic Differentiation (Cont.)

What’s the idea behind Automatic Differentiation?Automatic differentiation techniques rely on the fact thatevery function no matter how complicated is executed on acomputer as a (potentially very long) sequence ofelementary operations such as additions, multiplications,and elementary functions such as sin and cos. Byrepeated application of the chain rule of derivative calculusto the composition of those elementary operations, onecan computes in a completely mechanical fashion.

Introductions

How good AD is?

ReliabilityAccurate to machine precision, no truncation error exists.Computational CostForward Mode: 2 ∼ 3n × cost(f )Reverse Mode: 5× cost(f )Human EffortSpend less time in preparing a code for differentiation, inparticular in situations where computer models are boundto change frequently.

Introductions

How widely is AD used?

Sensitivity Analysis of a Mesoscale Weather ModelApplication Area: Climate ModelingData assimilation for ocean circulationApplication Area: OceanographyIntensity Modulated Radiation TherapyApplication Area: BiomedicineMultidisciplinary Design of AircraftApplication Area: Computational Fluid DynamicsThe NEOS serverApplication Area: Optimization......

Source: http://www.autodiff.org/?module=Applications&submenu=& category=all

Forward and Reverse Mode

AD methods : SimpleExample

SimpleExample

Unify all the variable..

Forward method

Forward methodDifferentiate the Code:

ui = xi i = 1, ...n,

ui = Φ({uj}j<i) i = n + 1, ..., N

Differentiate:∇ui = ei i = 1, ..., n

∇ui =∑j<i

ci,j ∗ ∇uj i = n + 1, ..., N

Reverse method

Reverse methodCompute the Adjoint of the Code

uj =∂y∂uj

=∂(y1, y2, ..ym)

Compute for dependent variables

un+p+j =∂(y1, y2, ..ym)

∂uj= ej j = 1, ..., m

Compute for intermediates and independents uj , j = n + p, ..., 1

uj =∂y∂uj

=∑i>j

uici,j

Forward methods

Forward methodMethod : Compute the gradient of each variable, and usethe chain rule to pass the gradientThe size of computed object: In each computation, itcomputes the vectors with input size n.The computation of gradient of each variable proceedswith the computation of each variableEasily implement

Forward methods

Computing Variable Value Computing Gradient Value

Reverse methods

Reverse methodMethod : Compute Adjoint of each variable, pass theAdjointThe size of computed object: In each computation, itcomputes the vectors with output size m. (Note,usually theoutput size is 1 in optimization application.)The computation of Adjoint of each variable proceed afterthe completion of the computation of all variables.

Reverse methods

Reverse methodTraverse through the Computational Graph reversely andget the parents of each variable so as to compute theAdjoint.Obtain the gradient by compute each partial deriviate oneby oneHarder to implement

Reverse methods

Computing Variable Value Computing Adjoint Value

Reverse methods

Implementation of Reverse mode

Implementation of Reverse modeAs mentioned above, the implementation in Forward modeis relatively straightforward. We only propose thecomparison of important feature between SourceTransformation and Operator Overloading:Using Source Transformation: Re-ordering the code upsidedownUsing Operator Overloading: Record computation on a"tape"

Reverse methods

Re-ordering the code upside down:

Reverse methods

Record computation on a "tape"Record:Operation,operandsRelated technique: CheckpointingIf the number of operations going large, Checkpointingprevent the program from exhausting all the memory

Comparison

The following topic is discussed in the comparisonbetween Forward mode and backward modeComputational ComplexityMemory RequiredTime to develop

Comparison

Cost of Forward Propagation of Derivs.

Define{

N|c|=1 : No. of unit local derivatives ci,j = ±1N|c|6=1 : No. of nonunit local derivatives ci,j 6= 0, ±1

Solve for derivatives in forward order 5un+1,5un+2, . . . ,5uN

5ui =∑j≺i

ci,j ∗ 5uj , i = n + 1, . . . , N,

with each 5ui = (∂ui/∂x1, . . . , ∂ui/∂xn), a length n vector.Flop count flops(fwd) given by,

flops(fwd) = nN|c|6=1 (mults.ci,j ∗ 5uj , ci,j 6= 1, 0)+n(N|c|6=1 + N|c|=1) (adds./subs. + ci,j 5 uj)−n(p + m) (first n adds./subs.)

flops(fwd) = n(2N|c|6=1 + N|c|=1 − p −m)

Comparison

Cost of Reverse Propagation of Adjoints

Solve for adjoints in reverse order un+p, un+p−1, . . . , u1

uj =∑i�j

uici,j .

with uj = ∂∂uj

(y1, y2, . . . , ym) is a length m vector.

Flop count flops(rev) given by,

flops(rev) = mN|c|6=1 (mults.ui ∗ ci,j , ci,j 6= ±1, 0)= +m(N|c|=1 + N|c|6=1) (adds./subs. + (ui ∗ ci,j))

flops(rev) = m(2N|c|6=1 + N|c|=1).

Comparison

Memory Required

Used Storage:It’s uncertain that which mode takes more memory,usually, reverse mode takes more.The cost of memory for Forward mode is from:Storing size (1) in each variableStoring input size n in each gradient variableThe cost of memory for Reverse mode is from:Storing size (1) in each variableStoring output size m in each Adjoint variableStoring DAG(directed acyclic graph,which present thefunction)

Comparison

Memory Required

It’s more likely to have less memory used while usingforward mode:1.If there exists reused variable in original function2.If n is so large that Reverse requires lots of memory tostore DAG.It’s more likely to have less memory used while usingreverse mode:1.If n is relatively large, so the storage required for storinggradient is more than storing Adjoint

Comparison

Time to develop

Time to develop: Usually, it’s hard to develop Reversecode than Forward one, especially using SourceTransformation technique.

Comparison

Time to develop

Conclusion:Using Forward mode when n � m, such as optimizationUsing Reverse mode when m � n, such as SensitivityAnalysis

Extended knowledge

Directional DerivativesForward mode:seed d = (d1, ...dn)Tseeding ∇xi = dicalculates Jf ∗ dMulti-directional derivatives : replace d by D,whereD = [dij ]i=1,..n,j=1,..q

Extended knowledge

Directional AdjointsReverse mode:seed v = (v1, ...vm)seeding y j = vjcalculates v ∗ JfMulti-directional Adjoint : replace v by V,whereV = [vij ]i=1,..q,j=1,..m

Case Study

Using FADBAD++:FADBAD++ were developed by Ole Stauning and ClausBendtsen.Flexible automatic differentiation using templates andoperator overloading in ANSI C++Only with source code, no additional library required.Free to use

Case Study

Using FADBAD++:Test function : f (x) =

Objective: Testing different coding of the function inForward mode, try to reuse the variableResult : Basically, no matter how you code,the memorycost as much as n ∗ n ∗ 8byte , no different between reusevariable or not

Case Study

Using FADBAD++:Test function : f (x) =

Objective: Testing Reverse modeResult : test until n = 6500 , Using Forward mode out ofmemory. Reverse is 127 times faster, and only take fewMB.Remark : Couldn’t see how the DAG take the memoryfrom using reverse mode, it’s more likely to observe byusing fewer independent variables but more complicatedfunction.

Complexity Analysis

Code List

Code-List given by re-writing the code into elemental binaryand unary operations/functions, e.g.[

[log2(x1x2) + x2x2

3 − a− x2√b · log(x1x2) + x2/x3 − x2x2

]v1 = x1 v7 = v6 ∗ v2 v13 = v8 − v2v2 = x2 v8 = v7 − a v14 = v2

5v3 = x3 v9 = 1/v3 v15 =

√v12

v4 = v1 ∗ v2 v10 = v2 ∗ v9 v16 = v14 + v13v5 = log(v4) v11 = b ∗ v5 v17 = v15 − v8v6 = v2

3 v12 = v11 + v10

Complexity Analysis

Code-list (ctd.)

Assume code-list containsN± addition/substractions e.g v14 + v13N∗ multiplications e.g. v1 ∗ v2Nf nonlinear functions/operations e.g. log(v4), 1/v3Total of p + m = N± + N∗ + Nf statements

ThenEach addition/subtraction generates two ci,j = ±1Each multiplication generates two ci,j 6= ±1, 0Each nonlinear function generates one ci,j 6= 1, 0 requiringone nonlinear function evaluation e.g. v5 = log(v4) givesc5,4 = 1/v4.

So we have,N|c|=1 = 2N±N|c|6=1 = 2N∗ + 1Nf

Complexity Analysis

Forward Mode Complexity

Complexity of Forward Mode

flops(Jf ) = flops(f ) + flops(ci,j) + flops(fwd)

Assume flops(nonlinear function) = w , w > 1.

Cost of evaluation function is,

flops(f ) = N∗ + N± + wNf

Cost of evaluation local derivatives ci,j is,

flops(ci,j) = wNf .

Cost of forward propagation of derivatives is

flops(fwd) = n(2N|c|6=1 + N|c|=1 − p −m)= n(3N∗ + N± + Nf )

Complexity Analysis

Forward Mode Complexity

Complexity of Forward Mode (Ctd.)

Then for forward mode

flops(Jf )flops(f ) = 1 + wNf +n(3N∗+N±+Nf )

N∗+N±+wNf

= 1 + 3nN∗ + nN± + n( 1w + 1

n )wN f

where,

(N∗, N±, wN f ) =(N∗, N±, wNf )

N∗ + N± + wNf.

SinceN∗ + N± + wN f = 1 and all coefficients positive,

flops(Jf )flops(f )

≤ 1 + n ∗max(3, 1, (1w

)) = 1 + 3n.

n << m, Forward Mode preferred.

Complexity Analysis

Reverse Mode Complexity

Complexity of Reverse Mode

flops(rev) = m(4N∗ + 2N± + 2Nf ),

giving,

flops(Jf )flops(f ) = 1 + 4mN∗ + 2mN± + m( 2

w + 1m )wN f

flops(Jf )flops(f )

≤ 1 + m ∗max(4, 2, (2w

)) = 1 + 4m

For m = 1flops(5f ) ≤ 5flops(f )

AD Softwares

AD tools in MATLAB

Differentiation Arithmetic

−→u = (u, u′),

where u denotes the value of the function u: R → R evaluatedat the point x0, and where u′ denotes the value u′(x0).

−→u +−→v = (u + v , u′ + v ′)

−→u −−→v = (u − v , u′ − v ′)−→u ×−→v = (uv , uv ′ + u′v)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v)

−→x = (x , 1)−→c = (c, 0)

Ref:http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdf

AD Softwares

AD tools in MATLAB

−→u = (u, u′),

−→u +−→v = (u + v , u′ + v ′)

−→x = (x , 1)−→c = (c, 0)

AD Softwares

AD tools in MATLAB

−→u = (u, u′),

−→u +−→v = (u + v , u′ + v ′)

−→x = (x , 1)−→c = (c, 0)

AD Softwares

AD tools in MATLAB

Example of a Rational Function

f (x) = (x+1)(x−2)x+3

f (3) = 2/3, f ′(3) =?

−→f (−→x ) =

(−→x +

−→1 )(

−→x −−→2 )

(−→x +

−→3 )

=((x , 1) + (1, 0))× ((x , 1)− (2, 0))

((x , 1) + (3, 0))

Inserting the value −→x = (3, 1) into−→f produces

−→f (3, 1) =

((3, 1) + (1, 0))× ((3, 1)− (2, 0))

((3, 1) + (3, 0))

=(4, 1)× (1, 1)

(6, 1)

=(4, 5)

(6, 1)=

AD Softwares

AD tools in MATLAB

f (x) = (x+1)(x−2)x+3

f (3) = 2/3, f ′(3) =?

−→f (−→x ) =

(−→x +

−→1 )(

−→x −−→2 )

(−→x +

−→3 )

=((x , 1) + (1, 0))× ((x , 1)− (2, 0))

((x , 1) + (3, 0))

−→f (3, 1) =

((3, 1) + (1, 0))× ((3, 1)− (2, 0))

((3, 1) + (3, 0))

=(4, 1)× (1, 1)

(6, 1)

=(4, 5)

(6, 1)=

AD Softwares

AD tools in MATLAB

f (x) = (x+1)(x−2)x+3

f (3) = 2/3, f ′(3) =?

−→f (−→x ) =

(−→x +

−→1 )(

−→x −−→2 )

(−→x +

−→3 )

=((x , 1) + (1, 0))× ((x , 1)− (2, 0))

((x , 1) + (3, 0))

−→f (3, 1) =

((3, 1) + (1, 0))× ((3, 1)− (2, 0))

((3, 1) + (3, 0))

=(4, 1)× (1, 1)

(6, 1)

=(4, 5)

(6, 1)=

AD Softwares

AD tools in MATLAB

Derivatives of Element Functions

Chain Rule:

(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)

−→g (−→u ) =

−→g ((u, u′)) = (g(u), u′g′(u))

sin−→u = sin(u, u′) = (sin u, u′ cos u)

cos−→u = cos(u, u′) = (cos u,−u′ sin u)

e−→u = e(u,u′) = (eu, u′eu)

AD Softwares

AD tools in MATLAB

Chain Rule:

(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)

−→g (−→u ) =

−→g ((u, u′)) = (g(u), u′g′(u))

e−→u = e(u,u′) = (eu, u′eu)

AD Softwares

AD tools in MATLAB

Chain Rule:

(g ◦ u)′(x) = u′(x)(g′ ◦ u)(x)

−→g (−→u ) =

−→g ((u, u′)) = (g(u), u′g′(u))

e−→u = e(u,u′) = (eu, u′eu)

AD Softwares

AD tools in MATLAB

Example of Sin

From ../Intlab/gradient/@gradient/sin.m

AD Softwares

AD tools in MATLAB

Example for Element Functions

Evaluate the derivative at x=0.

f (x) = (1 + x + ex) sin x−→f (−→x ) = (

−→1 +

−→x + e−→x )sin−→x

−→f (0, 1) =

((1, 0) + (0, 1) + e(0,1)

)sin(0, 1)

=((1, 1) + (e0, e0)

)(sin 0, cos 0)

= (2, 2)(0, 1) = (0, 2).

AD Softwares

AD tools in MATLAB

f (x) = (1 + x + ex) sin x−→f (−→x ) = (

−→1 +

−→x + e−→x )sin−→x

−→f (0, 1) =

((1, 0) + (0, 1) + e(0,1)

)sin(0, 1)

=((1, 1) + (e0, e0)

)(sin 0, cos 0)

= (2, 2)(0, 1) = (0, 2).

AD Softwares

AD tools in MATLAB

f (x) = (1 + x + ex) sin x−→f (−→x ) = (

−→1 +

−→x + e−→x )sin−→x

−→f (0, 1) =

((1, 0) + (0, 1) + e(0,1)

)sin(0, 1)

=((1, 1) + (e0, e0)

)(sin 0, cos 0)

= (2, 2)(0, 1) = (0, 2).

AD Softwares

AD tools in MATLAB

High-order Derivatives

−→u = (u, u′, u′′),

−→u +−→v = (u + v , u′ + v ′, u′′ + v ′′)

−→u −−→v = (u − v , u′ − v ′, u′′ − v ′′)−→u ×−→v = (uv , uv ′ + u′v , uv ′′ + 2u′v ′ + u′′v ′)−→u ÷−→v = (u/v , u′ − (u/v)v ′/v , (u′′ − 2(u/v)′v ′ − (u/v)v ′′)/v)

· · · · · ·

AD Softwares

AD tools in MATLAB

INTLab

Developers: Institute for Reliable Computing, HamburgUniversity of Technology

Mode: ForwardMethod: Operator overloading

Language: MATLABURL: http://www.ti3.tu-harburg.de/rump/intlab/

Licensing: Open Source

AD Softwares

AD tools in MATLAB

Rosenbrock Function

y1 = 400x1(x21 − x2) + 2(x1 − 1)

y2 = 200(x21 − x2)

AD Softwares

AD tools in MATLAB

One Step of Newton Method with INTLab

AD Softwares

AD tools in MATLAB

TOMLAB/MAD

Developers: Marcus M. Edvall and Kenneth Holmstrom,Tomlab Optimization Inc. (TOMLAB /MADintegration)Shaun A. Forth and Robert Ketzscher, CranfieldUniversity (MAD)

Mode: ForwardMethod: Operator overloading

Language: MATLABURL: http://tomlab.biz/products/mad/

Licensing: License

AD Softwares

AD tools in MATLAB

One Step of Newton Method with MAD

AD Softwares

AD tools in MATLAB

ADiMat

Developers: Andre Vehreschild, Institute for ScientificComputing, RWTH Aachen University

Mode: ForwardMethod: Source transformation

Operator overloadingLanguage: MATLAB

URL: http://www.sc.rwth-aachen.de/vehreschild/adimat.html

Licensing: under discussion

AD Softwares

AD tools in MATLAB

ADiMat’s Example

function [result1, result2]= f(x)% Compute the sin and square-root of x*2.% Very simple example for ADiMat website.% Andre Vehreschild, Institute for% Scientific Computing,% RWTH Aachen University, D-52056 Aachen,% Germany.% vehreschild@sc.rwth-aachen.de

result1= sin(x);result2= sqrt(x*2);

Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

AD Softwares

AD tools in MATLAB

ADiMat’s Example (cont.)

>> addiff(@f, ’x’, ’result1,result2’);>> p=magic(5);>> g_p=createFullGradients(p);>> [g_r1, r1, g_r2, r2]= g_f(g_p, p);>> J1= [g_r1{:}]; % and>> J2= [g_r2{:}];

Source: http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

AD Softwares

AD tools in MATLAB

ADiMat’s Example (cont.)

function [g_result1, result1, g_result2, result2] = g_f(g_x, x)% Compute the sin and square-root of x*2.% Very simple example for ADiMat website.% Andre Vehreschild, Institute for Scientific Computing,% RWTH Aachen University, D-52056 Aachen, Germany.% vehreschild@sc.rwth-aachen.de

g_result1= ((g_x).* cos(x));result1= sin(x);g_tmp_f_00000= g_x* 2;tmp_f_00000= x* 2;g_result2= ((g_tmp_f_00000)./ (2.*sqrt(tmp_f_00000)));result2= sqrt(tmp_f_00000);

Source:http://www.sc.rwth-aachen.de/vehreschild/adimat/example1.html

AD Softwares

AD tools in MATLAB

Matrix Calculus

Definition: If X is p × q and Y is m × n, then dY: = dY/dX dX:where the derivative dY/dX is a large mn × pq matrix.

d(X 2) : = (XdX + dXX ) :

d(det(X )) = d(det(X T )) = det(X )(X−T ) :T dX :

d(ln(det(X ))) = (X−T ) :T dX :

Ref: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html

AD Softwares

AD tools in MATLAB

Vandermonde Function

Source: Shaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation in

MATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195-222

AD Softwares

AD tools in MATLAB

Vandermonde Function (cont.)

Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5

AD Softwares

AD tools in MATLAB

Vandermonde Function (cont.)

Method 10 20 40 80 160 320 640 1280Function 0.000 0.000 0.000 0.000 0.000 0.010 0.000 0.000

MAD(Full) 0.070 0.060 0.070 0.130 0.581 2.664 10.535 45.535MAD(Sparse) 0.071 0.050 0.060 0.060 0.060 0.070 0.100 0.881

INTLab 0.050 0.040 0.040 0.090 0.040 0.050 0.071 0.120ADiMat 0.231 0.140 0.271 0.601 1.362 3.044 7.340 21.611

Unit of CPU time is second. Experiment on a PIII1000Hz PC (Windows 2000), Matlab Version: 7.0.1.24704 (R14)

Service Pack 1, TOMLAB v5.6, INTLAB Version 5.3, ADiMat (beta) 0.4-r9.

AD Softwares

AD tools in MATLAB

Arrowhead Function

AD Softwares

AD tools in MATLAB

Arrowhead Function (cont.)

Experiment on a PIV 3.0Ghz PC (Windows XP), Matlab Version: 6.5

AD Softwares

AD tools in MATLAB

Arrowhead Function (cont.)

Method 20 40 80 160 320 640 1280Function 0.010 0.000 0.000 0.000 0.000 0.000 0.000

MAD(Full) 0.180 0.050 0.070 0.200 1.111 4.367 17.796MAD(Sparse) 0.060 0.060 0.060 0.070 0.080 0.100 0.160

INTLab 0.090 0.051 0.050 0.050 0.081 0.140 0.340ADiMat 0.911 0.311 0.651 1.262 2.704 6.028 14.581

AD Softwares

AD tools in MATLAB

BDQRTIC mod

AD Softwares

AD tools in MATLAB

BDQRTIC mod (cont.)

Method 20 40 80 160 320 640 1280Function 12.809 0.010 0.000 0.000 0.000 0.010 0.000

MAD(Full) 2.604 0.121 0.150 0.490 2.513 10.926 43.162MAD(Sparse) 0.270 0.120 0.130 0.150 0.201 0.260 0.371

INTLab 2.293 0.080 0.100 0.110 0.150 0.230 0.481ADiMat 3.455 0.621 1.152 2.544 5.778 14.641 42.671

AD Softwares

AD tools in MATLAB

Summary of AD softwares in MATLab

Operator overloading method for AD forward mode is easyto implement by differentiation arithmetic.All of AD tools in Matlab are easy to use.Sparse storage provides a good way to improve theperformance of AD tools.

AD Softwares

AD in C/C++ (ADIC)

The Computational Differentiation Group atArgonne National Laboratory

ADIC introduced in 1997 by:

Chrirtian BischofScientific Computing at

RWTH Aachen University

Lucas Rohfounder, president andCEO of Hostway Co.

and the other team mem-bers.

AD Softwares

AD in C/C++ (ADIC)

State of ADIS

ADIC is an Automatic Differentiation tools In ANSI C/C++.

ADIC was introduced in 1966.

Last updated: June 10, 2005.

Official web site www-new.mcs.anl.gov/adic/down-2.htm.

ADIC is using forward method.

Supported Platforms: Unix/Linux.

Selected Application: NEOS

Related Research Group: Argonne National Laboratory,USA

AD Softwares

AD in C/C++ (ADIC)

ADICAnatomy

AD Softwares

AD in C/C++ (ADIC)

ADICProcess

AD Softwares

AD in C/C++ (ADIC)

func.cUntitled

#include "func.h"#include <math.h>

void func(data_t * pdata){ int i; double *x = pdata->x; double *y = pdata->y; double s, temp;

i=0; for (;i < pdata->len ;){ s = s + x[i]*y[i]; i++; }

temp = exp(s);

pdata->r = temp;}

AD Softwares

AD in C/C++ (ADIC)

driver.c

AD Softwares

AD in C/C++ (ADIC)

Commands

The first command generates the header file ad_deriv.hand derivative function func.ad.c;

The second command compiles and links all neededfunctions and generates ad_func;

AD Softwares

AD in C/C++ (ADIC)

Handling Side Effects

AD Softwares

AD in C/C++ (ADIC)

AD Softwares

AD in C/C++ (ADIC)

AD Softwares

AD in C/C++ (ADIC)

AD Softwares

AD in C/C++ (ADIC)

For Further Reading in ADIC

Christian H. Bischof, Paul D. Hovland, Boyana NorrisImplementation of Automatic Differentiation Tools.PEPM Š02, Jan. 1415, 2002 Portland, OR, USA

Paul D. Hovlan and Boyana NorrisUsers’ Guide to ADIC 1.1.UsersŠ Guide to ADIC 1.1

C. H. Bischof, L. Roh, A. J. Mauer-OatsADIC: an extensible automatic differentiation tool forANSI-C.Mathematics and Computer Science Division, ArgonneNational Laboratory, Argonne, IL 60439, USA

Reference

ReferenceC.H. Bischof and H. M. Bucker. Computing Derivatives of Computer Programs, in Modern Methods andAlgorithms of Quantum Chemistry: Proceedings, Second Edition, edited by J. Grotendorst,NIC-Directors,2000, pages 315-327C. Bischof, A. Carle, P. Khademi, and G. Pusch. Automatic Differentiation: Obtaining Fast and ReliableDerivatives-Fast, in Control Problems in Industry, edited by I. Lasiecka and B. Morton,1995, pages 1-16Andreas Griewank. On Automatic Differentiation, in Mathematical Programming: Recent Developments andApplications, edited by M. Iri and K. Tanabe, Kluwer Academic Publishers, 1989.Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, Penn., 2000.Shaun Forth. Introduction to Automatic Differentiation, presentation slide for The 4th InternationalConference on Automatic Differentiation. July 19-23 University of Chicago, Gleacher Centre, Chicago USA,2004.G. F. Corliss, Automatic Differentiation.Warwick Tucker, http://www.math.uu.se/ warwick/vt07/FMB/avnm1.pdfhttp://www.autodiff.org/http://www.ti3.tu-harburg.de/rump/intlab/http://tomopt.com/tomlab/products/mad/http://www.sc.rwth-aachen.de/vehreschild/adimat/index.htmlShaun A. Forth An Efficient Overloaded Implementation of Forward Mode Automatic Differentiation inMATLAB ACM Transactions on Mathematical Software, Vol. 32,No.2, 2006, P195 − 222Siegfried M. Rump INTLAB −− INTerval LABoratory Developments in Reliable Computing, KluwerAcademic Publishers, 1999, p77 − 104Christian H. Bischof, H. Martin Bucker, Bruno Lang, A. Rasch, Andre Vehreschild Combining SourceTransformation and Operator Overloading Techniques to Compute Derivatives for MATLAB ProgramsConference proceeding, Proceedings of the Second IEEE International Workshop on Source Code Analysisand Manipulation (SCAM 2002), IEEE Computer Society, 2002

Thanks & Questions

Thanks!

Questions?

automatic differentiation - mcmaster universitycs777/presentations/ad.pdf · automatic...

Documents

li locksmith alarm co inc dba li automatic …...li...

a geometric theory of higher-order automatic...

introduction to ad-3.4, an automatic differentiation library...

automatic differentiation tutorial

mixed-language automatic differentiation

automatic differentiation of melanoma and clark-nevus skin...

optimization in r: algorithms, sequencing, and automatic...

automatic differentiation methods in computational...

crlb via automatic differentiation: despot2 jul 12, 2013...

automatic differentiation of rigid body dynamics for ... ·...

uf lecture 14 automatic differentiation

deep learning and automatic differentiation from theano to...

automatic differentiation in root arxiv:2004.04435v1 [cs

an overview of automatic differentiation and introduction

gaudi - gnu automatic differentiation euroad workshop - ralf...

auto diff: an automatic differentiation package for …

automatic differentiation of codes in nuclear · pdf...

crlb via automatic differentiation: despot2

pennylane: automatic differentiation of hybrid quantum...

auto: a framework for automatic differentiation in