![Page 1: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/1.jpg)
Automatic Differentiation of programs
and its applications to Scientific Computing
Laurent [email protected]://www-sop.inria.fr/tropics
Manchester, april 7th, 2005
Laurent Hascoet () AD Manchester, april 7th , 2005 1 / 67
![Page 2: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 2 / 67
![Page 3: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/3.jpg)
So you need derivatives ?...
Given a program P computing a function F
F : IRm → IRn
X 7→ Y
we want to build a program that computes the derivativesof F .
Specifically, we want the derivatives of the dependent,i.e. some variables in Y ,with respect to the independent, i.e. some variables in X .
Laurent Hascoet () AD Manchester, april 7th , 2005 3 / 67
![Page 4: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/4.jpg)
Which derivatives do you want?
Derivatives come in various shapes and flavors:
Jacobian Matrices: J =(
∂yj
∂xi
)Directional or tangent derivatives, differentials:dY = Y = J × dX = J × XGradients:
When n = 1 output : gradient = J =(
∂y∂xi
)When n > 1 outputs: gradient = Y
t × J
Higher-order derivative tensors
Taylor coefficients
Intervals ?
Laurent Hascoet () AD Manchester, april 7th , 2005 4 / 67
![Page 5: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/5.jpg)
Divided Differences
Given X , run P twice, and compute Y
Y =P(X + εX )− P(X )
ε
Pros: immediate; no thinking required !
Cons: approximation; what ε ?⇒ Not so cheap after all !
Optimization wants inexpensive and accurate derivatives.⇒ Let’s go for exact, analytic derivatives !
Laurent Hascoet () AD Manchester, april 7th , 2005 5 / 67
![Page 6: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/6.jpg)
AD Example: analytic tangent differentiation
by Program transformation
SUBROUTINE FOO(v1, v2, v4, p1)
REAL v1,v2,v3,v4,p1
v3 = 2.0*v1 + 5.0
v4 = v3 + p1*v2/v3
END
v3d = 2.0*v1d
v4d = v3d + p1*(v2d*v3-v2*v3d)/(v3*v3)
REAL v1d,v2d,v3d,v4d
v1d, v2d, v4d,•
Just inserts “differentiated instructions” into FOO
Laurent Hascoet () AD Manchester, april 7th , 2005 6 / 67
![Page 7: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/7.jpg)
AD Example: analytic tangent differentiation
by Program transformation
SUBROUTINE FOO(v1, v2, v4, p1)
REAL v1,v2,v3,v4,p1
v3 = 2.0*v1 + 5.0
v4 = v3 + p1*v2/v3
END
v3d = 2.0*v1d
v4d = v3d + p1*(v2d*v3-v2*v3d)/(v3*v3)
REAL v1d,v2d,v3d,v4d
v1d, v2d, v4d,•
Just inserts “differentiated instructions” into FOO
Laurent Hascoet () AD Manchester, april 7th , 2005 6 / 67
![Page 8: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/8.jpg)
AD Example: analytic tangent differentiation
by Program transformation
SUBROUTINE FOO(v1, v2, v4, p1)
REAL v1,v2,v3,v4,p1
v3 = 2.0*v1 + 5.0
v4 = v3 + p1*v2/v3
END
v3d = 2.0*v1d
v4d = v3d + p1*(v2d*v3-v2*v3d)/(v3*v3)
REAL v1d,v2d,v3d,v4d
v1d, v2d, v4d,•
Just inserts “differentiated instructions” into FOOLaurent Hascoet () AD Manchester, april 7th , 2005 6 / 67
![Page 9: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/9.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 7 / 67
![Page 10: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/10.jpg)
Take control away!
We differentiate programs. But control ⇒ non-differentiability!
Freeze the current control:⇒ the program becomes a simple sequence of instructions⇒ AD differentiates these sequences:
Program
CodeList 1
CodeList 2
CodeList N
Diff(CodeList 1)
Diff(CodeList 2)
Diff(CodeList N)
Diff(Program)
Control 1:
Control N:
Control 1
Control N
Caution: the program is only piecewise differentiable !
Laurent Hascoet () AD Manchester, april 7th , 2005 8 / 67
![Page 11: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/11.jpg)
Computer Programs as Functions
Identify sequences of instructions
{I1; I2; . . . Ip−1; Ip; }
with composition of functions.
Each simple instruction
Ik : v4 = v3 + v2/v3
is a function fk : IRq → IRq whereThe output v4 is built from the input v2 and v3All other variable are passed unchanged
Thus we see P : {I1; I2; . . . Ip−1; Ip; } as
f = fp ◦ fp−1 ◦ · · · ◦ f1
Laurent Hascoet () AD Manchester, april 7th , 2005 9 / 67
![Page 12: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/12.jpg)
Using the Chain Rule
f = fp ◦ fp−1 ◦ · · · ◦ f1
We define for short:
W0 = X and Wk = fk(Wk−1)
The chain rule yields:
f ′(X ) = f ′p(Wp−1).f′p−1(Wp−2). . . . .f
′1(W0)
Laurent Hascoet () AD Manchester, april 7th , 2005 10 / 67
![Page 13: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/13.jpg)
Tangent mode and Reverse mode
Full J is expensive and often useless.We’d better compute useful projections of J.
tangent AD :
Y = f ′(X ).X = f ′p(Wp−1).f′p−1(Wp−2) . . . f ′1(W0).X
reverse AD :
X = f ′t(X ).Y = f ′t1 (W0). . . . f′tp−1(Wp−2).f
′tp (Wp−1).Y
Evaluate both from right to left:⇒ always matrix × vector
Theoretical cost is about 4 times the cost of P
Laurent Hascoet () AD Manchester, april 7th , 2005 11 / 67
![Page 14: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/14.jpg)
Costs of Tangent and Reverse AD
F : IRm → IRn
( )[]m inputs
n outputs
Gradient
Tangent
J costs m ∗ 4 ∗ P using the tangent modeGood if m <= nJ costs n ∗ 4 ∗ P using the reverse modeGood if m >> n (e.g n = 1 in optimization)Laurent Hascoet () AD Manchester, april 7th , 2005 12 / 67
![Page 15: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/15.jpg)
Back to the Tangent Mode example
v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3
Elementary Jacobian matrices:
f ′(X ) = ...
1
11
0 p1
v31− p1∗v2
v23
0
11
2 01
...
v3 = 2 ∗ v1
v4 = v3 ∗ (1− p1 ∗ v2/v23 ) + v2 ∗ p1/v3
Laurent Hascoet () AD Manchester, april 7th , 2005 13 / 67
![Page 16: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/16.jpg)
Tangent Mode example continued
Tangent AD keeps the structure of P :...
v3d = 2.0*v1dv3 = 2.0*v1 + 5.0v4d = v3d*(1-p1*v2/(v3*v3)) + v2d*p1/v3v4 = v3 + p1*v2/v3
...Differentiated instructions insertedinto P’s original control flow.
Laurent Hascoet () AD Manchester, april 7th , 2005 14 / 67
![Page 17: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/17.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 15 / 67
![Page 18: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/18.jpg)
Multi-directional mode and Jacobians
If you want Y = f ′(X ).X for the same X and several X
either run the tangent differentiated program severaltimes, evaluating f several times.
or run a “Multi-directional” tangent once, evaluatingf once.
Same for X = f ′t(X ).Y for several Y .
In particular, multi-directional tangent or reverse is goodto get the full Jacobian.
Laurent Hascoet () AD Manchester, april 7th , 2005 16 / 67
![Page 19: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/19.jpg)
Sparse Jacobians with seed matrices
When sparse Jacobian, use “seed matrices” to propagatefewer X or Y
Multi-directional tangent mode:a b
cd
e f g
×
111
1
=
a b
cd
e f g
Multi-directional reverse mode:
(1 1
1 1
)×
a b
cd
e f g
=
(a c be f d g
)
Laurent Hascoet () AD Manchester, april 7th , 2005 17 / 67
![Page 20: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/20.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 18 / 67
![Page 21: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/21.jpg)
Focus on the Reverse mode
X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y
Ip−1 ;W = Y ;W = f ′tp (Wp−1) * W ;
Ip−2 ;
Restore Wp−2 before Ip−2 ;W = f ′tp−1(Wp−2) * W ;
I1 ;...
...Restore W0 before I1 ;W = f ′t1 (W0) * W ;X = W ;
Instructions differentiated in the reverse order !
Laurent Hascoet () AD Manchester, april 7th , 2005 19 / 67
![Page 22: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/22.jpg)
Focus on the Reverse mode
X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y
Ip−1 ;W = Y ;W = f ′tp (Wp−1) * W ;
Ip−2 ;
Restore Wp−2 before Ip−2 ;W = f ′tp−1(Wp−2) * W ;
I1 ;...
...Restore W0 before I1 ;W = f ′t1 (W0) * W ;X = W ;
Instructions differentiated in the reverse order !
Laurent Hascoet () AD Manchester, april 7th , 2005 19 / 67
![Page 23: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/23.jpg)
Focus on the Reverse mode
X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y
Ip−1 ;W = Y ;W = f ′tp (Wp−1) * W ;
Ip−2 ;
Restore Wp−2 before Ip−2 ;W = f ′tp−1(Wp−2) * W ;
I1 ;...
...Restore W0 before I1 ;W = f ′t1 (W0) * W ;X = W ;
Instructions differentiated in the reverse order !Laurent Hascoet () AD Manchester, april 7th , 2005 19 / 67
![Page 24: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/24.jpg)
Reverse mode: graphical interpretation
time
I I I I I
II
II
1 2 3 p-2 p-1
pp-1
21
Bottleneck: memory usage (“Tape”).
Laurent Hascoet () AD Manchester, april 7th , 2005 20 / 67
![Page 25: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/25.jpg)
Back to the example
v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3
Transposed Jacobian matrices:
f ′t(X ) = ...
1 2
10
1
1 01 p1
v3
1 1− p1∗v2
v23
0
...
v 2 = v 2 + v 4 ∗ p1/v3
...v 1 = v 1 + 2 ∗ v 3
v 3 = 0Laurent Hascoet () AD Manchester, april 7th , 2005 21 / 67
![Page 26: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/26.jpg)
Reverse Mode example continued
Reverse AD inverses the structure of P :
...v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3
...
.........................../*restore previous state*/v2b = v2b + p1*v4b/v3v3b = v3b + (1-p1*v2/(v3*v3))*v4bv4b = 0.0......................../*restore previous state*/v1b = v1b + 2.0*v3bv3b = 0.0......................../*restore previous state*/
...
Differentiated instructions insertedinto the inverse of P’s original control flow.
Laurent Hascoet () AD Manchester, april 7th , 2005 22 / 67
![Page 27: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/27.jpg)
Control Flow Inversion : conditionals
The control flow of the forward sweepis mirrored in the backward sweep.
...
if (T(i).lt.0.0) then
T(i) = S(i)*T(i)
endif
...
if (...) then
Sb(i) = Sb(i) + T(i)*Tb(i)
Tb(i) = S(i)*Tb(i)
endif
...
Laurent Hascoet () AD Manchester, april 7th , 2005 23 / 67
![Page 28: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/28.jpg)
Control Flow Inversion : loops
Reversed loops run in the inverse order
...
Do i = 1,N
T(i) = 2.5*T(i-1) + 3.5
Enddo
...
Do i = N,1,-1
Tb(i-1) = Tb(i-1) + 2.5*Tb(i)
Tb(i) = 0.0
EnddoLaurent Hascoet () AD Manchester, april 7th , 2005 24 / 67
![Page 29: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/29.jpg)
Control Flow Inversion : spaghetti
Remember original Control Flow when it mergesB1t1
B2 B3
B4
B5
PUSH(0) PUSH(1)
PUSH(0) PUSH(1)
B5
B4
B2 B3
B1
POP(test)
POP(test)
Laurent Hascoet () AD Manchester, april 7th , 2005 25 / 67
![Page 30: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/30.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 26 / 67
![Page 31: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/31.jpg)
Yet another formalization using computation
graphs
A sequence of instructions corresponds to a computationgraph
DO i=1,n
IF (B(i).gt.0.0) THEN
r = A(i)*B(i) + y
X(i) = 3*r - B(i)*X(i-3)
y = SIN(X(i)*r)
ENDIF
ENDDO
y A(i) B(i) X(i-3)
*
+ *3
*
-
*
SIN
r y X(i)
Source program Computation Graph
Laurent Hascoet () AD Manchester, april 7th , 2005 27 / 67
![Page 32: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/32.jpg)
Jacobians by Vertex Elimination
1B(i)
A(i) X(i-3)B(i)
1
3
1-1
X(i)r
1
1COS(X(i)*r)
1
y A(i) B(i) X(i-3)
r y X(i)
COS(X(i)*r) *(X(i)*A(i) + r*(3*A(i) - X(i-3)))
y A(i) B(i) X(i-3)
r y X(i)
Jacobian Computation Graph Bipartite Jacobian Graph
Forward vertex elimination ⇒ tangent AD.
Reverse vertex elimination ⇒ reverse AD.
Other orders (“cross-country”) may be optimal.
Laurent Hascoet () AD Manchester, april 7th , 2005 28 / 67
![Page 33: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/33.jpg)
Yet another formalization: Lagrange multipliers
v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3
Can be viewed as constrains. We know that theLagrangian L(v1, v2, v3, v4, v3, v4) =v4 + v3.(−v3 + 2.v1 + 5) + v4.(−v4 + v3 + p1 ∗ v2/v3) issuch that:
v1 =∂v4
∂v1=
∂L∂v1
and v2 =∂v4
∂v2=
∂L∂v2
provided∂L∂v3
=∂L∂v4
=∂L∂v3
=∂L∂v4
= 0
Laurent Hascoet () AD Manchester, april 7th , 2005 29 / 67
![Page 34: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/34.jpg)
The vi are the Lagrange multipliers associated to theinstruction that sets vi .
For instance, equation ∂L∂v3
= 0 gives us:
v4.(1− p1.v2/(v3.v3))− v3 = 0
To be compared with instructionv3b = v3b + (1-p1*v2/(v3*v3))*v4b(initial v3b is set to 0.0)
Laurent Hascoet () AD Manchester, april 7th , 2005 30 / 67
![Page 35: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/35.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 31 / 67
![Page 36: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/36.jpg)
Applications to Optimization
From a simulation program P :
P :(design parameters)γ 7→ (cost function)J(γ)
it takes a gradient J ′(γ) to obtain an optimizationprogram.
Reverse mode AD builds program P that computes J ′(γ)
Optimization algorithms (Gradient descent, SQP, . . . )may also use 2nd derivatives. AD can provide them too.
Laurent Hascoet () AD Manchester, april 7th , 2005 32 / 67
![Page 37: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/37.jpg)
Special case: steady-state
If J is defined on a state W , and W results from animplicit steady state equation
Ψ(W , γ) = 0
which is solved iteratively: W0, W1, W2, ..., W∞
then pure reverse AD of P may prove too expensive(memory...)
Solutions exist:
reverse AD on the final steady state only.Andreas Griewank’s“Piggy-backing”reverse AD on Ψ alone + hand-codingLaurent Hascoet () AD Manchester, april 7th , 2005 33 / 67
![Page 38: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/38.jpg)
A color picture (at last !...)
AD-computed gradient of a scalar cost (sonic boom)with respect to skin geometry:
Laurent Hascoet () AD Manchester, april 7th , 2005 34 / 67
![Page 39: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/39.jpg)
... and after a few optimization steps
Improvement of the sonic boom under the plane after 8optimization cycles:
(Plane geometry provided by Dassault Aviation)
Laurent Hascoet () AD Manchester, april 7th , 2005 35 / 67
![Page 40: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/40.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 36 / 67
![Page 41: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/41.jpg)
Time/Memory tradeoffs for reverse AD
From the definition of the gradient X
X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y
we get the general shape of reverse AD program:
time
I I I I I
II
II
1 2 3 p-2 p-1
pp-1
21
⇒ How can we restore previous values?
Laurent Hascoet () AD Manchester, april 7th , 2005 37 / 67
![Page 42: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/42.jpg)
Restoration by recomputation
(RA: Recompute-All)
Restart execution from a stored initial state:
time
I I I I I
I
I
I
I
I
1 2 3 p-2 p-1
p
p-1
2
1
1
Memory use low, CPU use high ⇒ trade-off needed !
Laurent Hascoet () AD Manchester, april 7th , 2005 38 / 67
![Page 43: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/43.jpg)
Restoration by storage
(SA: Store-All)
Progressively undo the assignments made by the forwardsweep
time
I I I I I
IIIIII
1 2 3 p-2 p-1
pp-1p-2321
Memory use high, CPU use low ⇒ trade-off needed !
Laurent Hascoet () AD Manchester, april 7th , 2005 39 / 67
![Page 44: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/44.jpg)
Checkpointing (SA strategy)
On selected pieces of the program, possibly nested, don’tstore intermediate values and re-execute the piece whenvalues are required.
time
C{time
Memory and CPU grow like log(size(P))
Laurent Hascoet () AD Manchester, april 7th , 2005 40 / 67
![Page 45: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/45.jpg)
Checkpointing on calls (SA)
A classical choice: checkpoint procedure calls !
A
B
C
D
A A
B
C
D D D B B
C C C
x : original form of x
x : forward sweep for x
x : backward sweep for x
: take snapshot
: use snapshot
Memory and CPU grow like log(size(P)) when call treewell balanced.
Ill-balanced call trees require not checkpointing some calls
Careful analysis keeps the snapshots small.
Laurent Hascoet () AD Manchester, april 7th , 2005 41 / 67
![Page 46: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/46.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 42 / 67
![Page 47: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/47.jpg)
Activity analysis
Finds out the variables that, at some locationdo not depend on any independent,or have no dependent depending on them.
Derivative either null or useless ⇒ simplifications
orig. prog tangent mode w/activity analysis
c = a*b
a = 5.0
d = a*c
e = a/c
e=floor(e)
cd = a*bd + ad*bc = a*bad = 0.0a = 5.0dd = a*cd + ad*cd = a*ced=ad/c-a*cd/c**2e = a/ced = 0.0e = floor(e)
cd = a*bd + ad*bc = a*b
a = 5.0dd = a*cdd = a*c
e = a/ced = 0.0e = floor(e)
Laurent Hascoet () AD Manchester, april 7th , 2005 43 / 67
![Page 48: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/48.jpg)
Adjoint Liveness
The important result of the reverse mode is in X . The original resultY is of no interest.
The last instruction of the program P can be removed from P.
Recursively, other instructions of P can be removed too.
Laurent Hascoet () AD Manchester, april 7th , 2005 44 / 67
![Page 49: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/49.jpg)
orig. program reverse mode Adjoint Live code
IF(a.GT.0.)THEN
a = LOG(a)
ELSEa = LOG(c)CALL SUB(a)
ENDIFEND
IF(a.GT.0.)THENCALL PUSH(a)a = LOG(a)CALL POP(a)ab = ab/aELSEa = LOG(c)CALL PUSH(a)CALL SUB(a)CALL POP(a)CALL SUB_B(a,ab)cb = cb + ab/cab = 0.0END IF
IF (a.GT.0.) THEN
ab = ab/aELSEa = LOG(c)
CALL SUB_B(a,ab)cb = cb + ab/cab = 0.0END IF
Laurent Hascoet () AD Manchester, april 7th , 2005 45 / 67
![Page 50: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/50.jpg)
To Be Restored” analysis
In reverse AD, not all values must be restored during the backwardsweep.
Variables occurring only in linear expressions do not appear in thedifferentiated instructions.⇒ not To Be Restored.
Laurent Hascoet () AD Manchester, april 7th , 2005 46 / 67
![Page 51: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/51.jpg)
x = x + EXP(a)y = x + a**2a = 3*z
reverse mode: reverse mode:naive backward sweep backward sweep with TBR
CALL POP(a)zb = zb + 3*abab = 0.0CALL POP(y)ab = ab + 2*a*ybxb = xb + ybyb = 0.0CALL POP(x)ab = ab + EXP(a)*xb
CALL POP(a)zb = zb + 3*abab = 0.0ab = ab + 2*a*ybxb = xb + ybyb = 0.0ab = ab + EXP(a)*xb
Laurent Hascoet () AD Manchester, april 7th , 2005 47 / 67
![Page 52: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/52.jpg)
Aliasing
In reverse AD, it is important to know whether two variables in aninstruction are the same.
a[i] = 3*a[i+1] a[i] = 3*a[i] a[i] = 3*a[j]
variablescertainlydifferent
variablescertainly equal
? ⇒tmp = 3*a[j]
a[i] = tmp
ab[i+1]= ab[i+1]+ 3*ab[i]
ab[i] = 0.0
ab[i] = 3* ab[i] tmpb = ab[i]ab[i] = 0.0ab[j] = ab[j]
+ 3*tmpb
Laurent Hascoet () AD Manchester, april 7th , 2005 48 / 67
![Page 53: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/53.jpg)
Snapshots
Taking small snapshots saves a lot of memory:
time
C{ D{Snapshot(C) = Use(C) ∩ (Write(C) ∪Write(D))
Laurent Hascoet () AD Manchester, april 7th , 2005 49 / 67
![Page 54: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/54.jpg)
Undecidability
Analyses are static: operate on source, don’t knowrun-time data.
Undecidability: no static analysis can answer yes orno for every possible program : there will always beprograms on which the analysis will answer “I can’ttell”
⇒ all tools must be ready to take conservativedecisions when the analysis is in doubt.
In practice, tool “laziness” is a far more commoncause for undecided analyses and conservativetransformations.
Laurent Hascoet () AD Manchester, april 7th , 2005 50 / 67
![Page 55: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/55.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 51 / 67
![Page 56: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/56.jpg)
Tools for source-transformation AD
AD tools are based on overloadingor on source transformation.
Source transformation requires complex tools, but offersmore room for optimization.
parsing →analysis →differentiationf77 type-checking tangentf9x use/kill reversec dependencies multi-directionalmatlab activity . . .. . . . . .
Laurent Hascoet () AD Manchester, april 7th , 2005 52 / 67
![Page 57: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/57.jpg)
Some AD tools
nagware f95 Compiler: Overloading, tangent,reverse
adol-c : Overloading+Tape; tangent, reverse,higher-order
adifor : Regeneration ; tangent, reverse?, Store-All+ Checkpointing
tapenade : Regeneration ; tangent, reverse,Store-All + Checkpointing
taf : Regeneration ; tangent, reverse, Recompute-All+ Checkpointing
Laurent Hascoet () AD Manchester, april 7th , 2005 53 / 67
![Page 58: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/58.jpg)
Some Limitations of AD tools
Fundamental problems:
Piecewise differentiability
Convergence of derivatives
Reverse AD of large codes
Technical Difficulties:
Pointers and memory allocation
Objects
Inversion or Duplication of random control(communications, random,...)
Laurent Hascoet () AD Manchester, april 7th , 2005 54 / 67
![Page 59: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/59.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 55 / 67
![Page 60: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/60.jpg)
Validation methods
From a program P that evaluates
F : IRm → IRn
X 7→ Y
tangent AD creates
P : X , X 7→ Y , Y
and reverse AD creates
P : X , Y 7→ X
Wow can we validate these programs ?
Tangent wrt Divided Differences
Reverse wrt TangentLaurent Hascoet () AD Manchester, april 7th , 2005 56 / 67
![Page 61: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/61.jpg)
Validation of Tangent wrt Divided Differences
For a given X , set g(h ∈ IR) = F (X + h.Xd):
g ′(0) = limε→0
F (X + ε×X )− F (X )
ε
Also, from the chain rule:
g ′(0) = F ′(X )× X = Y
So we can approximate Y by running P twice, at points Xand X + ε× X
Laurent Hascoet () AD Manchester, april 7th , 2005 57 / 67
![Page 62: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/62.jpg)
Validation of Reverse wrt Tangent
For a given X , tangent code returned Y
Initialize Y = Y and run the reverse code, yielding X .We have :
(X · X ) = (F ′t(X )× Y · X )
= Y t × F ′(X )× X
= Y t × Y
= (Y · Y )
Often called the “dot-product test”
Laurent Hascoet () AD Manchester, april 7th , 2005 58 / 67
![Page 63: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/63.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 59 / 67
![Page 64: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/64.jpg)
Black-box routines
If the tool permits, give dependency signature (sparsitypattern) of all external procedures ⇒ better activityanalysis ⇒ better diff program.
FOO: ( )Inputs
Outputs
Id
Id
After AD, provide required hand-coded derivative (FOO Dor FOO B)
Laurent Hascoet () AD Manchester, april 7th , 2005 60 / 67
![Page 65: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/65.jpg)
Linear or auto-adjoint procedures
Make linear or auto-adjoint procedures “black-box”.
Since they are their own tangent or reverse derivatives,provide their original form as hand-coded derivative.
In many cases, this is more efficient than pure AD ofthese procedures
Laurent Hascoet () AD Manchester, april 7th , 2005 61 / 67
![Page 66: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/66.jpg)
Independent loops
If a loop has independent iterations, possibly terminatedby a sum-reduction, then
Standard: Improved:
doi = 1,N
body(i)end
doi = N,1←−−−−body(i)
end
⇐⇒doi = 1,N
body(i)←−−−−body(i)
end
In the Recompute-All context, this reduces the memoryconsumption by a factor N
Laurent Hascoet () AD Manchester, april 7th , 2005 62 / 67
![Page 67: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/67.jpg)
Outline
1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 63 / 67
![Page 68: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/68.jpg)
AD: Context
DERIVATIVES
Div. Diff Analytic Diff
Maths AD
Overloading Source Transfo
Multi-dir Tangent Reverse
inaccuracy
programming
control
flexibility
efficiency
Laurent Hascoet () AD Manchester, april 7th , 2005 64 / 67
![Page 69: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/69.jpg)
AD: To Bring Home
If you want the derivatives of an implemented mathfunction, you should seriously consider AD.
Divided Differences aren’t good for you (nor forothers...)
Especially think of AD when you need higher order(taylor coefficients) for simulation or gradients(reverse mode) for optimization.
Reverse AD is a discrete equivalent of the adjointmethods from control theory: gives a gradient atremarkably low cost.
Laurent Hascoet () AD Manchester, april 7th , 2005 65 / 67
![Page 70: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/70.jpg)
AD tools: To Bring Home
AD tools provide you with highly optimized derivativeprograms in a matter of minutes.
AD tools are making progress steadily, but the bestAD will always require end-user intervention.
Laurent Hascoet () AD Manchester, april 7th , 2005 66 / 67
![Page 71: Automatic Differentiation of programs and its …...Multi-directional mode and Jacobians If you want Y˙ = f0(X).X˙ for the same X and several X˙ either run the tangent differentiated](https://reader034.vdocuments.us/reader034/viewer/2022042218/5ec49a0b062a77719b54fea7/html5/thumbnails/71.jpg)
Thank you for your attention !1 Introduction
2 Formalization
3 ......... Multi-directional
4 Reverse AD
5 ......... Alternative formalizations
6 Reverse AD for Optimization
7 Performance issues
8 ......... Static Analyses in AD tools
9 Some AD Tools
10 Validation
11 ......... Expert-level AD
12 Conclusion
Laurent Hascoet () AD Manchester, april 7th , 2005 67 / 67