second order differentiation

31
Yi Heng Second Order Differentiation Bommerholz – 14.08.2006 Summer School 2006

Upload: berget

Post on 12-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Second Order Differentiation. Yi Heng. Bommerholz – 14.08.2006 Summer School 2006. Outline. Background What are derivatives? Where do we need derivatives? How to compute derivatives? Basics of Automatic Differentiation Introduction Forward mode strategy Reverse mode strategy - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Second Order Differentiation

Yi Heng

Second Order Differentiation

Bommerholz – 14.08.2006

Summer School 2006

Page 2: Second Order Differentiation

2

Outline

BackgroundWhat are derivatives?Where do we need derivatives?How to compute derivatives?

Basics of Automatic DifferentiationIntroductionForward mode strategyReverse mode strategy

Second-Order Automatic Differentiation ModuleIntroductionForward mode strategyTaylor Series strategyHessian Performance

An Application in Optimal Control Problems

Summary

Page 3: Second Order Differentiation

3

Background

What are derivatives?

Jacobian Matrix:

1 1

1

1

The differential of : , ( ) is described by the Jacobian matrix

.

Tangents (directional de

m n

m

n n

m

f f

f f

x xf

J

f f

x x

x x

x

* *

rivatives):

Gradients :

d J d

b b J

Y X

X Y

Page 4: Second Order Differentiation

4

Background

What are derivatives?

Hessian Matrix:

2

, 1, ,

The second order partial derivatives of a function :

constitutes its Hessian matrix

( )

m

i j i j m

f

fH f

x x

Page 5: Second Order Differentiation

5

Background

• Linear approximation• Bending and Acceleration (Second derivatives)• Solve algebraic and differential equations• Curve fitting• Optimization Problems• Sensitivity analysis• Inverse Problem (data assimilation)• Parameter identification

Where do we need derivatives?

Page 6: Second Order Differentiation

6

Background

How to compute derivatives?

Computation work is expensive

For complicated functions, the representation of the final expression may be an unaffordable overhead

Symbolic differentiation

Divided differences

The approximation contains truncation error

Derivatives can be computed to machine precision

It is easy to implement (the definition of derivative is used)

Only original computer program is required (Formula not necessary)

Page 7: Second Order Differentiation

7

Background

How to compute derivatives?

Automatic differentiation

Computation work is cheaper

Only requires the original computer program

Machine precision approximation can be obtained

To be continued ...

Page 8: Second Order Differentiation

8

Basics of Automatic Differentiation

Introduction

Automatic differentiation ...

• Is also known as computational differentiation, algorithmic differentiation, and differentiation of algorithms;

• Is a systematic application of the familar rules of calculus to computer programs, yielding programs for the propagation of numerical values of first, second, or higher order derivatives;

• Traverses the code list (or computational graph) in the forward mode, the reverse mode, or a combination of the two;

• Typically is implemented by using either source code transformation or operator overloading;

• Is a process for evaluating derivatives which depends only on an algorithmic specification of the function to be differentiated.

Page 9: Second Order Differentiation

9

Basics of Automatic Differentiation

Introduction

Rules of arithmetic operations for gradient vector

, are scalar function with independent input variables.

( ) ,

( ) ,

( / ) ( ( / ) ) / , 0,

( ( )) '( ) ( ),

for differentiable functions (such as the standard functions)

wit

u v m

u v u v

uv u v v u

u v u u v v v v

u u u

h known derivatives.

Page 10: Second Order Differentiation

10

Basics of Automatic Differentiation

Forward mode & Reverse mode

An example

2 2 2

2 2 2 2 3 2

2 2 2 3 2 2

2 2

given function ( , , ) ( cos )( 2 3 ), the partial derivatives are

( 2 3 ) ( cos ) 2 3 2 3 2 cos ,

( 2 3 ) ( cos ) 4 6 3 4 cos ,

sin ( 2

f x y z xy z x y z

fy x y z xy z x x y y yz x z

xf

x x y z xy z y x xy xz y zy

fz x y

z

2

2 2 2

3 ) ( cos ) 6

sin 2 sin 3 sin 6 6 cos .

T

z xy z z

x z y z z z xyz z z

f f ff

x y z

Page 11: Second Order Differentiation

11

Basics of Automatic Differentiation

Forward mode & Reverse mode - Forward mode

Code list

1

2

3

4 1 2

5 3

6 4 5

27 1

28 2

29 3

10 7 8 9

11 6 10

,

,

,

,

cos ,

,

,

2 ,

3 ,

,

,

u x

u y

u z

u u u

u u

u u u

u u

u u

u u

u u u u

u u u

1

2

3

4 1 2 2 1 1 2 2 1

5 3 3 3

6 4 5 2 1 3

7

[1,0,0],

[0,1,0],

[0,0,1],

+ [0, ,0]+[ ,0,0]=[ , ,0],

( sin ) [0,0, sin ],

[ , , sin ],

u

u

u

u u u u u u u u u

u u u u

u u u u u u

u

1 1 1

8 2 2 2

9 3 3 3

10 7 8 9 1 2 3

11 6 10 10 6 6 1 10 2 6 2 10 1 6 3 10 3

2 [2 ,0,0],

4 [0,4 ,0],

6 [0,0,6 ],

[2 ,4 ,6 ],

[2 , 4 ,6 sin ].

u u u

u u u u

u u u u

u u u u u u u

u u u u u u u u u u u u u u u u u

Gradient entries+

2 3 211

2 3 2

2 2 2

( , , ) [3 2 cos 2 3 ,

6 4 cos 3 ,

6 6 cos sin 2 sin 3 sin ].

f x y z u x y x z y yz

xy y z x xz

xyz z z x z y z z z

Page 12: Second Order Differentiation

12

Basics of Automatic Differentiation

Forward mode & Reverse mode - Reverse mode

Code list

1

2

3

4 1 2

5 3

6 4 5

27 1

28 2

29 3

10 7 8 9

11 6 10

,

,

,

,

cos ,

,

,

2 ,

3 ,

,

,

u x

u y

u z

u u u

u u

u u u

u u

u u

u u

u u u u

u u u

1011 11 11 116 6

11 10 9 10 9

10 1011 11 11 116 6

8 10 8 7 10 7

6 611 11 11 11 1110 10 10

6 5 6 5 4 6 4

1, , ,

= , = ,

, ,

uu u u uu u

u u u u u

u uu u u uu u

u u u u u u

u uu u u u uu u u

u u u u u u u

9 511 11 116 3 10 3

3 9 3 5 3

811 11 4 1110 1 6 2

2 4 2 8 2

711 11 4 1110 2 6 1

1 4 1 7 1

,

6 sin ,

4 ,

+ = 2 .

u uu u uu u u u

u u u u u

uu u u uu u u u

u u u u u

uu u u uu u u u

u u u u u

Adjoints+

2 3 211 11 11

1 2 3

2 3 2

2

( , , ) [ , , ] [3 2 cos 2 3 ,

6 4 cos 3 ,

6 6 cos si

u u uf x y z x y x z y yz

u u u

xy y z x xz

xyz z z x

2 2n 2 sin 3 sin ].

z y z z z

Page 13: Second Order Differentiation

13

Second-Order AD Module

Introduction

Divided differences

1 1

1 1

( , , , , ) ( , , , , )Forward differentiation: ( )

( , , , , ) ( , , , , )Backward differentiation: ( )

Centered differentiation:

First order differentiation

m n m n

m

m n m n

m

m

f x x h x f x x xfO h

x h

f x x x f x x h xfO h

x h

f

x

21 1

221 1 1

2 2

( , , , , ) ( , , , , )( )

2

( , , , , ) 2 ( , , , , ) ( , , , , )( )

Second order differentiation

m n m n

m n m n m n

m

f x x h x f x x h xO h

h

f x x h x f x x x f x x h xfO h

x h

Page 14: Second Order Differentiation

14

Second-Order AD Module

Introduction

Rules of arithmetic operations for Hessian matrices

( ) ( ) ( ),

( ) ( ) ( ),

( / ) ( ( ) ( / ) ( / ) ( / ) ( )) / , 0,

( ( )) ''( ) '( ) ( ),

for twice differentiable functions such as the standard functions.

T T

T T

T

H u v H u H v

H uv uH v u v v u vH u

H u v H u u v v v u v u v H v v v

H u u u u u H u

Page 15: Second Order Differentiation

15

Second-Order AD Module

Forward Mode Stategy

An example

2 2

2 22

2

2 2

given function ( , ) 2 sin( ), the second order partial derivatives are

(2 cos( )) 2 sin( ), (2 cos( )) cos( ) sin( ),

(4 cos( )) cos( ) sin( ),

x y

x

f x y x y xy

f fx y xy y xy x y xy xy xy xy

x x y

f fy x xy xy xy xy

y x

2

2

2 2

2

2 2

2

(4 cos( )) 4 sin( ).

The Hessian matrix ( ) .

yy x xy x xyy

f f

x x yH f

f f

y x y

Page 16: Second Order Differentiation

16

Second-Order AD Module

Forward Mode Stategy

1

2

23 1

24 2

5 1 2

6 3 4 5

,

,

,

2 ,

sin( ),

,

u x

u y

u u

u u

u u u

u u u u

Code list

1

2

3 1 1 1

4 2 2 2

5 1 2 2 1 2 1 1 1 2 2

2 1 2 1 1 2

6 3 4 5

1 2 1 2 2 1 1 2

[1,0],

[0,1],

2 =[2 ,0],

=4 [0,4 ],

sin( ) cos ( ) cos ( )

[ cos ( ), cos ( )],

[2 cos ( ), 4 cos ( )

u

u

u u u u

u u u u

u u u u u u u u u u u

u u u u u u

u u u u

u u u u u u u u

].

Gradient entries+

Page 17: Second Order Differentiation

17

Second-Order AD Module

Forward Mode Stategy

...

1 2

23 1 1 1 1 1 1 1 1 1

24 2 2 2 2 2 2 2 2 2

5 1 2 1 2

0 0 0 0( ) , ( ) ,

0 0 0 0

2 0( ) ( )= ( ) ( ) ,

0 0

0 0( )= (2 )=2[ ( ) ( )] ,

0 4

( ) (sin( )) sin( ) (

T T

T T

H u H u

H u H u u H u u u u u u H u

H u H u u H u u u u u u H u

H u H u u u u

1 2 1 2 1 2 1 2

22 1 2

1 2 1 221 2 1

21 2 2 1 2 1 2 1 2

6 3 4 5

1 2 1 2 1 2

) ( ) cos( ) ( )

0 1 sin( ) cos( ) ,

1 0

2 sin( ) cos( ) sin( )( ) ( ) ( ) ( )

cos( ) sin(

Tu u u u u u H u u

u u uu u u u

u u u

u u u u u u u u uH u H u H u H u

u u u u u u

21 1 2

.) 4 sin( ) u u u

Hessian matrix entries+

Page 18: Second Order Differentiation

18

Second-Order AD Module

Forward Mode Stategy

Hessian Type Cost

H(f) O(n2)

H(f)*V O(nnv)

VTH(f)*V O(nv2)

VTH(f)*W O(nvnw)

H(f): n by n matrix

V: n by nv matrix

W: n by nw matrix

Page 19: Second Order Differentiation

19

Second-Order AD Module

Taylor Series Strategy

0

22 2

0 0 20 0

We consider as a scalar function ( ) of . Its Taylor series, up to second order, is

1 ( ) ( ) ,

2

where and are the first an

t ttt t

t tt

f f t t

f ff t f t t f f t f t

t t

f f

x u

x u x

0

2

2

d second order Taylor coefficients.

The uniquesness of the Taylor series implies that for , the th basis vector, we obtain

1 ,

2

i

t tti i

e i

f ff f

x x

x x x

u

0

.x

Page 20: Second Order Differentiation

20

Second-Order AD Module

Taylor Series Strategy

0 0

To compute the ( , ) off-diagonal entry in the Hessian, we set . The uniqueness

of Taylor expansion implies

,

i j

ti j

i j e e

f ff

x x

x x x x

u

0 0 0

2 2 2

2 2

1 1 .

2 2tti i j j

f f ff

x x x x

x x x x x x

Page 21: Second Order Differentiation

21

Second-Order AD Module

Hessian Performance

• Twice ADIFOR, first produces a gradient code with ADIFOR 2.0, and then runs the gradient code through ADIFOR again.

• Forward, implements the forward mode.

• Adaptive Forward, uses the forward mode, with preaccumulation at a statement level where deemed appropriate.

• Sparse Taylor Series, uses the Taylor series mode to compute the needed entries.

Page 22: Second Order Differentiation

22

An Application in OCPs

Problem Definition and Theoretical Analysis

0

0 0

Consider the following problem

f ( , , , ) 0 . [ , ] (1a)

with the following set of consistent and non-redundant initial conditions:

( ) ( )

n ft t t

t

x x u v

x x v

(1b)

where: , are the state (output) variables and their time derivatives respectively,

are control (input) variables, and are time-invariant parameters. Depending

on the imp

n

x x

u v

lementation, the control variables may be approximated by some types of

discretization which involves some (or all) of the parameters in set

( )

v

u u v (1c)

Page 23: Second Order Differentiation

23

An Application in OCPs

The first order sensitivity equations

0

00

f f f f0 , [ , ] (2a)

with the initial conditions:

( ) ( ) (2b)

n ft t t

t

x x u

x v x v u v v

xxv

v v

Page 24: Second Order Differentiation

24

An Application in OCPs

The second order sensitivity equations

2 2 2 2 2 2

2 2 2

2 2 2 2

2

f f f f f f

f f f f f

T

n

T

n

I I I

I I

x x x x x u

x v x v v x v x x v u x v v x

x x x x

v x v x x v u x v v x u

2

2

2 2 2 2

2

2 2 2 2

02

f f f f

f f f f0 . [ , ] (3a)

with ini

T

n

n f

I

t t t

u

v

u x x u

v x u v x u v u v v u

x x u

x v v x v v u v v v

220

02 2tial conditions given by: ( ) ( ) (3b)t

xxv

v v

Page 25: Second Order Differentiation

25

An Application in OCPs

The second order sensitivity equations

1

The result of Eq. (3a) is post-multiplied by a vector obtaining:

{ } 0 (4)

by comparing terms the equivalent form is derived:

n p

p

f f ( , , , ) 0 (5a)

with , being the matrices whose columns are respectively given by

the matrix-vector products , , with

T Tn

n

i iz z

Z Z A x x u vx x

Z Z

2 2

i 0

2 201 0

0

, , 1, 2, , , [ , ]. (5b)

Finally, the set of initial conditions for these are:

( ( )) ( ) ( ) (5c)

i ii f

T T T n

x xz z i n t t t

x xt

p pv v

Z p v p vv v

Page 26: Second Order Differentiation

26

An Application in OCPs

Optimal control problem

0Find the control vector ( ) over [ , ] to minimize (or maximize) a

performance index, :

( , ) ( ( )) (6)

subject

f

f

t t t t

J

J t

u

x u x

to a set of ordinary differential equations:

f ( ( ), ( ), ) (7)

where is the vector of state variables, with initial c

dt t t

dt

xx u

x 0 0onditions ( ) .

An additional set of inequality constrains are the lower and upper bounds

on the control variables:

( ) L U

t

t

x x

u u u (8)

Page 27: Second Order Differentiation

27

An Application in OCPs

Truncated Newton method for the solution of the NLP

The truncated newton method uses an iterative scheme, usually a conjugate

gradient method, to solve the Newton equations of the optimization problem:

( ) (H x p g ) (9)

where ( ) is the Hessian matrix, is the search direction and ( ) is the

gradient vector.

x

H x p g x

Page 28: Second Order Differentiation

28

An Application in OCPs

Implementation Details

Step 1

• Automatic derivation of the first and second order sensitivity equations to construct a full augmented IVP.

• Creates corresponding program subroutines in a format suitable to a standard IVP solver.

Step 2

• Numerical solution of the outer NLP using a truncated –Newton method which solves bound-constrained problems.

Page 29: Second Order Differentiation

29

An Application in OCPs

Two approaches with TN method

TN algorithm with finite difference scheme

TN algorithm with the exact Hessian vector product calculation

• It uses the second order sensitivity equations defined in Eq. (5a) to obtain the exact Hessian vector product. (Earlier methods of the CVP type were based on first order sensitivities only, i.e. Gradient based algorithms mostly).

• This approach has been shown more robust and reliable due to the use of exact second order information.

• Gradient evaluation requires the solution of the first order sensitivity system

• Gradient information is used to approximate the Hessian vector product with a finite difference scheme

Page 30: Second Order Differentiation

30

Summary

• Basics of derivatives- Definition of derivatives- Application of derivatives- Methods to compute derivatives

• Basics of AD- Compute first order derivatives with forward mode- Compute first order derivatives with reverse mode

• Second Order Differentiation- Compute second order derivatives with forward mode strategy- Compute second order derivatives with Taylor Series strategy- Hessian Performance

• An Application in Optimal Control Problems- First order and second order sensitivity equations of DAE- Solve optimal control problem with CVP method- Solve nonlinear programming problems with truncated Newton method- Truncated Newton method with exact Hessian vector product calculation

Page 31: Second Order Differentiation

31

References

• Abate, Bischof, Roh,Carle "Algorithms and Design for a Second-Order Automatic Differentiation Module„

• Eva Balsa Canto, Julio R. Banga, Antonio A. Alonso, Vassilios S. Vassiliadis "Restricted second order information for the solution of optimal control problems using control vector parameterization„

• Louis B. Rall, George F. Corliss „An Introduction to Automatic Differentiation„

• Andreas Griewank „Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation„

• Stephen G. Nash „A Survey of Truncated-Newton Methods„