nmfi 15 handouts prelim

Numerical Methods for Financial Engineering

Alexey Chernov

University of Reading

Spring Term 2015

Alexey Chernov Numerical Methods for Financial Engineering 1 / 182

Introduction

This course is an introduction to the major numerical methodsneeded for quantitative work in finance.

General survey of key numerical methodsDetailed study of methods specific to financial applicationsPractical implementation of numerical methods

Lectures will take place in G51, Mondays, 4–6pm! Alexey Chernov, Maths 109, [email protected]

Seminars: computer room, Mondays (from 26/01), 10-11am.! Claudio Bierig, [email protected]


Outline of the lecture

L 1 Foundations of Numerical ComputationsL 2 Option Pricing with Binomial and Trinomial TreesL 3 Monte Carlo SimulationsL 4 Finite Difference MethodsL 5 Systems of Linear Equations Ax = bL 6 Nonlinear EquationsL 7 OptimizationL 8 CalibrationL 9 Interpolation

Assessment:50% Project: study and programming of a numerical method50% Exam: 2 hours closed book examination


Literature1 M. Gilli, D. Maringer, and E. Schumann, Numerical methods and optimization in

finance, Elsevier (2011)2 S. Boyd and L. Vandenberghe, Convex Optimisation, Cambridge University

Press (2004)3 G. Cornuejols and R. Tütüncü, Optimization Methods in Finance, Cambridge

University Press (2007)4 D.J. Duffy, Finite Difference Methods in Financial Engineering: A Partial

Differential Equation Approach, Wiley Finance (2006)5 D.J. Duffy and Jörg Kienitz, Monte Carlo Frameworks. Building customable

high-performance C++ applicaitons, Wiley (2009)6 G. Fusai and A. Roncoroni, Implementing Models in Quantitative Finance,

Springer Finance (2008)7 P. Glasserman, Monte Carlo methods in financial engineering, Springer (2004)8 J. C. Hull, Options, Futures and Other Derivatives, 6th edition, Pearson, 2006.9 J. Nocedal and S.J. Wright, Numerical Optimisation, Springer (2006)

10 W. H. Press, S. A. Teukolsky, W. T., Vetterling, B. P. Flannery, Numerical Recipes:The Art of Scientific Computing, 3rd edition, Cambridge University Press (2007)

11 F.D. Rouah and G. Vainberg, Option Pricing Models and Volatility UsingExcel-VBA, Wiley Finance (2007)

12 R. Seydel, Tools for Computational Finance, Springer (2002)


Lecture 1

Foundations of Numerical Computations

Approximation of Reality with Computer SimulationsComputer ArithmeticMeasuring ErrorsApproximating derivatives with finite differencesNumerical instability and ill-conditioning


Approximation of Reality with Computer Simulations

We encounter for 3 levels of approximation:

(1) Real process#

#(modelling errors)#

(2) Mathematical model of real process#

#(numerical approximation errors)#

(3) Numerical approximation to mathematical model

In this course we focus on

Numerical approximation to mathematical models:Methods and AlgorithmsControl of Numerical Error


Computer Arithmetic

Two sources of numerical error:

a. Truncation errors (for example f 0(x) ⇡ f (x+h)�f (x)h )

b. Round-off errors

Round-off is an intrinsic limitation of numerical computations:

Computer cannot perform exact real number arithmetice.g. it can happen, that 1 + ✏ = 1 for ✏ 6= 0

Reason: Real numbers cannot be represented exactly.

Information is represented in “words” (64 bits = double precision)

1 0 0 1 ... ... 1 0 0 1 0 1

b63 b62 b61 b60 ... ... b5 b4 b3 b2 b1 b0


Computer Arithmetic (2)

Representation of real numbers:

f = ±n ⇥ be

wheren is the mantissa (fractional part)b is the base (b = 2)e is the exponent

Example: 91.232 = 0.91232⇥ 102 = 0.71275⇥ 27.

b63

|{z}sign

b62 b61 ... ... b53 b52

| {z }11 bit (biased) exponent (s = 11)

b51 b50 ... ... ... ... ... b1 b0

| {z }52 bit (normalized) mantissa (t = 52)



Example: s = 2, t = 2

sign =

b4

0 = “+00

1 = “�00e =

b3b2

0 0 = 00 1 = 11 0 = 21 1 = 3

n =

b1b0

1 0 0 = 41 0 1 = 51 1 0 = 61 1 1 = 7

sign 2 {0, 1} e0 2 {�4,�3,�2,�1} n 2 {4, 5, 6, 7}

Possible numbers:

n \ 2e0 116

18

14

12

4 1/4 1/2 1 25 5/16 5/8 5/4 5/2

6 3/8 3/4 3/2 37 7/16 7/8 7/4 7/2Real line:

�3 �2 �1 0 1 2 3� 12�

14

14

12



Real line:

�3 �2 �1 0 1 2 3� 12�

14

14

12

Notice:

representable real numbers are not equally spacedgap around zero (can be closed by denormalization)

Range of (nonzero) floating point numbers (double precision):

m |f | M, m ⇡ 2.22⇥ 10�308, M ⇡ 1.79⇥ 10308



Round-off: approx. of exact numbers by floating point numbers,e.g. by chopping digits beyond a given position (a better logic isused in practice)

In the previous example (s = 2, t = 2):

float(0.71) = 58 , float(1.71) = 3

2

0 1 2 358 3

278

0.71 1.71

Notice:

float(1.71) – float(0.71) = 78 whereas 1.71 – 0.71 = 1

although 1 is exact representable.



Machine precision is defined as the smallest ✏mach > 0, s.t.

float(1 + ✏mach) > 1.

Public Sub ComputeMachineEpsilon()

Dim e As Double

e = 1

Do

e = e / 2

Loop While (1 + e) > 1

MsgBox ("Machine Epsilon is " & 2

*

e)

End Sub

✏mach ⇡ 2.22⇥ 10�16

Example of limitations:

Sn :=nX

k=1

1k!1 whereas float(Sn) < 100 (since

1/n

Sn�1< ✏mach for large n)

Notice, flop computations are not associative: for � =✏mach

2float(float(1 + �) + �) = 1,float(1 + float(� + �)) > 1.


Measuring errors

In numerical computations, frequently, one measures errors:between an approximate value x and an exact value xdistance between two successive values in an iterative alg.

Absolute erroreabs = |x � x |

Example:x = 3, x = 2, eabs = 1 is “large”x = 10001, x = 10000, eabs = 1 is rather “small”

Relative error eabs =|x � x ||x |

Example:x = 10001, x = 10000, erel = 10�5 is “small”

does not work when x is (or close to) zero

Combined error ecom =|x � x ||x |+ 1

Combines properties of the absolute error for |x | ⌧ 1 and relative error for |x | � 1.Alexey Chernov Numerical Methods for Financial Engineering 13 / 182

Approximating derivatives with finite differences

Paradigm:Instead of working with f 0(x) it is possible to approximate itusing values of f only (this is important when f (x) is not known).

DefinitionIf f : R! R is continuous, f 2 C0(R), then

f 0(x) = limh!0

f (x + h)� f (x)h

. (1)[“infinitely small” difference quotient]

Idea: Use (1) for a “finite” positive value of h.Question: What is the quality of this approximation?

Tool: Taylor expansion

f (x + h) = f (x) + hf 0(x) +h2

2f 00(x) +

h3

6f 000(x) + · · ·+ hn

n!f (n)(x) + Rn

where Rn = hn+1

(n+1)! f(n+1)(⇠), ⇠ 2 [x , x + h], provided f 2 Cn+1(R).


Approximating derivatives with finite differences (2)

Approximating first-order derivatives:

Second order Taylor expansion:

f (x ± h) = f (x)± hf 0(x) +h2

2f 00(⇠), for ⇠ between x and x ± h.

Forward difference approximation

f 0(x) =f (x + h)� f (x)

h� h

2f 00(⇠),

Backward difference approximation

f 0(x) =f (x)� f (x � h)

h+

h2

f 00(⇠).

In both cases the truncation error is O(h).



Third order Taylor expansion:f (x ± h) = f (x)± hf 0(x) +

h2

2f 00(⇠)± h3

6f 000(⇠±).

Central difference approximation for f 0(x)

f 0(x) =f (x + h)� f (x � h)

2h� h2

3f 000(⇠), ⇠ 2 [x � h, x + h]

truncation error is O(h2).

Approximating second-order derivatives:

Central difference approximation for f 00(x)

f 00(x) =f (x + h)� 2f (x) + f (x � h)

h2 �h3

f 000(⇠), ⇠ 2 [x�h, x+h]

truncation error is O(h).

Approximation of higher order derivatives is analogous.Alexey Chernov Numerical Methods for Financial Engineering 16 / 182


Approximating partial derivatives: Suppose f = f (x , y)

Central difference approximation for @f@x

@f@x

(x , y) ⇡ f (x + hx , y)� f (x � hx , y)2hx

Central difference approximation for @2f@x@y

@2f@x@y

(x , y) ⇡ 14hxhy

✓f (x + hx , y + hy )� f (x � hx , y + hy )

�f (x + hx , y � hy ) + f (x � hx , y � hy )

◆.

Homework: find the truncation error.



Question: how to choose h? If h is too small, it could happen

float(x + h) = float(x) or float(f (x + h)) = float(f (x))

Careful analysis: Balance truncation and round-off errors.

Define f 0h(x) :=f (x + h)� f (x)

hand M := supx |f 00(x)|, then

|f 0h(x)� f 0(x)| h2

M

Assume |float(f (x))� f (x)| ✏, then |float(f 0h(x))� f 0h(x)| 2✏h

|float(f 0h(x))� f 0(x)| 2✏h

+Mh2

Smallest worst-case error is at the minimum of g(h) := 2✏h + Mh

2 .Alexey Chernov Numerical Methods for Financial Engineering 18 / 182


g0(h) = �2✏h2 +

M2

!= 0.

hopt = 2r

✏

M

If ✏ = ✏mach and M = 1, then hopt ⇡ 10�8

Analogously, for the central differencesf (x + h)� f (x � h)

2h

g(h) =2✏h

+Mh2

3and hopt =

3

r3✏M

.

If ✏ = ✏mach and M = 1, then hopt ⇡ 10�5.3



Example:

f (x) = cos(xx)� sin(ex), f 0(1.5) =?

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2 2.5 3



Example:

f (x) = cos(xx)� sin(ex), f 0(1.5) =?

1.0E-111.0E-101.0E-091.0E-081.0E-071.0E-061.0E-051.0E-041.0E-031.0E-021.0E-011.0E+001.0E+01

1.0E-16 1.0E-12 1.0E-08 1.0E-04 1.0E+00 abs error vs h

forward difference

central difference


Numerical instability and ill-conditioning

Round-off errors may seriously affect the precision of thenumerical solution.

Two types/reasons of severe precision reduction:

Round-off errors are considerably amplified by thealgorithm (numerical instability (subtractive cancellation))Small perturbations of data generate large changes in thesolution (ill-conditioned or sensitive problem)


Numerical instability

Example: solve ax2 + bx + c = 0

with a = 1, c = 1 and different b.

x1 =�b �

pb2 � 4ac

2a, x2 =

�b +p

b2 � 4ac2a

Algorithm 1

� =p

b2 � 4ac

x1 =�b ��

2a

x2 =�b +�

2a

Algorithm 2

� =p

b2 � 4ac

if(b < 0) then x1 = �b+�2a

else x1 = �b��2a end if

x2 = c/(ax1)


Performance in the 5-digit arithmetic

Algorithm 1:b � float(�) float(x2) x2 rel. error5 4.58257569495584 4.5825 -0.20875 -0.208712152522080 2.00018

500 499.995999984000 499.99 -0.005 -0.002000008000067 3.499995000 4999.99959999998 4999.9 -0.05 -0.000200000008135 250.999

Algorithm 2:b float(�) float(x1) float(x2) x2 rel. error

5000 4999.9 -4999.9 -0.0002 -0.000200000008135 4 ⇥ 10�8


Example: ill conditioned problem

Consider the problem: Find x solving Ax = b where

A =

0.78 0.563

0.913 0.659

�b =

0.2170.254

�

Exact solution: x =

1�1

�

Numerical solution: x =

0.999999999970896

�1

�

Perturbed problem: Find y solving (A + E)y = b where

E =

0.001 0.001�0.002 0.001

�

Numerical solution: y =

�4.999999999993187.30851063828823

�

Notice: y is very different from xAlexey Chernov Numerical Methods for Financial Engineering 25 / 182

Condition number

For linear systems:

(A) = kA�1kkAk.

Rule: (A) >1p✏mach

⇡ 108 the solution might be inaccurate.

Example:

A =

0.78 0.563

0.913 0.659

�

kAk2 ⇡ 1.481, kA�1k2 ⇡ 1.481⇥106, 2(A) ⇡ 2.1932⇥106.


Lecture 2

Option Pricing with Binomial and Trinomial Trees

S0,C0

Sup,Cup

Sdown,Cdown

�t

S = stock prices

C = option prices

Aim: Computecurrent option price C0.

Terminology and foundationsIntroduction to tree-based methodsBinomial trees

Models: Cox-Ross-Rubinstein, Tian, Jarrow-RuddPricing algorithmExample: European and American putApproximating the Greeks

Trinomial treesPricing algorithmModels: Boyle, Hull-White


Terminology and foundations (1)

An option is a contract which gives the owner the right (but notthe obligation) to buy or sell an underlying asset at a specifiedstrike price K on or before a specified date T .

Option types (according to the rights):Call option is the right to buyPut option is the right to sell

Option positions:Long position = buying an optionShort position = selling an option

Option styles:European options (excerc. at the expiration date only)American options (excerc. at any time up to the expirationdate)Exotic options (Asian, Bermudan, Barrier, etc.)



Factors affecting stock option prices:Stock price St at current time t = 0Strike price, KTime to expiration/maturity, TVolatility of the stock price, �Risk-free interest rate,rDividends during the lifetime of the option, qPayoff

Example: for long positions (buying) inEuropean call

ST

Payoff = max(ST � K , 0)

K

European put

ST

Payoff = max(K � ST , 0)

K

Example: for long positions (buying) inEuropean call

ST

Profit = max(ST � K , 0)� C0

C0

K

European put

ST

Profit = max(K � ST , 0)� C0

K

C0



Methods for option pricingAnalytical (closed form solutions)Binomial and trinomial trees consider todayMonte Carlo modelsFinite Difference methods


Trees: introduction

Trees is an intuitive approach to explain the logic behind optionpricing models and also is a powerful computational technique.

Methodology:Trees approximate the movements in the price of an assetThe value of the option is calculated backwards assumingrisk-free rates and risk-free probabilities

Benefits:Trees can be used to price European and Americanoptions

Drawbacks:Trees are typically not suitable for pricing path-dependentoptions


Binomial trees

Binomial model: over a short period of time �t the price of anasset either goes up with prob. p or down with prob. (1� p).

S0

SupS0uwith p

SdownS0dwith 1 � p

�t

Q: How to identifymodel parameters

u, d , p?

A: Matching with thecontinuous time model

for stock prices!

Price movements can beadditive (absolute movements: Sup = S0 + u, Sdown = S0 + d)multiplicativemultiplicative (relative movements: Sup = S0u,Sdown = S0d)

! in the lecture we consider multiplicative movements only


Continuous time stock price behavior (Black-Scholes)

Model for stock price behavior: geometric Brownian motion

dSt = µStdt + �StdWt

Wt is a Wiener process (Brownian motion)µ = r � q is the expected return of the stock� is the volatility of the stock price

It can be shown that⇤

St has the lognormal distribution (by Itô’s lemma),E[St ] = S0eµt

Var[St ] = S20e2µt(e�2t � 1).

Define:⇢

R = eµt ,

Q = e2µt e�2t ,then

⇢E[St ] = S0R,E[S2

t ] = S20Q.

⇤see e.g. Note 2 at www.rotman.utoronto.ca/~hull/TechnicalNotes


www.rotman.utoronto.ca/~hull/TechnicalNotes

Binomial trees: moment matching

S0

S0uwith p

S0dwith 1 � p

�t

Match the moments✓

R = eµ�t

Q = R2e�2�t

◆:

E[S�t ] = pS0u + (1� p)S0d != S0R

E[S2�t ] = pS2

0u2 + (1� p)S20d2 !

= S20Q

8><

>:

pu + (1� p)d = R

pu2 + (1� p)d2 = Qud = 1 [Cox, Ross, Rubinstein’79]


Binomial trees: moment matching (2)

Rewrite the system: p =R � du � d

, p =Q � d2

u2 � d2 , ud = 1.

) Ru2 � (Q + 1)u + R = 0.

Solution:

u =Q + 1 +

p(Q + 1)2 � 4R2

2R= 1 + �

p�t +

�2

2�t + o(�t) ⇡ e�

p�t .

CRR model [Cox, Ross, Rubinstein’79]

u = e�p�t , d = e��

p�t , p =

e(r�q)�t � du � d

.

CRR model with a drift

u = e⌘�t+�p�t , d = e⌘�t��

p�t , p =

e(r�q)�t � du � d

.


Binomial trees: moment matching (3)

Tian’93:

Match the 3rd moment instead of setting ud = 1.

Tian model

u =eµ�t v

2(v + 1 +

pv2 + 2v � 3),

d =eµ�t v

2(v + 1�

pv2 + 2v � 3),

p =eµ�t � d

u � d,

v = e�2�t .


Further Binomial tree models

Jarrow-Rudd equal probability model

u = e(µ��2/2)�t+�p�t ,

d = e(µ��2/2)�t��p�t ,

p =12.

Jarrow-Rudd risk-neutral model

u = e(µ��2/2)�t+�p�t ,

d = e(µ��2/2)�t��p�t ,

p =eµ�t � d

u � d.

(special case of CRR with a drift)


Pricing algorithm


Step 1: Growing the tree of asset prices

S0,0

S1,0

S1,1

S2,0

S2,1

S2,2

S3,0

S3,1

S3,2

S3,3

S4,0

S4,1

S4,2

S4,3

S4,4

�ti0 1 2 3 4

Si,j represents theprice after i periods

and j upticks:

Si,j = S0ujd i�j

(path-independent)

t = Tin= i�t


Step 2: Computing stock prices at maturity and payoff

C4,0

C4,1

C4,2

C4,3

C4,4

�ti0 1 2 3 4

Stock prices atmaturity:

Sn,j = S0ujdn�j

Payoff:

Call option

Cn,j = max(Sn,j�K , 0)

Put option

Cn,j = max(K�Sn,j , 0)


Step 3: Compute the values of the option backwards

C0,0

C1,0

C1,1

C2,0

C2,1

C2,2

C3,0

C3,1

C3,2

C3,3

C4,0

C4,1

C4,2

C4,3

C4,4

�ti0 1 2 3 4

European option: Ci,j = (pCi+1,j+1 + (1� p)Ci+1,j)e�r�t =: C⇤i,j

American call: Ci,j = max(C⇤i,j ,Si,j � K )

American put: Ci,j = max(C⇤i,j ,K � Si,j)


Example: European put option

[Hull, Options Futures and other Derivatives, 6th edition ]

Consider a 2-year European put option with a strike price of $52 on astock whose current price is $50. Assume that there are only two timesteps of 1 year, and in each time step the stock price either moves upby 20% or moves down by 20%. Suppose that the risk-free interestrate is 5% and no dividends are paid. Aim is to find the current optionprice assuming p = eµ�t�d

u�d .Solution:

u = 1.2, d = 0.8,

K = 52, S0 = 50

�t = 1, r = 0.05, µ = 0.05,

R = eµ�t ⇡ 1.05128

p =R � du � d

⇡ 0.6282.

S0,0

C0,0

S1,1

C1,1

S1,0

C1,0

S2,2

C2,2

S2,1

C2,1

S2,0

C2,0


u = 1.2,

d = 0.8,

K = 52,

S0 = 50,

R = 1.05128,

p ⇡ 0.6282,

1� p ⇡ 0.3718.

S0

S0u

S0d

S0u2

S0

S0d2


u = 1.2,

d = 0.8,

K = 52,

S0 = 50,

R = 1.05128,

p ⇡ 0.6282,

1� p ⇡ 0.3718.

50

60

40

72

48

32


u = 1.2,

d = 0.8,

K = 52,

S0 = 50,

R = 1.05128,

p ⇡ 0.6282,

1� p ⇡ 0.3718.

5050

4.1923

6060

1.4147

4040

9.4636

72

0

48

4

32

20

Cn,j = max(K � Sn,j , 0), Ci,j = (pCi+1,j+1 + (1� p)Ci+1,j)e�r�t


Example: American put option

50

4.1923

60

1.4147

40

9.4636

72

0

48

4

32

20

C⇤i,j = (pCi+1,j+1 + (1� p)Ci+1,j)e�r�t , Ci,j = max(K � Si,j ,C⇤

i,j)


Example: American put option

50

5.0894

60

1.4147

40

12

72

0

48

4

32

20

C⇤i,j = (pCi+1,j+1 + (1� p)Ci+1,j)e�r�t , Ci,j = max(K � Si,j ,C⇤

i,j)


Remarks:

No need to store the whole matrices Si,j , Ci,j . Columnvectors at a time step i are sufficient.A simple shortcut for European options:

# paths leading to the terminal stock price Sn,j is✓

nj

◆

The probability of a path leading to Sn,j equals pj(1� p)n�j

Thus the value of the option can be expressed as

C0,0 = e�rTnX

j=0

✓nj

◆pj(1� p)n�j ⇥ Payoff (j).

The parameters r and q do not have to be constant in time;if these are time-dependent then the pricing formulae donot change, instead the probabilities p will betime-dependent as well.


Approximating the Greeks: Delta �

The Delta � is the change in the option price for a given smallchange in S.

Estimate from the tree:

�0,0 =C1,1 � C1,0

S1,1 � S1,0

S0,0

C0,0

S1,1

C1,1

S1,0

C1,0

S2,2

C2,2

S2,1

C2,1

S2,0

C2,0


Approximating the Greeks: Gamma �

The Gamma � is the change in the Delta � given a smallchange in the spot price SEstimate from the tree:

�0,0 =�1,1 ��1,0

S1,1 � S1,0or �0,0 =

�1,1 ��1,012(S2,2 � S2,0)

,

but�1,1 =

C2,2 � C2,1

S2,2 � S2,1, �1,0 =

C2,1 � C2,0

S2,1 � S2,0,

thus

�0,0 =

C2,2 � C2,1

S2,2 � S2,1� C2,1 � C2,0

S2,1 � S2,012(S2,2 � S2,0)


Approximating the Greeks: Gamma �

�0,0 =

C2,2 � C2,1

S2,2 � S2,1� C2,1 � C2,0

S2,1 � S2,012(S2,2 � S2,0)

S0,0

C0,0

S1,1

C1,1

S1,0

C1,0

S2,2

C2,2

S2,1

C2,1

S2,0

C2,0


Approximating the Greeks: Theta ⇥

The Theta ⇥ is the change in the price of the option for a smallchange in time t . If the tree fulfills ud = 1, then we may use

⇥0,0 =C2,1 � C0,0

2�t.

S0,0

C0,0

S1,1

C1,1S1,0

C1,0

S2,2

C2,2S2,1

C2,1S2,0

C2,0

If ud = 1 is not fulfilled, Rubinstein (1994) suggested (uses B-S eq.)

⇥0,0 = rC0,0 � (r � q)S0,0�0,0 � 12�2S2

0,0�0,0.


Approximating the Greeks: Vega V

The Vega V is the change in the price of the option for a smallchange in volatility �.

Approximation of the Vega V:compute the option price using the volatility of theunderlying: C0,0(�)

compute the option price assuming volatility � ! � + 1%and building a new tree: C0,0(� + 0.01)Vega will be the difference between these two optionprices divided by 1%:

V =C0,0(� + 0.01)� C0,0(�)

0.01.


Trinomial trees

During a time interval �tthe asset price S(t) canmove:

S0,0

pu

pm

pd

S0,0u

S0,0m

S0,0d�t

In general it is assumed that m = 1No-arbitrage condition: d < er�t < uRisk-neutral probabilities: pu, pm, pdTo have a tree that is horizontally recombining (has nodrift), it is generally assumed that ud = 1Option pricing is analogous to pricing with binomial trees

European: Ci,j = (puCi+1,j+2 + pmCi+1,j+1 + pdCi+1,j)e�r�t =: C⇤i,j

American call: Ci,j = max(Si,j � K ,C⇤i,j)

American out: Ci,j = max(K � Si,j ,C⇤i,j)


Trinomial tree parameters: moment matching

Moments from a continuous time model:E[St ] = S0R, R = eµt

E[S2t ] = S2

0Q, Q = e(2µ+�2)t

Moments from the tree:E[S�t ] = S0(puu + pmm + pdd)E[S2

�t ] = S20(puu2 + pmm2 + pdd2)

This gives 3 equations (including pu + pm + pd = 1) for 6unknowns.

Using m = 1, ud = 1 still 3 equations, 4 unknowns.


Trinomial tree parameters: Boyle and Hull-White

Boyle suggested u = e��p

t . � is called the stretch parameter.Its typical value is � =

p2 (Boyle) or � =

p3 (Hull-White)

The value � =p

2 implies

u = e�p

2t pu =⇣

eµ�t/2�e��p

�t/2

e�p

�t/2�e��p

�t/2

⌘2

d = e��p

2t pd =⇣

eµ�t/2�e�p

�t/2

e�p

�t/2�e��p

�t/2

⌘2

m = 1 pm = 1� pu � pd

Other parameters were given by Kamrad/Ritchken and Tian(equal probabilities tree or 4 moments matching tree)


Growing trinomial trees

S0,0

S1,0

S1,1

S1,2

S2,0

S2,1

S2,2

S2,3

S2,4

S3,0

S3,1

S3,2

S3,3

S3,4

S3,5

S3,6

�ti0 1 2 3


Lecture 3

Monte Carlo Simulations

Introduction to the Monte Carlo (MC) methodPricing derivatives (solution of SDEs) with MC

General framework for option pricingEstimation of the mean-square errorBeyond geometric Brownian motion:

Euler-Maruyama and Milstein schemesApproximating the Greeks:

How to choose parameters?Random number generation

IntroductionBox-Muller and Marsaglia polar method for generatingstandard normal random variables


Introduction to the Monte Carlo method

Methodology:Simulate thousands of paths for the underlying asset,assuming a stochastic price processThe option price is calculated as the average of thesimulated option prices

Benefits:Can be used to price European, various path-dependentoptions (Asian, Barrier, etc.)Can be used to price options with several underlyingassetsSuitable for parallel computing

Drawbacks:Computationally intensiveDifficult to use for pricing American options


Pricing derivatives with the Monte Carlo method

Summary of the steps:1. Simulate a path of the underlying stock from t = 0 to t = T2. Calculate the stock price at maturity (t = T )3. Calculate the option price at maturity (t = T ) using the

payoff function

Execute steps 1–3 M times (typically M = 1000 or more)

4. Calculate the averaged option price at maturity (t = T )5. Discount the averaged option price to t = 0.

! To begin, we need a model for evolution of Stock prices.


Continuous time stock price behavior (Black-Scholes)

Model for stock price behavior: geometric Brownian motion


St is the stock priceWt is a Wiener process (Brownian motion)µ = r � q is the expected return of the stock (drift coeff.)r is the risk-free interest rateq are dividends paid during the lifetime of the option� is the volatility of the stock price (diffusion coefficient)

Recall the definition of Wt :i. W0 = 0ii. The function t !Wt is almost surely everywhere continuousiii. The increments of Wt are independentiv. The increments satisfy Wt+�t �Wt ⇠ N (0,�t)

N (µ, �) denotes the normal distribution with expected value µ and variance �2


Increments of geometric Brownian motion


Resort to the log-price xt := log St , then by Itô’s Lemma

) dxt =

✓µ� �2

2

◆dt + �dWt

Z t+�t

t

✓. . .

◆

) xt+�t � xt =

✓µ� �2

2

◆�t + �(Wt+�t �Wt).

Therefore, even for large �t

xt+�t = xt +

✓µ� �2

2

◆�t + �Z

p�t ,

where Z ⇠ N (0, 1), or

St+�t = St exp✓✓

µ� �2

2

◆�t + �Z

p�t◆.


Path generation algorithm (realization of step 1 and 2)Calculate the stock log-prices xtn at a set of discrete mesh points tn:

Monte Carlo path simulation

Specify the pricing parameters r , q, T , x0 = log S0 and N = thenumber of mesh points

Define ⌫ := r � q � �2

2, �t =

TN

for n=1...N

{

draw a sample Zn from N (0, 1)

compute xtn+1 = xtn + ⌫�t + �Znp�t (#)

if necessary, compute Stn = exp(xtn)}

The value StN is the stock price at maturity ST ( step 2)

Remark: no need to store xn for all mesh point, instead of (#) use

xnew = xold + ⌫�t + �Znp�t


Evaluation of the mean option price (steps 3–5)

Compute option price at maturity

CT = payoff(ST ,K ) step 3

Repeat steps 1–3 M times computing SN,j and

CT ,j = payoff(ST ,j ,K ) j = 1, . . . ,M

Compute the average

CT =1M

MX

j=1

CT ,j step 4

Discount the average option price

C0 = exp(�rT )CT step 5


Estimate for the mean-square error

Suppose ↵ is a sample estimate for the deterministic quantity ↵.

The meas-square error (MSE) of ↵ is defined as E[(↵� ↵)2].

Then it holds for x = ↵� E[↵] and y = ↵� E[↵]

E[(↵� ↵)2] = E[(x � y)2]

= E[x2 � 2xy + y2]

= x2 � 2x E[y ]|{z}=0

+E[y2]

= (↵� E[↵])2 + E[(↵� E[↵])2]

= Bias(↵)2 + Var(↵).


Bias

The estimator C0 is unbiased

E[C0] = E[exp(�rT )1M

MX

j=1

CT ,j

= exp(�rT )1M

MX

j=1

E[CT ,j ]

= exp(�rT )1M

MX

j=1

E[CT ]

= exp(�rT )E[CT ]

= E[C0].


Bias

However, for other options the estimators may become biased.

Example: Asian call fixed strike

CT = max(A� K , 0), A =

Z T

0Stdt

In the MC simulation the integral cannot be computed inexactly

CT ,N = max(AN � K , 0), AN =NX

n=1

Stn�t

So thatE[C0,N ] 6= E[C0].

But the estimator is asymptotically unbiased

E[C0,N ]! E[C0], N !1.


Variance

Suppose ↵ =1M

MX

j=1

↵j where ↵j are indep. realizations of ↵.

We have E[↵] = 1M

MX

j=1

E[↵j ] = 0 and therefore

Var(↵) = E

2

64

0

@ 1M

MX

j=1

↵j

1

A23

75 =1

M2E

2

4MX

i,j=1

↵i↵j

3

5

=1

M2

MX

j=1

E⇥↵2

j⇤=

1M2

MX

j=1

Var(↵)

=Var(↵)

M.

Mean-square error

E[(E[C0]� C0,N)2] = (E[C0]� E[C0,N ])

2 +Var(C0,N)

M.


Discretization schemes for nonlinear SDEs

More generally, prices may be modelled by the nonlinear SDE

dXt = µ(Xt)dt + �(Xt)dWt , 0 < t TX0 = A

Consider the integral form:

Xtn+1 = Xtn +

Z tn+1

tnµ(Xt)dt +

Z tn+1

tn�(Xt)dWt

X0 = A

Approximate µ(Xt) and �(Xt) by their values in the left point tn.This will define a sequence Yn ⇡ Xtn defined via

Yn+1 = Yn + µ(Yn)�t + �(Yn)(Wtn+1 �Wt).

Define µn = µ(Yn), �n = �(Yn), �Wn = Zp�t , then . . .


Discretization schemes for nonlinear SDEs

Explicit Euler(-Maruyama) scheme

Yn+1 = Yn + µn�t + �n�Wn

Analogously we obtain:Backward Euler scheme

Yn+1 = Yn + µn+1�t + �n�Wn

(via approximation of integrals by the right point tn+1)

Semi-implicit Euler scheme (trapezoidal rule)

Yn+1 = Yn +µn+1 + µn

2�t + �n�Wn

(via approximation of integrals by the midpoint tn+tn+12 )

Milstein scheme: for �0n = �0(Yn)

Yn+1 = Yn + µn�t + �n�Wn +�0n�n

2((�Wn)

2 ��t)

(derived via Itô lemma applied for µ and �)Alexey Chernov Numerical Methods for Financial Engineering 70 / 182

Practical considerations

As mentioned earlier, no need to store the entire path ofthe stock price; it is enough to store the latest stock priceonlyNo need to store the final payoffs in a vector; it is enoughto store their sum CT ,M only and update

CT ,M = CT ,M�1 +CT ,M � CT ,M

M

(check that CT ,M =1M

MX

j=1

CT ,j )

For path-dependent options:If the payoff depends on the min/max then update themin/max at each step for each pathIf the payoff depends on the average path then compute thesum of prices on the path and store only this sum


Approximating the Greeks

Suppose that we have a randomized mechanism for computingthe option price C at t = 0 for a given stock price S at t = 0 (here

we omit the subscript in C0 and S0 for brievity). The aim is to computesensitivities of

f (S) = E[C(S)],

in particularDelta � =

@f@S

(t)��t=0

and � =@2f@S2 (t)

��t=0

The natural idea is to use finite differences:

Simulate M independent option prices C1(S), . . . ,CM(S)

Simulate additional M option prices C1(S + h), . . . ,CM(S + h)

Average C(S) =1M

MX

j=1

Cj(S), C(S + h) =1M

MX

j=1

Cj(S + h)

Use a finite difference approximation for �.


Finite difference estimators for � and �

Approximation of �

Forward difference: �F (M, h) =C(S + h)� C(S)

h

Backward difference: �B(M, h) =C(S)� C(S � h)

h

Central difference: �C(M, h) =C(S + h)� C(S � h)

2h

Approximation of �

Central difference: �C(M, h) =C(S + h)� 2C(S) + C(S � h)

h2

How to choose M and h?

Consider the estimator �F (M, h) in detail.Other estimators are left as a homework.


MSE for �F (M, h): bias

Rewrite

�F (M, h) =1M

MX

j=1

Cj(S + h)� Cj(S)

h

thus, the MSE error representation can be used:

E[(�� F )2] = Bias(�F )

2 + Var(�F )

Estimating the bias:

E[�F ] =f (S + h)� f (S)

h= f 0(S) +

12

f 00(S)h + o(h)

Notice that � = f 0(S), therefore

Bias(�F ) = E[�F ]�� =12

f 00(S)h + o(h)


MSE for �F (M, h): variance

Estimating the variance:

Define ↵j :=Cj(S + h)� Cj(S)

h� E

Cj(S + h)� Cj(S)

h

�

by assumption, ↵j are independent and identically distributed

) Var(↵) =Var(↵)

M(as we have seen before)

) Var(�F ) =Var(C(S + h)� C(S))

Mh2 .

If Cj(S + h) and Cj(S) are generated independentlyVar(C(S + h)� C(S)) = Var(C(S + h)) + Var(C(S))! 2Var(C(S))

Thus Var(�F ) = O(M�1h�2)

Estimating the MSE:

E[(�� F )2] = Bias(�F )

2 + Var(�F ) = O(h2) +O(M�1h�2)

Optimal relation between h and M for �(M, h): M ⇠ h�4


Random number generation

Genuine random numbers in the strictest sense can onlybe produced by physical and real-world processes(example: radioactive decay – need for a measurement device!)

Random numbers from software my look genuinely, but thecome from a deterministic algorithm and therefore arecalled pseudorandom numbers.

Drawback:they are not genuinely random

Benefits:can be reproducedcan accomodate the type of distribution and properties

Quasirandom numbers are deterministically constructed tomaximize some goodness-of-fit measure. They typicallyhave better theoretical properties for a specific aim thanpseudorandom numbers (are much better spaced, etc.)


Pseudorandom numbers

The most effort is on methods for uniform randomvariables (RV) U(0, 1), since other RV can be obtainedfrom them via a suitable transformation.(Linear) sequential congruential generator:

Xn+1 = (aXn + b) mod m

where a, b,m are integers and the seed X0 is given by the userExample: if a = b = 1, then Xn 2 {0, 1, . . . ,m � 1}.

Generate rational numbers 0 fn < 1 by rescaling fn =Xn

mThe sequence fn is periodic (i.e. it will repeat itself).Parameters a and b should be carefully chosen.Example: IBM randu routine uses a = 65539, b = 0, m = 231.


Random variables

Discrete random variables – simple example:

Task: Simulate X such as P(X = 5) = 0.7 and P(X = 6) = 0.3.

Solution:Simulate Y ⇠ U(0, 1)Set X = 5 if Y < 0.7, otherwise X = 6.

Continuous random variables:If a variable X has the cumulative density (probabilitydistribution function, PDF) F then F (X ) follows a uniformdistribution. Thus, X can be simulated via:

Simulate Y that follows a U(0, 1) distributionCompute X = G(Y ) where G = F�1 is the inverse of F

Note: often there is no closed-form expression for F; this is computed using numerical

approximations


Box-Muller and Marsaglia polar methodBoth methods generate two independent standard normal variablesy1 and y2 ⇠ N (0, 1) from two uniform variables: x1, x2 ⇠ U [0, 1].Box-Muller method

y1 = cos(2⇡x2)p�2 log x1,

y2 = sin(2⇡x2)p�2 log x1.

Marsaglia polar method

Define u = 2x1 � 1, v = 2x2 � 1, s = u2 + v2.If s < 1 (true for ⇡

4 = 78.5 % of samples!), set

y1 = ur�2 log s

s, y2 = v

r�2 log s

s

If s � 1, disregard the pair (x1, x2) and resample.

Notice that y1 and y2 will be independent standard normal variables(use only y1 if only one RV is needed).


Lecture 4

Variance reduction for Monte Carlo methodsand

Finite Difference Methods for option pricing

Variance reduction for MC methodsAntithetic variablesControl variate technique

Finite Difference (FD) Methods for option pricingBlack-Scholes equation and its discretisationImplicit FD methodExplicit FD methodCrank-Nicolson method


Recall: Pricing derivatives with the Monte Carlo method

1. Simulate a random path of the underlying stock

St = St(Z ) for 0 t T

2. Evaluate the stock price at maturity ST (Z )

3. Calculate the option price at maturity using the payoff function

CT (Z ) = payoff(ST (Z ),K )

Repeat steps 1–3 M times (M ⇠ thousands or more) getting

CT ,j = CT (Zj), j = 1, . . . ,M

4. Calculate the averaged option price at maturity: CT =1M

MX

j=1

CT ,j

5. Discount the averaged option price to t = 0: C0 = e�rT CT


Variance reduction techniques

Denote: Ctrue = E[C] the fair option price and C =1M

MX

j=1

Cj .

(where we have skipped the time index in C0 and wrote just C to simplify the notations)

Then: E[(Ctrue � C)2] = Bias(C)2 + Var(C).

Estimators with smaller variance are more accurate.

Variance reduction techniques:Antithetic variates consider in this lectureControl variate technique consider in this lectureImportance samplingStratified samplingMoment matchingQuasi-random sequences


Variance reduction techniques: Antithetic variates

IdeaIn addition to M independent sample paths Z1,Z2, . . . ,ZM with incre-ments ⇠ N (0,�t). For every Zj generate a new path Zj satisfying

Covar(C(Zj),C(Zj)) < 0

and define Canti :=1M

MX

j=1

C⇤j , C⇤

j :=C(Zj) + C(Zj)

2.

Claim: Canti has the same mean as C but a smaller variance.Proof: Zi and Zj are samples from the same distribution, thus

E[C(Zj )] = E[C(Zj )], (write Cj = C(Zj ), Cj = C(Zj ) for short)

Moreover: Var(C⇤j ) =

14

⇣Var(Cj ) + 2Covar(Cj , Cj ) + Var(Cj )

⌘< 1

2 Var(Cj )

C⇤j are iid ) Var(Canti ) =

1M

Var(C⇤j ) <

12M

Var(Cj ) =12

Var(C).

Typical choice for Gaussian random variables: Zj = �Zj .Alexey Chernov Numerical Methods for Financial Engineering 83 / 182

Variance reduction: Control variate techniqueThe aim is to price an option A by Monte Carlo simulations(assuming there is no closed-form pricing formula)Consider a similar derivative B for which a closed-form pricingformula is available (B is called a control variate)Simulate the prices for both options (using the same set ofsimulated random paths Z1, . . . ,ZM ) ! Csim

A and CsimB

Compute the price of option B using the analytic result! CanaB

Compute the pricing error given by simulations for option B:

CanaB � Csim

B

Correct the simulated price for option A with this pricing error;the correct price will be

C⇤A := Csim

A + �(CanaB � Csim

B ), � > 0.


Variance reduction: Control variate technique

Compute the mean value and variance of C⇤A:

E[C⇤A] = CA,true when E[Cana

B ] = E[CsimB ]

Var(C⇤A) = Var(Csim

A � �CsimB )

= Var(CsimA )� 2�Covar(Csim

A ,CsimB ) + �2Var(Csim

B )

The RHS is minimized at �⇤ :=Covar(Csim

A ,CsimB )

Var(CsimB )

.

For � = �⇤ we have:

Var(C⇤A) = (1�⇢2)Var(Csim

A ), where ⇢ =Covar(Csim

A ,CsimB )

Var(CsimA )

12 Var(Csim

B )12.

Evidently, the larger is 0 ⇢ 1 the smaller is Var(C⇤A).

Antithetic variables and control variates can be used jointly.Alexey Chernov Numerical Methods for Financial Engineering 85 / 182

By now we have considered two numerical methods foroption pricing:

Binomial and trinomial treesMonte Carlo Method

Both methods include the following stepsi) Simulate (an approximate) stock price evolutionii) Calculate the value of the option at maturityiii) Calculate the value of the option at current time

A model for the stock price evolution (Geom. Brownian motion):

dSt = µStdt + �StdWt (a stochastic model!)

Black and Scholes came up with an alternative view on optionpricing. In their approach the steps i) and iii) are “merged” in such away that the randomness is completely eliminated and the optionprice C = C(S, t) becomes a function of the stock price S and Sbecomes an independent variable ) Black-Scholes equation.The Black-Scholes equation can be solved approximately bythe Finite Difference method.


The Black-Scholes equation (1)

Itô lemmaSuppose S follows the Itô process dS = µSdt + �SdW . ThenC = C(S, t) follows

dC =@C@S

dS +@C@t

dt +�2S2

2@2C@S2 dt .

Crusial: both dC and dS are driven by the same process dW .Therefore there exists a riskless portfolio of

one short option (sold option) with price Clong (bought) � units of the underlying stock with price S

The value of the portfolio is P = � · S � C, the rate of return equals

dP = d(� · S � C) + q� · Sdt

=

✓�� @C

@S

◆dS �

✓@C@t

+�2S2

2@2C@S2 � q� · S

◆dt .

When � = @C@S the value of the portfolio is purely deterministic.



When the value P is invested at the risk-free rate at the bondmarket, the rate of return equals

dP = rPdt = r✓@C@S

S � C◆

dt

No-arbitrage assumption: rates of return for all risklessinstruments are equal (american option “�”).

�✓@C@t

+�2S2

2@2C@S2 � qS

@C@S

◆= rS

@C@S� rC

Black-Scholes eq:@C@t

+�2S2

2@2C@S2 + (r � q)S

@C@S

= rC,

Terminal condition: C(S,T ) = payoff(S,K ), for 0 S 1,

Boundary conditions: C(0, t) = F1(t), C(1, t) = F2(t), 0 t T .



Black-Scholes equation in log-price (call option)

Define: x = log Sf (x , t) = C(S, t)

@f@t

+

✓r � q � �2

2

◆@f@x

+�2

2@2f@x2 = rf ,

Terminal condition: f (x ,T ) = max(ex �K , 0), for �1 x 1.

In practice the domain of x is truncated from (�1,1) to (a, b):

Boundary conditions: C(a, t) = 0, C(b, t) = eb � e�r(T�t)K ,0 t T .

The Finite Difference method solves the resulting parabolicPDE numerically by approximating all partial derivatives

@f@t

,@f@x

and@2f@x2

by finite differences on a grid.Alexey Chernov Numerical Methods for Financial Engineering 89 / 182

Consider the uniform partitions:

0 = t0 < t1 < · · · < tm = T , ti = i⌧, ⌧ =Tm,

a = x0 < x1 < · · · < xn = b, xj = jh, h =b � a

n.

f0,0 f1,0 f2,0 f3,0 f4,0 f5,0

f0,1 f1,1 f2,1 f3,1 f4,1 f5,1

f0,2 f1,2 f2,2 f3,2 f4,2 f5,2

. . .

. . .

. . .

. . .

. . .

. . . . . . . . . . . . . . . . . . . . . . . .

fm,0

fm,1

fm,2

f0,n f1,n f2,n f3,n f4,n f5,n fm,n

t2⌧ 3⌧ 4⌧ 5⌧0 ⌧ . . . m⌧ = T

x

a + 2h

a

a + h

. . .

a + nh = b

Then fi,j ⇡ f (ti , xj),@f@t

(ti , xj) ⇡fi+1,j � fi,j

⌧(forward difference)

Discretization of the x-derivatives determines the method:Explicit / Implicit / Crank-Nicolson method.


Implicit method: approximation of partial derivatives

fi,j ⇡ f (ti , xj),@f@t



@f@x

(ti , xj) ⇡fi,j+1 � fi,j�1

2h(central difference)

@2f@x2 (ti , xj) ⇡

fi,j+1 � 2fi,j + fi,j�1

h2 (central difference)


Implicit method: time stepping

The equation@f@t

+

✓r � q � �2

2

◆@f@x

+�2

2@2f@x2 = rf , implies

fi+1,j � fi,j⌧

+

✓r � q � �2

2

◆fi,j+1 � fi,j�1

2h+�2

2fi,j+1 � 2fi,j + fi,j�1

h2 = rfi,j .

This simplifies to Afi,j�1 + Bfi,j + Cfi,j+1 = fi+1,j

where A = �⌧�2

2h2 +⌧

2h

✓r � q � �2

2

◆,

B =⌧�2

h2 + 1 + r⌧,

C = �⌧�2

2h2 �⌧

2h

✓r � q � �2

2

◆.

The option price at time T, at the lower and at the upperboundaries are known.The option price will be computed backwards on the grid,by solving the simultaneous equations at each step.


Implicit method: matrix form

Matrix form2

666664

B C 0 . . . 0A B C . . . 00 A B . . . 0... . . . ...0 0 0 . . . 0

3

777775

2

666664

fi,1fi,2...

fi,n�2fi,n�1

3

777775=

2

666664

fi+1,1 � Afi,0fi+1,2

...fi+1,n�2

fi+1,n�1 � Cfi,n

3

777775i = n � 1, . . . , 0.

Here fi,0 and fi,n are the value of the derivative at the lowerand upper boundaries (known from boundary conditions)Such a matrix is called a tri-diagonal matrix and can besolved in O(n) flops by the LU-factorization (see Lection 2)If step sizes ⌧ and h or the parameters r , q,� are variable,matrix entries A,B,C will vary as well


Explicit method: approximation of partial derivatives




@f@x

(ti , xj) ⇡fi+1,j+1 � fi+1,j�1

2h(central difference)

@2f@x2 (ti , xj) ⇡

fi+1,j+1 � 2fi+1,j + fi+1,j�1

h2 (central difference)


Explicit method: time stepping

The equation@f@t

+⇣

r � q � �2

2

⌘ @f@x

+�2


fi+1,j � fi,j⌧

+

r � q �

�2

2

!fi+1,j+1 � fi+1,j�1

2h+�2

2fi+1,j+1 � 2fi+1,j + fi+1,j�1

h2 = rfi,j

This simplifies to fi,j = Dfi+1,j�1 + Efi+1,j + Ffi+1,j+1

where D =1

1 + r⌧

✓⌧�2

2h2 �⌧

2h(r � q � �2

2)

◆,

E =1

1 + r⌧

✓1� ⌧�2

h2

◆,

F =1

1 + r⌧

✓⌧�2

2h2 +⌧

2h(r � q � �2

2)

◆.

The values fi+1,0 and fi+1,n are known from boundary conditions

No matrix inversion, however stability requirement: ⌧ (h/�)2.

Typical choice: h = �p

3⌧ [Hull, Options, futures and other derivatives,’06]


Explicit method: trinomial tree interpretation

The explicit Finite Difference method is equivalent to thetrinomial tree approach with

pd = (1 + r⌧)D, pm = (1 + r⌧)E , pu = (1 + r⌧)F .


Crank-Nicolson scheme

The approximations for the partial derivatives take the averagesof the values in the explicit and implicit schemes:


(ti , xj) ⇡ fi+1,j � fi,j⌧

, (forward difference)

@f@x

(ti , xj) ⇡ 12

fi,j+1 � fi,j�1

2h+

fi+1,j+1 � fi+1,j�1

2h

�,

@2f@x2 (ti , xj) ⇡ 1

2

fi+1,j+1 � 2fi+1,j + fi+1,j�1

h2 +fi+1,j+1 � 2fi+1,j + fi+1,j�1

h2

�.


Crank-Nicolson scheme: time stepping

The equation@f@t

+⇣

r � q � �2

2

⌘ @f@x

+�2


Afi,j�1 + Bfi,j + Cfi,j+1 = Dfi+1,j�1 + E fi+1,j + F fi+1,j+1

where

A =A2, B =

B + 12

, C =C2,

D =D2, E =

E + 12

, F =F2.

The ✓-scheme is a generalization of the above methods.For approximation of x-derivatives it uses ✓-weighting, e.g.

@f@x

(ti , xj) ⇡ ✓fi,j+1 � fi,j�1

2h+ (1� ✓) fi+1,j+1 � fi+1,j�1

2h(0 ✓ 1)

Then ✓ = 0 is explicit, ✓ = 1 is implicit and ✓ = 12 is Crank-Nicolson.


Remarks

Crank-Nicholson:It can be shown that no condition on the size of ⌧ and h isrequired for stabilityHowever, ⌧ ⇠ h2 is still required for numerical accuracy(the samle applies to the implicit method)

General remarks in the Finite Difference method:Pricing of Americal options is possible. One solves

@C@t

+�2S2

2@2C@S2 + (r � q)S

@C@S rC.

A discretization of this formulation leads to a system ofinequalities (more difficult to solve)Pricing of path-dependent options with the FiniteDifference method is more difficult.


Lecture 5

Systems of linear equations

IntroductionDirect methods

Triangular systemsLU factorizationCholesky factorizationQR decompositionSingular Value DecompositionLU factorization for tridiagonal systems

Iterative methodsJacobiGauss-SeidelSOR


Introduction

For given values aij , bi , i , j = 1, . . . , n find the values xi such that

8>>>><

>>>>:

a11x1 + a12x2 + · · ·+ a1nxn = b1

a21x1 + a22x2 + · · ·+ a2nxn = b2

......

......

an1x1 + an2x2 + · · ·+ annxn = bn

2

6664

a11a12 . . . a1na11a12 . . . a1n...

.... . .

...a11a12 . . . a1n

3

7775

| {z }=A

2

6664

x1x2...

xn

3

7775

| {z }=x

=

2

6664

b1b2...

bn

3

7775

| {z }=b

Systems of linear equations (as above) arise veryfrequently in numerical methods and algorithms.The choice of an efficient solution method is veryimportant.Typically, the more information about aij is available, themore efficient solution algorithm can be proposed.However, no best algorithm for all cases is available.


Types of matrices

Dense: almost all elements are nonzero2

6664

3 5 2.1 4.71 21.2 7 4.5 2 41 2 4 3 00 1.3 1 1 92 1.1 11 2 5

3

7775

Sparse: only a fraction of all elements are nonzero2

6664

3 0 0 4.71 00 7 0 0 40 2 4 0 00 0 1 1 00 1.1 0 0 5

3

7775

Banded: nonzero elements are near the diagonal2

6664

3 5 0 0 01.2 7 4.5 0 00 2 4 3 00 0 1 1 90 0 0 2 5

3

7775


Types of matrices

Triangular: all elements above/below the diagonal are zeroLower triangular

2

6664

3 0 0 0 01.2 7 0 0 01 2 4 0 00 1.3 1 1 02 1.1 11 2 5

3

7775

Upper triangular2

6664

3 5 2.1 4.71 20 7 4.5 2 40 0 4 3 00 0 0 1 90 0 0 0 5

3

7775


Numerical Methods

Ax = b

Direct methods: algorithms, computing the exact solution x(if we ignore round-off errors)Iterative methods: generate a (typically infinite) sequenceof approximations x (0), x (1), x (2), . . . , x (k), · · ·! x .

Assume from now on that the matrix A is invertible, i.e. thereexists A�1, such that

x = A�1b

(recall: a necessary and sufficient condition for this is det A 6= 0).


Direct methods: Triangular systems (1)

Example: consider

a11 0a21 a22

� x1x2

�=

b1b2

�,

then (if all aii 6= 0 !)

x1 = b1/a11,

x2 = (b2 � a21x1)/a22.

This can be easily generalized to n ⇥ n matrices!



Lower triangular systems: Lx = b

Forward substitution

xi =

✓bi �

i�1X

j=1

ìj xj

◆/ìi

i = 1 . . . n.

2

6666666664

`11

`21`22 0...

ì1ì2 . . . ìi...

`n1`n2 . . . `n,n�1`nn

3

7777777775

2

6666666664

x1x2...xi...xn

3

7777777775

=

2

6666666664

b1b2...bi...bn

3

7777777775



Upper triangular systems: Ux = b

Back-substitution

xi =

✓bi �

nX

j=i+1

uijxj

◆/uii

i = n, n � 1, . . . , 1.

2

6666666664

u11u12 . . . u1,n�1u1nu12 . . . u1,n�1u1n

...uii . . . uin

0...

unn

3

7777777775

2

6666666664

x1x2...xi...xn

3

7777777775

=

2

6666666664

b1b2...bi...bn

3

7777777775


Recall: definition of matrix product

For two matrices

B =

2

664

b11b12 . . . b1mb21b22 . . . b2m

. . .bn1bn2 . . . bnm

3

775 , C =

2

6666664

c11c12 . . . c1pc21c22 . . . c2p

. . .

. . .

. . .cm1cm2 . . . cmp

3

7777775

their product A = BC is an n ⇥ p matrix

A =

2

664

a11a12 . . . a1pa21a22 . . . a2p

. . .an1an2 . . . anp

3

775 with entries aij =mX

k=1

bikckj .


Direct methods: LU factorization

Idea: Attempt to represent A as a product of two matrices

A = LU

where L is lower triangular (`ii = 1) and U is upper triangular.

Then Ax = b can be solved in two steps:1. Find y , a solution of Ly = b,2. Find x , a solution of Ux = y .

Matrices L and U can be computed out of A simultaneously:element by element.

Tool: Summation/subtraction of matrix rows (!Gauss elimination)(equivalent to elimination of unknowns xi from linear eqs).

# floating point operations ⇠ 2n3/3, i.e. of order O(n3).

LU factorization is the method of choice if A is dense and has noparticular structure.


Direct methods: Cholesky factorization

Cholesky factorization can be applied ifA is symmetric, i.e.

A = A> or equivalently aij = aji .

A is positive definite, i.e.

for any x = [x1, x2, . . . , xn]> 6= 0, it holds x>Ax > 0.

Idea: Represent A = GG> where G is a lower triangle matrix.2

664

a11a12 . . . a1na21a22 . . . a2n

. . .an1an2 . . . ann

3

775 =

2

664

g11g21g22 0. . .gn1gn2 . . . gnn

3

775

2

664

g11g21 . . . gn1g22 . . . gn2

. . .0 gnn

3

775

more details!Alexey Chernov Numerical Methods for Financial Engineering 110 / 182

Direct methods: Cholesky factorization (algorithm)

2

666664

a11a12 . . . a1na21a22 . . . a2n

. . .ai1ai2 . . . aij . . . ain

. . .an1an2 . . . ann

3

777775=

2

6666664

g11

g21g22 0. . .

gi1gi2 . . . gii. . .

gn1gn2 . . . gnn

3

7777775

2

6666664

g11g21 . . . gj1 . . . gn1g22 . . . gj2 . . . gn2

. . .gjj . . . gnj

. . .

0 gnn

3

7777775

Suppose that the first j � 1 columns of G are known (all gik , k < j);

Claim: the j-th column of G is computable as follows:

Notice: aij =jX

k=1

gik gjk ) gijgjj = aij �j�1X

k=1

gik gjk

| {z }=vi , computable

Observe that g2jj = vj (set i = j to see this)

This implies gij = vi/pvj .


Direct methods: Cholesky factorization (algorithm)

for j = 1...n do ’ loop over all columns

for i = j...n do ’ loop over unknown

’ elements in the jth column

v(i) = a(i,j) ’ initialize v

for k=1...j-1 do ’ compute v

v(i) = v(i) - g(i,k)

*

g(j,k)

end for

g(i,j) = v(i)/

pv(j) ’ comp.the jth column of g

end for

More efficient realizations are possible

# flops ⇠ n3/3, roughly ⇥2 as efficient as LU, but still O(n3).


Direct methods: QR decomposition

Idea: Decompose A = QR, whereR is upper triangular

Q is orthogonal, i.e. Q>Q = I, I =

2

664

1 0 . . . 00 1 . . . 0

. . .0 0 . . . 1

3

775

Then Ax = b can be solved in two steps:

1. Compute f = Q>b,2. Find x , a solution of Rx = f (by back-substitution).

QR decomp. is defined for rectangular matrices, m � n2

666664

a11a12 . . . a1na21a22 . . . a2n

. . .an1an2 . . . ann

. . .am1am2 . . . amn

3

777775=

2

666664

q11q12 . . . q1n . . . q1mq21q22 . . . q1n . . . q1m

. . .qn1qn2 . . . qnn . . . qnm

. . .qm1qm2 . . . qmn . . . qmm

3

777775

2

666664

r11r12 . . . r1nr22 . . . r2n

. . .

0 rnn

3

777775

# flops ⇠ 4m2n


Direct methods: Singular Value Decomposition (SVD)

Suppose A is an (m ⇥ n) matrix

SVD represents A = U⌃V> where

U is an m ⇥m orthogonal matrix (i.e. U>U = Im⇥m)V is an n ⇥ n orthogonal matrix (i.e. V>V = In⇥n)⌃ is an m ⇥ n diagonal matrix

2

666664

a11a12 . . . a1na21a22 . . . a2n

. . .an1an2 . . . ann

. . .am1am2 . . . amn

3

777775=

2

666664

u11u12 . . . u1n . . . u1mu21u22 . . . u1n . . . u1m

. . .un1un2 . . . unn . . . unm

. . .um1um2 . . . umn . . . umm

3

777775

2

666664

�1

�2 0. . .

0 �p. . .

0 0 0

3

777775

2

664

v11v12 . . . v1nv21v22 . . . v2n

. . .vn1vn2 . . . vnn

3

775

�1 � �2 � · · · � �p, p = min(m, n), are called singular values


Direct methods: for tridiagonal systems

A matrix A = {aij} is called tridiagonal if aij = 0 if |i � j | > 1.Storage requirements: 3n � 2 floating point numbersLU decomposition can be realized very efficiently

2

6666664

d1q1

p1d2q2

p2d3q3

. . . qn�1

pn�1dn

3

7777775

| {z }=A

=

2

6666664

1`11`21

. . .

`n�11

3

7777775

| {z }=L

2

6666664

u1r1

u2r2

u3r3

. . . rn�1

un

3

7777775

| {z }=U

it holds that qi = ri ui and ì are computed as follows

u1 = d1for i = 2 . . . n do

ì�1 = pi�1/ui�1ui = di � ì�1qi�1

end for

# flops = 4n � 4Forward and back-substitutioncan be realized in 7n � 5 flopsComplete solution in O(n) flops“a linear complexity solver”


Iterative methods

Generate a (typically infinite) sequence of approximationsx (0), x (1), x (2), . . . , x (k), · · ·! xTool: matrix-vector multiplicationBeneficial for sparse matrices AEfficiency depends on the speed of convergence

Example:

Consider⇢

a11x1 + a12x2 = b1

a21x1 + a22x2 = b2then

⇢x1 = (b1 � a12x2)/a11

x2 = (b2 � a21x1)/a22

Jacobi iteration:(

x (k+1)1 = (b1 � a12x (k)

2 )/a11

x (k+1)2 = (b2 � a21x (k)

1 )/a22

Gauss-Seidel iteration:(

x (k+1)1 = (b1 � a12x (k)

2 )/a11

x (k+1)2 = (b2 � a21x (k+1)

1 )/a22


Iterative methods: Jacobi, Gauss-Seidel, SORIn general: consider an additive decomposition A = L+D+U

2

6664

a11a12a13 . . . a1na21a22a23 . . . a2na31a32a33 . . . a3n

. . .an1an2an3 . . . ann

3

7775

| {z }=A

=

2

66664

0a21 0 0a31a32 0. . .an1an2an3 . . . 0

3

77775

| {z }=L

+

2

66664

a11 00 a22 0 0

0 a33

0 . . .ann

3

77775

| {z }=D

+

2

66664

0 a12a13 . . . a1n0 a23 . . . a2n

0 . . . a3n

0 . . .0

3

77775

| {z }=U

Jacobi iteration: (D is diagonal!)

Dx (k+1) = b � (L + U)x (k)

Gauss-Seidel iteration: (solve by forward substitution)

(D + L)x (k+1) = b � Ux (k)

Successive over-relaxation (SOR): (solve by forward substitution)

(D + !L)x (k+1) = !b � �!U + (! � 1)D�x (k)

Notice: starting value x (0) has to be chosen in advance.Alexey Chernov Numerical Methods for Financial Engineering 117 / 182

Iterative methods: Jacobi, Gauss-Seidel, SOR

All three methods can be written as

Mx (k+1) = Nx (k) + b

Jac:

(M = D,

N = �L � U,

), G-S:

(M = D + L,N = �U

), SOR:

8>><

>>:

M =1!

D + L,

N =� 1!

� 1�D � U

9>>=

>>;.

Notice that A = M � N, yielding

Ax = b , Mx = Nx + b.

Subtracting we get

M(x � x (k+1)) = N(x � x (k)), or

(x � x (k+1)) = M�1N(x � x (k))

= . . .

= (M�1N)k+1(x � x (0)).

Convergence x (k) ! x , (M�1N)k ! 0 , ⇢(M�1N) < 1.


Iterative methods: sufficient conditions for convergence

⇢(M�1N) is the spectral radius of the matrix M�1N

⇢(M�1N) < 1 is guaranteed if . . .

Jacobi: A is strictly diagonally dominant, i.e.

|aii | >nX

j = 1j 6= i

|aij |, i = . . . n.

Gauss-Seidel: A is symmetric and positive definite.SOR: A is symmetric and positive definite, 0 < ! < 2.

Remarks:

SOR with ! = 1 is identical with Gauss-SeidelSOR with ! 6= 1 can converge better than Gauss-Seidel


Iterative methods: general structure of algorithms

Iterative methods typically converge after k =1 iterations

Stopping criterion: stop when changes in the solution vectorbecome small, e.g. for sufficiently small " > 0

|x (k+1)i � x (k)

i |x (k)

i + 1< ", i = 1, 2, . . . n.

Usually, it is helpful to prescribe the maximal number ofiterations, e.g. maxit = 1000

initialize x (0), x (1), ", maxit, it=1

while

✓|x (k+1)

i �x (k)i |

x (k)i +1

� "◆do

x (0) = x (1)’ store prev.iteration in x (0)

compute x (1) with Jacobi, G-S or SORif (it > maxit) stop end if

end while


Lecture 6

Numerical methods fornonlinear equations

Find x satisfying f(x) = 0

Scalar nonlinear equations (1D)BracketingBisectionFixed point iterationNewton’s method

f (x) = 0.

Systems of nonlinear equations (nD)Fixed point methods (Jacobi, G-S, SOR)Newton’s methodQuasi-Newton methodsOther variations of Newton’s method

8><

>:

f1(x1, . . ., xn) = 0,. . .

fn(x1, . . ., xn) = 0.


Scalar nonlinear equations: Introduction

ProblemFind x satisfying f (x) = 0.

Such x is called a root or a zero of f .

Some equations can be solved analytically:

1.45e�0.2x+0.1 = 3 , e�0.2x � 2| {z }=f (x)

= 0 , x = �5 ln 2.

Rough estimation can be obtained just by plotting:

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 2 4 6 8 10 12

f(x)=0.1x-sin(x)/ln(x+2)

Four roots on [0, 10]:

x1 = 0,x2 ⇡ 3,x3 ⇡ 7,x4 ⇡ 8.5.


Scalar nonlinear equations: Bracketing

BracketingAim: construct subintervals of [a, b] likely to contain zeros.

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 2 4 6 8 10 12

f(x)=0.1x-sin(x)/ln(x+2)

Grid

Idea:Check the sign of f (x)in the grid points.

Algorithm (Bracketing)Initialize a, b, N; ’N = number of subintervals

h =|b � a|

N; c = a; ’h = step size

for i=1...N do

d = c + h;if(signf (c) 6= signf (d)) then save [c, d ] end if

c = d ;end for

+ Simple, allows to localize zeros� Very expensive as a global search

Local search!


Scalar nonlinear equations: Bisection

Bisection

Aim: find a zero of f (x) on [a, b]

Algorithm (Bisection)Initialize a, b, "; ’" = tolerance

if(signf (a) 6= signf (b)) thenwhile (h > ") do

h = |b � a|;c = a + h/2;if(signf (a) 6= signf (c)) then

b = c; ’zero is in [a, c]else

a = c; ’zero is in [c, b]end if

end while

end if

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5

f(x)

f(a), f(c ), f(b)

a, b, c


Scalar nonlinear equations: Fixed point iteration

Idea: find a fixed point instead of a zeroFind x satisfying x = g(x).

Such x is called a fixed point of g.

Notice: f (x) = 0 , x = g(x) for g(x) = x + f (x).

Fixed point iterationGiven x0, compute xk+1 = g(xk ), k = 0, 1, 2, . . .

Algorithm (Fixed point iteration)Initialize x0, y , " ’x0 is a starting value

do

y = x0;x0 = g(y);

while

⇣|x0�y|1+|y|

⌘> ";


Scalar nonlinear equations: Fixed point iteration (2)

What can happen?

ConvergenceDivergenceConvergence not to the nearest fixed pointOscillation

Example f (x) = x � x1.4 + 0.2

-1

0

1

2

3

4

5

0 0.5 1 1.5 2 2.5 3 3.5

x^1.4-0.2

x

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

(x+0.2)^(5/7)

x

-1

0

1

2

3

4

5

0 0.5 1 1.5 2 2.5 3 3.5

x^1.4-0.2

x

g1(x) = x1.4 � 0.2

-1

0

1

2

3

4

5

0 0.5 1 1.5 2 2.5 3 3.5

x^1.4-0.2

x

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

(x+0.2)^(5/7)

x

g2(x) = (x + 0.2)5/7


Scalar nonlinear equations: Fixed point iteration (3)

Theorem (a sufficient condition)Suppose

the interval [a, b] contains a fixed point x⇤ of g(x),|g0(x)| < 1 for any x 2 [a, b],d

then the fixed point iteration xk+1 = g(xk ) converges to x⇤ forany starting value x0 2 [a, b].


Scalar nonlinear equations: Newton’s method

Newton’s method:

Aim: find a zero of f (x).

Taylor expansion:

f (x + h) = f (x) + f 0(x)h + R

Idea: Given x , find h such that f (x + h) = 0.

Neglecting the remainder h ⇡ � f (x)f 0(x)

.

Newton iteration

Given x0, compute xk+1 = xk � f (xk )

f 0(xk ), k = 0, 1, 2, . . .


Scalar nonlinear equations: Newton’s method

Algorithm (Newton’s method)Initialize x0, "do

y = x0;fval = f (y);dfval = f 0(y);x0 = y � fval/dfval;

while

⇣|x0�y|1+|y|

⌘> "

Theorem (quadratic convergence)

Suppose f 0(x) 6= 0

f 00(x) is finite

x0 is “sufficiently close” to the root x⇤Then

|x⇤ � xk+1| C|x⇤ � xk |2.


Systems of nonlinear equations

F (x) = 0 ,

8><

>:

f1(x1, . . ., xn) = 0,. . .

fn(x1, . . ., xn) = 0.

where at least one of f1, . . . , fn is nonlinear.

For such F we define the Jacobi matrix

rF (x) =

2

64

@f1@x1

. . . @f1@xn

... . . . ...@fn@x1

. . . @fn@xn

3

75 .


Systems of nonlinear equations: Fixed point methods

Rewrite F (x) = 0 as x = G(x), sich that8>>>>>><

>>>>>>:

x1 = g1(x2, . . . , xn)

. . .

xi = gi(x1 . . . xi�1, xi+1 . . . xn)

. . .

xn = g1(x1, . . . , xn�1)

Jacobi iteration:

x (k+1)i = gi(x

(k)1 . . . x (k)

i�1, x(k)i+1 . . . x

(k)n ), i = 0, 1, 2, . . . , n

Gauss-Seidel iteration:

x (k+1)i = gi(x

(k+1)1 . . . x (k+1)

i�1 , x (k)i+1 . . . x

(k)n ), i = 0, 1, 2, . . . , n

Stopping criterion: maxi=1...n

|x (k+1)

i � x (k)i |

1 + |x (k)i |

!< ".


Systems of nonlinear equations: Newton’s method

The idea is similar to the scalar case.

F (x + h) = F (x) +rF (x)h + RFind h 2 Rn such that F (x + h) = 0.

h ⇡ �rF (x)�1F (x)Newton iteration for systems

Given x0, compute x (k+1) = x (k) �rF (x (k))�1F (x (k))

Algorithm (Newton’s method for systems)

Initialize x (0), "do

y = x (0); b = �F (y); A = rF (y);find h satisfying Ah = b;x0 = y + h;

while maxi=1...n

⇣|hi |

1+|yi |

⌘> "


Systems of nonlinear equations: Newton’s method (2)

Under certain assumptions on F (x) Newton’s methodconverges quadratically

kx⇤ � x (k+1)k Ckx⇤ � x (k)k2

Comments:Newton’s method involves

evaluation of n2 partial derivativessolution of a linear system as a building block : O(n3)

It A = rF if sparse and not difficult to evaluate, Newton’smethod gets very efficient due to quadratic convergence.

Quasi-Newton: Instead of evaluation of the Jacobi matrixcompute inexpensive approximations to it and keep updating.


Systems of nonlinear equations: Broyden’s method

Broyden’s method

Given x0, compute x (k+1) = x (k) � (B(k))�1F (x (k))

where B(k+1) is the “minimal” variation of B(k):

B(k+1) = B(k) +(F (x (k+1))� F (x (k))� B(k)h)h>

h>h

and h is the solution of B(k)h = �F (x (k)).

Properties:+ Fast rank 1 update in only O(n2) operations+ Direct update of (B(k))�1 is possible at the same cost� Quadratic convergence is typically destroyed) Faster iterations, but usually more iterations are needed.


Other variations of Newton’s method

Potential problem: if the starting point is too far from thesolution, Newton’s method do not converge.

Reason: direction and the length of the step are unreliable.

Alternatives: Damped Newton method

x (k+1) = x (k) + ↵(k)h(k)

where h(k) is the regular Newton step, and ↵(k) is a number.Typically 0 < ↵(k) < 1 far from solution and ↵(k) = 1 close tosolution (↵(k) may be linked to kF (x (k))k).Trust region: Estimate the radius of the region where

Newton’s step is constrained to stay.Solution by minimization: Minimize kF (x)k2 instead,

minx

12

F (x)>F (x) (optimization problem).


Lecture 7

Numerical optimization

OverviewUnconstrained optimization in 1D

Golden section searchNewton’s method

Unconstrained optimization in multiple dimensions (nD)Gradient descent methodConjugate Gradient method (! linear systems of equations)Fletcher-Reeves methodNewton’s methodBroyden-Fletcher-Goldfarb-Shanno (BGFS)

Equality/inequality constrained nonlinear optimization! next Lecture

Linear Programming, Least Squares ! next Lecture


Optimization: Overview

An optimization problem

minimizex

f (x)

where

f is the objective function (typically real-valued)x are the decision variables (possibly constraint)

Notice: “minimize”$ “maximize” if f $ (�f )

Typical example: minimize average losses or variability ofreturns f with given (limited) resources x .


Optimization: Overview (2)

DefinitionOptimization problem is the task of finding x⇤ such that

f (x⇤) = minfi (x) 0, i = 1 . . . phj (x) = 0, j = 1 . . . m

f (x)

where

x = (x1, . . . , xn)> are the optimization/decision variables(possibly constraint)f is the objective function (real-valued)fi and hj are constraint functionx⇤ is the sought optimum (solution)


Optimization: Overview (3)Main categories:

Unconstrained optimisation problems: p = 0, m = 0;

Linear program: f , fi , hj are linear (i.e. f (↵x + �y) = ↵f (x) + �f (y))

or, more generally, affine (i.e. linear + constant);

Quadratic programming: f is quadratic, fi , hj are affine;

Convex optimisation problems:

f , fi are convex✓

i.e. f (↵x + �y) < ↵f (x) + �f (y),↵,� > 0, ↵+ � = 1.

◆and hj are affine.

Remarks on convex optimisation:A local optimum is also a global optimum!

If unconstrained, optimality condition is rf (x) = 0 .

Equality constraints typically can be eliminated (simplify before solving!)


Optimization: Overview (4)

DefinitionA solution method is an algorithm that computes a solution ofthe underlying problem (to some accuracy)

Methodology and typical difficulties:Transform the problem into an easier one (standard form)Starting valuesAlgorithm – does it find a solution?Is the solution a local or a global optimum?Rate of convergence?


Unconstrained optimization in 1d

Golden section search

Applies to a unimodal function f (x) in the interval x 2 [a, b],i.e. a function f so that there exists x⇤ 2 [a, b] implyingf is monotonously decreasing on [a, x⇤],f is monotonously increasing on [x⇤, b].

) f (x) > f (x⇤) if x 6= x⇤.

Assume a < x1 < x2 < b then

(i) f (x1) � f (x2) ) x⇤ 2 [x1, b],(ii) f (x1) < f (x2) ) x⇤ 2 [a, x2].

Therefore x⇤ 2 [a1, b1] where

[a1, b1] :=

⇢[x1, b], in case (i),[a, x2], in case (ii).

Notice that [a1, b1] is strictly smaller than [a, b] ! iterate!Alexey Chernov Numerical Methods for Financial Engineering 141 / 182

Unconstrained optimization in 1d: Golden section (2)

Question: How to choose x1, x2?

Consider intervals with lengths reduces by a constant ratio:

⌧ |b � a| = |b1 � a1|, ⌧ |b1 � a1| = |b2 � a2|, . . .After two stepts we may obtain the following configuration:

a bx1 x2

A

AC

Bhere[a1, b1] = [x1, b][a2, b2] = [x2, b]

To obtain ⌧ we observe B = A + C,AB

=CA=: ⌧

) ⌧2 = 1� ⌧ ) ⌧ =

p5� 12

⇡ 0.618.

Notice that by symmetry of the refinement scheme, other possibleconfigurations will yield the same value ⌧ .


Newton’s methodIdea: For a given x , approximate f by a quadratic local model(function).

f (x + h) ⇡ gx(h) = f (x) + f 0(x)h +12

f 00(x)h2

| {z }local model

Notice: gx achieves its minimum at h such that g0x(h) = 0

) f 0(x) + f 00(x)h = 0

, h = � f 0(x)f 00(x)

Newton iteration (optimization)

Given x0, compute xk+1 = xk � f 0(xk )

f 00(xk ), k = 0, 1, 2, . . .

Another interpretation: finding the roots of f 0(x) = 0.Alexey Chernov Numerical Methods for Financial Engineering 143 / 182

Unconstrained optimization in multiple dimensions

Unconstrained optimization

Find x⇤ 2 Rn, such that x⇤ = argminx

f (x), x = (x1, . . . , xn)>.

Iteration: x (k+1) = x (k) + �kh(k), k = 0, 1, 2, . . .Question: how to choose �k and h(k)?

Recall the definition of the gradient: rf (x) =⇣@f@x1

, . . . , @f@xn

⌘>.


Unconstrained optimization in multiple dimensions

Gradient descent (steepest descent) method:

Observation: f (x) decays at x (k) the fastest in the direction

h(k)g := �rf (x (k))

Thus, if �k is small enough

f (x (k+1)) f (x (k)) for x (k+1) = x (k) + �kh(k)g

Notice:�k can change every iteration�k is a real number, therefore finding optimal �k can berealised by unconstrained optimisation algorithms in 1D.

�k := argmin�

f (x (k) + �h(k))

Benefits/drawbacks:+ Simple� Convergence may be very slow (! zigzag path)


Unconstrained optimization: CG

Conjugate Gradient method (CG):

Basic ideas:Choose “sufficiently distinct” h(k) descent directions.The notion “sufficiently distinct” should depend on f .

Consider a special case: a quadratic unconstrainedminimization problem with the functional

f (x) =12

x>Ax � x>b, where

A is symmetric (A = A>), positive definite (x>Ax > 0 if x 6= 0)

Notice that rf (x) = 0 , Ax � b = 0 , Ax = b,(write rf (x) carefully in components to see this!)

The Conjugate Gradient Method is (also) an iterative methodfor solution of linear systems of equations.


Unconstrained optimization: CG (2)

Define the “residual” rk = b � Ax (k)

notice rk = �rf (x (k)) = h(k)g which is the gradient descent direction!

Instead of h(k)g we choose conjugate directions pk :

p>i Apj = 0 if i 6= j , and span{p0, . . . , pk} = span{r0, . . . , rk}.

Once pk is known, define ak by the line search (exact solution)

↵k = argmin↵

f (x (k) + ↵k pk )= � r>k pk

p>k Apk

.

Go to the next iteration

x (k+1) = x (k) + ↵k pk .

Convergence properties:CG converges in n steps in the exact arithmetic

kx (k)�x⇤kA 2✓p

(A)�1p(A)+1

◆kkx (0)�x⇤kA (preconditioning)


Unconstrained optimization: nonlinear CG

Fletcher-Reeves method: is the nonlinear CG for nonlinear f

Necessary modifications:

set rk = �rf (x (k)) instead of rk = b � Ax (k)

↵k should be determined by the numerical line search

↵k = argmin↵

f (x (k) + ↵kpk ).

Convergence results for Fletcher-Reeves method are weakerthan for the CG.


Unconstrained optimization: Newton’s method

Newton’s method

Interpretation via local quadratic approximation:

f (x + h) ⇡ gx(h) = f (x) + h>rf (x) +12

h>r2f (x)h

rgx(h) = 0 , rf (x)+r2f (x)h = 0 , h = �(r2f (x))�1rf (x)

Newton iteration (optimization)

Given x (0), compute x (k+1) = x (k) + ↵(k)h(k)N where

h(k)N = �(r2f (x (k)))�1rf (x (k)), k = 0, 1, . . .

Interpretation via systems of linear equations:

minx

f (x) , rf (x) = 0.


Unconstrained optimization:Broyden-Fletcher-Goldfarb-Shanno method (BGFS)

BFGS is a Quasi-Newton Method:

Use a simple-to-calculate matrix B(k) instead of r2f (x)

Algorithm (BFGS)

solve B(k)h(k) = �rf (x (k))s = ↵h(k) (compute ↵ with the line search)x (k+1) = x (k) + sy = rf (x (k+1))�rf (x (k))update B(k+1) = B(k) + U

with the update matrix U =yy>

y>s� (B(k)s)(B(k)s)>

s>B(k)s.

BFGS is similar to Broyden’s methods for nonlinearsystems of equations.


Lecture 8

Numerical optimization and calibration

Equality/inequality constrained nonlinear optimizationtwo examples and some theory

Direct search methods: Nelder-Mead methodLinear Programming: Simplex methodCalibration of problem parameters

calibration of the volatility in the Black-Scholes modelLeast Squares method

Choosing weightsLinear Least Squares (! Cholesky, QR-decomposition, SVD)Nonlinear Least Squares (Newton, Gauss-Newton,Levenberg-Marquardt)Tikhonov Regularization


Nonlinear constraint optimization

An example: Markowitz’ portfolio optimization problemGiven:

money to be invested in n securities (stocks, bonds, etc)the expected return µi of the security ithe variance of the return �i of the security ithe correlation ⇢ij of two securities i and j

Find the optimal portfolio x = (x1, . . . , xn) withthe smallest variance of the total returnand at least a target value R of the total expected return

Thismeans:

x1 + · · ·+ xn = 1 (xi is the proportion of the total amount)

xi � 0 (only investment id allowed)

E[x ] = µ1x1 + · · ·+ µnxn = µ>x � R

Var[x ] =X

ij

⇢ij�i�j| {z }=Qij

xixj = x>Qx ! min

Standard form: Find x⇤ = argmin

xx>Qx ,

subjected to constraints

8><

>:

e>x = 1,µ>x � R,

x � 0.here e> = (1, . . . , 1)


Nonlinear constraint optimization

Constraint optimization:

f (x⇤) = minfi (x) 0, i = 1 . . . phj (x) = 0, j = 1 . . . m

f (x)

Introduce new variables �i , µj , and the Lagrange functional

L(x ,�, µ) = f (x) +pX

i=1

�i fi(x) +mX

j=1

µj hj(x)

Theorem (necessary conditions)If x⇤ is a solution of the optimization problem, then there exist �i , µj :

@

@x`L(x ,�, µ) = 0 at x = x⇤ (system of nonlinear equations)

�i � 0, fi(x⇤) 0, �i fi(x⇤) = 0 (Karush-Kuhn-Tucker cond.)

hj(x⇤) = 0

Num. meth.: Active set meth., interior point; see also: augmented Lagrangian, penalty.


Constraint optimization: elimination

Notice: equality constraints can be frequently eliminated!

constrained problem! unconstrained problem

Example: Markowitz’ problem with two securities

x⇤ = argmin

x1 + x2 = 1f (x), f (x1, x2) = x2

1 + 14x2

2

Then

f (x⇤) = minx2

f (1� x2, x2)

= minx2

g(x2) g(x2) =54x2

2 � 2x2 + 1

Steps:Eliminate constraintsSolve the unconstrained optim. problem for g(x2): x⇤

2 = 45

Find x⇤1 = 1� x⇤

2 = 15


Direct search methods

Gradient-based methods are very efficient if f is nicely behaved.In practice, alternative methods might be better, e.g. if

Derivatives of f (x) are not available or not existEvaluation of f (x) and its derivatives is very expensiveValues f (x) are noisyNot the accurate optimum x⇤: f (x⇤) f (x) is sought, butonly an improvement of the curr. state x (1) : f (x (1)) f (x (0)).

We discuss a variant of the Nelder-Mead method (this methodis related to the simplex method for linear programming):

Ideas and ingredients:a. f (x) is evaluated at (n+ 1) vertices xi of an n-dim. simplex.b. Based on f (xi), the simplex evolves towards minx f (x).c. Basic rule: relocate the worst vertex in a better position.


Nelder-Mead algorithm1) Given the starting simplex (x (1), . . . , x (n+1)), evaluate f (i) = f (x (i)) and

reorder the vertices so that

f (1) f (2) · · · f (n+1). Aim: relocate x(n+1) !

2) Compute the mean of all vertices except the last: x =x (1) + · · ·+ x (n)

n3) Reflect x (n+1) through the mean: x (R) = (1 + ⇢)x � ⇢x (n+1),

Set f (R) = f (x (R)) and check the following cases:

a. if(f (1) f (R) < f (n)) replace x (n+1) by x (R); goto Step 1); “reflection”

b. if(f (R) < f (1)) replace x (n+1) by x (E) = (1 + ⇢)x (R) � ⇢x ; goto 1); “expansion”

c. if(f (n) f (R) f (n+1)) set x (C) = (1 + ⇢)x � ⇢x (n+1); “out-contraction”

d. if(f (R) > f (n+1)) set x (C) = (1� ⇢)x + ⇢x (n+1); “in-contraction”

e. if(f (C) < f (n+1)) replace x (n+1) by x (C); goto 1);

else “shrink the simplex”: x (i) = x (1) + (x (i) � x (1))/�, i = 2, . . . , n + 1, goto 1);Typical parameter values ⇢ = 1, = 0.5, � = 2.


Linear programming

Linear program in standard form

Find x⇤ = argmin

xc>x ,

subjected to constraints

(Ax = b,

x � 0.Remarks:

The feasible region Ax = b, x � 0 is apolygon in Rn

The optimum x⇤ is one of the vertices ofthis polygon (because c>x is linear!)

Idea of the Simplex method:Travel along edges of the polygonsuccessively minimizing the objectivefunction c>x .

Source:

http://en.wikipedia.org/wiki/File:Simplex-

method-3-dimensions.png

Simplex method is very efficient in the practice, however examples can be constructed

so that its convergence is extremely slow!Alexey Chernov Numerical Methods for Financial Engineering 157 / 182

Calibration

Calibration is the process of finding model parametersmatching practically observed data.

Interpretation in the context of option pricing:Option pricing models represent the price of the option asa function of the underlierThis underlier is typically modelled by StochasticDifferential Equations (SDEs)SDE models typically include model parameters (e.g. thevolatility � of the underlier)Choosing such parameters is called calibration.


Calibration

Example: Black-Scholes model

The stock price St under the risk-neutral measure follows

dSt = (r � q)Stdt + �StdWt

• St ,� are the price and volatility of the underlying asset• r is the annualized continuously compounded risk-free rate of return• q is the annualized continuously compounded yield of return• t is the time

The price of the European call option is

C(S, t) = e�q⌧StN(d)�Xe�r⌧N(d��p⌧), d =log(St

X ) + (r � q + �2

2 )⌧

�p⌧

.

• ⌧ = T � t is the time to maturity, • X is the strike price,

• N(x) is the standard normal CDF, N(x) =1

p2⇡

Z x

�1e�t2/2dt

To compute the option price we should know the parameters!What are the right parameter values?


Calibration

If all other parameters except the volatility are fixed, the optionprice C(�) is an increasing function of �.

Suppose CM is the observed price of the call option, then �⇤ isthe unique solution of the nonlinear equation

C(�)� CM = 0.

This �⇤ is calibrated to the observed market price and is calledthe implied volatility.

In practice (# observations)� (# parameters) so that marketprices cannot be matched exactly.

Aim: find parameter values which represent observed marketprices with “as good as possible”.

Importance:

Market data! Calibration! Pricing & hedging! Risk management


Calibration“As good as possible” fitting is frequently realized by theLeast Squares Method:

x = (x1, . . . , xn)> is the vector of parameters,

b = (b1, . . . , bm)> are observed market prices, m > n,

f (x) = (f1(x), . . . , fm(x))> are corresponding model prices.

The aim is to find x so that ri := bi � fi(x) is “small”.

Idea: Minimise the least squares objective functional

g(x) =12

mX

i=1

!i |bi � fi(x)|2, !i > 0.

Methods to choose weights:

• !i =1

|bbid

i � bask

i |2 , • !i =1

⌫(�i)2 , where ⌫(�i ) is option’s vegaat the implied volatility �i .

Meaning: more important observation have larger weights.Alexey Chernov Numerical Methods for Financial Engineering 161 / 182

Calibration: Linear Least Squares

Consider unit weights !i = 1 in what follows.

If f (x) = Ax is a linear one speaks about linear least squares.

0

BBBBBB@

a11 . . . a1n... . . . ...

ai1 . . . ain... . . . ...

am1 . . . amn

1

CCCCCCA

0

B@x1...

xn

1

CA =

0

BBBBBB@

b1...bi...

bm

1

CCCCCCA

In general, if m > n this isan overdetermined system

and thereforehas no solution.

Minimisation of g(x) =12

r(x)>r(x) =12(b � Ax)>(b � Ax)

is equivalent to rg(x) = 0. We have rg(x) = A>Ax�A>b

Normal equations: A>Ax = A>b.


Calibration: Linear Least Squares

Normal equations: A>Ax = A>b.

Normal equations can be solved (see Lecture 2) by:Cholesky factorisation (when A>A is positive definite) as

Gy = A>b, G>x = y

QR-decomposition: A = QR

2g(x) = kr(x)k2 = kQ>r(x)k2 = kQ>QRx �Q>bk2

=

��

R10

�x �

Q>

1Q>

2

�b��

2= kR1x �Q>

1 bk2 + kQ>2 bk2.

Optimal x is the solution of R1x = Q>1 b.

SVD (similar to the QR-decomposition)Alexey Chernov Numerical Methods for Financial Engineering 163 / 182

How to solve a general nonlinear Least Squares problem?

g(x) =12

mX

i=1

|ri(x)|2, ri(x) = bi � fi(x).

Newton’s method: fit g(x) by a local quadratic model

Newton iteration (least squares)

Given x (0), compute x (k+1) = x (k) + h(k)N where

h(k)N = �(r2g(x (k)))�1rg(x (k)), k = 0, 1, . . .

Details:@g

@xk=

mX

i=1ri

@ri@xk

,@2g

@xk@x`=

mX

i=1

@ri@x`

@ri@xk

+ ri@2ri

@xk@x`,

rg(x) = rr(x)>r(x), r2g(x) = rr(x)>rr(x) + S(x), S(x) =mX

i=1

ri (x)r2ri (x)

rr(x) =

2

66664

@r1@x1

. . .@r1@xn

.

.

.. . .

.

.

.@rm@x1

. . . @rm@xn

3

77775r2ri (x) =

2

6666664

@2ri@x1@x1

. . .@2 ri

@x1@xn...

. . ....

@2ri@xn@x1

. . .@2 ri

@xn@xn

3

7777775


In practice evaluation of S(x) is very difficult and time-consuming.

Gauss-Netwon method:Drop S(x). This makes sence if r(x) is small.

Gauss-Newton iteration

Given x (0), compute x (k+1) = x (k) + h(k)GN where

rr(x (k))>rr(x (k))h(k)GN = �rr(x (k))>r(x (k)), k = 0, 1, . . .

We solve normal equations every step ! linear least squares

Levenberg-Marquardt method:Replace S(x) by µkI. µk = 0.01 is the typical parameter value

Levenberg-Marquardt iteration

Given x (0), compute x (k+1) = x (k) + h(k)LM where

(rr(x (k))>rr(x (k)) + µkI)h(k)LM = �rr(x (k))>r(x (k)), k = 0, 1, . . .

Notice: LM is a combination of Gauss-Newton and the gradient descent method!


Regularization:

Sometimes it is necessary to give preference to particularsolutions. Bayesian analysis is an example: If prior informationon the solution is known. A penalty term could be introduced,which penalizes large differences between the computedquantity Q(x) and the prior Q(x0).

This approach is called the Tikhonov regularization

greg(x) =12

r(x)>r(x) +12µF (Q(x),Q(x0))

“Minimization with two objective functionals”

If r = b � Ax , Q(x) = x , x0 = 0, F (x , 0) = x>x it reduces to theLevenberg-Marquardt method, since

rgreg(x) = A>(Ax � b) + µx = 0

, (A>A + µI)x = b.

(Tikhonov regularization improves conditioning!)


Lecture 9

Interpolation

Given: values f0, f1, . . . fn in the n + 1 nodes x0, x1, . . . , xn,i.e. n + 1 pairs (x0, f0), (x1, f1), . . . , (xn, fn).

Aim: find a function P(x) passing through this values:

P(x0) = f0, P(x1) = f1, . . . P(xn) = fn.

Piecewise linear interpolationPolynomial interpolation

Lagrange’s formulaNeville’s algorithm

Spline interpolationHermite cubic splinesNatural cubic splines


Given: n + 1 pairs (x0, f0), (x1, f1), . . . , (xn, fn).Aim: find a function P(x) passing through this values:

P(x0) = f0, P(x1) = f1, . . . P(xn) = fn.

By this we

Reconstruct / fit an unknown function f (x) based on itsnodal values fk = f (xk ).

Simplest idea: connect the data points by straight linesegments (piecewise linear interpolation).

P(x) =

8>><

>>:

P1(x), x0 x x1P2(x), x1 x x2. . .Pn(x), xn�1 x xn

where

Pk (x) = akx + bk .


Piecewise linear interpolation

How to determine ak , bk? ! use interpolation conditions!

Consider the leftmost interval:

8<

:

P(x0) = f0P(x1) = f1P is linear

) P(x) =x1 � xx1 � x0

f0 +x � x0x1 � x0

f1.

In general: for xi x xi+1

P(x) =xi+1 � xxi+1 � xi

fi +x � xi

xi+1 � xifi+1.



Benefits:SimpleExtends to higher dimensions

Example: Bilinear interpolation in 2d:

Given (x0, y0, f00), (x0, y1, f01), (x1, y0, f10), (x1, y1, f11),

interpolate first in x , then in y and observe:

P(x , y) =x1 � xx1 � x0

y1 � yy1 � y0

f00 +x1 � xx1 � x0

y � y0

y1 � y0f01

+x � x0

x1 � x0

y1 � yy1 � y0

f10 +x � x0

x1 � x0

y � y0

y1 � y0f11



Drawbacks:Non-smoothAccuracy might be insufficient

Theorem: if f (x) is twice continuously differentiable, then

|f (x)� P(x)| max |f 00(⇠)|8

h2

where h = maxk=1...n

|xk � xk�1|.


Polynomial interpolation:

Given: n + 1 pairs (x0, f0), (x1, f1), . . . , (xn, fn).

Aim: find a polynomial P(x) = a0 + a1x + · · ·+ anxn, s.t.

P(x0) = f0, P(x1) = f1, . . . P(xn) = fn.

Theorem: if all nodes xi are distinct, then P(x) is unique.


How to obtain an explicit formula for P(x)?

Consider Lagrange basis polynomials `0(x), . . . , `n(x) definedby two conditions:

ì(x) are polynomials of degree n

ì(xk ) =

⇢1, i = k .0, i 6= k .

We observe

ì(x) =✓

x � x0xi � x0

◆. . .

✓x � xi�1xi � xi�1

◆✓x � xi+1xi � xi+1

◆. . .

✓x � xn

xi � xn

◆

or shortly ì(x) =Y

0 m nm 6= i

x � xm

xi � xm.

Then P(x) =nX

i=0

fiì(x) (Lagrange’s formula).


Polynomial interpolation

Example: Given three pairs (xi , fi) = (0, 1), (1, 3), (3, 2).

a) Find P(2) where P(x) is a quadratic polynomial interpolatingthis data set

`0(x) =(x � 1)(x � 3)(0� 1)(0� 3)

, `1(x) =(x � 0)(x � 3)(1� 0)(1� 3)

, `2(x) =(x � 0)(x � 1)(3� 0)(3� 1)

P(2) = 1 · `0(2) + 3 · `1(2) + 2 · `2(2) = 1 · �13

+ 3 · 1 + 2 · 13=

103

b) Find P(x)

P(x) = 1 · `0(x) + 3 · `1(x) + 2 · `2(x) = · · · = �56

x2 +176

x + 1.

Lagrange’s formula is useful when many interpolation problemsare to be solved for the same abscissas xi but different values fi


Neville’s Algorithm:

is particularly suitable for evaluation of P(x) in a single point x .Notation: for a given data set (xi , fi), i = 0, . . . , n we denote by

Pi0i1...ik (x) a polynomial of degree k for which

Pi0i1...ik (xij ) = fij

Theorem: Such polynomials are linked as follows

Pi(x) := fi ,

Pi0i1...ik (x) :=(x � xi0)Pi1...ik (x)� (x � xik )Pi0...ik�1(x)

xik � xi0.

Check that Pi0 i1...ik (xij ) = fij indeed!

But thenP(x) = Pi0i1...in(x).


Neville’s Algorithm

Assemble these polynomials in a tableau

k = 0 1 2 3x0 f0 = P0(x)

P01(x)x1 f1 = P1(x) P012(x)

P12(x) P0123(x) = P(x)x2 f2 = P2(x) P123(x)

P23(x)x3 f3 = P3(x)


Neville’s Algorithm

Example: Given three pairs (xi , fi) = (0, 1), (1, 3), (3, 2).

Find P(2) where P(x) is a quadratic polynomial interpolatingthis data set

k = 0 1 2x0 f0 = P0(2) = 1

P01(2) = 5x1 f1 = P1(2) = 3 P012(2) = 10

3P12(2) = 2.5

x2 f2 = P2(2) = 2

P01(2) =(2 � 0) · 3 � (2 � 1) · 1

1 � 0= 5

P12(2) =(2 � 1) · 2 � (2 � 3) · 3

3 � 1= 2.5

P012(2) =(2 � 0) · 2.5 � (2 � 3) · 5

3 � 0=

103


Polynomial interpolation

The main Drawback of global polynomial interpolation:

P(x) might oscillate between interpolation points.

A “compromise” between linear and polynomial interpolation isthe spline interpolation:

Spline is a piecewise polynomialSpline possesses a high degree of smoothness at theplaces where the polynomial pieces connect(typically continuous 1st and 2nd derivatives )Typically polynomials of lower degree are used (cubic).


Cubic splines have the general form

P(x) =

8>><

>>:

P1(x), x0 x x1P2(x), x1 x x2. . .Pn(x), xn�1 x xn

where

Pk (x) =

a0 + a1x + a2x2 + a3x3.

Cubic polynomials are used between the nodesThere are 4 unknown parameters on each interval:a0, a1, a2, a3, but 2 of them are fixed by the interpolationcondition P(xk ) = fk (2n parameters are still undetermined)

n � 1 parameters are fixed by the smoothness condition

P 0k (xk ) = P 0

k+1(xk )

at interior nodes (n + 1 parameters are still undetermined)

Remaining parameters can be fixed by different methods:e.g. Hermite cubic splines and Natural cubic splines.


Spline interpolation

Hermite cubic spline interpolation conditions:

1. P(xk ) = fk in all nodes k = 0, . . . , n2. P 0

k (xk ) = P 0k+1(xk ) in inner nodes k = 1, . . . , n � 1

3. P 0(xk ) = f 0k in all nodes k = 0, . . . , n

Benefit: Interpolation conditions can be localized

1.� 3. ,⇢

P(xk ) = fk , P(xk+1) = fk+1,P 0(xk ) = f 0k , P 0(xk+1) = f 0k+1,

Drawback: Additional data f 0k are needed in all nodes!Alternatively, f 0k may be somehow approximated, e.g. from fk .



Example:Consider g(x) = e5x � 1 and the data set (xk , fk , f 0k ) withxk = k

10 , fk = g(xk ) and f 0k = g0(xk ), k = 0, . . . , 10. Find theHermite interpolant H(x) for this data set.

Solution:The task falls apart into 10 local interpolation problems. Consider e.g.the leftmost interval. The data set for 0 x 0.1 is(xk , fk , f 0k ) = (0, 0, 5), (0.1, e0.5 � 1, 5e0.5). We have

H(x) = a + bx + cx2 + dx3, H 0(x) = b + 2cx + 3dx2

Then a = 0, b = 5 and⇢

5⇥ 10�1 + 10�2c + 10�3d = e0.5 � 15 + 2⇥ 10�1c + 3⇥ 10�2d = 5e0.5

⇢10c + d = 1000e0.5 � 150020c + 3d = 500e0.5 � 500

Solution: d = �1500e0.5 + 2500, c = 250e0.5 � 400.Alexey Chernov Numerical Methods for Financial Engineering 181 / 182


Natural cubic splines interpolation conditions:

1. P(xk ) = fk in all nodes k = 0, . . . , n2. P 0

k (xk ) = P 0k+1(xk ) in interior nodes k = 1, . . . , n � 1

3. P 00k (xk ) = P 00

k+1(xk ) in interior nodes k = 1, . . . , n � 14. P 00(x0) = 0, P 00(xn) = 0

Benefit:P(x) is twice continuously differentiable.

Drawback:Computation of P(x) results in a global linear system ofequations for its coefficients. Therefore, interpolation by Naturalcubic splines is computationally more expensive thaninterpolation by Hermite cubic splines.


nmfi 15 handouts prelim

Documents

numerical optimisation

numerical method50

numerical recipes

numerical approximation

optimization methods

monte carlo methods

springer finance

wiley finance