master’s thesis - mediatum.ub.tum.de · master’s thesis a ne models in credit risk student...

University of Rome II “Tor Vergata”

Faculty of EngineeringMaster of Science in Models and Systems Engineering

Master’s Thesis

Affine Models in Credit Risk

Student

Vincenzo Ferrazzano

Advisors

Prof. Claudia Kluppelberg

Technische Universitat Munchen

Prof. Benedetto Scoppola

Universita of Rome II “Tor Vergata”

Academic year 2007-2008

I

In beloving memory of two great persons:

To my grandfather Vincenzo, the very first engineer of my family;

I bear his name with pride.

To Prof. Roberta DalPasso, who shown me the rewards of hard work.

II

Martyrdom, sir, is what these people like: it is the only way

in which a man can become famous without ability.

George Bernard Shaw, “The Devil’s Disciple, Act II”.

Shut up and calculate!

Attributed to Richard Feynman.

Contents

Introduction X

1 A Primer in Credit Risk 1

1.1 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Term structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Credit rating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Historical data of default . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Recovery rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Netting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.7 Credit Default Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.8 Credit spread options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Intensity-Based Modelling of Default 9

2.1 Counting processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Poisson processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Doubly stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Risk-neutral probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Useful results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.2 Correlated jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Affine processes and transforms 22

3.1 Affine processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 First examples of affine processes . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Extending the transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.1 Extended transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.2 Fourier transform inversion . . . . . . . . . . . . . . . . . . . . . . . 30

3.3.3 Fourier representation . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.4 Time dependence and multiple jump types . . . . . . . . . . . . . . 33

III

CONTENTS IV

3.4 An optimization idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 A more general result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.1 Diagonal diffusion matrix . . . . . . . . . . . . . . . . . . . . . . . 38

3.6 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6.1 CIR with jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6.2 Bates model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7 Further considerations and reference . . . . . . . . . . . . . . . . . . . . . 43

3.7.1 Infinite activity vs. finite activity . . . . . . . . . . . . . . . . . . . 43

3.7.2 Statistical estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Numerical Methods 45

4.1 Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1.1 Facts on ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1.2 Remarks on Generalized Riccati Equations . . . . . . . . . . . . . . 46

4.1.3 Analysis of one step methods . . . . . . . . . . . . . . . . . . . . . 47

4.1.4 Runge Kutta methods . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1.5 Derivation of an explicit RK method . . . . . . . . . . . . . . . . . 53

4.1.6 Global error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.1.7 On higher order RK methods . . . . . . . . . . . . . . . . . . . . . 57

4.1.8 Step adaptivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.1.9 Our choice: Dormand-Prince method . . . . . . . . . . . . . . . . . 61

4.2 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.1 One dimensional integration . . . . . . . . . . . . . . . . . . . . . . 63

4.2.2 Step adaptivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2.3 Domain transformation . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2.4 Multidimensional integral . . . . . . . . . . . . . . . . . . . . . . . 68

4.3 Main sources and further readings . . . . . . . . . . . . . . . . . . . . . . . 69

5 Applications 70

5.1 Defaultable claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.1.1 No recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.2 Claims with recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.3 Unpredictable Default Recovery . . . . . . . . . . . . . . . . . . . . 72

5.1.4 Fractional loss of value on default . . . . . . . . . . . . . . . . . . . 73

5.1.5 Netting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2 Credit derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2.1 Credit spread options . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2.2 Credit Default Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . 75

CONTENTS V

5.3 A multiname model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.3.1 Pricing a CDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 A numerical example 81

A Measure Theory 92

A.1 Stiejeltes-Lebesgue integration . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.2 Lebesgue measure theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 93

B Stochastic processes 95

B.1 Definitions and basic results . . . . . . . . . . . . . . . . . . . . . . . . . . 95

B.2 Levy Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

B.2.1 Compound Poisson process . . . . . . . . . . . . . . . . . . . . . . . 99

B.3 Infinitesimal generator of a Markov Process . . . . . . . . . . . . . . . . . 100

C Risk-neutral Valuation 102

C.1 The market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

C.2 Risk-neutral Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

D Numerical Codes 104

List of Tables

1.1 Cumulative probabilities of default (%). Source: Moody’s (1970-2003) . . . 5

1.2 Recovery rates on corporate bonds. Source: Moody’s (1982-2003) . . . . . 6

6.1 Parameters for Bates model. . . . . . . . . . . . . . . . . . . . . . . . . . 83

VI

List of Figures

3.1 Solution of the coefficients α and β for the Vasıcek model, with T = 10,

σ = 0.012, k = 0.05, γ = 0.03 . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 How to use local truncation error to estimate global error . . . . . . . . . . 56

4.2 Example of the evaluation algorithm, to get 23 = 8 interval integration

without evaluating the same point twice . . . . . . . . . . . . . . . . . . . 65

4.3 The change of variable y = sinh(π2

sinhx)

(lower curve) and his derivative

(upper curve). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1 Local error tolerance: 10−1, evaluations needed: 37; the algorithm shows a

numerical instability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.2 Local error tolerance: 10−2, evaluation needed: 42; no instability. . . . . . 84

6.3 Local error tolerance: 10−5, evaluations needed: 127; perfect matching. . . 84

6.4 Local error tolerance: 10−5, evaluations needed: 127; perfect matching. The

price is increasing with maturity, i.e is convenient to perform a long-run

investment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5 Local error tolerance: 10−5, evaluations needed: 127; lY = lλ = lr = 0; the

price still grows but at a slower pace. . . . . . . . . . . . . . . . . . . . . . 86

6.6 Local error tolerance: 10−5, evaluations needed: 127; γλ = 0.50. This asset

is really likely to default in the future, then the price is dropping. . . . . . 87

6.7 Local error tolerance: 10−5, evaluations needed: 127; λ0 = 0.80, δ = 2 ,

µS = 0.85. This asset is really likely to default in the near future, then the

price is dropping but the mean reversion make the price rise on the long

run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.8 Local error tolerance: 10−5, evaluations needed: 127. Upper picture: dis-

counted price of the asset; middle picture: price of the asset without dis-

count, bottom picture: ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . 89

VII

LIST OF FIGURES VIII

6.9 Local error tolerance: 10−5, evaluations needed: 127; λ0 = 0.50, lλ = 0.15.

Upper picture: discounted price of the asset; middle picture: price of the

asset without discount, bottom picture: ratio. . . . . . . . . . . . . . . . . 90

Acknowledgments

This thesis is based upon studies conducted from March 2008 to September 2008 at

the Chair of Statistical Mathematics, Department of Mathematics, Technische Universitat

Munchen, Germany.

First and foremost I would like to thank Professor Doktor Claudia Kluppelberg for all

the support, the advice and the kind hospitality. Her experience and kindness made this

thesis possible.

In addition I am very grateful to Peter Hepperger for all the advice and correction

about numerics and for proofreading my manuscript; special thanks go to Frau Grant for

helping me settling in a fruitful life in Munich.

Last but not least, I sincerely thank my parents for their love and support. I am very

grateful that their confidence and encouragement allowed me to enjoy so many opportu-

nities.

IX

Introduction

Risk is, in its broader meaning, the possibility of something goes worse than expected,

involving a loss or a missed gain. It seems clear that this concept is indissolubly tied to

the more abstract and mathematical ones of probability and uncertainty. In finance there

are many different risks to be taken in account:

• Market risk.

• Liquidity risk.

• Currency risk.

• Interest rate risk.

• Credit risk;

• and so on. . .

This thesis is focused on credit risk, which is the risk related to the probability that one

of the counterparts of a contract does not honor his duty, e.g., contractor of a loan is not

able to give the money back at the arranged date or to pay interest. This event is usually

called credit event and the counterpart it is said to be defaulted.

Thinking about risk, usually implies both losses and gains, since in everyday life,

the more we risk, the more we can perhaps gain. That in general is not true for credit

risk, since if everything goes smooth, an investor gets the promised amount. Otherwise,

if something goes bad, the investor loses an amount of money, depending on the nature

of the contract he signed. So, to counter-balance the risk, an investor has to buy some

form of protection and to act on the interest rate, asking for a discount. In the light of

those considerations and of the recent subprime mortgage crisis, we can understand why

financial institutions seek (and sell) actively protection via credit derivatives like Credit

Default Swaps (CDS) and other derivatives.

A central problem in Mathematical Finance is to price products and the question we

have in mind is: “what is a fair price for that protection?”.

X

Introduction XI

It is well-known that the main tool used in practice to price derivatives in is the Monte

Carlo Method (MCM) and it’s as much well-known that such method is slow, inaccurate

and in general inefficient for low dimensional problems. On the other hand, we often have

to deal with problems depending on few econometrics variables for the sake of calibration

effort, and a prompt and fast solution would be a feature desirable by practitioners.

Those drawbacks can be addressed using an affine framework: the class of affine pro-

cesses is composed of n-dimensional stochastic processes (Xt)t≥0 characterized by an

“affine” generator. Let us fix a complete probability space (Ω,F ,P), a filtration (Ft)t≥0

satisfying some usual conditions (cf. section B.2), and let us denote the conditional ex-

pectation E [·|Ft]:=Et [·]. The affine processes’ “core” feature is the following result:

Et

[eR Tt R(Xs)ds+v·XT

]= eα+β·Xt , 0 ≤ t ≤ T (I.1)

with R(Xt) = ρ0 + ρ1 ·Xt, denoting with · the usual inner product in Rn. The quantities

α and β can be explicitly evaluated solving n+ 1 ordinary differential equations (ODEs),

called generalized Riccati equations (GREs). Solving ODEs instead of using MCM pro-

vides some straightforward useful features:

• the solution of ODE is one of the most studied numerical problem, with many

established results, well-tested algorithms and routines.

• Algorithm to solve an ODE can be very fast, very accurate and often both, if the

RHS of the ODE is enough well-behaved (and that can happen in our case).

Why affine process and credit risk? To begin with, let us take a look at the pricing

problem. As customary in mathematical finance, the price St of a financial product at

time t is computed via the formula:

St = Et

[e−

R Tt rsdsf (. . . , T )

], 0 ≤ t ≤ T (I.2)

where (rt)t≥0 is the short rate process and f is the payoff of the considered product. The

short rate is defined as the risk-free rate of return of an investment over an infinitesimal

period:

Just compare the previous equation (I.1). If we let r and f depend on an affine process

(Xt)t≥0 in an affine and exponential affine fashion, we can solve easily the evaluation

problem. The meaning of those dependences will be made clear in Chapter 3.

In the second place, we recall that in a credit risk setting, the payoff function is a

contingent claim, of the form

f (. . . , t) = F1t<τ +R1t≥τ

Introduction XII

where τ is the time of the credit event (default time), F is the promised amount and R

is the recovery, if any, in case of default. The quantities F and R are in general FT−measurable random variable.

For the sake of simplicity, let us suppose that there is no recovery, then if we let all

the default-related quantities depend on an exogenous parameter λt called the default

intensity we can show, under some technical conditions, that we can rewrite price as

Et

[e−

R Tt rsdsF1t<τ

]= Et

[e−

R Tt (rs+λs)dsF

], 0 ≤ t ≤ T. (I.3)

Intuitively, the parameter λt acts as a risk premium, correcting the interest rate to

compensate the risk associated to default.

If my four (or even fewer) readers will be patient enough, I will explain all the relations

presented, showing applications and numerical results through the chapters of this work.

Chapter 1

A Primer in Credit Risk

This chapter it is meant to be an introduction to some concepts about the credit risk

and credit-related products. We refer the interest reader to [34].

1.1 Bonds

A bond is a debt security, in which the authorized issuer owes to the holders a debt

and is obliged to repay the principal and the interest (the coupon) at a later date, called

maturity.

A bond is simply a loan in the form of a security with different terminology: the

issuer is equivalent to the borrower, the bond holder to the lender, and the coupon to the

interest. Bonds enable the issuer to finance long-term investments with external funds.

A bond can be issued from a firm (corporate bond) or from national entities, both

in local currency (government bond) and in foreign currency (sovereign bond). Usually

government bond are considered risk-free1, but other kinds of risk can occur, since that

form of investment depends both on inflation (for domestic investors) and on currency

exchange’s rates.

Bonds and stocks are both securities, but the major difference between the two is that

stock-holders are the owners of the company (i.e., they have an equity stake), whereas

bond-holders are lenders to the issuing company. Another difference is that bonds usually

have a defined term, or maturity, after which the bond is redeemed.

To define a bond we have to specify some feature, i.e. :

• issue price or, shortly, price.

• Face value F , or principal.

1Default of a nation on such kinds of bonds is a extremely rare, but not impossible, event, e.g considerthe Russian crisis of 1998.

1

1.2 Term structure 2

• Maturity T .

• Coupon and coupon dates.

• Indentures and Covenants.

maybe there are other option embedded in a bond, but those are the minimal.

As said before, when an investor buys a bond he is, as a matter of fact, lending money

to the issuer. That amount is called the issue price, and is the price actually paid for the

bond. At a certain date, the maturity, the bond expires and an amount of money is paid,

the face value. Normally some smaller amounts are regularly paid before the maturity,

and are called coupons2; those coupons can be paid with various rates, but usually on

a semestral basis in Europe and yearly in US ; if no coupons are paid, we have a zero

coupon bond, otherwise we are dealing with a coupon bearing bond.

With a bond comes a formal debt agreement (indentures), and each term of this

agreement is called covenant. A positive covenant requires certain actions, and a negative

covenant limits certain actions. An indenture is a legally binding contract and can be

enforced by the law. Then is appropriate to make a distinction between technical default

and debt-service default: the first is the default inducted by breaking a covenant, and

the second occurs when the borrower has not made a scheduled payment of interest

or principal. Throughout we will denote with the term default the debt-service default,

although those those two form of default are correlated (cf. [4]).

1.2 Term structure

Let us consider a zero-coupon bond, assuming that no default can occur and the

interest rate is fixed. Then the price of such bond is given by

St = e−r(T−t)F, 0 ≤ t ≤ T.

Then

r = − 1

T − tlog

StF, 0 ≤ t ≤ T (1.1)

is the zero rate, which is the rate that quantifies the performance of the bond. The zero-

coupon bond is the simplest example of a credit derivative and since the payoff function

is fixed at the issue of the bond, a zero-coupon bond can be defined, for the sake of

simplicity, with F = 1. Similarly, to quantify the performance of a coupon bearing bond,

2This name dates back to the time when the bonds were actually made of paper, with the coupon asmall piece of paper to be physically detached from the bond, in order for the interest to be paid

1.2 Term structure 3

let us consider the coupon dates, 0 < T1 < . . . < Tn = T , c is the coupon. Then we want

to find the rate y such that

S0 = cn∑i=1

e−yTi + e−yTF = cn−1∑i=1

eyTi + e−yT (F + c) (1.2)

holds. This rate is called yield rate, and makes the actualized promised payments equal

to the market price.

It is common in finance to consider only zero-coupon bonds, since a coupon bearing

bond can be treated like a portfolio of zero coupon bonds having face value c, with

maturity Ti When inferred from market data, it is common practice to plot the zero-rate

r as a function of T − t, to get information about the interest rate. The curve r(T − t) is

called term structure of interest rate, zero-curve or yield curve.

Depending on the shape of the curve r(T − t) is possible to get information about the

attitude of the investors toward the economy: usually three different shape are observed:

• Increasing curve: in a normal economy the curve should have this shape. This

means that at the moment all the economic indicator involved are positive (e.g.

inflation, interest rate..) and the investors are prone to risk. During such conditions,

investors expect higher yields for fixed income instruments with long-term maturities

that occur further into the future. Beside this, the yield is higher to compensate the

prolonged exposition to uncertainty.

• Flat curve: the market is sending mixed signals to investors, and they don’t know

what the interest rate will do. Therefore, a flat yield curve can be the herald of a

trend change in market behavior. This situation, if the long-term rate decreases,

can degenerate in a reverse curve. In that condition investors can maximize their

risk/return tradeoff by choosing fixed-income securities with the least risk, or highest

credit quality.

• Decreasing curve: These yield curves are rare, and they form during extraordinary

market conditions wherein the expectations of investors are completely the inverse

of those demonstrated by the normal yield curve. In such abnormal market envi-

ronments, bonds with maturity dates further into the future are expected to offer

lower yields than bonds with shorter maturities. The inverted yield curve indicates

that the market currently expects interest rates to decline as time moves farther

into the future, which in turn means the market expects yields of long-term bonds

to decline.

1.3 Credit rating 4

Recall our claim (I.1). Then the price of a zero-coupon bond is, considering also the

possibility of default (i.e. (rt)t≥0 is not the risk free rate, or short rate)

St = Et

[e−

R Tt rsdsF

]= Feα(t,T )+β(t,T )rt

and using (1.1) we have

r = − 1

T − t(α(t, T ) + β(t, T )rt) . (1.3)

What is the role of credit risk here? Consider the case of a defaultable bond and

remember (I.3). Then the r inferred from corporate bond include also a credit premium

λ, which is called spread, which is an award to the investor to compensate the risk of

default.

A common practice is to use as reference rate the treasury zero-rates, which are the

zero rates inferred from government bond and are considered risk-free. For a more detailed

discussion of term structure theory, we refer to [28].

1.3 Credit rating

Rating agencies such as Moody’s and Standard & Poor’s (S&P) provide ratings de-

scribing the creditworthiness of corporate bonds. Hence, the rating is a valuable informa-

tion to investors, and the reputation and reliability is fundamental for a rating agencies.

As a matter of fact, many US State and federal laws and regulations3, and many corporate

bylaws require the rating agency to be accredited as a “Nationally Recognized Statisti-

cal Rating Organization” (NRSRO) by the Securities and Exchange Commission (SEC).

That condition, known as regulatory barrier, makes the rating business a quite close one.

Using Moody’s system, the best rating is Aaa. Bonds with this rating are considered

to have almost no chance of defaulting. The next best rating is Aa, followed by A, Baa,

Ba, B, and Caa. Only bonds with ratings of Baa or above are considered to be investment

grade. The S&P ratings corresponding to Moody’s Aaa, Aa, A, Baa, Ba, B, and Caa

are AAA, AA, A, BBB, BB, B, and CCC, respectively. To create finer rating measures,

Moody’s divides the Aa rating category into Aal, Aa2, and Aa3; it divides A into A1, A2

and A3; and so on. Similarly S&P divides its AA rating category into AA+, AA, and AA-;

it divides its A rating category into A+, A, and A-; and so on. (Only the Aaa category

for Moody’s and the AAA category for S&P are not subdivided.)

Bond traders have developed procedures for taking credit risk into account when pric-

ing corporate bonds. They collect market data on actively traded bonds to calculate a

3E.g. money funds can invest only in AAA-rated bonds and many pension founds are bounded toinvestiment-grade bonds.

1.4 Historical data of default 5

generic zero-coupon yield curve for each credit rating category. These zero-coupon yield

curves are then used to value other bonds. For example, a newly issue A-rated bond would

be priced using the zero-coupon yield curve calculated from other A-rated bonds.

The spread normally increases as the rating declines and it increases with maturity.

We point out that the spread tends to increase faster with maturity for low credit ratings

than for high credit ratings. For example, the difference between the five-year spread and

the one-year spread for a BBB-rated bond is greater than that for a AAA-rated bond.

1.4 Historical data of default

We report in Table 1.1 some historical data, relative to corporate bonds, aggregated

per rating; they show how the probability of default of bonds with a certain initial rating

varies over the years. We can see the difference of behavior between the investment grade

bonds and those of speculative grade: while the healthy firms need some time to default,

in the speculative zone the first years are crucial. Of course, bonds with lower ratings offer

higher spread to compensate the credit risk, hence the adjective “speculative”.

Table 1.1: Cumulative probabilities of default (%). Source: Moody’s (1970-2003)

Time (years)

Rating 1 2 3 4 5 7 10 15 20

Aaa 0.00 0.00 0.00 0.04 0.012 0.29 0.62 1.21 1.55

Aa 0.02 0.03 0.06 0.15 0.24 0.43 0.68 1.51 2.70

A 0.02 0.09 0.23 0.38 0.54 0.91 1.59 2.94 5.24

Baa 0.20 0.57 1.03 1.62 2.16 3.24 5.10 9.12 12.59

Ba 1.26 3.48 6.00 8.59 11.17 15.44 21.01 30.88 38.56

B 6.21 13.76 20.65 26.66 31.99 40.79 50.02 59.21 60.73

Caa 23.65 37.20 48.02 55.56 60.83 69.36 77.91 80.23 80.23

1.5 Recovery rate

When a corporation defaults on his obligations, counterparts look for an agreement

to solve the debt. If the firm is unable to fulfil all the requests, it can file voluntary o be

filed for bankruptcy, with modalities that vary from nation to nation. Usually, the money

to refund the investors are collect selling the firm’s assets (Liquidation - Chapter 7 of

the US Bankruptcy Code) or is possible to keep the firm in business while a bankruptcy

1.6 Netting 6

court supervises the “reorganization” of the company’s contractual and debt obligations

(Reorganization - Chapter 11 of the US Bankruptcy Code). The largest Chapter 11 case

is the 2008 Lehman Brothers’ bankruptcy, with $ 613 billion of debts4.

Of course, there is not refund available for everybody. As a general rule, some creditors

have an higher priority class than others, specified as covenant in the indenture.

Table 1.2: Recovery rates on corporate bonds. Source: Moody’s (1982-2003)

Class Average (%)

Senior secured 51.61

Senior unsecured 36.1

Senior subordinated 32.5

Subordinated 31.1

Junior subordinated 24.5

Thus, we can express the price of a corporate bond with recovery as

St = Et

[e−

R Tt rsds

(F1T<τ +R1T≥τ

)]= Et

[e−

R Tt rsdsF1T<τ

]+Et

[e−

R τt rsdsR1T≥τ

].

(1.4)

It is evident that the price is the result of the price of the zero-coupon bond plus the

price of the protection.

1.6 Netting

A complication in the estimation of the losses that will be taken in the event of a

counterparty default is netting. This is a clause in most contracts written by financial

institutions. It states that if a counterparty defaults on one contract with the financial

institution then it must default on all outstanding contracts with the financial institution.

That is useful because two financial institutions usually are mutually on short or long

position with each other, and this prevents a counterpart to default voluntarily on some

contracts while keeping others, which are more advantageous (“cherry picking”). This

situation is known as moral hazard, defined as: “the risk that a party to a transaction has

not entered into the contract in good faith, has provided misleading information about its

assets, liabilities or credit capacity, or has an incentive to take unusual risks in a desperate

attempt to earn a profit before the contract settles.”5. We present an example from [34]

to clarify the matter:

4http://www.bloomberg.com/apps/news?pid=20601103&sid=aI_Hue3zUKgs&refer=us5http://www.investopedia.com/terms/m/moralhazard.asp

1.7 Credit Default Swaps 7

A financial institution that has three contracts outstanding with a particular counter-

party. The contracts are worth $10 million, $30 million, and −$25 million to the financial

institution. Suppose the counterparty runs into financial difficulties and defaults on its

outstanding obligations. To the counterparty, the three contracts have values of $10 mil-

lion, $30 million, and −$25 million, respectively. Without netting, the counterparty would

default on the first two contracts and retain the third for a loss to the financial institution

of $40 million. With netting, it is compelled to default on all three contracts for a loss to

the financial institution of $15 million. (If the third contract had been worth $45 million

to the financial institution, the counterparty would choose not to default and there would

be no loss to the financial institution.)

We will now present some credit derivatives that can be priced with affine framework.

1.7 Credit Default Swaps

A credit default swap (CDS) is a contract that provides insurance against the risk of a

default by particular company. The company is known as the reference entity. The buyer

of the insurance obtains the right to sell a particular bond issued by the company for its

face value when a credit event occurs. The bond is known as the reference obligation and

the total face value of the bond that can be sold is known as the swap’s notional principal.

The buyer of the CDS makes periodic payments to the seller until the end of the life of

the CDS or until a credit event occurs. In the latter case, the recovery is determined and

the seller refund the buyer with the difference between the face value and the recovery.

The settlement is then evaluated in 1 −W of the face value, where W is the recovery.

Therefore CDSs are thought as nothing but an additional form of protection from losses;

as many other derivatives, in recent times CDSs lose much of his original “protection”

function, becoming a speculative instrument: since an investor doesn’t have to actually

own the bond underlying the CDS, he is actually betting on the default of a title.

In the light of those consideration, binary CDS were developed, which consist in a

CDS with a fixed refund. We will deal only with standard CDS, leaving intended that a

binary CDS is a standard CDS whose recovery is fixed.

As said in Section 1.3, in practice the determination of the actual recovery rate could

take a long time, then is common practice to use market data to estimate the recovery;

a common method is to compute the mid-price, i.e. the mean between the bid and the

offer for a similar bond. CDSs allow companies to manage their credit risk actively, it is

quite common for a financial institution to sell protection in markets which differ from

its usual field of operation, in order to diversify risks. That is known by practitioners as

the commandment: “don’t put all your eggs in the same basket”. The CDSs market has

1.8 Credit spread options 8

been growing larger and larger, reaching the astounding value of $58 trillion 6 ($58 ·1012).

For a quick comparison we point out that world’s gross domestic product (GDP) was $

54 trillion in 20067.

It is common practice to use CDSs to calibrate models, since they are so diffused and

are more liquid than the underlying bond (i.e. they can much easily sold and bought

at any time). This assumption is often questioned, in light of asymmetric information

arguments: roughly speaking, that means that CDSs rate strictly depends on the default

probability, which is in general known with a various degree of precision, while for other

kinds of derivatives, the informations are the same for everybody (e.g. interest rates,

change ratios, ecc. ecc.). E.g. the CEO of a firm knows better the firm’s real status than

a private citizen which wants to invest his savings. We refer to [34], 537, and to [8] for

further readings.

1.8 Credit spread options

A credit spread option is basically an European option written on a bond. Let us

consider, for simplicity, a zero-coupon bond with maturity T . Let us suppose that the

bond has a zero rate r = Yt + s, where Yt is some reference index (e.g. Libor) and s is

the spread of the owned bond. We denote with St the spread of bonds available on the

market at time t . Furthermore, let us suppose that we are able to exercise the option

only if the underlying bond is not defaulted at time t.

The option would be exercised at time t if there are bonds with greater spread, selling

the old bond to buy the one with greater spread (i.e. better performance); then the payoff

of such option is

Zt =(e−(Yt+s)(T−t) − e−(Yt+St)(T−t)

)+(1.5)

On the other hand, for a call option, we would buy the bond only if the spread is

higher than spreads available on the market, netting:

Z ′t =(e−(Yt+St)(T−t) − e−(Yt+s)(T−t)

)+

6http://www.sec.gov/news/testimony/2008/ts092308cc.htm7http://siteresources.worldbank.org/DATASTATISTICS/Resources/GDP.pdf

Chapter 2

Intensity-Based Modelling of Default

As we have seen in the Introduction, the critical aspect in credit risk is to predict

if and when a default event is likely to occur. To achieve that, in literature two kind of

approaches were used to describe default’s dynamics: the structural approach and the

reduced approach.

Structural models use the value of the firm to determine the default time τ , defined as

time in which the firm’s value falls below a certain level, e.g. the total values of liabilities;

the reduced approach, instead, models the firm’s status (alive or defaulted) as a jump

process, with τ the time of the first jump.

The most prominent examples of structural models are [39] and [6], both based on the

Black & Scholes framework; cf. for instance [24] for a description and some criticism of

such models.

In this thesis, we will adopt the reduced form approach as it leads naturally affine

processes. We assume throughout that all stochastic elements are defined on a complete

probability space (Ω,F ,P) and a filtration (Ft)t≥0, satisfying the usual condition (Defi-

nition B.2) .

2.1 Counting processes

Definition 2.1 (Counting process). Let (Tn)n≥0 be an increasing sequence of random

variables such that: T0 = 0 a.s. A stochastic process (Nt)t≥0 is the counting process

associated to the sequence (Tn)n≥0 ifNt = n if t ∈ [Tn, Tn+1)

Nt =∞ if t ≥ limn→∞

Tn.

If limn→∞

Tn =∞, then (Nt)t≥0 is called non-explosive. We point out that a counting process

is a right continuous process by definition.

9

2.1 Counting processes 10

Definition 2.2 (Intensity of a counting process). Let (λt)t≥0 be a non-negative predictable

process, such that ∫ t

0

λsds <∞ a.s., for all t

Let (Nt)t≥0 be a counting process. If(Nt −

∫ t0λsds

)t≥0

is a local martingale, then (λt)t≥0

is called intensity of (Nt)t≥0.

We can consider the intensity unique, indeed the following result holds.

Theorem 2.1.1 ([7]). Let (λt)t≥0 and(λt

)t≥0

two intensities for the counting process

(Nt)t≥0. Then ∫ t

0

∣∣∣λs − λs∣∣∣λsds = 0 a.s., for all t. (2.1)

If we take the intensities to be strictly positive, then from (2.1) follows that λt = λt

a.s, for all t ≥ 0.

We can get rid of the localness assuming some technical condition, leading to the

following results:

Proposition 2.1.2. Suppose (Nt)t≥0 is an (Ft)−adapted counting process and (λt)t≥0 is

a non-negative Ft−predictable process such that, for all t, E[∫ t

0λsds

]<∞, with (Ft)t≥0

satisfying the usual conditions. Then the following are equivalent:

(i) (Nt)t≥0 is nonexplosive and (λt)t≥0 is the intensity of (Nt)t≥0.

(ii)(Nt −

∫ t0λsds

)t≥0

is a martingale.

Proof. (ii⇒ i).

A martingale is, obviously, also a local martingale, and (λt)t≥0 is the intensity of

(Nt)t≥0. By the definition of martingale

E[Nt −

∫ t

0

λsds

]≤ E

[∣∣∣∣Nt −∫ t

0

λsds

∣∣∣∣] <∞.then (Nt)t≥0 is non exploding.

(i⇒ ii).

Since Nt and∫ t

0λudu are increasing and positive, we have for all t that∣∣∣∣Nt −

∫ t

0

λudu

∣∣∣∣ ≤ |Nt|+∣∣∣∣∫ t

0

λudu

∣∣∣∣ = Nt +

∫ t

0

λudu

and

sups≤t

∣∣∣∣Ns −∫ s

0

λudu

∣∣∣∣ ≤ Nt +

∫ t

0

λudu.

2.1 Counting processes 11

Then

E[sups≤t

∣∣∣∣Ns −∫ s

0

λudu

∣∣∣∣] <∞and by Theorem B.1.3 we have the desired result.

Remark 1. From the Proposition 2.1.2 (ii), we can observe, for s > t:

Et

[Ns −

∫ s

0

λudu

]= Nt −

∫ t

0

λudu ⇒ Et [Ns −Nt] = Et

[∫ s

t

λudu

].

Proposition 2.1.3 (T8,T9, [7] 27-28). Suppose (Nt)t≥0 to be a nonexplosive (Ft)−adapted

counting process with intensity (λt)t≥0, with∫ t

0λsds <∞ a.s., for all t and (Ft)t≥0 sat-

isfyies the usual conditions. LetMt = Nt −

∫ t0λsds, t ≥ 0

. Then for every predictable

process (Ht)t≥0 such that∫ t

0|Ht|λsds <∞ a.s., for all t the process:

Yt =

∫ t

0

HsdMs =

∫ t

0

HsdNs −∫ t

0

Hsλsds.

is well defined and is a local martingale. In addition, if E[∫ t

0|Hs|λsds

]<∞, then (Yt)t≥0

for all t, is a martingale.

Remark 2. If (Yt)t≥0 is a martingale, then for s > t

Et [Ys] = Yt ⇒ Et

[∫ s

t

HudNu

]= Et

[∫ s

t

Huλudu

]The first jump of a counting process will be central in our context, hence we pose this

Definition 2.3 (Intensity for a stopping time). Let be (Nt)t≥0 be a nonexplosive counting

process with intensity (λt)t≥0 , and let τ := inft : Nt = 1. Then it will be said that the

stopping time τ has intensity (λt)t≥0.

For multiname risk modelling, can be useful to characterize the first jump of n counting

process:

Lemma 2.1.4. Let τi, i = 1, . . . , n be a succession of stopping times with intensity (λit)t≥0.

If P [τi = τj] = δij then, τ := min (τ1, . . . , τn) has intensityn∑i=1

λi.

Proof. By Definition 2.3, for each i, we have that, M it = N i

t−∫ t

0λisds is a local martingale.

Then also

Mt :=n∑i=1

M it =

n∑i=1

N it −

∫ t

0

n∑i=1

λiudu (2.2)

2.2 Poisson processes 12

is a local martingale.

Let us denote with (Nt)t≥0 the counting process associated to τ , since we supposed

that P [τi = τj] = δij, we can write

Nt = N1t + · · ·+Nn

t , t ≤ τ ;

then by 2.2,n∑i=1

λi is the intensity of τ .

2.2 Poisson processes

We present one of the many equivalent definition of a Poisson process, which qualifies

it as a Levy process (cf. Definition B.10 )

Definition 2.4 (Poisson process). An (Ft)−adapted, non-exploding, counting process

(Nt)t≥0 is a Poisson process if

1. ∀s, t, 0 ≤ s ≤ t < ∞, Nt − Ns is independent from Gs := σ (Nu : u ≤ s), the

history of the process up to time s (independent increments.)

2. ∀s, t, u, v; 0 ≤ s ≤ t < ∞, 0 ≤ u ≤ v < ∞, and v − u = t − s we have Nt − Nsd=

Nv −Nu (that is: stationary increments.)

Theorem 2.2.1. Let (Nt)t≥0 be a Poisson process. Then

P [Nt = n] = e−λt(λt)n

n!, n ∈ N0 (2.3)

for some λ and all t ≥ 0. That is, the random variable Nt has a Poisson distribution

of parameter λt, for some deterministic λ > 0.

to be added. [46]

Theorem 2.2.2. Let (Nt)t≥0 be a Poisson process, then

E[Nt] = λt, V ar(Nt) = λt.

In addition, (Nt − λt)t≥0 and ((Nt − λt)2 − λt)t≥0 are martingales.

Proof. The calculation of mean and variance is straightforward and will be omitted. Since

λt is deterministic,

E[Nt] = λt ⇒ E[Nt − λt] = 0,

E[(Nt − λt)2] = λt ⇒ E[(Nt − λt)2 − λt] = 0

To verify that those “compensated” process are martingales, we note that for all t ≥ s :

E[Nt − λt− (Ns − λs) |Gs] = E[Nt −Ns − λ(t− s)] = 0.

The same holds for (Nt − λt)2 − λt, t ≥ 0.

2.3 Doubly stochastic process 13

Definition (2.2) in combination with Theorem 2.2.2 shows that λ is the (deterministic)

intensity of the Poisson process (Nt)t≥0.

As it can be seen in the proof of Theorem 2.2.2 we have that

E[Nt −Ns] = λ(t− s).

Hence λ represents the jump-rate per unit time and the name “intensity” is well-given.

2.3 Doubly stochastic process

Now we will extend the concept of Poisson process. Let (Gt)t≥0 and (Ft)t≥0 be filtra-

tions and Fs ∨ Gt := σ (Fs ∪ Gt).

Definition 2.5 (Doubly stochastic processes). Let (Nt)t≥0 be a non-explosive counting

process with intensity (λt)t≥0, and (Ft)t≥0 satisfies the usual conditions, with Ft ⊂ Gt, t ≥0.

If (λt)t≥0 is (Ft)-predictable, and for all t, s > t, Ns−Nt, conditioned on Fs∨Gt, has

a Poisson distribution with parameter∫ stλsds, then (Nt)t≥0 is called doubly stochastic

process, driven by (Ft)t≥0.

That is:

P [Ns −Nt = n |Fs ∨ Gt ] =

(∫ stλsds

)nn!

e−R st λsds. (2.4)

Remark 3. If we multiply (2.4) by n, and sum over n ∈ N0, recalling that∞∑i=0

xi

i!= ex, we

get

E [Ns −Nt |Fs ∨ Gt ] =

∫ s

t

λudu, 0 ≤ t < s;

once again, the name “intensity” is well-given. Of course, the quantity∫ stλudu is a stochas-

tic process, otherwise we would get the same result as for the Poisson process. An alter-

native representation is by the mean of his characteristic function:

E[eiu(Ns−Nt) |Fs ∨ Gt

]= exp

[(eiu − 1

) ∫ s

t

λudu

], 0 ≤ t < s, u ∈ R (2.5)

We will use a doubly stochastic process to model the status of a debtor: the process

(Nt)t≥0 starts with N0 = 0, and we will denote with τ the time of the first jump (i.e

τ := inft : Nt = 1). Then for Nt = 0, t < τ , the debtor is alive. At time τ the process

jumps and we have the credit event. We see from the definition of doubly stochastic

process that there are two different flows of informations: (Ft)t≥0 and (Gt)t≥0, the first

smaller than the other.

2.3 Doubly stochastic process 14

So, at initial time t, the investors know λt from historical data, and they want to

foretell the probability of default, i.e. P[Ns −Nt > 0 |Gt ], and other correlate quantities.

To know the probability to have n jumps in the time interval [t, s], we have to know

the informations contained in Fs, in order to have that∫ stλudu makes sense, and the

information contained in Gt.

For our applications, we can think that the two filtrations represent two different

actors: one equipped with sigma algebra Fs, that contains the statistical informations of

the process (Nt)t≥0, up to time s and the other equipped with Gt that has information

about (λt)t≥0 only up to time t, but has access to other kinds of information. The notion

of doubly stochastic process can be effortlessy generalized in a multidimensional context.

Definition 2.6. The process N = (N1t , . . . , N

nt ), t ≥ 0 is said to be an n−dimensional

doubly stochastic process, driven by (Ft)t≥0, with intensity λt = (λ1t , . . . , λ

nt ), t ≥ 0 if

• (N it )t≥0 is a doubly stochastic process, driven by (Ft)t≥0, with intensity (λit)t≥0, for

i = 1, . . .m, n.

• N iu − N i

t , t ≤ u ≤ s, i = 1, . . . , n, conditional on the σ−algebra Gt ∨ Fs, are

independent.

As we noted before, the emphasis is on the filtration (Gt)t≥0 and usually it can be

constructed from (Ft)t≥0.

Proposition 2.3.1. Let (Ft)t≥0 satisfies the usual conditions, and (λt)t≥0 be a (Ft)−predictable,

non negative process with∫ t

0λsds <∞ .a.s. for all t Let (Zi)i∈N be i.i.d random variable

with exponential distribution, independent of (Ft)t≥0 for all t.

If a counting process (Nt)t≥0 is associated to the sequence :

T0 = 0,

Tn = inft ≥ Tn−1 :

∫ tTn−1

λsds ≥ Zn

, n ∈ N,

(2.6)

then (Nt)t≥0 is a non exploding counting process, with intensity (λt)t≥0, doubly stochastic

driven by (Ft)t≥0, with Gt the σ-algebra generated both from Ft and σ (Ns : 0 ≤ s ≤ t)

Proof. Let us take n = 1, then

T1 = inf

t ≥ 0 :

∫ t

0

λsds ≥ Z1

;

by definition,

P [t ≥ T1 |Ft ] = P[∫ t

0

λsds ≥ Z1 |Ft

]= e−

R t0 λsds.

2.4 Risk-neutral probability 15

For the strong Markov property we can apply the same reasoning, starting from

(T1, Z1). From the independence of Z1 and Z2 follows that T2 − T1 are independent of

T1; hence (Ti − Ti−1)i∈N are i.i.d.. Therefore Nt, conditioned on Ft has a Poisson distri-

bution with parameter∫ t

0λsds and Ns − Nt, conditional on Fs ∨ Gt, is distributed with

Poisson distribution, i.e. (Nt)t≥0 is a doubly stochastic process.

2.4 Risk-neutral probability

Up to now, all definitions depended on the physical probability measure P: now we

need a risk-neutral probability Q measure to price the derivatives.

We recall the staples of neutral pricing theory in the Appendix C, and in addition we

will suppose the existence of a short interest rate (rt)t≥0, such that∫ t

0rudu <∞ a.s., for

all t. (in analogy with the hypothesis made on (λt)t≥0.).

Before proceeding we answer to the question: a counting process with respect to P is

still a counting process in an equivalent probability measure Q?

Here helps the following:

Proposition 2.4.1 ([1]). Suppose that a nonexplosive counting process (Nt)t≥0 has a

P−intensity process (λt)t≥0, and that Q is any equivalent probability measure to P. Then

(Nt)t≥0 has a Q-intensity process (λQt )t≥0.

The ratio λQt /λt, t ≥ 0, represents the risk premium associated with the uncertainty

about the time of default. Of course we suppose (λt)t≥0 is strictly positive, since we

suppose that there is always a possibility of a default event.

Lemma 2.4.2. Let (Nt)t≥0 be a nonexplosive counting process with intensity (λt)t≥0 with

Ti := inf t : Nt = i (i.e Ti is the i-th jump time of Nt). Let (ϕt)t≥0 be a strictly positive

predictable process such that∫ T

0λsϕsds <∞ a.s, for fixed T . Then

ξt = exp

(∫ t

0

(1− ϕs)λsds) ∏i:Ti≤t

ϕTi , t ≤ T (2.7)

is well defined and a local martingale

Proof. Confronting (2.7) and (A.1.1) se can see from the Theorem A.1.1, posing a(t) =

Mt = Nt −∫ t

0λudu, u(t) = ϕt − 1 that ξt solves the equation

ξt = 1 +

∫ t

0

ξs−(ϕs − 1)dMs, t ≥ 0 (2.8)

since ∆Mt = ∆Nt = 1 and M ct = −

∫ t0λudu. Looking at 2.8 we see that the integrand is

predictable, then by Proposition 2.1.3, (ξt)t≥0 is a local martingale.

2.4 Risk-neutral probability 16

We will present a version of Girsanov’s Theorem, that allows us to find a probability

measure under which a nonexplosive counting process remains a counting process. We

present first some necessary condition for (2.7) to be a martingale.

Lemma 2.4.3. Let (Nt)t≥0 be a nonexplosive counting process with intensity (λt)t≥0. If

(λt)t≥0 is bounded and deterministic and (ϕt)t≥0 is bounded on [0, T ], then (ξt)t∈[0,T ], as

defined in (2.7), is a martingale.

Proof. Let us consider for simplicity λ = 1 and T = 1. Since (ϕt)t≥0 is bounded, there

exists a K > 1 such that ϕt ≤ K, t ∈ [0, 1]; from (2.7) we have

ξt ≤ eKtKNt (2.9)

then∫ t

0

|ξs−(ϕs − 1)|ds ≤ |K − 1|∫ t

0

eKsKNs−ds ≤ (K + 1)eKKN1−t ≤ (K + 1)eKKN1 .

where we used (2.9), and the fact that eKtKNt− is increasing in t, and t ∈ [0, 1]. Then

E[∫ t

0

|ξs−(ϕs − 1)|ds]≤ E

[eKKN1(K + 1)

]<∞

since Nt is non-exploding; then by Proposition 2.1.3∫ t

0

ξs−(ϕs − 1)dMs, t ∈ [0, 1]

is a martingale. Then in the light of 2.8 also (ξt)t∈[0,T ] is a martingale .

Theorem 2.4.4 (Girsanov’s Theorem - 1, [21], 57 and [7] T3, 166-167). Suppose that the

local martingale (ξt)t∈[0,T ] is a martingale. Then an equivalent probability measure Q is

defined by dQdP = ξT . Restricted to the time interval [0, T ], under the probability measure

Q, (Nt)t≥0 is a nonexplosive counting process with intensity λQt = λtϕt, t ∈ [0, T ].

We note that the Radon-Nikodym derivative is totally determined by the process

ϕt = λQt /λt. Then if we are able to find (ϕt)t≥0, then we have an equivalent measure such

that the nonexplosive counting property still holds.

Asking more we can preserve also the double stochastic property.

Lemma 2.4.5. Let (Nt)t≥0 be a doubly stochastic, driven by (Ft)t≥0 with intensity (λt)t≥0.

For a fixed time T > 0, let ϕ be an (Ft)−predictable process with (λt)t≥0 and (ϕt)t≥0

bounded on [0, T ]. Then (ξt)t∈[0,T ], defined as in (2.7), is a martingale.

2.5 Useful results 17

Proof. Consider that for the law of iterated expectations, for a random variable ξt:

E [E [ξt |Ft ]] = E [ξt] .

Then proof is analogous to the one of Lemma 2.4.3, considering the doubly stochastic

property.

Theorem 2.4.6 (Girsanov’s Theorem bis, [21], 57). Suppose (Nt)t≥0 is doubly stochastic,

driven by (Ft)t≥0 with intensity (λt)t≥0, with (Gt)t≥0 is the completion of σ(Ns : 0 ≤ s ≤ t)∨Ft , t ≥ 0. For a fixed time T > 0, let (ϕt)t≥0 be an (Ft)−predictable process with∫ T0ϕsλsds < ∞ a.s.. Let (ξt)t∈[0,T ] be defined by (2.7), and suppose that (ξt)t∈[0,T ] is a

martingale. Let Q be the probability measure with dQdP = ξT . Then, restricted to the time

interval [0, T ], under the probability measure Q and with respect to the filtration (Gt)t≥0,

(Nt)t≥0 is doubly stochastic, driven by (Ft)t≥0, with intensityλQt = ϕtλt, t ∈ [0, T ]

.

Remark 4. As said before, sigma algebra Gt contains the market informations up to time

t. Hypotheses in Theorem 2.4.6 and Proposition 2.3.1 strengthen this interpretation since

they tells that it has to contain information about λt and Nt (respectively, in our context,

the intensity of default and the status of the firm) which are market data.

We have given a density ξ which provide an equivalent measure Q, but we need that

the discounted price of an asset is a martingale under Q. That would cast some restriction

on parameters of an affine model, we refer to [23], Sections 3.1-3.2.

2.5 Useful results

Now we are ready to find some useful relations, which can be effectively evaluated

jointly with affine process theory. We will often use this classical results in probability:

Theorem 2.5.1 (Law of iterated expectations). Let A ,G ⊂ F be two σ algebra such

that A ⊂ G : Then E [X |A ] = E [E [X |A ] |G ] = E [E [X |G ] |A ] .

Proof. E [X |A ] is, by definition, an A−measurable random variable. and, since A ⊂ G ,

then is also G−measurable; therefore E [X |A ] = E [E [X |A ] |G ].

Let us prove the other relation: each A ∈ A belongs also to G , then

E [E [X |G ] 1A] = E [X1A] .

Then the relation follows from the definition of conditional expectation.


2.5.1 Survival analysis

Let us denote with A the event Ns −Nt = 0, i.e. no default in the time interval [t, s].

Then the probability of default, conditioned to the information available up to time s to

investors, is, exploiting the Theorem 2.5.1:

P [τ > s |Gt ] = E [1A |Gt ]= E[E[1A |Gt ∨Fs ] |Gt ]= E[P[Ns −Nt = 0 |Gt ∨Fs ] |Gt ]

But recalling (2.4) we have

P [τ > s |Gt ] = E[e−R ts λudu |Gt ] (2.10)

and now let us compare (2.10) with I.2: then evaluating the probability of default form

the market data contained in Gt is equivalent to price a zero-coupon bond discounted by

the intensity (λt)t≥0.

Credit risk can be naturally approached with survival analysis, and the reduced ap-

proach yields some nice results.

Definition 2.7 (Hazard rate). If τ is an absolutely continuous non-negative random

variable, its hazard rate function, is defined by

h(t) =f(t)

S(t), t ≥ 0

where f(t) is the density of τ and S(t) is the survival function: S(t) =∫ t

0f(u)du.

Note that P (T ≤ t+ dt|T > t) ≈ h(t)dt. Then the hazard rate is the instantaneous rate

of failure at time t.

Let us denote with p(t) = P (τ > t) the survival function p : [0,∞) → [0, 1]. The

density of the stopping time is π(t) = −dpdt

, and the hazard rate h : [0,∞)→ [0,∞) is

h(t) =π(t)

p(t)= − d

dtlog p(t) (2.11)

then, we can integrate both sides, getting:

p(t) = P(τ > t) = e−R t0 h(u)du

In an analogous way we can define all those quantity conditioned on Gt, adding as a

shorthand notation, the subscript t .

Then we know from (2.10) that

pt(s) = Pt(τ > s) = Et[e−R st λudu]


and, therefore,

πt(s) = − d

dspt(s) = − d

dsEt[e

−R st λudu]

If we were allowed to exchange expectation and derivatives, we would obtain easily

πt(s) = Et[e−R st λuduλs] (2.12)

Since s, λs are usually positive and bounded, we can use regular results from measure

theory to exchange expectation and derivative. Actually, more can be said:

Theorem 2.5.2 ([31], 106-107). Let (Nt)t≥0 be a doubly stochastic process, driven by

(Ft)t≥0, such that

• ∃C constant, such that, E (λ2t ) < C, ∀t.

• ∀ε > 0 and a.e. t,

limδ→0

P(|λ(t+ δ)− λ(t)| < ε) = 1

thend

dsEt[e

−R st λudu] = Et[−λse−

R st λudu].

If we assume that (I.1) holds, in a doubly stochastic setting with λt = l0 + l1 · Xt−,

and (Xt)t≥0 an affine process, then all those quantities can be easily calculated.

Let us begin with the survival function:

pt(s) = Pt(τ > s) = Et

[e−

R st λudu

]= eα(t,s)+β(t,s)·Xt

where α(t, s) and β(t, s) solves the GREs. Recalling the equation (2.11)

ht(s) = − d

dslog pt(s) = − d

ds(α(t, s) + β(t, s) ·Xt) = −∂sα− ∂sβ ·Xt

Once we have solved the GREs, we know also explicitly the time derivatives involved in

the calculation of ht(s).

Again from (2.11) we know that the density of the stopping time is

πt(s) = ht(s)pt(s) = − (∂sα + ∂sβ ·Xt) eα(t,s)+β(t,s)·Xt

Therefore resolving the GREs we can find all the related quantities, and recalling (2.12)

we obtain

Et[e−R st (l0+l1·Xu)du (l0 + l1 ·Xs)] = − (∂sα + ∂sβ ·Xt) e

α(t,s)+β(t,s)·Xt

Actually, a more general result will be shown but this example shows how, in some sense,

the affine structure can be extended.


2.5.2 Correlated jumps

The doubly stochastic process framework allow us to get some some results also in the

n−dimensional case, where we can consider each component of an n−dimensional doubly

stochastic process (Nt)t≥0 as a different entity, which is only conditionally independent of

the others components.

Proposition 2.5.3. Let (Nt)t≥0 be an n−dimensional doubly stochastic process with in-

tensity (λt)t≥0 and let τi := inft : N it ≥ 1. Then:

(i) Pt [τ > T ] = Et

[e−

R Tt Λsds

].

(ii) Pt [τ1 > t1, . . . , τn > tn] = Et

[e−

R tnt Γsds

].

where τ = infτ1, . . . , τn, Λs =n∑i=1

λis, Γs =n∑

i:ti>s

λis and t ≤ t1 ≤ . . . ≤ tn

Proof. (i) We have, for the law of iterated expectations,

Pt [τ > T ] = Et [P [τ > T |Ft ∨ GT ]] .

By definition of n−dimensional doubly stochastic process, we have that, conditioned

on Ft ∨ GT , the jump times are independent, then the result follows from Lemma

2.1.4 and (2.10).

(ii) For the law of iterated expectations and conditional independence we have:

Pt [τ1 > t1, . . . , τn > tn] = Et [P [τ1 > t1, . . . , τn > tn |Ft ∨ GT ]] =

Et [P [τ1 > t1 |Ft ∨ GT ] . . .P [τn > tn |Ft ∨ GT ]] = Et

[e−

R t1t λ1

sds · · · e−R tnt λns ds

].

We observe that, if intensities are affine, Λs is affine and Γs is piecewise affine, and

then are tractable with the affine framework. Let us consider the case (ii), since (i) is

trivial.

Let the (I.1) holds, since Γs = Γ(Xs, s) is affine s ∈ (tk−1, tk), we have that

Etk+1

[e−R tk+2tk+1

Γ(Xs,s)ds

]= eα(k+1)+β(k+1)·Xtk+1

and, by the law of iterated expectations

Etk

[e−

R tk+2tk

Γ(Xs,s)ds

]= Etk

[Etk+1

[e−

R tk+2tk

Γ(Xs,s)ds

]]= Etk

[e−

R tk+1tk

Γ(Xs,s)dsEtk+1

[e−R tk+2tk+1

Γ(Xs,s)ds

]].


Etk

[eR tk+1tk

Γ(Xs,s)dseα(k+1)+β(k+1)·Xtk+1

]= eα(k)+β(k)·Xtk

Solving the latter equation backward, up to time t0 = t, we have that, by Proposition

2.5.3 (ii):

Pt [τ1 > t1, . . . , τn > tn] = Et

[e−

R tnt Γsds

]= eα(0)+β(0)·Xt ;

i.e we have an analytical expression for joint distribution of default times.

The special case of n = 2 is of interest: many contracts which involves two actors are

valid if and only if both actors are not defaulted at maturity: i.e. τ1 > T ∩ τ2 > T.We are, of course, in the case (i) of Proposition 2.5.3, i.e. Pt [τ1 > T ∩ τ2 > T] =

Et

[e−

R Tt (λ1

s+λ2s)ds].

Recurring to inclusion-exclusion formula, it is easy to find also an expression for the

event τ1 > T ∪ τ2 > T, which means that at least one actor is alive at maturity:

Pt [τ1 > T ∪ τ2 > T] = Pt [τ1 > T ] + Pt [τ2 > T ]− Pt [τ1 > T ∩ τ2 > T] ;

using the doubly stochastic property we have, as usual,

Pt [τ1 > T ∪ τ2 > T] = Et

[e−

R Tt λ1

sds]

+ Et

[e−

R Tt λ2

sds]− Et

[e−

R Tt (λ1

s+λ2s)ds]. (2.13)

Chapter 3

Affine processes and transforms

Now we will present and show the results that prove (I.1). Here we will deal with a

simpler theory of affine process, developed mainly in [23] for jump-diffusion process, a

fully fledged theory can be found in [22]. For notation simplicity we will suppose that all

functions considered don’t depend explicitly on time and with only one type of jumps,

but all the results also apply in a more wide context, as said in Section 3.3.4.

We will denote with · the usual inner product in Rn and with ⊗ the dyadic product

Rn×n 3 (a⊗ b)ij := aibj, a, b ∈ Rn

while : is the scalar product over the space of tensors

A : B := tr[ABT

]=

n∑i,j=1

(A)ij(B)ij, A,B ∈ Rn×n.

3.1 Affine processes

Let us pose some definitions, that will turn useful. Throughout all random random

elements are defined on a filtred probility space (Ω,F , (Ft)t≥0,P). Details and definitions

can be found in Appendix B.

Definition 3.1 (Jump-Diffusion Process). Let us fix the probability space (Ω,F ,P) and

a filtration (Ft)t≥0 satisfying the usual hypotheses, and suppose that (Xt)t≥0 is a Markov

process in some state D ⊂ Rn. (Xt)t≥0 is a jump-diffusion (JD) process if the transition

semigroup has an infinitesimal generator D of the Levy type defined, for any bounded

f : D → R, f ∈ C2(D) with bounded derivatives, by

Df(x) = ∇xf · µ(x) +1

2∇2xf :

(σ(x)σT (x)

)+ λ(x)

∫Rn

[f(x+ z)− f(x)] dν(z) (3.1)

22

3.1 Affine processes 23

where µ : D → Rn, σ : D → Rn×n , ν : Rn → R+ is the fixed jump distribution of a

compound Poisson process (Zt)t≥0 with intensity (λ(Xt))t≥0, with λt : D → R+.

Defining this generator, we have defined a process driven by a finite activity Levy

process (cf. Proposition B.3.3) with triplet (0, 0, λtν).

Anyway, we can think of (Xt)t≥0 as a process solving:

dXt = µ(Xt)dt+ σ(Xt)dWt + dZt (3.2)

whereX0 has, for simplicity, a known distribution. We denote with (Wt)t≥0 a (Ft)−brownian

motion in Rn, (Zt)t≥0 a compound Poisson process valued in Rn with fixed jump distri-

bution ν and intensity (λ(Xt))t≥0.

The choice of D is arbitrary and casts restrictions on µ, σ, ν, along with necessity to

have a strong solution to (3.2). Prior attempts to define those conditions and the form

of D were made in [20] and [16]. In [22] authors established a full characterization of the

affine process in a state space D = Rm+ × Rn−m, m = 1, . . . , n, which is considered to

be the standard state space for financial applications. We will present those condition in

Section 3.5, referring to [22] for the details.

Definition 3.2 (Affine characteristic of an JD process). Let (Xt)t≥0 be a JD process,

with parameters µ, λ, σ, ν; we define the Laplace transform of the jump distribution as

θ(c) =

∫Rnec·zdν(z), c ∈ C. (3.3)

If we let the parameters depend affinely on (Xt)t≥0

µ(x) = K0 +K1x K := (K0, K1) ∈ (Rn,Rn×n)(σ(x)σT (x)

)ij

= (H0)ij +n∑k=1

(H1)ijk xk H := (H0, H1) ∈ (Rn×n,Rn×n×n)

λ(x) = l0 + l1 · x l := (l0, l1) ∈ (R,Rn) .

(3.4)

then the quintuplet (K,H, l, θ) is called the (affine) characteristic of (Xt)t≥0 and (Xt)t≥0

is called affine jump-diffusion process (AJD) or, for short, affine.

Definition 3.3 (Trasform). Let (Xt)t≥0 be an affine process, with characteristic χ =

(K,H, l, θ), and R : D → R+ be a discount-rate function, such that R(x) = ρ0 + ρ1 · x,

ρ ∈ R, ρ1 ∈ Rn. Then for t ≤ T the function ψ : Cn ×D × R+ × R+ → C is well defined

by:

ψ(u,Xt, t, T ) = Et

[e−

R Tt R(Xs)dseu·XT

](3.5)


Definition 3.4. A characteristic χ = (K,H, l, θ) is well-behaved at (u, T ) ∈ Cn × [0,∞)

if α and β solves the GRE:β(t) = ρ1 −KT

1 β(t)− 12βT (t)H1β(t)− l1(θ(β(t))− 1)

α(t) = ρ0 −KT0 · β(t)− 1

2βT (t)H0β(t)− l0(θ(β(t))− 1)

α(T ) = 0, β(T ) = u.

(3.6)

and if

1. E[∫ T

0|γt| dt

]<∞, with γt = Ψt (θ(β(t))− 1)λ(Xt)).

2. E[∫ T

0ηt · ηtdt

]<∞ , with ηt = Ψtβ

T (t)σ(Xt).

3. E [|ψT |] <∞.

where Ψt = e−R t0 R(Xs)dseα(t)+β(t)·Xt

We observe that actually only the first equation is an ODE, the second one is only a

definite integrale, once β is known. The equations (3.6) are ODEs backwards in time and

is useful to operate the transformation

t→ s = T − t

obtaining β(s) = −ρ1 +KT

1 β(s) + 12βT (s)H1β(s) + l1(θ(β(s))− 1)

α(s) = −ρ0 +KT0 · β(s) + 1

2βT (s)H0β(s) + l0(θ(β(s))− 1)

α(0) = 0, β(0) = u.

(3.7)

The time dependence in the following pages will be dropped for ease of notation. We leave

the time dependence in the boundary condition, leaving intended that if we have α(0) and

β(0) as boundary conditions, we are solving the initial time problem (i.e. (3.7)) otherwise

we are solving the backwards time problem.

And here is the results we were all waiting for:

Theorem 3.1.1. Let (K,H, l, θ) be well-behaved in (u, T ), then

ψ(u,Xt, t, T ) = eα(t)+β(t)·Xt (3.8)

Proof. We have to show that (Ψt)t≥0 is a martingale, because, for s ≥ t

Et [Ψs] = Ψt ⇒ eR t0 R(Xu)duEt [Ψs] = e

R t0 R(Xu)duΨt (3.9)

Since eR t0 R(Xu)du is Gt−measurable, and using the boundary condition in (3.6):

Et

[e−

R Tt R(Xs)dseu·XT

]= eα(t)+β(t)·Xt (3.10)


We will denote with the sequence (Ti)i≥0 the jump times of (Xt)t≥0 and with (Nt)t≥0 the

counting process associated to that sequence.

Using the Ito’s Formula, Theorem B.3.4, to real and complex part, we can write

Ψt = Ψ0+

∫ t

0

∂Ψs

∂sds+

∫ t

0

∇xΨs · dXcs+

1

2

∫ t

0

∇2xΨs :

(σ (Xs)σ

T (Xs))ds+

∑0<Ti≤t

ΨTi −ΨTi−

We can compute the involved derivatives, recalling the affine hypotesis (3.4)-(??)

∂∂t

Ψt = Ψt

[α− ρ0 + x

(β − ρ1

)]∇xΨt = Ψtβ ⇒ ∇xΨt · dXc

t = Ψtβ · [(K0 +K1x) dt+ σ(x)dWt]

∇2xΨt = Ψt (β ⊗ β)⇒ βT (H0 +H1x) β

then, adding and subtracting∫ t

0γsds and grouping all the terms we got

Ψt = Ψ0 +∫ t

0Ψs

[α− ρ0 + β ·K0 + 1

2β ·H0β + l0(θ(β)− 1)

]ds+

∫ t0

ΨsβTσdWs+∫ t

0ΨsXs ·

[β − ρ1 + βTK1 + 1

2β ·H1β + l1(θ(β)− 1)

]ds+

∑0<Ti≤t

ΨTi −ΨTi− −∫ t

0γsds.

Since α and β solve the GRE, the first and the third integrand are equal to 0 and

under the hypothesis 2 the second integral is a martingale (cf: [43]).

It remains to show that the last addend∑0<Ti≤t

(ΨTi −ΨTi−)−∫ t

0

γsds

is a martingale. That holds if and only if

Et

[ ∑0<Ti≤s

(ΨTi −ΨTi−)−∫ s

0

γudu

]=∑

0<Ti≤t

(ΨTi −ΨTi−)−∫ t

0

γudu

Splitting the expectations (allowed by hypothesis 1) we get

Et

[ ∑t<Ti≤s

(ΨTi −ΨTi−)

]= Et

[∫ s

t

γudu

]we observe that

ΨTi −ΨTi− = eR Ti0 R(Xs)dseα(Ti)+β(Ti)·XTi − e

R Ti−0 R(Xs)dseα(Ti−)+β(Ti−)·XTi−

but since α, β and the integral are continuous functions, we can regroup

ΨTi −ΨTi− = eR Ti−0 R(Xs)dseα(Ti−)+β(Ti−)·XTi−(eβTi−·(XTi−XTi−) − 1) = ΨTi−(eβTi−·∆XTi − 1)

3.2 First examples of affine processes 26

Then using the Law of Iterated expectation and the above relation:

Et

[ ∑t<Ti≤s

(ΨTi −ΨTi−)

]= Et

[ ∑t<Ti≤s

E [(ΨTi −ΨTi−) |XTi−]

]

= Et

[ ∑t<Ti≤s

(ΨTi− (θ(β(Ti))− 1))

]= Et

[∑i

∫ TiTi−1+

Ψu− (θ(β(u))− 1) dNu

]= Et

[∫ st


](3.11)

In the light of Remark 2, we note that, given hypothesis 1, (Ψt−(θ(β(t))− 1))t≥0 is an

(Gt)−predictable process and (λt)t≥0 is the intensity of the jump-counting process (Nt)t≥0

Et

[∫ s

t


]= Et

[∫ s

t

Ψu (θ(β(u))− 1)λudu

]which is the definition of γt.

Hence (Jt)t≥0, as also (Ψt)t≥0, is a martingale.

Remark 5. As we have seen, up to now we have dealt with three different filtration.

In Chapter 2 we had a process λDt , which is Ft−predictable, where Ft, t ≥ 0 is a

filtration satisfying the usual hypotheses such that Ft ⊂ Gt. We have written λDt to stress

the fact that the intensity of the double stochastic process is not the same of the pure

jump-process seen in (3.2). In addition, we have seen that the filtration (Gt)t≥0 can be

built directly from (Ft)t≥0. On the other hand, in the current chapter, we need only a

filtration with respect to which (Xt)t≥0 would be adapted, then (Gt)t≥0 would fit well since

Ft ⊂ Gt.

In order to apply the affine property to the right hand side of Claim I.3, it appears

clear that also λDt has to be affine. Then λDt = Λ(Xt−), where Λ(x) is an affine function .

Then a good choice is Ft := σ(Xt). About the condition under which a Markov process

generate a filtration satisfying the usual conditions, check [13], Theorem 4, 61.

Then from now on Et := E[· |Gt ] and we will use indifferently E[· |Ft ] or E[· |Xt ].

3.2 First examples of affine processes

An easy, first application of the affine property provided by Theorem 3.1.1 is the evalu-

ation of a zero-coupon bond, posing u = 0 in (3.8). Zero-coupon bonds are, fundamentally,

interest rate derivatives, then is natural to take as driving process an interest rate process;

we will take an one-dimentional interest rate factor for simplicity of notation.


In other words, we want to evaluate

Et

[e−

R Tt R(Xs)ds

]with R(Xt) = ρ0 + ρ1rt = 0 + 1 rt, then the focus moves on the choice of the process

(rt)t≥0, that has to be affine.

Two popular affine model are the Vasıcek model, described by

drt = k(γ − r)dt+ σdWt

and the Cox-Ingersoll-Ross (CIR) model

drt = k(γ − r)dt+ σ√rdWt

By a simple inspection we can find characteristics for both models

Vasıcek CIR

K (kγ,−k) (kγ,−k)

H (σ2, 0) (0, σ2)

l (0, 0) (0, 0)

θ 1 1

ρ (0, 1) (0, 1)

and then we can easily write down the associated GREs.

For the Vasıcek model we haveβ = −1− kβα = kγβ + 1

2σ2β2

α(0) = 0

β(0) = 0

and the solution isβ(t) = 1

k

(e−k(T−t) − 1

)α(t) = (T − t) ( σ

2

2k2 − γ) +(1− e−k(T−t)) (γ

k− σ2

k3

)+ σ2

4k3

(1− e−2k(T−t))

On the other hand, the associated GREs for the CIR model are:β = −1− kβ + 1

2σ2β2

α = kγβ

α(0) = 0

β(0) = 0;

this is a classical homogeneous Riccati equation, and the following result holds:


-2,5 0 2,5 5 7,5 10 12,5 15

-7,5

-5

-2,5

2,5

5

alpha

beta

Time

Figure 3.1: Solution of the coefficients α and β for the Vasıcek model, with T = 10,

σ = 0.012, k = 0.05, γ = 0.03

Lemma 3.2.1. Let us consider the following initial value problem (homogeneous Riccati

equation) x = Ax2 +Bx− C

x(0) = x0 ≤ 0.

Then the unique solution of (3.2.1) is

x(t) = −2C(eρt − 1)− (ρ (eρt + 1) +B (eρt − 1))x0

(ρ−B)(eρt − 1) + 2ρ− 2A(eρt − 1)x0

where ρ :=√B2 + 4AC, for A, C ≥ 0 and B ∈ R.

Therefore, the solution is, posing a =√k2 + 2σ:

β(t) = − 2(ea(T−t) − 1

)(a+ k)

(ea(T−t) − 1

)+ 2a

α(t) =2γkσ2 ln

(2ae(a+k)(T−t)/2

(a+ k)(ea(T−t) − 1

)+ 2a

)Those results are somewhat classic, they can be derived with an equilibrium approach

(cf. original papers : [15], [53]). The existence of analytical solution, with the (1.3) explains

the popularity of those models. We can (and we will) use the existence of those analytical

solution to test our numerical algorithms.

3.3 Extending the transform 29

3.3 Extending the transform

The trasform (3.8) could be a bit limiting for pricing purposes, since the payoff has

to have the peculiar form ev·XT . While is trivial to observe that we can manage the case

eu+v·XT simply replacing the condition α(0) = 0 with α(0) = u, actually a bit more can

be said.

3.3.1 Extended transform

Intuitively we can exploit the approach used in page 19, differentiating both sides of

the (3.8), moving the derivative through the expectation, to get

Et

[e−

R Tt R(Xs)ds (u ·XT ) ev·XT

]= eα(t)+β(t)·Xt (A(t) +B(t) ·Xt) (3.12)

where A(t) and B(t) have the same dynamics as α, β, but different boundary condition

in order to satisfy the identity between RHS and LHS of (3.12) for t = T .

Then A and B have to solve the following ODEs, obtained differentiating (3.6) both

sides, getting those linear ODEsB(t) = −KT

1 β(t)− βT (t)H1B(t)− l1∇xθ(β(t)) ·B(t)

A(t) = −KT0 ·B(t)− βT (t)H0B(t)− l0∇xθ(β(t)) ·B(t)

A(T ) = 0, B(T ) = u.

(3.13)

A fully fledged proof of this moves along the lines of the proof of Theorem 3.1.1, so we

will only enunciate the result, along with a necessary definition with technical hypothesis,

fully analogue of ones in Definition 3.4

Definition 3.5. A characteristic (K,H, l, θ) is extended well-behaved at (u, v, T ) if:

• (3.6) are solved uniquely by α and β.

• The Laplace jump transform θ is differentiable at β(t), t ≤ T ( it suffices that ν is

well defined and finite at β(t)).

• (3.13) are solved uniquely by A and B;

and

1. E[∫ T

0|γt| dt

]<∞, with γt = λ (Xt) (Φt (θ(β(t))− 1) + Ψt∇xθ(β(t))B(t)).

2. E[∫ T

0ηt · ηtdt

]<∞, with ηt = Φt

(βT (t) +BT (t)

)σ (Xt).

3. E [ΦT ] <∞.


where Φt = Ψt(A(t) +B(t) ·Xt).

Given Definition 3.5 , we have the extended result:

Theorem 3.3.1. Let (K,H, l, θ) be extended well-behaved, then (3.12) holds.

Of course all the considerations made for (3.6) hold also for (3.13).

3.3.2 Fourier transform inversion

The transform can be further extended to evaluate payoff of the form (ed·XT − c)+, i.e.

an option, let us define this with:

C(d, c, t, T ) := Et

[e−

R Tt R(Xs)ds(ed·XT − c)+

]noting that (ed·XT − c)+ = (ed·XT − c)1d·Xt≥ln c, we can rewrite

C(d, c, t, T ) = Gd,−d(− ln c;Xt, t, T )− cG0,−d(− ln c;Xt, t, T ) (3.14)

denoting with

Ga,b(y;Xt, t, T ) = Et

[e−

R Tt R(Xs)dsea·XT 1b·XT≤y

](3.15)

Performing the Fourier-Stieltjes transform (cf. Definition A.3) of Ga,b(y;Xt, t, T ), de-

noted with Ga.b(v;Xt, t, T ), we have:

Ga.b(v;Xt, t, T ) =

∫ReivydGa,b(y;Xt, t, T )

but the differentiating trough the expectation, exchanging integral and expectation and

observing that ddy

1b·Xt≤y = δ(b ·XT − y) we have

Ga.b(v;Xt, t, T ) = Et

[e−

R Tt R(Xs)dse(a+ivb)·XT

]= ψ(a+ ivb, x, t, T )

using the Theorem 3.1.1.

Then, knowing the parameters of an option, we are able to determine easily the Fourier-

Stieltjes transform of (3.14)

Theorem 3.3.2 (Transform Inversion). Let (K,H, l, θ) be well-behaved at (a + ivb, T ),

for fixed T ∈ [0 +∞), a, b ∈ Rn and any v ∈ Rn, and∫R|ψ(a+ ivb, x, t, T )| dv <∞ (3.16)

Then Ga,b is well defined by (3.15) and can be expressed by

Ga.b(y;Xt, t, T ) =ψ(a,Xt, t, T )

2− 1

π

∫ ∞0

= [ψ(a+ ibv,Xt, t, T )e−ivy]

vdv (3.17)

where =[c] denotes the imaginary part of any c ∈ C.


Proof. Let us fix y ∈ R, and for any τ ∈ (0,∞):

12π

∫ τ−τeivyψ(a− ivb, x, t, T )− e−ivyψ(a+ ivb, x, t, T )

iv dv

= 12π

∫ τ−τ

∫Re−iv(z−y) − eiv(z−y)

iv dGa,b(z;x, t, T )dv

Since for all u, v ∈ R, |eiv − eiu| ≤ |v − u| we have∣∣e−iv(z−y) − eiv(z−y)∣∣

iv≤ −2i |z − y| sgn(v)

andlim

y→−∞Ga,b(y;x, t, T ) = ψ(a, x, t, T ) <∞

limy→−∞

Ga,b(y;x, t, T ) = 0(3.18)

we can use Fubini Theorem to exchange the order of integration, getting

=1

2π

∫R

∫ τ

−τ

e−iv(z−y) − eiv(z−y)

ivdvdGa,b(z;x, t, T ). (3.19)

Recalling the Euler Formula, eia−e−ia2i

= sin(a) for all a ∈ R, we can write

1

2π

e−iv(z−y) − eiv(z−y)

iv= −sin(v(z − y))

πv= −sgn(z − y) sin(v|z − y|)

πvtherefore (3.19) can be rewritten

−∫

Rsgn(z − y)

[∫ τ

−τ

sin(v|z − y|)πv

dv

]dGa,b(z;x, t, T ) (3.20)

with the inner integral bounded for every τ, z and for a fixed y.

Recalling that∫

Rsin(αx)

xdx = π for all α > 0, the bounded convergence theorem yields

limτ→∞1

2π

∫ τ−τeivyψ(a− ivb, x, t, T )− e−ivyψ(a+ ivb, x, t, T )

iv dv

= −∫

R sgn(z − y)dGa,b(z;x, t, T )

= −[∫∞

ydGa,b(z;x, t, T )−

∫ y−−∞ dGa,b(z;x, t, T )

]= − Ga,b(z;x, t, T )|z=∞z=y + Ga,b(z;x, t, T )|z=y−z=−∞ .

Recalling the condition (3.18) we get

= −ψ(a, x, t, T ) +Ga,b(y;x, t, T ) +Ga,b(y−;x, t, T )

where

Ga,b(y−;x, t, T ) = limz→y,z≤y

Et

[e−

R Tt R(Xs)dsea·XT 1b·XT≤z

]∣∣∣XT=x

.

Using (3.16), by dominated convergence theorem we have

Ga,b(y−;x, t, T ) = Ga,b(y;x, t, T );


then

Ga,b(y;x, t, T ) =ψ(a, x, t, T )

2+

1

4π

∫ ∞−∞

eivyψ(a− ivb, x, t, T )− e−ivyψ(a+ ivb, x, t, T )

ivdv.

Then, observing that eivyψ(a−ivb, x, t, T ) is the complex conjugate of e−ivyψ(a+ivb, x, t, T ),

we get (3.17) using the fact that, for all c ∈ C, c := <[c] + i=[c], we have c∗ − c =

<[c]− i=[c]−<[c]− i=[c] = −2i=[c], denoting with c∗ the complex conjugated of c.

Remark 6. This result is of little practical utility, since each evaluation of the integrand

in ∫ ∞0

= [ψ(a+ ibv,Xt, t, T )e−ivy]

vdv

request the solution of a set of complex GREs (the boundary condition of the GREs

depends on v), and that can be painful from the computational point of view. Different

will be the case in which the transform is explicitly known, then (3.17) requests only an

complex integral to be computed.

3.3.3 Fourier representation

Theorem 3.1.1 allow us to to price a payoff of the form eu·XT , u ∈ C, and it is straight-

forward to apply the trasform to the class of functions that admit an integral representa-

tion, like Fourier or Laplace.

Proposition 3.3.3. Let the hypothesis of Theorem 3.1.1 hold and let us suppose that

the payoff of a contingent claim admits a multidimensional Fourier representation, with

ω, ω0 ∈ Rn

f(XT ) =

∫Rne(ω0+iω0)·XTF (ω)dω.

If Et

[e−

R Tt R(Xs)ds+ω0·XT

]<∞ then the price can be expressed with the formula

St =

∫Rnψ(ω0 + iω,Xt, t, T )F (ω)dω (3.21)

Proof.

St = Et

[e−

R Tt R(Xs)ds

∫Rne(ω0+iω)·XTF (ω)dω

].

Using Fubini’s theorem

St =

∫Rn

Et

[e−

R Tt R(Xs)dse(ω0+iω)·XTF (ω)

]dω

St =

∫Rn

Et

[e−

R Tt R(Xs)dse(ω0+iω)·XT

]F (ω)dω.

Then by Theorem 3.1.1, the result follows.

3.4 An optimization idea 33

Remark 7. Like for Fourier inversion, (3.21) can be really hard to evaluate, since it requests

integration over Rn, and the integrand is a really hard function to evaluate, since for each

ω, the whole trajectory of GREs has to be re-evaluated.

3.3.4 Time dependence and multiple jump types

All the proof so far were carried out with the process (Xt)t≥0 defined by the generator

(3.1) . We can redefine D as a subset of Rn × [0,∞) and the infinitesimal generator

Df(x) =∂f

∂t+∇xf ·µ(x, t)+

1

2∇2xf :

(σ(x, t)σT (x, t)

)+

m∑i=1

λi(x)

∫Rn

[f(x+ z)− f(x)] dνit(z)

with f : D → R a function smooth enough and νit(z), i = 1, . . . ,m is a time-dependent

jump distribution. If we suppose that all the quantities listed above are continuous and

bounded with respect to t on the interval [0,∞), θi(c, t) :=∫

Rn ec·zdνit(z) and

µ(x, t) = K0(t) +K1(t)x K(t) := (K0(t), K1(t)) ∈ (Rn,Rn×n)(σ(x, t)σT (x, t)

)ij

= (H0(t))ij + (H1(t))ij x H(t) := (H0(t), H1(t)) ∈ (Rn×n,Rn×n×n)

λi(x, t) = li0(t) + li1(t) · x li(t) := (li0(t), li1(t)) ∈ (R,Rn)

(3.22)

Theorems (3.1.1) and (3.3.1) still hold, simply replacing l0(θ(c) − 1) and l1(θ(c) − 1) in

GREs withm∑i=1

li0(θi(c, t)− 1) andm∑i=1

li1(θi(c, t)− 1).

3.4 An optimization idea

Let us consider a firm, issuing a contingent claim as form of financing, with a fixed

maturity date T and a payoff ST = F (v) = ev·XT ,. Then the payoff depends on the linear

combination with coefficients vi, i = 1, . . . , n of n econometric variables X iT . Up to now

we have treated the coefficients v as given, but a question arise:

How should the firm choose the coefficients v?

a possible answer is: the firm should choose v such that we get the maximum money

selling the claim. In other words

maxv∈V

S0 := E0

[e−

R T0 R(Xs)dsev·XT

]where V ∈ Rn is the (closed) set of all the admissible v.

3.4 An optimization idea 34

This can be written using the affine framework, as an optimization problem:

maxv∈V

α(T ) +X0 · β(T )

st

β = −ρ1 +KT1 β + 1

2βTH1β + l1(θ(β)− 1)

α = −ρ0 +KT0 · β + 1

2βTH0β + l0(θ(β)− 1)

α(0) = 0

β(0) = v

(3.23)

A more reasonable choice criterion would be, since the dynamics are non linear, to

look for a static control v such that maximises the income/outcome ratio, that is

maxv∈V

E0

[e−

R T0 R(Xs)dsev·XT

]E0 [ev·XT ]

.

Of course the maximum value attainable is 1, and can be seen as if the firm looks for the

way to pay as few interest as possible on the bond.

In our friendly, affine context, that can be written

maxv∈V

α(T )− α(T ) +X0 · (β(T )− β(T ))

st

β = −ρ1 +KT1 β + 1

2βTH1β + l1(θ(β)− 1)

α = −ρ0 +KT0 · β + 1

2βTH0β + l0(θ(β)− 1)

˙β = KT1 β + 1

2βTH1β + l1(θ(β)− 1)

˙α = KT0 · β + 1

2βTH0β + l0(θ(β)− 1)

α(0) = α(0) = 0

β(0) = β(0) = v

(3.24)

It is possible to show that those problems are well posed.

Proposition 3.4.1. Let V 6= ∅ be a closed and limited set. Then a solution to problem

(3.24) exists.

Proof. For Bolzano-Weierstrass’ theorem, then V is also a compact set. The solution of the

GREs can be expressed by functions α(v, t), α(v, t), β(v, t), β(v, t)), which, by Theorems

4.1.2-4.1.3, are continuous with respect to v.

Then for Weierstrass’ Theorem we have that the continuous function v → α(v, T ) −α(v, T ) +X0 · (β(v, T )− β(v, T )) has an extremal value over V .

This approach can be extended to any setting in which the objective function can be

expressed in a way tractable with affine processes.

3.5 A more general result 35

3.5 A more general result

As said in Section 3.1, the state space is taken D : Rm+ × Rn−m, m = 1, . . . , n. We

will denote with Semn the space of n × n positive semi-definite symmetric matrices,

I =: 1, . . . ,m and I := m+ 1, . . . , n. Moreover, we will denote in this section

a · b = a1b1 + . . .+ anbn, a, b ∈ Cn

which is not the standard inner product on Cn.

Definition 3.6. A Markov process (Xt)t≥0 is called regular affine if its characteristic

function has exponential-affine dependence on the initial state, i.e for t ∈ R+, u ∈ iRn,

there exist φ(t, u) ∈ C and ψ(t, u) ∈ Cn such that for all x ∈ D

E[eu·Xt |X0 = x

]= eφ(t,u)+ψ(t,u)·x (3.25)

Moreover, the functions ψ and φ are continuous in t and ∂∂tψ(t, u)

∣∣t=0+ , ∂

∂tφ(t, u)

∣∣t=0+ exist

and are continuous at u = 0.

It has been shown that affine regular processes are fully characterized by

Theorem 3.5.1 (Theorem 2.7, [22]). A regular affine process is a Feller process with

infinitesimal generator

Df(x) = ∇2xf : A(x)+∇xf ·B(x)−C(x)f(x)+

∫D\0

(f(x+ ξ)− f(x)−∇xf · χ(ξ))M(x, dξ)

(3.26)

for f ∈ C2c (D)

A(x) = a+m∑i=1

xiαi a, αi ∈ Rn×n

B(x) = b+N∑i=1

xiβi b, βi ∈ Rn

C(x) = c+m∑i=1

xiγi c, γi ∈ R+

M(x) = m(dξ) +m∑i=1

xiµi(dξ) m,µi are Radon measure over D\0

(3.27)

and χ : Rn → Rn, χi(ξ) = max(1, |ξi|)sgn(ξi). In order to ensure that the process will not

leave D:

• a ∈ Semn, with (a)ii = 0, i ∈ I.

• αj ∈ Semn, with (αj)kk = 0, j, k ∈ I, k 6= j.

• b ∈ D.


• (βj)i = 0, i ∈ I, j ∈ I, (βi)j ∈ R+, i, j ∈ I, i 6= j.

• m =∫D\0

(∑i∈Iχi(ξ) +

∑j∈I

|χj(ξ)|2)m(dξ) <∞.

• µi =∫D\0

( ∑j∈I\i

χj(ξ) +∑

j∈I∪i|χj(ξ)|2

)µi(dξ) <∞, i ∈ I.

Furthermore, ψ and φ in (3.25) solve the generalized Riccati equations,∂∂tφ(t, u) = F (ψ(t, u))

∂∂tψ(t, u) = R(ψ(t, u))

φ(0, u) = 0

ψ(0, u) = u

(3.28)

where

F (u) = au · u+ b · u− c+∫D\0

(eu·ξ − 1− u · χ(ξ)

)m(dξ)

Ri(u) = αiu · u+ βi · u− γi +∫D\0

(eu·ξ − 1− u · χ(ξ)

)µi(dξ), i ∈ I

Ri(u) = βi · u i ∈ I

(3.29)

Conversely, for any choice of admissible parameters a, αi, b, βi, c, γi, m, µi, there exists

a unique regular affine process with generator (3.26).

Remark 8. Let β ∈ Rn×n be the matrix such that the i−th column is formed by βi, then

β looks

β =

∗ + . . . + 0 . . . . . . 0

+. . . . . .

......

. . . . . ....

.... . . . . . +

.... . . . . .

...

+ . . . + ∗ 0 . . . . . . 0

∗ . . . . . . ∗ ∗ . . . . . . ∗...

. . . . . ....

.... . . . . .

......

. . . . . ....

.... . . . . .

...

∗ . . . . . . ∗ ∗ . . . . . . ∗

(3.30)

where + denote an element of R+, ∗ an element of R.

Then the third equation in (3.28) is a linear autonomous system that can be separately

solved, yielding ψj(t, u) = (eβtw)j−m, j ∈ I

∂∂tψi(t, u) = Ri(ψ(t, u)) i ∈ I

φ(t, u) =∫ t

0F (ψ(t, u))

ψ(0, u) = u

(3.31)


where w is a vector in Rn−m, containing the last n−m component of u, β ∈ R(n−m)×(n−m)

is the bottom-rightest submatrix of β and eβt =∞∑i=0

(βt)k

k!.

Remark 9. A is the diffusion matrix σσT . The restrictions on a and αi imply that (a)kl =

(a)lk = (αj)kl = (αj)lk = 0, j, k ∈ I, k 6= j, imposing a rigid dependence structure on the

matrix A. E.g., for n = 3, if m = 0 A is an arbitrary semi-positive definite matrix and it

can not depend on any component of (Xt)t≥0. For 1 ≤ m ≤ 3, we show the pattern of a,

and αi, i ∈ I:

m = 1, a =

0 0 0

+ ∗+

, α1 =

+ ∗ ∗+ ∗

+

,

m = 2, a =

0 0 0

0 0

+

, α1 =

+ 0 ∗0 0

+

, α2 =

0 0 0

+ ∗+

,

m = 3, a = 0, α1 =

+ 0 0

0 0

0

, α2 =

0 0 0

+ 0

0

, α3 =

0 0 0

0 0

+

,where + is a nonnegative number and ∗ is a number such that semi-positive definiteness

holds. For m = n, A is a diagonal with nonnegative elements with a straightforward

square root; then a regular affine process is Rn+ if and only if has a multifactor CIR σ(X).

A small code was written to plot the structure of those matrices (cf. Appendix D )

Remark 10. Parameters in (3.27) are not time dependent. As for the simpler case of jump

diffusion processes, it can be extended to the time dependent case, assuming that all the

parameters are time continuous, for further details we refer to [27].

The reader could have yet noticed that equations (3.28) are similar to (3.7) if C(x) = 0

but the discounting part is missing; to clarify this, the following result has been shown in

[22], Section 11:

Proposition 3.5.2 (Proposition 11.2, [22], 45). Let (Xt)t≥0 be an affine process with

c = 0 and γi = 0, (cf. Theorem 3.5.1) Let (x, r) ∈ D × R, then

E[eqR

rt+u·Xt |X0 = x

]= eφ

′(t,u,q)+ψ′·z+qr


where Rrt = r +

∫ t0

(l + λ ·Xs) ds, l, r ∈ R, λ ∈ Rn, q ∈ iR, and ψ′, φ′ solve∂∂tφ′(t, u, q) = F (ψ′(t, u, q)) + lq

∂∂tψ′(t, u, q) = R(ψ′(t, u, q)) + λq

φ′(0, u, q) = 0

ψ′(0, u, q) = u

(3.32)

Although the case of interest q = −1 is not strictly covered by this Proposition (iR :=

c ∈ C : <[c] = 0), il can be extended to include also for q = −1 if

E[e−R

rt |X0 = x

]<∞, ∀x ∈ D;

which is always satisfied if we assume to beRrt positive (as is usual in financial applications,

since Rrt plays the role of a discount rate).

Then, in the light of Proposition 3.5.2, we have that Theorem 3.5.1 fully extends the

results given in Section 3.1, providing existence restrictions on parameters and allowing

for infinite activity jumps.

3.5.1 Diagonal diffusion matrix

It is common in literature to consider σ as a diagonal matrix (e.g. [16]), but impos-

ing instantaneously uncorrelated diffusion could lead to loss of generality with possible

consequences of poor fit to data. Let us suppose to have an affine jump-diffusion process

(Xt)t≥0, with state space D : Rm+ × Rn−m. Let suppose to have a linear transformation

Λ : D → D, where Λ ∈ Rn×n is a nonsingular matrix. Applying Ito Formula to Yt = ΛXt,

we get:

dYt =(Λb+ ΛβΛ−1Yt

)dt+ Λσ(Λ−1Yt)dW (t) + ΛdZt

which is still affine and b′ := Λb, β′ := ΛβΛ−1, m′ = λf(Λdξ) and σ′ = σ(Λ−1Yt) satisfy

the conditions of Theorem (3.5.1). If we look at the diffusion matrix of (Yt)t≥0 we get

σ′(σ′)T = ΛaΛT + Λα1ΛTx1, · · ·+ ΛαmΛTxm.

If we are able to find a nonsingular matrix Λ such that ΛaΛT ,Λα1ΛT , . . . ,ΛαmΛT are

diagonal, then we can assume a diagonal diffusion matrix without loss of generality.

To partially clarify the matter, there is the following result:

Theorem 3.5.3 ([12]). Let (Xt)t≥0 be an affine jump-diffusion process with state space

D and diffusion matrix A : D → Rn×n. If m ≤ 1 or m ≥ n− 1, then there exists a regular

n× n matrix Λ : D → D, such that ΛaΛT ,Λα1ΛT , . . . ,ΛαmΛT are diagonal.

In particular, for n ≤ 3 one of those condition are always satisfied.


Proof. The proof is essentially constructive, and it deals separately with the four cases

m = 0, m = 1, m = n − 1, m = n. Let us denote with ei the i−th component of the

standard basis in Rn. For summation we use the Einstein summation convention, i.e. if

an index i is repeated twice, the expression is summed over all the possible values of i.

We refer to [52] for more details. Finally, we recall that ΛαiΛT is diagonal if and only if

(ΛαkΛT )ij = ei · ΛαjΛT ej = ΛT ei · αkΛT ej = 0, k = 0, . . . ,m, i, j = 1, . . . , n, i 6= j.

Case m = 0: we have only a, an arbitrary semi-definite positive matrix, then there exists

an ortogonal matrix such that ΛaΛT is diagonal.

Case m = 1: we have a and α1, with

a =

[0 0

0 A

]

and A is a n−1×n−1 semi-definite positive matrix. Considering α1, there are two cases:

(α1)ij = 0, and (α1)ij > 0. If (α1)ij = 0 we define

(α1)ij = (α1)ij −(α1)i1(α1)1j

(α1)11

where (α1)i1 = (α1)1j = 0, i, j = 1, . . . , n; from the definition of α1 follows:

x·α1x = xi(α1)ijxj = xi(α

1)ijxj−xi(α

1)i1(α1)1jxj(α1)11

= xi(α1)ijxj−

(xi(α1)ikδk1)(δk1(α1)kjxj)

(α1)11

(3.33)

where δij is the Kroneker delta. Since (ei)j = δij and α1 is symmetric, is straightforward

to recognize that:

x · α1x = x · α1x− (x · α1e1)2

(α1)11

.

For the Cauchy-Schwarz inequality we have x · α1e1 ≤√x · α1x

√e1 · α1e1; hence also α1

is semi-definite positive. On the other hand, if (a1)11 = 0, then α1 =: α1. Then also α1 is

of the form

α1 =

[0 0

0 B

],

with B semi-definite positive. Then for the Theorem 8.7.1, [30] there exist an orthogonal

matrix Q ∈ R(n−1)×(n−1) such that QAQT and QBQT are both diagonal. Let us consider

the matrix

Λ :=

1 0 . . . 0

Λ21

... Q

Λ2n

(3.34)


where (Λ)k1 are choose such that ΛT ek · α1e1 = (Λ)kj(α1)j1 = 0, k ≥ 2. Dropping for a

moment the summation convention, that means to compute

Λk1 = −

∑j≥2

(α1)j1(Λ)kj

(α1)11

, k ≥ 2;

if (α1)11 ≥ 0, while is always satisfied (α1)11 = 0. With such Λ we have that both ΛaΛT and

Λα1ΛT are diagonal, and if (α1)11 = 0 then also α1 is diagonal. Otherwise, if (α1)11 > 0,

reasoning in a way similar to (3.33), we have

ΛT ei · α1ΛT ej = ei · Λα1ΛT ej +(ΛT ei · α1e1)(ΛT ej · α1e1)

(α1)11

= 0, i 6= j,

i.e. is diagonal. In addition to this, is straightforward to see from (3.34) that Λ : R+ ×Rn−1 → R+ × Rn−1.

Case m = n − 1 : and the only non zero element in a is (a)nn ≥ 0. On the other hand,

for i = 1, . . . , n− 1, the non zero elements of αi are: (αi)ii, (αi)in, (αi)ni, (αi)ii. We take

Λ =

1 0 0 0

0. . . 0 0

0 0. . . 0

Λ1n . . . . . . Λnn

(3.35)

where

Λin =

− (ai)in

(ai)in1 ≤ i ≤ n− 1 if (ai)ii > 0

0 1 ≤ i ≤ n− 1 if (ai)ii = 0

1 i = n

. (3.36)

To verify that such Λ diagonalize αi let us write

ΛT ej · αiΛT el = (Λ)lh(αi)hk(Λ)kj, i = 1, . . . , n− 1, j 6= l.

Writing explicitly the summation and recalling the restrictions on αi we have:

(Λ)li(αi)ii(Λ)ij+(Λ)li(α

i)in(Λ)nj+(Λ)ln(αi)ni(Λ)ij+(Λ)ln(αi)nn(Λ)nj, i = 1, . . . , n−1, j 6= l.

For j 6= i, l 6= n all the components of Λ involved in the summation are null. It remains

to check values j = i, l = n and l = i, j = n, and also in this case the summation is 0, by

definition of Λ. It is also clear that Λ : Rn−1+ × R→ Rn−1

+ × R.

Case m = n: a is filled with zeroes, and the only non zero element in αi, i = 1, . . . , n− 1

is (αi)ii. Then Λ can be taken as the identity matrix.

3.6 More examples 41

Remark 11. In practice, we do not need to explicitly know the matrix Λ, but it suffice

to know that we can use a diagonal diffusion matrix without fear to miss capability to

match important features of the data. On the other hand, for a Rn+ valued affine process

the diagonal diffusion matrix is the only one admissible.

Remark 12. It was shown in [12] that there exists an affine process in R2+ × R2 whose

matrix cannot be diagonalized with a regular matrix Λ. Therefore, care has to be taken

with the assumption of instantaneously uncorrelated state variables for the cases not

covered by Theorem 3.5.3.

3.6 More examples

We will present two affine jumps-diffusion models showing how a simple generalizations

can make solving GREs explicitly much more difficult.

We will give a hint of how the affine framework works for our credit risk setting

(anticipating the traction made in Chaper 5), introducing a complete 4−dimensional

model which features mean reversion and stochastic volatility.

3.6.1 CIR with jumps

Let (Xt)t≥0 be a R+ valued Markov process, with generator

Df(x) = σ2x∂2f(x)

∂x2+ k(γ − x)

∂f(x)

∂x+l

d

∫R+

(f(x+ z)− f(x))e−zddz. (3.37)

The process is a jump-diffusion CIR model, with jump size distributed exponentially with

mean d and with intensity l. The jump transform is

θ(s) =1

d

∫R+

esze−zddz =

1

1− ds, s ∈ C.

Therefore GREs associated at the problem EQt

[e−

R Tt (ρ0+ρ1Xu−)duev+uXT

]are

β = −ρ1 − kβ + 12σ2β2

α = −ρ0 + kγβ + l dβ1−dβ

α(0) = v

β(0) = u

Those equations admit a closed form solution (we used Mathematica for the formal cal-

culations):β = 1+a1eb1(T−t)

c1+d1eb1(T−t)

α = v +(kγc1

+ lc2− l − ρ0

)(T − t) + kγ(a1c1−d1)

b1c1d1log c1+d1eb1(T−t)

c1+d1+

+ l(a2c2−d2)b2c2d2

log c2+d2eb2(T−t)

c2+d2

3.6 More examples 42

where the coefficients are

c1 =k+√k2+2σ2ρ1−2ρ1

, d1 = (1− c1u)−k+σ2u+

√k2+2σ2ρ1

2u+σ2u2−2ρ1,

a1 = (d1 + c1)u− 1, b1 = −d1(k+2ρ1c1)+a1(σ2−kc1)a1c1−d1 ,

a2 = d1c1, b2 = b1,

c2 = 1− dc1, b2 = d1−da1

c1.

This model can be used to model default intensities or short rate.

3.6.2 Bates model

Bates model was presented in [2], to extend Heston model allowing jumps in the log-

price, and it is described by the following SDE:dStSt

= µSdt+√VtdW

St + dJt

dVt = kV (γV − Vt)dt+√VtdW

Vt .

(3.38)

This model features a stochastic volatility Vt, which is modelled with a CIR process

with mean reversion, and jumps in price St, with intensity λ and jump distribution f ,

while brownian motions (W St )t≥0 and (W V

t )t≥0 are correlated with constant coefficient ρ.

At first glance, the process is not affine, but if we perform the transformation Yt = lnSt,

applying Ito formula to (3.38) we getdYt = (µS − 1

2Vt)dt+

√VtdW

St + dJt

dVt = kV (γV − Vt)dt+ σV√VtdW

Vt .

(3.39)

where

Jt =Nt∑i=1

[ln

(1 +

∆STiSTi−

)].

Model (3.39) is affine, with characteristics:

µ =

(µS

kV γV

)+

[0 −1

2

0 −kV

](Yt

Vt

)

σσT =

[1 ρσV

ρσV(σV)2

]Vt

θ(c) =∫

R eczf(z)dz

where f is the jump distribution of (Jt)t≥0.

Usually f is chosen in a way that (Jt)t≥0 has a desired jump distribution; e.g in [2],

jump-sizes of (Jt)t≥0 are log-normally distributed, and then jumps-sizes of (Jt)t≥0 are

normally distributed, case in which explicit solutions are available.

3.7 Further considerations and reference 43

Remark 13. Let us model the price of a defaultable claim with Bates model, then the

price of that claim would be, using (I.3):

EQt

[eR Tt rsdsSt1τ≥T

]= EQ

t

[eR Tt (rs+λs)dseYt

];

if we model (rt)t≥0 and (λt)t≥0 with one-dimensional affine processes (e.g, CIR with

jumps), then we have that the price of this claim falls in the cases handled by Theo-

rem 3.1.1. To be more precise, we have an affine process (Xt)t≥0 in R3+ × R, with Xt =

(Vt, rt, λt, Yt) and coefficients (cf. Definition 3.4): u = (0, 0, 0, 1), ρ0 = 0, ρ1 = (0, 1, 1, 0);

while remaining coefficients depend on the choice made for (rt)t≥0 and (λt)t≥0. We will

return on this complete example in Chapter 6.

We want to point out that, for such model, hypotheses of Theorem 3.5.3 hold, and

therefore, for calibration purposes, diffusion part for such model can be taken diagonal.

3.7 Further considerations and reference

Here we will mention two topic that are of practical interest, but they were not treated

in detail; here we will give reference for the interested reader.

3.7.1 Infinite activity vs. finite activity

We have seen, in Sections 3.1 and 3.5, results which are similar for two class of pro-

cesses: respectively affine jump diffusion processes and regular affine processes.

Basically, difference between such processes lies in the jump part: the former are

common diffusion processes, interpuncted by “rare” jumps which can represent sudden

events, like upheavals, crashes, discoveries etc.; on the other hand, the latter exhibit

infinitely many small jumps, which can “move” the process even without a diffusion part.

Affine jump diffusion processes are easy to simulate, say, for a MCM, and the dynamical

structure of the process is easy to understand and describe, since the distribution of jump

sizes is known. They are often used for the purpose of implied volatility smile interpolation,

like in [23], for more on this subject, we refer to [14], Chapter 13.

On the other hand, for infinite activity processes the familiar concept of a jump distri-

bution doesn’t hold, and they are less trivial to simulate, but in return they are considered

to be able to reproduce in a realistic way historical price data.

Then the choice between compound Poisson processes and infinite activity process is

just a matter of modelling preference and there is no ultimate solution.

3.7 Further considerations and reference 44

3.7.2 Statistical estimation

On important aspect of any model is the calibration, i.e. to find parameters of a model

that reflet real market conditions. A wide class of techniques are based on the inversion

of the characteristic function

Φ(u,Xt, t, T ) = Et

[eiu·XT

], u ∈ Rn.

Let us consider the characteristic function Φ(u,Xt, tn, tn+1); using a standard Fourier

analysis result we get the conditional density

f(Xtn+1 |Xtn ) =1

(2π)n

∫Rne−iuΦ(u,Xt, tn, tn+1)du. (3.40)

In [50], (3.40) is exploited to construct generalized method-of-moments and to derive

maximum likelihood estimators. However, this approach can be slow, since evaluation of

(3.40), as already pointed out, can be a demanding problem.

If the state process contains latent components which cannot be directly inferred from

data sources (e.g. parameter of volatility in stochastic volatility model); additional filter

procedures come into question. Typically, Kalman filters are used for all model classes:

e.g., [19] and [32].

Always considering latent variables, Bates in [3], proposed a “direct filtration-based

maximum likelihood method” for his model, which we have already encountered in Section

3.6.2 .

Chapter 4

Numerical Methods

So far we have seen that we face two classical problem in Numerical Analysis: Solu-

tion of a set of n + 1 ODEs and integration of a function Rn → R. We will devote to

each problem one section of this chapter. The first one is devoted to Runge-Kutta (RK)

methods, the second one to the numerical integration

4.1 Runge-Kutta methods

RK methods are the most famous and used family of algorithms for the approximation

of solutions to ODEs. To begin with we recall some fundamental results about ODEs.

4.1.1 Facts on ODEs

The Cauchy problem (also known as the initial-value problem) consists of finding the

solution of an ODE, in the scalar or vector case, given suitable initial conditions. In

particular, in the scalar case, denoting by I ⊂ R containing the point t0, the Cauchy

problem associated with a first order ODE reads: find a real-valued function y ∈ C1(I)

such that: y = f(t, y(t)) t ∈ Iy(t0) = y0

(4.1)

where f(t, y) is a given real-valued function on the interval S = I× (−∞,∞) which is

continuous with respect to both variables. If f depends on t only through y, the differential

equation is called autonomous.

Theorem 4.1.1 (Existence and uniqueness theorem, [29]). Let Ω be an open set in R×Rn.

45

4.1 Runge-Kutta methods 46

If f : Ω→ Rn is a continuous function, and∃δ, ε, L > 0 : [t0 − δ, t0 + δ]× Bε(y0) ⊆ Ω

||f(t, y′)− f(t, y

′′)|| ≤ ||y′ − y′′||L (Lipschitz in the second variable)

∀t ∈ [t0 − δ, t0 + δ], ∀y′ , y′′ ∈ Bε(y0)

(4.2)

holds, then exist an open set including x0 where is defined only and only one solution to

the Cauchy problem (4.1). We denote with Bε(y0) the open ball centred in y0 with radius

ε, with Bε(y0) his closure.

Therefore, will suppose through the chapter that the RHS of (4.1) is Lipschitz.

We have seen in Section 3.4 that we have a problem with variable initial value: some

regularity results are available

Theorem 4.1.2 (Continuous dependence theorem, [29]). Let Ω be an open set in R×Rn

and f : Ω → R a succession of continuous function such that fkk→ f point-wise on Ω.

In addition, let us suppose that, on every compact set K ⊂ Ω the convergence is uniform

and let properties (4.2) hold for every x ∈ Ω and fk and with the constants δ, ε and L

independent of k.

Moreover, let be t0 ∈ R and u0k ∈ Rn a succession such that (t0, y

0k)

k→ (t0, y0), where

for every k: (t0, y0k), (t0, y

0) ∈ Ω.

Then the problem (4.1) and the problemyk = fk(t, yk(t)) t ∈ Iy(t0) = y0

k

have unique solutions y(t) and yk(t) such that yk(t)k→ y(t) uniformly.

We can see the solution of (4.1) as a function ϕ(t, y0) that is continuous with respect

both t and y0. Of course the solution is differentiable with respect to t, while for the

differentiability with respect to y0, the following theorem holds:

Theorem 4.1.3 ([44]). Let the existence and uniqueness theorem’s hypotesis hold, and

let the partial derivatives (∇yf)ij be continuous over Ω for each i, j = 1, . . . , n. Then the

derivatives (∇y0ϕ)ij exist and are continuous. Moreover, also the derivatives ∂∂t

(∇y0ϕ)ijare continuous.

4.1.2 Remarks on Generalized Riccati Equations

Looking at (3.28), the presence of∫D\0

(eu·ξ − 1− u · χ(ξ)

)µi(dξ) doesn’t ensure that

R(u) would be Lipschitz.


E.g. let us consider a process (Xt)t≥0 in R+ with generator

Df(x) = x2√π

∂f

∂x+

∫R+\0

(f(x+ ξ)− f(x)− ∂f

∂xχ(ξ)

)x

2√π

dξ

ξ3/2.

Then is easy to recognize that our characteristics are

µ(dξ) =1

2√π

dξ

ξ3/2, β =

2√π,

then the GREs are

∂φ∂t

= 0∂ψ∂t

= 2√πψ +

∫R+\0 (eψξ − 1− ψχ(ξ)) 1

2√π

dξξ3/2

φ(0) = 0

ψ(0) = v

. (4.3)

The integral can be explicitly evaluated, yielding:∫R+\0

(eψξ − 1− ψχ(ξ))1

2√π

dξ

ξ3/2= −

√−ψ − 2ψ√

π, <[ψ] ≤ 0.

Of course RHS of (4.3) are not Lipschitz in the origin, and a solution is:

φ(t, v) = 0

ψ(t, v) = −(2√−v + t)2/4

. (4.4)

To ensure the existence and uniqueness of a solution to GREs, the following result

holds:

Theorem 4.1.4 (Proposition 6.1., [22], 24). For every u ∈ Cm−− × iRn−m there exists a

unique solution ψ(·, u) and φ(·, u) to (3.28) with values in Cm−−×iRn−m and C, respectively.

Moreover, φ and ψ are continuous on R×Cm−−× iRn−m. We denote with C−− = c ∈ C :

<[c] < 0.

4.1.3 Analysis of one step methods

Let us pose some definitions, that will become useful in the treatment of the RK

methods, which are basically the most prominent class of the family of one-step methods.

Our analysis will concentrate on one single differential equation (scalar case), but all the

presented results can be effortlessly extended to the general n-dimensional case, just using

in place of the modulus an appropriate norm.

Fix 0 < T < ∞ and let I = (t0, t0 + T ) and for h > 0, define tn = t0 + nh, with

n = 1, . . . , Nh, where Nh is the greatest integer such that tNh ≤ t0 + T . Moreover, let

us denote by uj the approximation of the exact solution yj = y(tj). In a similar way, we

define fj := f(tj, uj).


Definition 4.1. A numerical method for the approximation of problem (4.1) is called

a one-step method if n = 1, . . . , Nh, un+1 depends only on un. Otherwise, the scheme is

called a multistep method.

Definition 4.2 (Explicit and implicit methods). A method is called explicit if un+1 can

be computed directly in terms of (some of) the previous values uk,∀ k ≤ n . A method is

said to be implicit if un+1 depends implicitly on itself through f .

We want to point out that one-step methods are suitable to be used in cooperation

with step-adaptive techniques, that is, instead of fixing h at the beginning, we use as time

increment rule tn+1 = hn + tn, with hn chosen with some error-controlling criterion. Such

techniques can provide a great increase in performance, since, if an error-estimation is

available, we can spare computational power on easier parts of the integration process,

keeping at the same time the error at bay.

Each one-step method can be written in the form of

un+1 = un + hΦ(tn, un, fn;h) (4.5)

where the function Φ is called an increment function. A straight-forward property that

we assume for a method is:

limh→0

Φ(tn, un, f(tn, yn);h) = f(tn, yn) (4.6)

In terms of the exact solution y we can write

yn+1 = yn + hΦ (tn, yn, f(tn, yn);h) + εn+1 (4.7)

where εn+1 is the error we commit using the numerical scheme, assuming that un = yn,

that is, we don’t take into account error propagation. We can rewrite

εn+1 = hτn+1(h)

with τn+1(h) is the local truncation error at the time tn+1 ad we define

τ(h) = max0≤n≤Nh−1

|τn+1(h)|

as the global truncation error. We want to outline that truncation errors (local and global)

depend on the solution y of (4.1). That means, in other words, that the numerical method

has to be chosen accordingly to the problem.

We pose in addition the following definitions:

Definition 4.3 (Consistency). A numerical method is said to be consistent if and only if

limh→0

τ(h) = 0.


Remark 14. The property (4.6) implies consistency.

Definition 4.4 (Order of a method). A numerical method is said to be of order p i.f.f

∀t ∈ I the solution of y(t) of (4.1) is such that

τ(h) = O(hp), h→ 0.

Up to now, we were working with exact arithmetic, never taking into account the

(actual) problem of working in finite precision arithmetic (we want the computer do the

work for us!). To adress to this question, let us cast this:

Definition 4.5 (zero-stability of one-step methods). The numerical method (4.5) for the

approximation of the solution to problem (4.1) is zero-stable if, for a fixed ε,

∃h0 > 0,∃C > 0 : ∀h ∈ (0, h0] ,∣∣z(h)n − u(h)

n

∣∣ ≤ Cε, 0 ≤ n ≤ Nh (4.8)

where z(h)n , u

(h)n are the solution of the problems

z(h)n+1 = z

(h)n + h

[Φ(tn, z

(h)n , f(tn, z

(h)n );h

)+ δn+1

]z

(h)0 = y0 + δ0

(4.9)

u

(h)n+1 = u

(h)n + hΦ

(tn, u

(h)n , f(tn, u

(h)n );h

)u

(h)0 = y0

(4.10)

for 0 ≤ n ≤ Nh − 1 and |δk| ≤ ε, 0 ≤ k ≤ Nh.

Zero-stability thus requires that, in a bounded interval, (4.8) holds for any value h ≤h0. This property deals, in particular, with the behavior of the numerical method in

the limit case h→ 0 and this justifies the name of zero-stability. The latter is therefore a

distinguishing property of the numerical method itself, not of the Cauchy problem (which,

indeed, is stable due to the uniform Lipschitz continuity of f). Property (4.8) ensures that

the numerical method has a weak sensitivity with respect to small changes in the data. So

the request of a zero-stable numerical method is an answer to problems that could arise

in presence of finite precision arithmetic.

We want to point out that the name zero-stability is attributable to the fact that

we want this property to hold for h belonging to a neighborhood of the origin. Another

similar concept is the one of “absolute stability”, which holds for h not necessarily near

to zero.

Theorem 4.1.5 (Zero-stability of one step methods). Consider the explicit one-step

method (4.5) for the numerical solution of the Cauchy problem (4.1). Assume that the


increment function Φ is Lipschitz continuous with respect to the second argument, with

constant Λ independent of h and of the nodes tj ∈ [t0, t0 + T ], that is

∃h0 > 0,∃Λ > 0 : ∀h ∈ (0, h0], 0 ≤ n ≤ Nh∣∣∣Φ(tn, u(h)n , f(tn, u

(h)n );h

)− Φ

(tn, z

(h)n , f(tn, z

(h)n );h

)∣∣∣ ≤ Λ∣∣∣u(h)n − z(h)

n

∣∣∣ . (4.11)

Then, the method (4.5) is zero-stable.

Proof. Let us define w(h)j := u

(h)j − z

(h)n , and subtract (4.9) and (4.10) to get

w(h)j+1 = w

(h)j + h

[Φ(tj, u

(h)j , f(tj, u

(h)j );h

)− Φ

(tj, z

(h)j , f(tj, z

(h)j );h

)]+ δj+1.

assuming |δk| < ε and observing that w(h)0 = δ0. We sum over j, getting for n = 1, . . . , Nh

w(h)n = w

(h)0 + h

n−1∑j=0

(Φ(tj, u

(h)j , f(tj, u

(h)j );h

)− Φ

(tj, z

(h)j , f(tj, z

(h)j );h

))+

n−1∑j=0

δj+1.

taking the norm both sides and using (4.11) we have

|w(h)n | ≤ |w

(h)0 |+ hΛ

n−1∑j=0

|w(h)j |+

n−1∑j=0

|δj+1|, n = 1, . . . , Nh. (4.12)

Applying the discrete Gronwall lemma, stated below, we get

|w(h)n | ≤ (1 + hn)εenhΛ, n = 1, . . . , Nh

and noticing that n ≤ Nh ⇒ hn ≤ hNh ≤ T

|w(h)n | ≤ (1 + T )eTΛε = Cε,

Lemma 4.1.6 (Discrete Gronwall lemma, [47]). Let kn be a nonnegative sequence and φn

a sequence such that φ0 ≤ g0

φn ≤ g0 +n−1∑s=0

ps +n−1∑s=0

ksφs.

If g0 ≥ 0 and pn ≥ 0 ∀n ≥ 0, then:

φn ≤

(g0 +

n−1∑s=0

ps

)exp

(n−1∑s=0

ks

)

To proceed with our tractation, let us state another


Definition 4.6. A method is said to be convergent if

|un − yn| ≤ C(h), ∀n = 0, . . . , Nh,

where C(h) → 0 as h → 0. In that case, it is said to be convergent with order p if

∃C > 0 such that C(h) = Chp .

Theorem 4.1.7 (Convergence of one step methods). Under the same assumptions as in

Theorem 4.1.5, we have

|yn − un| ≤ (|y0 − u0|+ nhτ(h))enhΛ, 1 ≤ n ≤ Nh

Therefore, if the consistency assumption (4.6) holds and |y0−u0| → 0 as h→ 0, then the

method is convergent. Moreover, if |y0 − u0| = O(hp) and the method has order p, then it

is also convergent with order p.

Proof. Let wj := yj − uj, subtract (4.7) from (4.5) ad proceeding along the lines of the

previous proof getting to

|wn| ≤ |y0 − u0|+ hΛn−1∑j=0

|wj|+n−1∑j=0

|τj+1(h)|, n = 1, . . . , Nh.

we apply again the discrete Gronwall Lemma getting the desired result. Since nh ≤ T

and τ = O (hp)

|yn − un| ≤ (|y0 − u0|+ Tτ(h))eTΛ, 1 ≤ n ≤ Nh

therefore we can find a constant C dependent on T and Λ but not from h, such that

|yn − un| ≤ Chp

Those result are very important, since to ensure convergence and zero stability we

need only to check the hypothesis of Theorem 4.1.5 and (4.6).

4.1.4 Runge Kutta methods

Now we are going to discuss the chosen method to solve our GREs, and we will stick

again to the one dimensional case, since the RK methods work in the same way in the

general multi-dimensional case. In its most general form, an RK method can be written

as, indicating with un the approximate solution at the step n

un+1 = un + hF (tn, un, h; f) (4.13)


with f is the RHS of the (4.1), h is the temporal increment, tn is the value of the time at

the stage n of the integration, and F is the increment function defined as follows

F (tn, un, h; f) =s∑i=1

biKi

Ki = f

(t+ hci, un + h

s∑j=1

aijKj

)(4.14)

and s denotes the number of stages of the method. We see that an s-stage involves in

each step at least s evaluation of the RHS of (4.1), and thus higher-steps RK methods

can be inappropriate for systems with hard-to-evaluate RHS.

In our case of study the RHS can be quite easy to evaluate, so RK appears the most

suitable method for our applications.

The things can turn out different if we allow the process to jump, since we add the

(expensive!) jump trasform to the RHS of the GREs (cf. Section). Anyway, for most

common jump distribution the transform is know in a closed form, so RK remains the

method of choice.

At first glance, we can see from (4.14) that a RK method satisfies (4.6) if and only if∑si=1 bi = 1. Actually more can be said, check[38] for more general condition.

In addition we can see that the increment function is Lipschitz in the second variable

(is the convex combination of Lipschitz function). Therefore the method is, by Theorem

4.1.5, zero-stable, and ensuring consistency, we have convergence as well by Theorem 4.1.7.

The coefficients aij, ci and bi fully characterize an RK method and are usually

collected in the so-called Butcher array (or Tableau),

c1 a11 a12 · · · a1s

c2 a21 a22 · · · a2s

......

.... . .

...

cs as1 as2 · · · ass

b1 b2 · · · bs

or, in a compact notation,c A

bT

with A ∈ Rs×s, (A)ij = aij, b, c ∈ Rs, (b)i = bi, (c)i = ci, ∀ i, j = 1, · · · , s. Usually, the

coefficients ci are taken ci =∑s

j=1 aij.

At this stage of the tractation, we can distingue three different kind of RK methods:

1. fully implicit methods, characterized by a A taken as a full matrix.

2. Semi-implicit methods, with A taken as a triangular matrix, including the diagonal

(that is: aij = 0, j ≥ i )

3. Explicit methods, with A taken as a triangular matrix (that is: aij = 0, j > i ).


A RK methods request, in each step, to find the values Ki; that involves, respectively,

the solution of a full non linear set of s equations, s fixed-point problems or a simple

recursion.

The first two choices will skyrocket the computational demands of the method, so for

non-stiff problems, when absolute stability is a must-have-feature, the first two variant

are considered exaggeratedly expensive. So we have chosen to stick to explicit methods,

since they are a good compromise between performance and accuracy.

We anticipate here that our choice will by an order 5 method with built-in error

estimate, in order to use step adaptivity.

4.1.5 Derivation of an explicit RK method

The standard technique for deriving an explicit RK method consists of enforcing that

the highest number of terms in Taylor expansion of the exact solution yn+1 at tn coincide

with those of the approximate solution un+1, assuming that we take one step of the RK

method starting from the exact solution yn . We provide an example of this technique in

the case of an explicit 2-stage RK method. Let us consider a 2-stage explicit RK method

and assume to have at disposal the exact solution yn at the n-th step. Then

un+1 = yn + hF (tn, yn, h; f) = yn + h(b1K1 + b2K2),

K1 = fn, K2 = f(t+ c2h, un + a21K1h)

if we perform Taylor’s expansion on K1, we get:

K2 = yn + hc2∂fn∂t

+ ha21K1∂fn∂y

+ O(h2)

putting the linearized K’s in un+1 expression we get:

un+1 = yn + h (b1 + b2) fn + h2b2

(c2∂fn∂t

+ a21fn∂fn∂y

)+ O(h3)

performing the Taylor’s expansion on the exact solution at the third order:

yn+1 = yn + hfn +h2

2

(∂fn∂t

+ fn∂fn∂y

)+ O(h3).

If we subtract the scheme expansion from the exact solution, recalling that ci =∑s

j=1 aij,

we can write 2 equations: b1 + b2 = 1

b2c2 = 1/2.


That lead to a method wich is has a local truncation error of order 2 and is convergent

with order 2.1 In general in this way we can find restrains on constants involved in a RK

explicit method, which are not sufficient to determine the constants of a RK method.

Just to make this statement clear, we report some calculation for a RK method with

s = 4. For a system of a form (4.1):

y = f(y, t)

y = ft + fxf...y = ftt + 2fxtf + fxft + fxxf

2 + f 2xf....

y = fttt + 3fxttf + 3fxtft + 3fxxftf + 5fxfxtf + fxxxf3 + fxftt + 3fxxtf

2+

f 2xft + 4fxxfxf + f 3

xf...

...

and we can see how the derivatives grow in complexity. Using those derivatives in a

Taylor’s expansion retaining all the terms up to the fifth order, we reach those equations:

b1 + b2 + b3 + b4 = 1

b2c2 + b3c3 + b4c4 = 1/2.

b2c22 + b3c

23 + b4c

24 = 1/3

b3a32c2 + b4a42c2 + b4a43c3 = 1/6.

b2c32 + b3c

33 + b4c

34 = 1/4

b3c3a32c2 + b4c4a42c2 + b4c4a43c3 = 1/8

b3a32c22 + b4a42c

22 + b4a43c

23 = 1/12

b4a43a32c2 = 1/24

which has no unique solution, and can lead to various 4-th order methods. In general, to

solve those equation, we need to impose some additional condition, e.g. the minimization

of some form of error. All those condition can be generalized in an algebraic setting, but

the theory is far too complex to be presented here, so we refer the reader to [9].

4.1.6 Global error

Now we will explain the relation between the local truncation error and the convergence

order of the RK. We anticipate here that a Runge Kutta with truncation error of order p

is also convergent with order p.

To being with, we will show a preliminary result:

Lemma 4.1.8. Let f : R × R → R be the RHS of (4.1), and f Litpschitz with costant

L. Let y0, z0 ∈ R be two input values to a step with the RK method with(A,bT , c

), using

1This claim will be clearer in the following pages.


stepsize h ≤ h0, with h0Lρ (|A|) < 1, and let y1, z1 be the corresponding output values.

Then

|y1 − z1| ≤ (1 + hL∗)|y0 − z0|

where L∗ = L|bT |(I − h0L|A|)−11 , denoting with 1i = 1, |b|i = |bi|, (|A|)ij = |(A)ij|i, j = 1, . . .m

Proof. Let us denote the increments Ki, defined in (4.14), of the two instance with Yi and

Zi. We easily obtain

Yi − Zi = yi − zi + hs∑j=1

aij (f (t, Yj)− f (t, Zj))

from the triangular inequality and the Lipschitz property of we get

|Yi − Zi| ≤ |y0 − z0|+ h0L

s∑j=1

|aij| |Yj − Zj|

and substituting in

|y1 − z1| ≤ |y0 − z0|+ hLs∑j=1

|bj| |Yj − Zj|

we get the desired result.

and this is the main theorem:

Theorem 4.1.9. Let h0 and L∗ be such that the local truncation error is bounded by

τk(h) ≤ Chp, ∀k = 1, · · · , Nh, h ≤ h0

then the global error is be bounded by

|un − yn| ≤

eL∗T−1L∗

Chp L∗ > 0

CThp L∗ = 0

Therefore the scheme is convergent with order p.

Proof. Let us consider the RK method starting at time t0 up to time t0 + T . As it looks

clear from the Figure 4.1, we can estimante the solution with

|un − yn| ≤Nh−1∑i=1

∆i + δNh (4.15)

where ∆i is the distance between two approximated solutions at time t0 + T , originating

one in yi, and the other in yi−1. We denote the error between the exact solution yi and


Figure 4.1: How to use local truncation error to estimate global error

the approximated one started at the previous time ti−1 with δi; initial error δi propagates,

forming ∆i, as shown in the Figure 4.1. In addition, from the definition of τn(h) follows

that δi ≤ Chp+1.

We can apply repeatedly Lemma 4.1.8 to get an estimate of the error ∆i, getting

∆i ≤ δi (1 + hL∗)i ≤ Chp+1 (1 + hL∗)i ,

thus the (4.15) becomes

|un − yn| ≤ Chp+1

Nh−1∑i=0

(1 + hL∗)i

the case L∗ = 0 follows directly from the fact that hNh = T , while if L∗ > 0, we recall

n−1∑i=0

ri =1− rn

1− r. (4.16)

Then the value of our sum is

Nh−1∑i=0

(1 + hL∗)i =(1 + hL∗)Nh − 1

hL∗.


Since (1 + hL)Nh ≤ ehNhL∗

= eTL∗

we obtain the desired result.

This theoretical estimation is too complicated to be use in practice, and we will develop

present an alternative strategy to evaluate the error committed using the method.

4.1.7 On higher order RK methods

Up to now, we have shown that if we have a RK method with truncation order p it’s

also convergent with order p. The main question at the moment is: which is the role of

the number of stages s?

An answer to this question actually exists, but it is too complex to be shown here and

it goes beyond the scope of this thesis, so we will report some results, referring once again

the reader to [9] for proofs. To begin with, we recall to the reader that for each use of an

s stage RK methods request s evaluation of the RHS of the problem. Intuitively speaking

would seem clear that higher stage methods bring higher accuracy2, but this gain has to

be quantified.

For this purpose, we cite the following:

Theorem 4.1.10. If an explicit s stage RK method has order p, then s ≥ p

Reasonably, this state: “there is no free-lunch”, i.e. we cannot get an order p method

without evaluating, at least, p times the RHS. The following Theorem holds

Theorem 4.1.11. If an explicit s stage RK method has order p ≥ 5, then s > p. Moreover

the following conditions hold:

s =

s− p ≥ 1 p ≥ 5

s− p ≥ 2 p ≥ 7

s− p ≥ 3 p ≥ 8

That theorem says that in we want more and more precision, we are doomed to use

al lot of evaluation of the RHS of (4.1). it’s why the RK method of order 4 is so popular,

it has the best trade-off between accuracy and computational cost. To make even more

clear how the order of the methods diverges with the number of stages, we enunciate this

Theorem 4.1.12. For any positive integer p, an explicit RK method exists with order p

and s stages, where

s =

3p2−10p+24

8p even

3p2−4p+98

p odd.

2This statement is quite vague, we point out that higher order methods are actually more accuratefor very regular RHS, since higher order Taylor’s expansion are involved.


This results doesn’t prevent a lower stage method to exist: actually those are the

minimum stage required to reach the orders ranging from 1 to 8:

order 1 2 3 4 5 6 7 8

smin 1 2 3 4 6 7 9 11(4.17)

Up to now, we had worked in a scalar setting, claiming that all the methods can be

ported to a multidimensional setting. That is true, in fact all the results we shown are

independent from the dimension of the problem, and all of them take as assumption that

the considered method has order p. The main problem is:

the order of an RK method in the scalar case does not necessarily coincide with that in

the vector case. In general, a method with order p ≥ 5 in the scalar case, doesn’t retain

order p in the vector case, while the converse is always true.

An intuitive motivation of this claim can be again found in the increasing complexity

of derivatives, and a precise motivation can be found in the usual [9], par. 316, 148-149.

4.1.8 Step adaptivity

The main idea behind the step adaptivity is to adapt the step, or another error control

parameter, to keep the error under an user specified tolerance. One step methods are well-

suited to adapting the stepsize h, provided that an efficient estimator of the local error is

available.

Usually, a tool of this kind is an a posteriori error estimator, since the a priori local

error estimates are too complicated to be used in practice.

Roughly speaking the process can be schematized in the following way:

1. from un, calculate un+1 using the one-step method.

2. Use the estimator to evaluate the local truncation error.

3. Choose a new step h, accordingly to a rule depending on the error estimate.

this technique is fundamental in every well-designed code, since getting always the desired

precision allows us to spare computational power.

One possible method is to use two RK methods of different order, respectively p and

p+ 1, but with the same number s of stages and the same values Ki. We can denote this

kind of methods with a modified version of the Butcher tableau

c A

bT

bT

ET


where bT denote the coefficients of the p method, and bT the ones of the other method.

We denote the p-order solution with un+1 and the p+ 1 order solution with un+1.

We can estimate the error of the p-order solution in this way:

un+1 − un+1 = hs∑i=1

Ki(bi − bi) = h

s∑i=1

KiEi

this estimate tends to underestimate the local truncation error, since we are actually

doing:

un+1 − un+1 = O(hp+1

)+ O

(hp+2

)= O

(hp+1

), h ↓ 0,

otherwise it is O(hp+2); then this estimate is not reliable for great value of h. We want

to point out that the estimation is related to the p-order method, but since there is a

p + 1-method available, is best-practice to use the higher order solution instead of the

lower. Thus the estimation is even more crude, and those methods are not suitable when

extreme accuracy is a essential feature.

Let us now consider an integration problem over an interval [a, b], using a method of

order p, with a step h << 1. Integrating from x with a step h we have truncation error

C(x)hp. Since we are using a (to-be-determined) step-adapting strategy, it is clear that

h := H(x) . If steps are small we can approximate the global error as :

E(H) =

∫ b

a

C(x)Hp(x)dx

and the number of steps

S(H) =

∫ b

a

1

H(x)dx

An optimal policy would be to find a functional H(x) that minimize both E(H), main-

taining S(H) bounded, i.e., in the language of optimisation

minH

E(H)

st

S(H) ≤ N.

(4.18)

It is well know that a necessary condition (Karush-Kuhn-Tucker) for H∗ to solve (4.18),

is that H∗ minimises also the Langrangian function, for some λ ≥ 0

L(H) = E(H) + λ(S(H)−N) =

∫ b

a

[C(x)Hp(x) +

λ

H(x)

]dx− λN,

we refer the interested reader to any book on optimization theory.

We want to recall this basic result in calculus of variation:


Theorem 4.1.13 (Euler-Lagrange Formula, [44], 570). Let be f : R×Rn ×Rn → R and

let L be the functional

L(f) =

∫f(x,H(x), H(x)

)dx.

then L(f) has a stationary value if the Euler-Lagrange differential equation is satisfied:

∂f

∂H− d

dx

∂f

∂H= 0.

Since our Lagrangian doesn’t depend on H applying the Euler-Lagrange formula re-

duces only to solve∂

∂H

(C(x)Hp(x) +

λ

H(x)

)= 0.

which leads to:

C(x)Hp+1(x) =λ

p= c.

where c is a constant. Then, the optimal policy is to keep the local truncation error

constant, therefore we choose to take the new step h′

as follows:

h′= h

[errortol

] 1p+1

;

we want to point out that we will use the error estimated for the previous step to generate

the new step length, instead of the error of the step we are about to perform. Due to this

another approximation, we will reduce the suggested step by a quantity γ. On the other

hand, we want the step length h to follow as smoothly as possible the evolution of the

problem, avoiding abrupt variation in the step size, since that can lead to strange behavior

on the error side.

Mixing up all those consideration, we can write down that updating rule for the step

sizeh′= rh

r = max(α,min

(β, γ

(errortol

) 1p+1

)) (4.19)

where α, β, γ are design parameter, typical values are α = 0.5, β = 2.0, γ = 0.9. The value

of h at the first step can be taken directly from (4.7) and from the definition of truncation

error. Therefore we can take for the first step

h =tol

1p+1

2

In some application can be useful to impose a minimum value hmin, e.g. the machine

precision. In this case the equation (4.19) can be readily rewritten in the form

h′= rh

r = max(hmin,max

(α,min

(β, γ

(errortol

) 1p+1

))).


The value hmin can be imposed also to create a lower bound on running time, since there

is a class of numerical problem called stiff : if a numerical method is forced to use, in a

certain interval of integration, a step length which is excessively small in relation to the

smoothness of the exact solution in that interval, then the problem is said to be stiff in

that interval.

Usually it is not possible to foretell if the problem is stiff or not and a typical approach

is to use a Runge-Kutta of order 4-5 to test the problem, check the behavior of the method,

and in that case adopt a more suitable method.

4.1.9 Our choice: Dormand-Prince method

Up to now, we have shown some general results and techniques. Let us recall some

feature we want for our method:

• Runge-Kutta explicit method.

• Step adaptivity.

• Medium accuracy (∼ 10−4 − 106).

In the light of those consideration, we have chosen to use RK methods of order 4-5

with embedded error estimation, which at price of 6 function evaluation3 and one more

computation with respect to the ordinary RK5, allows us to control the stepsize.

The first one of this family is the Runge-Kutta-Fehlberg method (RKF45):

0

1/4 1/4

3/8 3/32 9/32

12/13 1932/2197 −7200/2197 7296/2197

1 439/216 −8 3680/513 −845/4104

1/2 −8/27 2 −3544/2565 1859/4104 −11/40

25/216 0 1408/2565 2197/4104 −1/5 0

16/135 0 6656/12825 28561/56430 −9/50 2/55

(4.20)

3Compare con table (4.17)


Two variant of this method are Cash-Karp method (RKCK):

0

1/5 1/5

3/10 3/40 9/40

3/5 3/10 −9/10 6/5

1 −11/54 5/2 −70/27 35/27

7/8 1631/55296 175/512 575/13824 44275/110592 253/4096

37/378 0 250/621 125/594 0 512/1771

2825/27648 0 18575/48384 13525/55296 277/14336 1/4

(4.21)

and Dormand-Prince (RKDP):

0

1/5 1/5

3/10 3/40 9/40

4/5 44/45 −56/15 32/9

8/9 19372/6561 −25360/2187 64448/6561 −212/729

1 9017/3168 −355/33 46732/5247 49/176 −5103/18656

1 35/384 0 500/1113 125/192 −2187/6784 11/84

5179/57600 0 7571/16695 393/640 −92097/339200 187/2100 1/40

35/384 0 500/1113 125/192 −2187/6784 11/84 0

(4.22)

Those methods are described for the first time, respectively, in : [26], [10], [18].

The (4.22) method is a 7-stage method but actually only 6 evaluation are needed. This

property is called FSAL (First Same As Last), since the first stage of a stage n is the

same of the last stage of the step n− 1.

To show this, consider the method (4.22) at step n

K7,n = f (tn + c7h, un + h (a71K1 + a72K2 + a73K3 + a74K4 + a75K5 + a76K6)) (4.23)

and the solution at step n+ 1 is

un+1 = un + h(b1K1 + b2K2 + b3K3 + b4K4 + b5K5 + b6K6 + b7K7

). (4.24)

When we start the n+ 1 step:

K1,n+1 = f(tn+1, un+1)

Since c7 = 1,, tn+1 = tn + h,and a7i = bi it is clear from (4.23) and (4.24) that K1,n+1 =

K7,n. What makes method (4.22) different from (4.20) and (4.21) is that the coefficients

4.2 Numerical integration 63

are chosen to minimize the norm of the error of the 5-th order method, while the others

are designed to be used only with the 4-th order method. That makes the method (4.22)

fitter to be used in the 5-th order mode, while the other two methods are often used

improperly.

Therefore this method is the common choice for solving non stiff problem, used also

in MatlabTM and Octave, and will be also our choice.

4.2 Numerical integration

The other numerical duty requested by the usage of the affine models is numerical

integration: if we allow our model to jump with law ν(z) we have to add this term to the

RHS of the GREs:

θ(c) =

∫Rnec·zdν(z), c ∈ C

therefore, for a n-dimensional affine process with jump, we have to perform an integration

over Rn. We want to point out that for most common law, the trasform is known in closed

form, but we want to provide a routine to test different laws, and to evaluate inverse

transform for options pricing applications (Theorem 3.3.2).

4.2.1 One dimensional integration

We will present some formulas and focus on implementation issues, without claiming

to be exhaustive. We refer the reader to any good book on Numerical Analysis, and to

given reference.

The classical trapezium rule, for xi−1 < xi:∫ xi

xi−1

f(x)dx ≈ xi − xi−1

2[f (xi) + f (xi−1)] . (4.25)

Error committed using this formula is, if f ∈ C2([xi−1, xi]),

Etr = −(xi − xi−1)3

12f′′(ξ), ξ ∈ [xi−1, xi] .

Since the integral is additive, we can partition [a, b] in N equi-spaced interval, with x0 =

a, xN = b, such that∫ b

a

f(x)dx =N∑i=1

∫ xi

xi−1

f(x)dx ≈ h

[1

2f0 + f1 + . . .+ fN−1 +

1

2fN

](4.26)

where h = (b− a)/N . Using this extended formula, the error committed is

ENtr = −(b− a)3

12N2f′′(ξ), ξ ∈ [a, b].


We will call equation (4.25) trapezium rule and (4.26) extended trapezium rule.

Now we turn our attention to the Cavalieri-Simpson rule:∫ xi

xi−1

f(x)dx ≈ xi − xi−1

6

[f (xi−1) + 4f

(xi + xi−1

2

)+ f (xi)

]; (4.27)

the error committed using this formula, if f ∈ C4([xi−1, xi]):

ECV = −(xi − xi−1)5

25 90f IV (ξ), ξ ∈ [xi−1, xi].

In (4.27) we have that for one interval [xi−1, xi], we need to evaluate the integrand f three

times; as done in (4.26), we can get the extended Cavalieri-Simpson rule with N interval:

h

[1

3f0 +

4

3f1 +

2

3f2 +

4

3f3 + . . .+

2

3f2N−2 +

4

3f2N−1 +

1

3f2N

](4.28)

where h = (b− a)/2N , and

fi =

f(xi+xi−1

2

)if i is odd

f (xi) if i is even.

The error, for this method, is:

ENCV = −

(b− a

2

)51

90N4f IV (ξ), ξ ∈ [a, b].

Usually the integration domain is fixed, we emphasize in error terms the number of nodes,

which is the parameter to vary to enhance accuracy.

As can be seen from error terms, the second rule has higher accuracy only if f is

regular enough, then (4.26) is more suitable in presence of an irregular integrand.

4.2.2 Step adaptivity

A simple rule to check convergence is to double the number of evaluation points.

Let us fix a error tolerance tol and consider the numerical integration of f over [a, b]

with the extended trapezium rule, subdividing the domain in 2n subintervals, of length

h = (b− a)/2n. We will denote this with I2n(f), leaving the extreme of the integration

domain unexpressed for simplicity of notation.

We will stop the integration process if

|I2n+1 (f)− I2n (f)| ≤ tol.

To be sure to avoid early convergence, we impose to do at least 5 refinement of the

integration domain. With a clever choice of the point, the trapezium rule is quite effective

when used in combination with step adaptivity.


Let us denote with In the set of evaluation points. Let us take those sets in the following

way

In =

x0 = a, x1 = b if n = 1xi = a+ h

(12

+ i), i = 0, . . . , 2n−2 − 1

if n ≥ 2;

with |In| = 2n−2.

Denote with

Λn =∑x∈In

f (x)

then we can write the integral with the trapezium rule as

I1 = (b−a)2

Λ0

I2n = b−a2n

(12Λ0 +

n−1∑i=1

Λi

)n ≥ 1

In this way we can reuse the function evaluation of the previous steps, simply as as shown

in the figure 4.2.

Figure 4.2: Example of the evaluation algorithm, to get 23 = 8 interval integration without

evaluating the same point twice

In this way, the integration process can be schematize as follows:

1. At steps n, compute Λn (cost: 2n−2 evaluation).

2. Add Λn to λ = 12Λ0 +

n−2∑i=1

Λi obtained at the precedent step, and update the value

of λ.

3. The value of the integral is I2n(f) = hλ, where h = (b− a)/2n.


4. Compute |I2n(f) − I2n−1(f)|, if lesser or equal to tol, return I2n , else increase n by

one, halve h and go ahead.

We impose a limitNMAX = 20 on the iterations of the above algorithm. That implies,

recalling the identity (4.16)

2 +NMAX∑n=2

2n−2 = 2 +NMAX−2∑

n=0

2n = 2NMAX−1 + 1 = 524289

evaluations, which are sufficient for most common applications.

The main advantage of the trapezium rule rule is that we can use it as building block

for the Cavalieri-Simpson rule.

Let us compute

S2n(f) :=4

3I2n(f)− 1

3I2n−1(f)

if we perform explicitly that calculation with (4.26) in mind, we get exactly (4.28). Since

to get I2n(f) we have necessarily to compute I2n−1(f), we have an almost free higher order

integration method.

4.2.3 Domain transformation

Let us take a look to the problem

I(f) =

∫ ∞−∞

f(x)dx

and we assume that the integrand f has no singularities on the real axis. A naive approach

to this method is to approximate∫ ∞−∞

f(x)dx ≈∫ a

−af(x)dx

taking a large, e.g. 1034, and using an ordinary integration routine on this integral like

(4.26). It is interesting to notice that, for such approach, the trapezium rule is optimal

in the quadrature methods class with constant step ([42], Section 2.3), i.e., under some

technical hypotheses, it attains the minimum error, with the same step length.

This approach is not efficient since (4.26) uses a constant integration step over the

domain, and a necessary condition on the integrand to have |I(f)| <∞ is

|f(x)| ≤ 1x1+ε , x→ ±∞ ε > 0

that means that evaluation performed at the extremes are bounded by 1a, i.e. almost zero.

In other words, the most meaningful values of f are packed far away from the extremes

of the domain. Then, two possibile approach are available:


• adapt locally the step size.

• Perform a change of variable, to “tame” the integrand.

the first option would force us to rewrite part of the code written for the finite-domain

case, so we will opt for the second one. Let us choose a change of variable x(t) such that

x : [a, b]→ [c, d], then ∫ b

a

f(y)dy =

∫ d

c

f(y(x))dy

dxdx

A possible change of variable for our purposes is the so-called Double Exponential (DE)

rule:y = sinh (c sinhx)dydx

= c cosh (c sinh(x)) cosh(x)

-4 -3,5 -3 -2,5 -2 -1,5 -1 -0,5 0 0,5 1 1,5 2 2,5 3 3,5 4

-10

-7,5

-5

-2,5

2,5

5

7,5

10

Figure 4.3: The change of variable y = sinh(π2

sinhx)

(lower curve) and his derivative

(upper curve).

where typical values of c are 1 or π2. Usual interval of integration is [−4, 4], which is

roughly equivalent to [−2 · 1018, 2 · 1018].

If the integration interval is only (0,∞), like in the case of CIR with jumps model

(Section 3.6.1), the DE rule can be easily written as

y = e2c sinhx

dydx

= e2c sinhx2c cosh(x)


That rule derives from analogue techniques developed to cope with integral with end-

point singularities and we refer the interested reader to [42], and to original papers [40]

and [41], besides the sources listed in Section 4.3. It has to be said that the DE rule is

optimal with respect to the trapezoidal rule, in the sense that doesn’t exist any other

transformation that allows a lower error with the same h. We want to point out that

domain transformation techniques really depend on the integrand, and therefore on the

specific singularity. We limit ourself to use a general purpose integrator and we suggest

for real world application to choose the integration method accordingly to the nature of

the integrand in (3.3).

4.2.4 Multidimensional integral

Up to now, we have presented methods to deal with one dimensional integration; To

extend those methods to n-dimensional integral like (3.3), we have a classical result from

multidimensional Riemann integral calculus, which is a special case of Fubini Theorem

(cf. Theorem A.2.3):

Theorem 4.2.1 (reduction theorem, [29]). Let f be an integrable function, f : Rm×Rn →Rp. Let us consider ∀y ∈ Rn the function

x→ f(x, y), x ∈ Rm

and suppose the above function to be integrable over Rm. Then the function

y →∫

Rm f(x, y)dx, y ∈ Rn,

is integrable over Rn and∫Rm×Rn

f(x, y)dxdy =

∫Rn

(∫Rm

f(x, y)dx

)dy

Nothing would stop us to use repeatedly the reduction theorem, getting∫Rnf (x1, . . . , xn) dx1 . . . dxn =

∫R

∫R. . .

∫Rf (x1, . . . , xn) dxn . . .dx2dx1 (4.29)

Then we can evaluate the integral, integrating a variable one after another. In this way

we can use our one dimensional routine recursively to solve an n-dimensional integral.

Needless to say, recursion is enemy of efficiency, therefore this method is unsuitable for

high-dimension integrals, say, n ≥ 4. Anyway we are dealing with low dimension processes,

then the integration itself will not be a problem.

Some problems could arise using one RK method to integrate GREs, if we were too

demanding with precision. Since each step of a RK method request s evaluation of RHS

per step, i.e. s integration per step.

4.3 Main sources and further readings 69

Assuming that we are using the same error tolerance for the RK method and for the

integration method, requesting great precision implies to increase the number of time

steps, and therefore the number of evaluation of the RHS of GRE, which becomes harder

and harder to evaluate for high precision requirement. If high precision is needed, could

be useful to choose another method: possible candidates are Burlish-Stoer method, and

Predictor-Corrector method. Both are suitable for problems that require high precision,

with an hard-to-evaluate RHS. The first is the common choice for great precision problem,

the second performs better with very smooth RHS, property that, in our case, depends

mainly on ν(z) in (3.3).

We chose RK methods for its applicability to a broad spectrum of problem but, as

usual, if more information are available on the nature of integrand(s), a specific designed

method has to be adopted.

4.3 Main sources and further readings

There are a lot of books on numerical calculus, and we will suggest to the reader some

of the authors we consulted to write this part of the thesis.

To the practitioner we definitely recommend to begin with two handbooks: [47] and

[45], the first devoted to a quick review of a great number of numerical methods, the

second to implementation issues in C++. Earlier edition of [45], dealing with algorithms

written in Fortran and C, can be found. Both books are supplied with broad and useful

references.

To go into details RK methods, we refer the reader to the omni-comprehensive [9],

whose author is the putative father of modern RK theory; an useful addition to this can

be [38].

With regard to numerical integration, as said before, we don’t refer to any particular

book since details can be found virtually in any introductive book to Numerical Calculus,

like [51]. Regarding the DE rule, we refer to [42], an interesting article dealing, from an

historical and technical point of view, with the discovery of this optimal rule to evaluate

improper integrals.

Chapter 5

Applications

Now we are ready to make joint use of the theory up to now exposed, and we will show

why affine processes and reduced models work nice together. All the measure-dependent

quantities (expectation , probability, intensity, etc. ) are considered under the risk-neutral

measure Q. To stress this fact, we will use the superscript Q, e.g.: EQ, λQ and so on.

The goal of this chapter is to bring the payoff for as many as possibile financial products

in a form tractable with affine processes’ theory, and we will start with the last claim to

be proved, the price of a defaultable zero-coupon bond.

5.1 Defaultable claims

Now we will show how the doubly stochastic framework works, jointly with affine

process theory. Before we will give a precise definition of defaultable contingent claim.

Definition 5.1. A defaultable contingent claim with maturity T is a claim, whose payoff

is defined as:

F1t<τ +Wτ1t≥τ

where

• F is a (GT )−measurable bounded random variable. F is called the promised payout.

• (Wt)t≥0 is a (Gt)−adapted stochastic process and Wt = 0 for t > T . (Wt)t≥0 is called

the recovery process.

Remark 15. The case of a bond is covered simply taking F = 1 and Wt ≤ 1, for all t.

70

5.1 Defaultable claims 71

5.1.1 No recovery

Theorem 5.1.1. Let us suppose to have a defaultable contingent claim without recovery

and that (rt)t≥0 and (λQt )t≥0 are bounded process, where (rt)t≥0 is Ft−adapted. Further-

more, let us suppose that, under Q, τ is doubly stochastic driven by a filtration (Ft)t≥0,

with intensity process λQ. Fix any t < T , then, for t ≥ τ , we have St = 0, otherwise

St = EQt

[e−

R Tt (ru+λQ

u)duF], t < τ. (5.1)

Proof. Since τ is doubly stochastic, there exists a filtration (Gt)t≥0, such that for every

t,Ft ⊂ Gt. Then for the law of iterated expectations

St = EQt

[EQ[e−

R Tt rudu1τ>TF |Gt ∨FT

]]= EQ

t

[e−

R Tt ruduFEQ [1τ>T |Gt ∨FT

]]for the measurability hypotesis on (rt)t≥0 and F .

Recalling that EQ [1τ>T |Gt ∨FT

]= PQ (τ > s |Gt ∨FT ) = PQ (NT −Nt = 0 |Gt ∨FT )

by the definition of doubly stochastic process we have:

PQ [NT −Nt = 0 |Gt ∨FT ] = e−R Tt λQ

udu

then the result, and claim I.3, follows.

Remark 16. In chapter 2 the process that here plays the role of interest rate is supposed to

be (Ft)−predictable, not (Ft)−adapted. But if we take rt = Λ(Xt−), as said in Remark

5, we have that ∫ t

0

Λ(Xs−)ds =

∫ t

0

Λ(Xs)ds, a.s. for each t.

5.1.2 Claims with recovery

As we have seen in Section 1.5, the price of a defaultable bond with recovery can be

split in two parts: the one inherent the face value, and the other inherent the recovery.

That case will be analyzed in the following:

Theorem 5.1.2. Consider a contingent claim with payoff F and recovery (wt)t≥0 . Let

us suppose that (wt)t≥0, (λQt )t≥0, and (rt)t≥0 are bounded. Futhermore, let us suppose that

τ is doubly stochastic under Q driven by a filtration (Ft)t≥0 with the property that (rt)t≥0

and (wt)t≥0 are Ft-adapted. Then, for t > τ , we have St = 0, otherwise ,

St = EQt

[e−

R Tt (ru+λQ

u)duF]

+

∫ T

t

Σ(t, u)du, t < τ ; (5.2)

where

Σ(t, u) = EQt [e−

R ut (rz+λQ

z )dzλQuwu] (5.3)


Proof. We already know the first part of the (5.2), then we have to evaluate:

EQt

[e−

R τ∧Tt rzdz1τ≤swτ

]. (5.4)

From (2.12), we already know that the conditional density of τ is πt(u) = EQt [e−

R ut λ

Qz dzλQ

u ],

then (5.4) can be as expressed as an integral with respect to πt(u)du, getting:

EQt

[e−

R τ∧Tt rzdz1τ≤swτ

]=

∫ Tte−

R ut (rz+λQ

z )dzwuπt(u)du

=∫ Tte−

R ut rzdzwuEQ

t [e−R ut λ

Qz dzλQ

u ]du

=∫ Tt

EQt [e−

R ut (rz+λQ

z )dzλQuwu]du

for the measurability hypotesis made on (rt)t≥0 and (wt)t≥0.

While the first addend can be evaluated with the Theorem 3.1.1, the integrand (5.3),

provided that wτ = ea+b·Xτ− , is naturally evaluated via the Theorem 3.3.1.

If the GREs associated to (5.3) are, as expected, to be solved numerically, we have the

evaluation of the integral in (5.2) for free: Σ(t, u) is of the form (3.12), and a RK method

employed on the associated GREs will return α(ti − t), β(ti − t), A(ti − t), B(ti − t), for

i = 1, . . . , N .

In order to have a better control on the error of∫ st

Σ(t, u)du, we could force the RK

method to work with a fixed h; then, using the trapezium rule with the same h, we have∫ st

Σ(t, u)du =∫ st

(A(u− t) +B(u− t) ·Xt) eα(u−t)+β(u−t)·Xtdu

≈ hN−1∑i=1

(A(ti − t) +B(ti − t) ·Xt) eα(ti−t)+β(ti−t)·Xt+

+h2

[wtλ

Qt + (A(s− t) +B(s− t) ·Xt) e

α(s−t)+β(s−t)·Xt].

Of course integration with RK and the evaluation of this formula can be performed in

parallel, with any need to store the whole GREs’ trajectory.

5.1.3 Unpredictable Default Recovery

The previous theorem suppose that the recovery process is known as time passes,

but it can be considered an unrealistic hypothesis. Without modifying other hypotheses,

a result similar to Theorem 5.1.2 can be obtained if the entity of the recovery will be

revealed only at default time.

Theorem 5.1.3. Let the hypothesis of Theorem 5.1.2 hold, with the difference that w is a

Gτ -measurable random variable. Then there exists a (Gt)−predictable process (Wt)t≥0 such

that

Σ(t, u) = EQt [e−

R ut (rz+λQ

z )dzλQuWu] (5.5)


Proof. From [17],Theorem IV.67(b), there is a (Gt)−predictable process (Wt)t≥0 such that

Ws = EQ [wτ1τ≤T |Gs− ] .Then, since s > t, for the law of iterated expectations

EQt

[e−

R s∧Tt rzdz1τ≤Twτ

]= EQ

t

[e−

R s∧Tt rzdzEQ [wτ1τ≤T |Gs− ]]

= EQt

[e−

R s∧Tt rzdzWs

]=

∫ Tt

EQt [e−

R ut (rz+λQ

z )dzλQuWu]du

(5.6)

where the last equation, and the result, is obtained moving along the lines of Theorem

5.1.2.

5.1.4 Fractional loss of value on default

A particularly interesting result comes if we suppose the recovery to be a fraction of

the security’s value just before the default

Definition 5.2 (recovery of market value (RMV)). Let V V RMt be the value of a contingent

claim that pays F at time T . The recovery in case of default at time τ is given by

Wτ = (1− Lτ )V RMVτ , τ ≤ T

The fractional loss process L = (Lt)t>0 is supposed to be L ∈ [0, 1] and predictable.

Theorem 5.1.4 ([33]). Consider a contingent claim that pays F at time T , where F is

(GT ) - measurable. The recovery process (Wt)t≥0 is (Gt)−adapted and defined by the RMV

assumption.

Let SRMV (t) denote the price of the claim. With qt = rt + λQt Lt let

St = EQt [e−

R Tt qsdsF ], t ≤ T

St = 0, t > T

where it is assumed that (St)t≥0 does not jump at the time of default τ (i.e. ∆Sτ = 0,

a.s). Then for t < τ we have

SRMVt = St.

Unfortunately, to keep the affinity of the modified discount rate (qt)t≥0 we have to

impose that the fractional loss process (Lt)t≥0 is deterministic.

5.2 Credit derivatives 74

5.1.5 Netting

Let us consider two financial institutions, which are entering in a contract with netting

as a covenant. The first firm is selling a defaultable claim that pays A at time T and buying

from the other firm another defaultable claim that pays B. Those claims have, respectively,

recovery at default WA and WB. We assume for simplicity that only two claims are

involved, while the assumption of same maturity is made not to give advantages to the

firm which bought the shorter-lived claim: if at any time t a firm has already monetized

the claim which it possessed, it has no more reasons to not default voluntary.

Then the total value of the agreement, from the first firm’s perspective, is, for A > 0

and B < 0:

St = EQt

[e−

R Tt rudu(A+B)

]+ EQ

t

[e−

R τ∧Tt rudu(WA +WB)1τ≤T

].

where τ = τ1 ∧ τ2. Let us suppose that WA and WB fall in the hypothesis of Theorem

5.1.4, then using the Proposition 2.5.3:

St = EQt

[e−

R Tt ru+LAu (λAu+λBu )duA

]+ EQ

t

[e−

R Tt ru+LBu (λAu+λBu )duB

]; (5.7)

RHS of (5.7) can be easily evaluated with affine framework, imposing on the parameter

the usual structure. We point out that intensities (λAt )t≥0, (λBt )t≥0 and the interest rate

have to be taken as affine, positive processes. It follows from Remark 9 that they must

be instantaneously uncorrelated, which is not a strong limitations since we can consider

that netting clause gets rid of moral hazard.

Let us compare (5.7) with the value of the same agreement without netting:

Vt = EQt

[e−

R Tt (ru+LAu λ

′Au )duA

]+ EQ

t

[e−

R Tt (ru+LBu λ

′Bu )duB

];

we used different intensities λ′At and λ′Bt , since, without netting, the moral hazard com-

ponent is relevant, increasing the intensity of a (perhaps voluntary) default. In this case,

the imposition of an instantaneously uncorrelated intensities could be limiting.

5.2 Credit derivatives

5.2.1 Credit spread options

The case of an option with an exponential affine payout has been discusse in Section

3.3.2, we will deal with a put credit spread option, since the call option can be analogously

approached with the put-call parity formula. Taking up the notations used in Section 1.8,

the price of a credit spread option is

EQ[e−

R t0 rudu1τ>tZt

]. (5.8)

5.2 Credit derivatives 75

In (1.5) the price of the bond can be evaluated as usual, using again the affine framework.

If conditions of Theorem 5.1.4 holds, with Lt = l, and l deterministic and constant, we

have, using the fact that rt + lλQt is affine and Theorem 3.1.1,

e−(Yt+St)(T−t) = EQt

[e−

R Tt (ru+lλQ

u)du]

= eα(t)+β(t)·Xt . (5.9)

Then (Yt + St)(T − t) is affine, and since t is fixed (is the maturity of the option) we will

drop the dependence on t from all deterministic coefficients. Then we define:

T − t = m Yt = y0 + y1 ·Xt

St = s0 + s1 ·Xt α = (y0 + s0)m

β = (y1 + s1)m β = y1m

α = (y0 + s)m d = β − βc = α− α rt + lλQ

t = ρ0 + ρ1 ·Xt−,

pointing out that the coefficients depend implicitly on l, (λt)t≥0, (rt)t≥0, since α and β

are obtained solving GREs involved by the latter identity in (5.9). Then the payoff (1.5)

can be rewritten

Zt =(eα+β·Xt − eα+β·Xt

)1d·Xt≥c

and (5.8), recalling the definition (3.15),

EQ[e−

R t0 rudu1τ>tZt

]= eαG−β,−d(−c;X0, 0, t)− eαG−β,−d(−c;X0, 0, t).

where the RHS can be evaluated using Theorem 3.3.2.

5.2.2 Credit Default Swaps

Let consider a CDS on a bond, with recovery W . Then the value of the protection is:

B = EQt

[e−

R τ∧T0 rudu(1−W )1τ≤T

](5.10)

where T is the maturity of the bond and τ the time of default. The buyer has to pay a

rate r at fixed times t1 < . . . < tn = T , whose market value is:

A = r

n∑i=1

EQt

[e−

R tit rudu1τ>ti

](5.11)

In the light of no-arbitrage considerations, r must make (5.11) and (5.10) equal, that

yields:

r =EQt

[e−

R τ∧T0 rudu(1−W )1τ≤T

]n∑i=1

EQt

[e−

R ti0 rudu1τ>ti

] . (5.12)

5.3 A multiname model 76

The numerator of (5.12), depending on the nature of W , can be valued with Theorems

5.1.2, 5.1.3 and 5.1.4, while the denominator is simply a sum of defaultable bond with

maturity ti.

On the other hand, a binary CDS with premium F has a rate/premium ratio:

r

F=

EQt

[e−

R τ∧T0 rudu1τ≤T

]n∑i=1

EQt

[e−

R ti0 rudu1τ>ti

] =

∫ st

EQt [e−

R ut (rz+λQ

z )dzλQu ]du

n∑i=1

EQt

[e−

R ti0 rudu1τ>ti

] , (5.13)

using (5.6).

5.3 A multiname model

The following model was presented in [11]. Let us consider a scenario with n firm, and

let us define a Markov process (Xt)t≥0 valued in R2n+1+ , Xt = (X0

t , . . . , X2nt ) , t ≥ 0 with

infinitesimal generator:

Df(x) =n∑i=0

αixi∂2f(x)

∂x2i

+n∑i=0

(bi + βi · x)∂f(x)

∂xi+∑p∈I

(f

(x+

2n∑i=n+1

piei

)− f(x)

)(lp + λp · x)

(5.14)

where ei denote the i−th component of the standard basis on Rn (i.e. (ei)j = δij, i, j =

0, . . . , n) and p = (pn+1, . . . , p2n) ∈ I = 0, 1n. Of course, such process is an affine

jump-diffusion process.

The coefficients αi, bi, lp are non-negative scalars, λp ∈ R2n+1+ , and βi = (βi,0, . . . , βi,2n).

The process (X0t )t≥0 is the short rate, (X i

t)t≥0, i = 1, . . . , n is the rating of the i−th firm

and (X i+nt )t≥0, i = 1, . . . , n is the status of the i−th firm (i.e. defaulted or not).

Ratings are taken with the convention: “the lower, the better”, if X i+nt = 0, the i−th

firm is not defaulted at time t, and we will assume that X i+n0 = 0. Moreover, the default

time of the i−th firm is τi = inft ≥ 0 : Xn+i

t = 1

.

In the light of those considerations we can give motivations and interpretations of the

involved parameters and restrictions.

• αi ≥ 0 since we are in R2n+1+ , and that is a consequence of Theorem 3.5.1, therefore

all diffusion parts are uncorrelated, and the correlation between ratings has to be

achieved acting on the drift part.

• The quantity bi + βi · x denotes the dependence of the credit rating for the firm i

on all entities on the market. To be more precise, the coefficient βi,j quantifies the

dependence on the Xjt and they can be positively correlated or not. We willl assume

that βi,j ≥ 0,∀i 6= j.


• The vector p is a possible failure scenario, and to each scenario is related an intensity

lp + λp · x, considerations made on βi also apply to λp. If a scenario p is considered

to be impossible, just put λp and lp to 0. We point out that when the i−th firm

defaults, X it will keep on evolving and influencing other variables. This effect can

be corrected (or removed) adding −n∑i=1

∂f∂xi+n

βi+nxi+n to the generator(5.14).

• The short rate X0t act on all the quantities via β0,i, but it is unreasonable that other

quantities would act on the short-rate. Therefore, βi,0 = 0, i = 1, . . . , 2n.

The default indicator function can be written as a limit of an exponential function, i.e.

limk→∞

e−kXi+nt = 1τi>t;

For the peculiar form of this model, we can evaluate the price of a contingent claim

simply solving a slightly modified version of GREs. We will consider the case n = 3 in

the following

Proposition 5.3.1. For t ≤ T , v ∈ R7−, δ ≥ 0 and p ∈ I we have

Et

[e−δ

R Tt X0

sdsev·XT limk→∞

e−k(p4X4T+p5X5

T+p6X6T )]

=

eφ(T−t,v;δ;p)+Pi∈0,...,3∪J0(p) ψi(T−t,v;δ;p)Xi

t

∏j∈J1(p)

1Xjt=0

where J0(p) := 4 ≤ j ≤ 6 : pj = 0, J1(p) := 4 ≤ j ≤ 6 : pj = 1 and the R−−valued

functions φ = φ (T − t, v; δ; p) and ψi = ψi (T − t, v; δ; p) solves

∂∂tφ =

3∑k=0

bkψk +∑

q∈I0(p)

lq(eq4ψ4+q5ψ5+q6ψ6 − 1)−∑

q∈I1(p)

lq

∂∂tψi = αiψ

2i +

3∑k=0

βk,iψk +∑

q∈I0(p)

λq,i(eq4ψ4+q5ψ5+q6ψ6 − 1)−

∑q∈I1(p)

λq,i − δ1i=0

∂∂tψj =

3∑k=0

βk,jψk +∑

q∈I0(p)

λq,j(eq4ψ4+q5ψ5+q6ψ6 − 1)−

∑q∈I1(p)

λq,j

φ (0, v; δ; p) = 0

ψi (0, v; δ; p) = vi

ψj (0, v; δ; p) = vj(5.15)

for i = 0, . . . , 3 and j ∈ J0(p), where I0(p) := q ∈ I : qj = 0 ∀j ∈ J1(p) and I1(p) :=

I\I0.

Proof. By dominated convergence:

Et

[e−δ

R Tt X0

sdsev·XT limk→∞

e−k(p4X4t +p5X5

t +p6X6t )]

= limk→∞

Et

[e−δ

R Tt X0

sdsev·XT e−k(p4X4t +p5X5

t +p6X6t )]

=


limk→∞

eφ(T−t,v−k(p4e4+p5e5+p6e6);δ)+ψ(T−t,v−k(p4e4+p5e5+p6e6);δ)·Xt (5.16)

where ψ and φ solve the usual GREs:

∂∂tφ =

3∑k=0

bkψk +∑q∈Ilq(eq4ψ4+q5ψ5+q6ψ6 − 1)

∂∂tψi = αiψ

2i +

3∑k=0

βk,iψk +∑q∈Iλq,i(e

q4ψ4+q5ψ5+q6ψ6 − 1)− δ1i=0

∂∂tψj =

3∑k=0

βk,jψk +∑q∈Iλq,j(e

q4ψ4+q5ψ5+q6ψ6 − 1)

φ (0, v; δ; p) = 0

ψi (0, v; δ; p) = vi

ψj (0, v; δ; p) = vj

(5.17)

with i = 0, . . . , 3 and j = 4, 5, 6; By Theorem 4.1.4 we have that ψ ∈ R7−, then ∂

∂tψj ≤ 0;

therefore:

ψj (t, v − k(p4e4 + p5e5 + p6e6; δ) ≤ ψj (0, v − k(p4e4 + p5e5 + p6e6; δ) = vj−k1pj=1, t ≥ 0.

Then is straightforward to split 4, 5, 6 in two sets that depends on the choice of p: J1(p)

and J0(p); we have

limk→∞

ψj (t, v − k(p4e4 + p5e5 + p6e6; δ) =

ψj(t, v; δ), j ∈ J1(p)

−∞, j ∈ J0(p).

We will also split I in two sets: I0(p) and I1(p); in the light of the new notation, we can

rewrite (5.16) and (5.17) as

limk→∞

eφ+Pi∈0,...,3∪J0(p) ψiX

it

∏j∈J1(p)

eψjXjt

and

∂∂tφ =

3∑k=0

bkψk +∑

q∈I0(p)

lq(eq4ψ4+q5ψ5+q6ψ6 − 1) +∑

q∈I1(p)

lq(eq4ψ4+q5ψ5+q6ψ6 − 1)

∂∂tψi = αiψ

2i +

3∑k=0

βk,iψk +∑

q∈I0(p)

λq,i(eq4ψ4+q5ψ5+q6ψ6 − 1) +

∑q∈I1(p)

λq,i(eq4ψ4+q5ψ5+q6ψ6 − 1)− δ1i=0

∂∂tψj =

3∑k=0

βk,jψk +∑

q∈I0(p)

λq,j(eq4ψ4+q5ψ5+q6ψ6 − 1) +

∑q∈I1(p)

λq,j(eq4ψ4+q5ψ5+q6ψ6 − 1).

The result follows from the observation that limk→∞

eψjXjt = 1Xj

t=0, j ∈ J1(p) and

limk→∞

eq4ψ4+q5ψ5+q6ψ6 = 0, q ∈ I1(p).


Remark 17. Theorem 5.3.1 considers a contingent claim of the form

ev·XT limk→∞

e−k(p4X4T+p5X5

T+p6X6T ),

which is, obviously, dependent on the choice of a default scenario p ∈ I. E.g., let us take

p = (1, 1, 1) or p = (1, 0, 1), then

ev·XT limk→∞

e−k(X4T+X5

T+X6T ) = ev·XT 1τ1>T1τ2>T1τ3>T = 1τ1∧τ2∧τ3>T

or

ev·XT limk→∞

e−k(X4T+X6

T ) = ev·XT 1τ1>T1τ3>T = ev·XT 1τ1∧τ3>T.

Then is possibile to price a contingent claim depending on a precise default scenario,

simply observing that 1τ≤T = 1− 1τ>T and 1s<τ<t = 1τ>s − 1τ>t.

5.3.1 Pricing a CDS

In 5.2.2 we have already approached the evaluation of a CDS, assuming that neither

the buyer of the protection nor the seller of the protection can default, mainly due to

the difficulty to express more than two defaults with the doubly stochastic setting. Let

us consider the time between two payments, t ∈ (tk−1, tk], k ≤ n, then three different

scenarios can occur:

1. if none of the actors default up to tk (i.e. tk < τ1 ∧ τ2 ∧ τ3) , the buyer (firm 2)

performs a payment to the seller (firm 3), amounting to an r to be determined.

2. If the reference entity (firm 1) has defaulted in period (tk−1, tk], (tk−1 < τ1 ≤ tk)

and the seller has not defaulted yet (tk < τ3 ) and the buyer has not defaulted by

tk (tk−1 < τ2) then the seller pays 1 − ea+b·Xtk , a ∈ R−, b ∈ R7−, and the contract

terminates.

3. In any other situation, the contract resolves with any effect.

Then, taking up again the notation of Section 5.2.2 we have, for t ≤ t1

B =n∑i=1

EQt

[e−

R tit X0

udu(1− ea+b·Xti )1ti−1<τ1≤ti1ti<τ31ti−1<τ2

]and

A = r

n∑i=1

EQt

[e−

R tit X0

udu1ti<τ1∧τ2∧τ3

]. (5.18)


Accordingly to Remark 17 we can write 1ti<τ1∧τ2∧τ3 = limk→∞

e−k(X4ti

+X5ti

+X6ti

) and

1ti−1<τ1≤ti1ti<τ31ti−1<τ2 = limk,l→∞

(e−lX4

ti−1 − e−kX4ti )e

−lX5ti−1−kX6

ti . Then, like we did in

(5.12), we can write

r =

n∑i=1

B1,it −B

2,it −B

3,it +B4,i

t

n∑i=1

EQt

[e−

R tit X0

udu1ti<τ1∧τ2∧τ3

].

where

B1,it = EQ

t

[e−

R tit X0

udu limk,l→∞

e−l“X4ti−1

+X5ti−1

”−kX6

ti

]B2,it = EQ

t

[e−

R tit X0

udu limk,l→∞

e−k(X4

ti+X6

ti)−lX5

ti−1

]B3,it = EQ

t

[e−

R tit X0

uduea+b·Xti limk,l→∞

e−l“X4ti−1

+X5ti−1

”−kX6

ti

]B4,it = EQ

t

[e−

R tit X0

uduea+b·Xti limk,l→∞

e−k(X4

ti+X6

ti)−lX5

ti−1

](5.19)

The argument of the sum in (5.18) can be evaluated simply applying Proposition 5.3.1,

while the terms in (5.19) can be evaluated using the law of iterated expectations, taking

as inner conditional expectation Etk−1and then using twice (5.18). For the sake of brevity,

we don’t report here those passages, referring to [11], 13-14, for complete expressions.

Chapter 6

A numerical example

We take again the example shown in Section 3.6, combining them to build a complete

model to evaluate a defaultable claim. Let us consider, as introduced in Remark 13, a

defaultable claim modelled with Bates model; for simplicity, let us suppose that there is

no recovery; then the price of such claim is

EQt

[e−

R Tt (rs+λs)dseYT

]= eα(t,T )+β1(t,T )Vt+β2(t,T )rt+β3(t,T )λt+β4(t,T )Yt ; (6.1)

where dVt = kV (γV − Vt)dt+ σV

√VtdW

Vt

drt = kr(γr − rt)dt+ σr√rtdW

rt + dJrt

dλt = kλ(γλ − λt)dt+ σλ√λtdW

λt + dJλt

dYt = (µS − 12Vt)dt+

√VtdW

St + dJSt .

(6.2)

We have modelled (λt)t≥0 and (rt)t≥0 with two CIR with jumps model, and log-price

with affine Bates model with gaussians jumps, distributed as N(ln(1 + k)− δ

2, δ

2

), accord-

ingly to [14], Section 15.2; respective jump intensities are denoted with lλ, lr, lY .

Moreover, we suppose that(W Vt

)t≥0

and(W Yt

)t≥0

are instantaneously correlated with

correlation ρ.

Then we can readily write the characteristics of such model, recalling that, in case of

multiple jump types, the GREs has to be modified, accordingly to Section 3.3.4:

µ =

kV γV

krγr

kλγλ

µS

+

−kV 0 0 0

0 −kr 0 0

0 0 −kλ 0

−12

0 0 0

Vt

rt

λt

Yt

; (6.3)

81

82

σσT =

σ2V 0 0 ρσV

0 0 0 0

0 0 0 0

ρσV 0 0 1

Vt +

0 0 0 0

0 σ2r 0 0

0 0 0 0

0 0 0 0

rt +

0 0 0 0

0 0 0 0

0 0 σ2λ 0

0 0 0 0

λt;θ1(c) = 1

1−drc2 , θ2(c) = 11−dλc3 , θ3(c) = exp

([ln(1 + k)− δ

2

]c4 + (δc4)2

8

);

l10 = lr, l20 = lλ, l30 = lY ;

ρ1 = (0, 1, 1, 0);

all the above characteristic satisfy the conditions of Theorem 3.5.1; we want to point out

that, unrealistically, the process (Yt)t≥0 is independent of (λt)t≥0. While the structure of

covariance matrix is fixed by Theorem 3.5.1, we could add negative components in the

last row of the matrix in (6.3), such that the higher the intensity of default, the lower the

price.

Now we can write down GREs associated to the problem (6.1):

β1 = −kV β1 − 12β4 + 1

2β2

1 + 12β1β4ρσV

β2 = −1− krβ2 + 12σ2rβ

22

β3 = −1− kλβ3 + 12σ2λβ

23

β4 = 12ρβ2

1σV + 12β1β4σ

2V

α = kV γV β1 + krγrβ2 + kλγλβ3 + µSβ4 + lr drβ2

1−drβ2+ lλ dλβ3

1−dλβ3+

lY(e[ln(1+k)− δ

2 ]β4+(δβ4)2

8 − 1

)α(0) = 0

β(0) = (0, 0, 0, 1).

(6.4)

Here β2 and β3 are decoupled and they can be solved explicitly: then application of

Lemma 3.2.1 yields:

β2(t) = − 2(eρ2(T−t)−1)(ρ2+kr)(eρ2(T−t)+1)+2ρ2

β3(t) = − 2(eρ3(T−t)−1)(ρ3+kλ)(eρ3(T−t)+1)+2ρ3

β1 = −kV β1 − 12β4 + 1

2β2

1 + 12ρσV β1β4

β4 = 12ρσV β

21 + 1

2σ2V β1β4

α = kV γV β1 + krγrβ2 + kλγλβ3 + µSβ4 + lr drβ2

1−drβ2+ lλ dλβ3

1−dλβ3+

lY(e[ln(1+k)− δ

2 ]β4+(δβ4)2

8 − 1

)α(0) = 0, β1(0) = 0, β4(0) = 0;

where ρ2 =√

(kr)2 + 2σ2r and ρ3 =

√(kλ)2 + 2σ2

λ. The knowledge of β2(t) and β3(t) can

be exploited to test our implementation of RKDP.

83

Table 6.1: Parameters for Bates model.

drift part diffusion part jump part Initial values

(Vt)t≥0 kV = 0.9 γV = 0.6 σV = 0.6 − − − V0 = 0.70

(rt)t≥0 kr = 2 γr = 0.07 σr = 0.65 lr = 0.12 dr = 0.1 − r0 = 0.08

(λt)t≥0 kλ = 1.5 γλ = 0.08 σλ = 0.4 lλ = 0.09 dλ = 0.15 − λ0 = 0.05

(Yt)t≥0 µS = 0.65 − ρ = −0.50 lY = 0.1 k = 1 δ = 0.5 Y0 = 5

Table 6.1 reports the parameters we used to integrate (6.4), we assume that we are

using those parameters in each figure, unless differently stated.

To begin with we will test our algoritm integrating β2 with different value of tolerance

(Figures 6.1 - 6.2 - 6.3 ), for comparison we have plotted the exact solution with a con-

tinuous red line: We want to point out how the step adaptivity works well in Figure 6.3:

0 1 2 3 4 5

01

23

4

t

beta

_2

Figure 6.1: Local error tolerance: 10−1, evaluations needed: 37; the algorithm shows a

numerical instability.

it relaxes the step length on the linear part and tighten it where the solution is varying

faster.

For our pricing purposes we are in particular interested to the value of α(0, T ) and

84

0 1 2 3 4 5

−0.

4−

0.3

−0.

2−

0.1

0.0

t

beta

_2

Figure 6.2: Local error tolerance: 10−2, evaluation needed: 42; no instability.

0 1 2 3 4 5

−0.

4−

0.3

−0.

2−

0.1

0.0

t

beta

_2

Figure 6.3: Local error tolerance: 10−5, evaluations needed: 127; perfect matching.

85

β(0, T ). Let us consider forward GREs (cf. (3.7)); in order to ensure that would get the

solution for s = T , we check every time that the time sn ≤ T . If at any time sn > T and

|sn − T | ≥ hmin, we force the algorithm to perform again the last step with sn = T .

0 1 2 3 4 5

0.0

0.5

1.0

1.5

T

alph

a

0 1 2 3 4 5

−0.

3−

0.2

−0.

10.

0

T

beta

_1

0 1 2 3 4 5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_2

0 1 2 3 4 5

−0.

6−

0.5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_3

0 1 2 3 4 5

0.80

0.85

0.90

0.95

1.00

T

beta

_4

0 1 2 3 4 5

150

160

170

180

190

T

Pric

e

Figure 6.4: Local error tolerance: 10−5, evaluations needed: 127; perfect matching. The

price is increasing with maturity, i.e is convenient to perform a long-run investment.

Since (6.4) is an autonomous system (all the coefficients are supposed constant), so-

lutions depend only on T − t, then we will consider the problem:

EQ0

[eR T0 (rs+λs)dseYT

]= eα(0,T )+β1(0,T )V0+β2(0,T )r0+β3(0,T )λ0+β4(0,T )Y0 ; (6.5)

i.e. we want to look at the same product, having at hand initial values at time t = 0,

86

but for different maturities T , and see how the price change. We have done some simulation

with different parameters, to show how many price structure can be reproduced by this

toy model.

We observe that β1, β2 and β3 are negative, which is reasonable since higher initial

value of, respectively, volatility, interest rate and intensity of default would make the

investment less appealing, i.e. with a lesser price, even with the same initial value of Y0.

On the other hand, β4 is positive and decreasing, then an higher value value of Y0 would

mean an higher price, but those effect will be less significant as maturity increases.

0 1 2 3 4 5

0.0

0.5

1.0

1.5

T

alph

a

0 1 2 3 4 5

−0.

3−

0.2

−0.

10.

0

T

beta

_1

0 1 2 3 4 5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_2

0 1 2 3 4 5

−0.

6−

0.5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_3

0 1 2 3 4 5

0.80

0.85

0.90

0.95

1.00

T

beta

_4

0 1 2 3 4 5

150

155

160

165

170

175

T

Pric

e

Figure 6.5: Local error tolerance: 10−5, evaluations needed: 127; lY = lλ = lr = 0; the

price still grows but at a slower pace.

87

Another computation that can be carried out, is the evaluation of the income/outcome

ratio defined in Section 3.4; we can see how this ratio varies as the maturity increase.

Figures 6.6 and 6.7 show a decreasing ratio; the effect of the corrected (since contains also

the risk spread) interest rate become more and more relevant as the maturity increases.

0 1 2 3 4 5

−0.

10.

00.

10.

20.

3

T

alph

a

0 1 2 3 4 5

−0.

3−

0.2

−0.

10.

0

Tbe

ta_1

0 1 2 3 4 5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_2

0 1 2 3 4 5

−0.

6−

0.5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_3

0 1 2 3 4 5

0.80

0.85

0.90

0.95

1.00

T

beta

_4

0 1 2 3 4 5

4060

8010

012

014

0

T

Pric

e

Figure 6.6: Local error tolerance: 10−5, evaluations needed: 127; γλ = 0.50. This asset is

really likely to default in the future, then the price is dropping.

88

0 1 2 3 4 5

0.0

0.5

1.0

1.5

2.0

2.5

T

alph

a

0 1 2 3 4 5

−0.

3−

0.2

−0.

10.

0

T

beta

_1

0 1 2 3 4 5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_2

0 1 2 3 4 5

−0.

6−

0.5

−0.

4−

0.3

−0.

2−

0.1

0.0

T

beta

_3

0 1 2 3 4 5

0.80

0.85

0.90

0.95

1.00

T

beta

_4

0 1 2 3 4 5

140

160

180

200

220

240

260

280

T

Pric

e

Figure 6.7: Local error tolerance: 10−5, evaluations needed: 127; λ0 = 0.80, δ = 2 ,

µS = 0.85. This asset is really likely to default in the near future, then the price is

dropping but the mean reversion make the price rise on the long run.

89

0 1 2 3 4 5

150

160

170

180

190

200

210

220

T

Pric

e

0 1 2 3 4 5

150

200

250

300

350

400

T

Pric

e_1

0 1 2 3 4 5

0.6

0.7

0.8

0.9

1.0

T

Pric

e/P

rice_

1

Figure 6.8: Local error tolerance: 10−5, evaluations needed: 127. Upper picture: discounted

price of the asset; middle picture: price of the asset without discount, bottom picture: ratio.

90

0 1 2 3 4 5

140

145

150

155

160

T

Pric

e

0 1 2 3 4 5

150

200

250

300

350

400

T

Pric

e_1

0 1 2 3 4 5

0.4

0.5

0.6

0.7

0.8

0.9

1.0

T

Pric

e/P

rice_

1

Figure 6.9: Local error tolerance: 10−5, evaluations needed: 127; λ0 = 0.50, lλ = 0.15.

Upper picture: discounted price of the asset; middle picture: price of the asset without

discount, bottom picture: ratio.

Conclusion

In this thesis, affine models and their applications were studied and investigated. Our

addition to this yet well developed theory are some minor proofs, and the pricing of some

derivatives.

Our focus was on numerical solving of GREs, and we wrote a state-of-the-art Runge-

Kutta routine, from scratch in C++. That routine perform very well in the given example,

and in many other instances. Execution times were always in the order of a second on

an ordinary laptop, for a local error tolerance 10−5. Then evaluation of product modelled

with affine process are suitable to be solved on any computer, including a personal one.

Then, future developments of this field of applications can be :

• Develop more products that can be priced via an affine framework.

• To explored the effect of adding jump, with Laplace transform not known in a closed

form. Then the integral ∫D\0

(eu·ξ − 1− u · χ(ξ)

)µ(dξ)

has to be evaluated numerically, and the code has to be written, accordingly to the

particular jump measure µ.

• Every evaluation of the RHS of a GREs can be very expensive, and methods that

request lesser evaluation of RHS has to be employed, like Burlish-Stoer.

• Could be useful to develop an “affine numerical toolbox”, i.e. a black-box software

oriented to the evaluation of financial products, modelled with affine processes. That

should include all the improvements proposed in the previous points, along model

calibration and an user-friendly interface, to fulfill needs of practitioners.

91

Appendix A

Measure Theory

A.1 Stiejeltes-Lebesgue integration

We will presente here some definition about

Definition A.1. Let f : [0, t]→ R such that

Vf (t) = supD

N∑i=1

|f(ti)− f(ti−1)| <∞

where D is the set of finite partitions of [0, t] :

0 = t0 < t1 < . . . < tn = t.

Then Vf (t) is called the variation of f over [0, t] and f is said to be of finite variation on

each compact interval of R+

It is well known that any function of finite variation can be decomposed into the

difference of two increasing functions, i.e. if g : [0,∞) → R function of finite variation

then there exist monotone increasing functions a : [0,∞) → R and b : [0,∞) → R such

that g(t) = a(t)− b(t). To a and b, correspond two measure

µa((0, t]) = a(t) = Vf (t), µb((0, t]) = b(t)

Thus it is sufficient to define the Stieltjes integral for monotone increasing functions, since

for all measurable function u, and g we can write∫ t

0

u(s)dg(s) =

∫ t

0

u(s)da(s)−∫ t

0

u(s)db(s)

92

A.2 Lebesgue measure theorems 93

Definition A.2. Let f : [0,∞) → R be a deterministic function and g : [0,∞) → Ra monotone increasing function. Let p be a finite partition of the interval [a, b], and let

|p| := supi |ti+1 − ti|. The Stieltjes integral of the function f with respect to the function

g over an interval [a, b] is defined as∫ b

a

f(s)dg(s) = lim|p|→0

n∑i=1

f(εi)(g(ti+1)− g(ti))

where εi ∈ [ti, ti+1].

Theorem A.1.1 (Exponential formula, [7], T4, 337). Let a be a right continuous increas-

ing function with a(0) = 0 and let u be such that∫ t

0

|u(s)| da(s) <∞, t ≥ 0

then the equation

x(t) = x(0) +

∫ t

0

x(s−)u(s)da(s), t ≥ 0

admits a unique locally bounded (sups∈[0,t] |x(s)| <∞, t ≥ 0) solution given by

x(t) = x(0)∏

0<s≤t

(1 + u(s)∆a(s)) exp

(∫ t

0

u(s)dac(s)

), t ≥ 0 (A.1.1)

where ∆a(t) = a(t)− a(t−) and act = a(t)−∑

0<s≤t ∆a(s)

It is possible to write the Fourier transform of a function in terms of Stejeltes integral:

Definition A.3 (Fourier-Stieltjes transform). Let α be a monotone increasing, real-valued

function of finite variation, then

f(x) =

∫Reitxdα(s)

is well defined. The function f(x) is called the Fourier-Stieltjes transform of α.

A.2 Lebesgue measure theorems

Those are some classical convergence results for the Lebesgue measure, all the functions

are considered to be positive, since is always possible to take f = f+ − f−, where f+ and

f− are positive functions. We took as reference [37].

A.2 Lebesgue measure theorems 94

Theorem A.2.1 (Dominated convergence theorem). If the sequence fn(X)n→ f(x) and

if for all n

fn(x) ≤ ϕ(x)

where ϕ(x) is integrable, then the limit function f(x) is integrable and∫A

fn(x)dµ(x)n→∫A

f(x)dµ(x)

Theorem A.2.2 (Bounded convergence theorem). If the sequence fn(X)n→ f(x) and if

for all n

fn(x) ≤ K

then the limit function f(x) is integrable and∫A

fn(x)dµ(x)n→∫A

f(x)dµ(x)

Theorem A.2.3 (Fubini theorem). Let the measures µx and µy be defined on Borel rings,

σ−attive and complete; let, moreover,

µ = µx ⊗ µy,

and let the function f(x, y) be integrable with respect to the measure µ on the sets

A = Ax0 × Ay0 .

Then ∫a

f(x, y)dµ =

∫Y

(∫Ay

f(x, y)dµy

)dµx =

∫X

(∫Ax

f(x, y)dµx

)dµy.

We denote with Ax = y : (x, y) ∈ A, and Ay = x : (x, y) ∈ A.

Appendix B

Stochastic processes

Here we will recall results and definitions used troughout

B.1 Definitions and basic results

Definition B.1 (Filtration). A filtration on (Ω,F ,P) is a family (Ft)t≥0 of sigma alge-

bras Ft ⊂ F such that

Ft ⊂ Fs, 0 ≤ t < s

(i.e. it is increasing). When it is clear from the context, the dependence on the elements

of the probability space will be omitted.

Definition B.2 (Usual conditions). A filtration (Ft)t≥0 is said to satisfy the usual con-

ditions if

• F0 contains all null sets included in F

• ∀t,Ft =⋂s>t Fs, i.e. the filtration is right continuous.

Definition B.3 (Martingale). An n−dimensional stochastic process (Xt)t≥0 on (Ω,F ,P)

is called martingale with respect to the filtration (Ft)t≥0 (for short, (Ft)−martingale or

P−martingale) if

(i) Xt is Ft−measurable for all t.

(ii) EP [|Xt|] <∞ for all t.

(iii) EP [Xs|Ft] = Xt for all 0 ≤ t ≤ s.

If it is clear from the context, the dependence on P or (Ft)t≥0 will be omitted.

95

B.1 Definitions and basic results 96

Definition B.4 (Adapted process). Let be (Ft)t≥0 a filtration defined on the probability

space (Ω,F ,P). A real valued process (Xt(ω))t≥0 is said to be (Ft)−adapted if for each

t ≥ 0 the function

ω → Xt(ω)

is Ft−measurable.

Definition B.5 (Predictable process). Let be (Ft)t≥0 a filtration defined on the proba-

bility space (Ω,F ,P), and let P (Ft) be the σ−algebra generated by the rectangles of

the form

(s, t]× A; 0 ≤ s ≤ t, A ∈ Fs.

A real valued process (Xt)t≥0 such that X0 is (F0)−measurable, and the mapping (t, ω)→Xt(ω) is P (Ft)−measurable is said to be (Ft)−predictable.

An useful class of predictable processes is the one formed by the left continuous process,

as shown by this theorem:

Theorem B.1.1. An Rn−valued process (Xt)t≥0 adapted to (Ft)t≥0 and left continuous

is (Ft)−predictable.

Proof. It suffice to prove the theorem for a R−valued process, for the case in Rn, the

proof has to be carried out component-wise.

Since Xt(ω) is left-continuous, for all (t, ω) ∈ [t,+∞)× Ω we have,

Xt(ω) := limn→∞

[n2n−1∑q=0

Xq/2n(ω)1 q2n<t≤q+ 1

2n +Xn(ω)1t>n

].

We have, for any n ≥ 1 and 0 ≤ q ≤ n2n − 1, that Xn and Xq/2n are Fn and

Fq/2n−measurable. ThenXq/2n(ω)1 q2n<t≤q+ 12n andXn(ω)1t>n are P

(Fq/2n

)and P (Fn)-

measurable. Then, summing over q passing to the limit in n, we have that Xt(ω) is

P (Ft)−measurable, for all t ≥ 0.

In our context all the predictable process of use will be left continuous, and therefore is

common to use the class of adapted, left continuous processes as a definition of predictable

process (cf. [21] and [14]).

Theorem B.1.2 (Corollary 3.2.6, [43]). Let ft(ω) : [0,∞)× Ω→ Rn be a function such

that, for all t ≥ 0

(i) (t, ω) → ft(ω) is B × F−measurable, where B denotes the Borel σ-algebra on

[0,∞).

(ii) ft(ω) is Ft−adapted.

B.2 Levy Processes 97

(iii) E[∫ t

0fs(ω) · fs(ω)ds

]<∞.

then the integral, in the Ito’s sense, ∫ t

0

fsdWs

is a (Ft)−martingale.

Definition B.6 (Local martingale). Let (Xt)t≥0 be a cadlag, adapted process. (Xt)t≥0 is

a local martingale if there exists an increasing succession of stopping times (Tn)n∈N with

limn→∞ Tn =∞ a.s. , such that(Xt∧Tn1Tn>0

)t≥0

is a martingale.

Theorem B.1.3. Let (Xt)t≥0 be a local martingale. If E[sups≤t|Xs|

]<∞, then (Xt)t≥0 is

a Martingale.

Proof. Let (Tn)n∈N be a succession of stopping times for (Xt)t≥0. Then, E[Xt∧Tn |Fs ] =

Xs∧Tn . If we send n to infinity, dominated convergence theorem yields E[Xt |Fs ] = Xs.

Definition B.7 (Cadlag processes). A process (Xt)t≥0 is said to be cadlag if

• lims↑tXs = Xt a.s. for all t ≥ 0, i.e. it is right continuous

• exists lims↓tXs a.s. for all t ≥ 0, i.e. it has left limits

Definition B.8 (Modification). Two processes (Xt)t≥0 and (Yt)t≥0 are modifications if

Xt = Yt a.s., each t ≥ 0.

Theorem B.1.4 (Corollary 1, p. 8, [46]). if X = (Xt)0≤t<∞ is a martingale, then there

exists one unique Y modification of (Xt)t≥0 such that Y is cadlag.

B.2 Levy Processes

Here we present the definition and some results about Levy processes, we refer to [14]

for all the proof and details, located mainly in Chapters 3 and 4. We will denote a set A

equipped with the sigma algebra A with (A,A ) and the usual Borel σ−algebra with B.

Definition B.9 (Radon measure). Let E ⊂ Rd. A Radon measure on (E,B) is a measure

µ such that for every compact measurable set B ∈ B, µ(B) <∞.

Definition B.10 (Levy process). A cadlag stochastic process (Xt)t≥0 on (Ω,F ,P) with

values in Rd such that X0 = 0 is called Levy process if:


1. Independent increments: for every increasing sequence of time t0, . . . , tn the random

variables Xt0 , Xt1 −Xt0 , . . . , Xtn −Xtn−1 are independents.

2. Stationary increments: Xt+h −Xtd= Xs+h −Xs, for fixed h and all t ≥ 0.

3. Stochastic continuity: limh→0

P[|Xt+h −Xt| ≥ ε] = 0 for all ε ≥ 0.

Definition B.11. Let (Ω,F,P) be a probability space, E ⊂ Rd and µ a given (positive)

Radon measure µ on (E,E ). A Poisson random measure on E with intensity measure µ

is an integer valued random measure, i.e. :

M : Ω× A→ N

such that:

1. For almost all ω ∈ Ω , M(ω, ·) is an integer-valued Radon measure on E: for any

bounded measurable A ⊂ E,M(A) <∞ is an integer valued random variable.

2. For each measurable set A ⊂ E,M(·, A) = M(A) is a Poisson random. variable with

parameter µ(A):

P[µ(A) = k] = e−µ(A) (µ(A))k

k!, k ∈ N

.

3. For disjoint measurable sets A1, . . . , An, the variables M(A1), . . . ,M(An) are inde-

pendent.

Definition B.12 (Jump measure). Let (Xt)t≥0 be a cadlag process on Rd. The measure

JX on [0,∞)× Rd is defined by

JX(B) = |(t,Xt −Xt−) ∈ B|

for any measurable set B ∈ [0,∞)× Rd. We denote with | · | the cardinality of a set.

Definition B.13 (Levy measure). Let (Xt)t≥0 be a Levy process on Rd. The measure ν

on Rd is defined by

ν(A) = E[|t ∈ [0, 1] : ∆Xt 6= 0,∆Xt ∈ A|], A ∈ B(Rd),

is called the Levy measure of (Xt)t≥0 : ν(A) is the expected number, per unit time, of

jumps whose size belongs to A.

Proposition B.2.1 (Levy-Ito decomposition). Let (Xt)t≥0 be a Levy process on Rd and

ν its Levy measure.


• ν is a Radon measure on Rd\ 0 and verifies∫|x|≤1

|x|2ν(dx) <∞∫|x|≥1

ν(dx) <∞

.

• The jump measure of (Xt)t≥0, denoted by JX , is a Poisson random measure on

[0,∞)× Rd with intensity measure ν(dx)dt.

• There exist a vector γ and a brownian motion (Wt)t≥0 with covariance matrix A of

a such that :Xt = γt+Wt +X l

t + limε↓0Xεt

X lt =

∫|x|≥1,s∈[0,t]

xJX(ds× dx)

Xεt =

∫ε≤|x|≤1,s∈[0,t]

x JX(ds× dx)− ν(dx)ds.(B.2.1)

The terms in (B.2.1) are independent and the convergence in the last term is a.s. and

uniform in t ∈ [0, T ]. The triplet (A, γ, ν) is said to be characteristic triplet of (Xt)t≥0.

The first result tell us that is a Radon measure over Rd\0, but nothing prevents

ν(Rd) to be infinite. Since ν can diverge at the origin, we write limε↓0Xεt . In our context

we are interested to finite activity process (i.e ν(Rd) < ∞), then we have no problems

putting directly ε = 0.

B.2.1 Compound Poisson process

Definition B.14 (Compound Poisson process). A compound Poisson process (or pure

jump process) with intensity λ > 0 and jump size distribution f is a stochastic process

(Xt)t≥0 defined as

Xt =Nt∑i=1

Yi,

where jumps sizes Yi are i.i.d. with distribution f and (Nt)t≥0 is a Poisson process with

intensity λ, independent from (Yi)i≥1.

The compound Poisson process is, of course, a Levy process. More can be said:

Proposition B.2.2. (Xt)t≥0 is a compound Poisson process if and only if it is a Levy

process and its sample paths are piecewise constant functions.

This result, combined with the following:

B.3 Infinitesimal generator of a Markov Process 100

Proposition B.2.3 (Jump measure of a compound Poisson process). Let (Xt)t≥0 be a

compound Poisson process with intensity λ and jump size distribution f . Its jump measure

JX is a Poisson random measure on Rd × [0,∞) with intensity measure µ(dx × dt) =

ν(dx)dt = λf(dx)dt.

allow us to recognize that the characteristic of a compound Poisson process is (0, 0, λf).

B.3 Infinitesimal generator of a Markov Process

We want to present here some results about infinitesimal generator, without claiming

to be exhaustive. We refer to [25] for further details:

Definition B.15 (Markov process). An Rn-valued process (Xt)t≥0 is called Markov pro-

cess if

P [Xt ∈ B |σ (X(u) : 0 ≤ u ≤ s) ] = P [Xt ∈ B |σ (X(s)) ]

for all 0 ≤ s ≤ t and for all B ∈ B.

The definition of Levy process, namely the independent increments hypothesis, clas-

sifies it as a Markov process. A stronger property holds:

Theorem B.3.1 (Strong Markov property). Let (Xt)t≥0 be a Levy process. If T is a

nonanticipating random time, then the process Yt = Xt+T −XT , t ≥ 0 is again a Levy

process, independent from FT and with same law as (Xt)t≥0 .

Theorem B.3.2. Let (Xt)t≥0 be a Markov process and f : Rn → Rn, and let C0 be the

set of continuous functions vanishing at infinity. His transition operator, defined as

Ptf(x) = E [f(x+Xt)]

is a semigroup, i.e.

PtPs = Pt+s

and, if Ptf ∈ C0,

limt↓0

Ptf(x) = f(x), ∀x ∈ Rn (Feller property)

where the convergence is with respect to the sup norm on C0.

Definition B.16. Let f ∈ C0 and (Xt)t≥0 a Markov process. Then his infinitesimal

generator is defined as

Df = limt↓0

Ptf − ft

Again, the limit is taken with respect to the sup norm on C0.

B.3 Infinitesimal generator of a Markov Process 101

The following result allow us to link the infinitesimal generator of a Levy process to

his characteristic function:

Proposition B.3.3. Let (Xt)t≥0 be a Levy process on Rd with characteristic triplet

(A, ν, γ). Then the infinitesimal generator of (Xt)t≥0 is defined for any f ∈ C20(R) as

Df(x) = ∇xf · γ +1

2∇2xf : A+

∫Rn

[f(x+ z)− f(x)−∇xf(x) · z1|z|≤1

]dν(z) (B.3.2)

where f ∈ C20(Rd) is the set of twice continuously differentiable functions, vanishing at

infinity.

Since we are dealing with finite activity process, then we don’t need the jump limiter

1|z|≤1 and we can neglect the last term in the integral. In a Levy process, the characteristic

are independent of t and ω, but the results can be extend to a more general class of process,

considering Levy process as building blocks (cf. [36], [35] and reference therein). Roughly

speaking, it is possible to think of a more general process, with characteristics (b, c, F )

resembles locally after t a Levy process with triplet (b, c, F )(ω, t). The generator of such

process has the same form, and the parameters depends on ω and t.

Theorem B.3.4 (Ito formula for jump-diffusion process). Let (Xt)t≥0 be a diffusion

process with jumps, defined, in integral notation,

Xt = X0 +

∫ t

0

µsds+

∫ t

0

σsdWs +Nt∑i=1

∆Xi

where bt and σt are continuous adapted processes with

E[∫ T

0

σsds

]<∞

Then, for any f : [0, T ] × R → R which is once differentiable in the first variabile and

twice in the second one, the process Yt = f(t,Xt), t ≥ 0 can be represented as:

f(t,Xt) = f(0, X0) +∫ t

0

[∂f∂s

(s,Xs) + ∂f∂s

(s,Xs)bs]ds+ 1

2

∫ t0σ2s∂2f∂x2 (s,Xs)ds+∫ t

0∂f∂x

(s,Xs)σsWs +∑

i≥1,Ti≤t[f(XTi− + ∆XTi)− f(XTi−)]

Of course this result holds also in the n−dimensional case, substituting derivatives

and product with appropriate gradients and inner products.

Appendix C

Risk-neutral Valuation

For an intuitive introduction to the concept of risk neutrality we refer to [34] and [48].

For the theory of continuous time finance we refer to [5]. In the following we will present

the main results for the pricing of financial derivatives.

C.1 The market

We assume that we have a continuous-time security market where investor are allowed

to trade continuously up to some fixed finite planning horizon T . Uncertainty in the

financial market is modelled by a probability space (Ω,F ,P), equipped with the filtration

(Ft)t≥0 which satisfies the usual conditions. Let (rt)t≥0 be a short rate process, and we

define B(t) as the riskless money market account, where we assume B(0) = 1 and

B(t) = eR t0 rsds

We’ll take B(t) as our numeraire, a french term to indicate an item or commodity acting

as a measure of value or as a standard for currency exchange. A mathematic definition is

Definition C.1 (Numeraire). A numeraire is a price process X = (Xt)t≥0 strictly positive

a.s. for each t ∈ [0, T ]

Another basilar concept is the one of equivalent martingale measure

Definition C.2 (Equivalent martingale measure). A probability measure defined on

(Ω,F ) is a equivalent martingale measure if:

(i) Q is equivalent to P, i.e. Q(A) = 0 if and only if P[A] = 0, for all A ∈ F .

(ii) For every dividend-free traded asset with price process P (t), the discounted price

process P (t)B(t)

is a martingale under Q.

102

C.2 Risk-neutral Valuation 103

The market is said to be arbitrage free if there is no arbitrage opportunities. With ar-

bitrage we call an investment with any initial contribution that produce, with probability

different from zero, a positive payoff.

But it can be shown that the existence of an equivalent martingale measure implies

that the market is arbitrage free. That strengthen the intuition that a martingale is a

“fair” game:

Theorem C.1.1 (arbitrage-free market, [5], Theorem 6.1.1.). If an equivalent martingale

measure exists then the market model contains no arbitrage opportunities.

C.2 Risk-neutral Valuation

If we want to determine the fair price of a financial instruments we must ensure that

the discounted price process of the asset is a martingale (recall that a martingale can be

seen as a fair game). This approach is known as risk-neutral pricing.

Theorem C.2.1 (Risk-neutral valuation formula, [49], Theorem 4.7). Let X be an (FT )−measurable random variable that is bounded from below, and let Q be a martingale measure

on the market for the underlying assets. An arbitrage-free price of a contingent claim

paying X at T > t is

St = E[B(t)

B(T )X |Ft

]where the expectation is taken under the pricing measure Q.

Under sufficient technical conditions (market completeness) this price can be taken to

be unique.

Appendix D

Numerical Codes

Here we report, deeply commented, all the C++ code written for this thesis. Why

C++? We have chosen C++ because:

• is easy to write, but yet powerful, flexible and fast.

• there is a lot of source code available in C++, and a big literature about numerical

calculus written in C++, like [45].

• is widely employed in the financial sector by practitioners, e.g. the MoSesTMFinancial

Modeling Software is basically a platform written in C++.

The codes are based on the algorithms explained in Chapter 4, and the only depen-

dency is from the library Template Numerical Toolkit for C++ (TNT)1.

This is a linear algebra library developed by National Institute of Standards and

Technology (NIST), and allows a simple usage of linear algebra data structure. The library

was adopted to avoid the “reinventing the squared wheel” issue: if something was already

well implemented, why don’t use it?

There was some problem with the code, some bugs hat to be fixed, and we added some

operation between tensors that the autor of the library didn’t yet implemented.

Last but not least, we decided to adopt a structured style of programming (i.e. without

objects) since we felt that the code is not complex enough to embark on this task.

All the codes are on the enclosed Cd-Rom.

1Freely available at http://math.nist.gov/tnt/

104

Ringraziamenti

Ecco la parte piu difficile, i ringraziamenti! Mi affido alla mia madrelingua perche solo

con essa riesco ad esprimere pienamente quello che penso e provo.

Voglio ringraziare per prima cosa la mia famiglia: Fulvio, Gina e Mario. Senza di loro

non sarei dove sono adesso e Monaco, che prima per me era solo la folkloristica citta dei

Brez’n e dei tedeschi cicciotti ubriachi, non sarebbe mai diventata prima sogno, e poi

realta di vita e di lavoro. Grazie di tutto ancora una volta.

Ora passiamo agli amici.

Luca “Cube” Burini: direttore della filiale londinese della IMS-Electronics, avido Kite-

sufer, purgatore di pollastre francesi e non, possano i tuoi reni perdonarti per l’abuso di

proteine. E ricorda: chi di tacchino ferisce..

Claudio “Skw” Palandra: in origine il businessman dietro al piano aziendale della IMS

electronics, oggi consulente della KMPG Italia. Che tu rubi i soldi alle vecchiette a Verona,

o che tu lecchi i francobolli a Roma, sarai sempre il piu sola. Oddio, che io abbia violato

qualche norma sulla privacy ?!?!?!? Aspetto ancora un vero “Last Kebab Standing”...

Michele “Dottor Morte\Migga da Nigga” Schirru: persona nobile di cuore e di sen-

timenti puri (sebbene studi avanzati non ne abbiano rilevati), ha deciso di dedicare 12

anni della sua vita per poter dedicare cio che rimane della stessa agli altri. Questo almeno

finche la sua sua orda di zombi assassini macrocefali portatori di orchite non invadera

la terra.. Amico da sempre, sono certo che farai grandi cose! E che non saro mai tuo

paziente..

Sirio “Sarei il piu matto se..” Valent: braccio armato (di penna) di una loggia deviata

di Confindustria, cerca di convincerci che la ricerca scientifica italiana abbia come fine

ultimo la clonazione del nostro “caro leader”. Ed essendo cosı assurdo, e sicuramente vero

e nessuno ci crede! Che tu possa realizzare una piccola parte dei tuoi sogni o per lo meno

non morire schiacciato sotto il loro peso.

Marco “Pikachu” Petralia: In un solo nome cosı tante cose: esteta decadente, medico

approssimativo, amateur bizzarre, vichingo burlone, amante degli animali (sic!) e delle

105

Ringraziamenti 106

giovani pulzelle con gli abiti impregnati degli odori della nostra gioiosa campagna laziale:

caciotte, fieno, concime.. Cerca di diventare maggiorenne di cervello prima che io vada in

pensione!

Emanuele “Ghiottolino” Mattei: sebbene il soprannome possa far pensare ad un orsac-

chiotto, lui e meglio! Meno peli, mangia di meno senza, tutta via, intaccare la morbidezza

al tatto, e se gli schiacci il pancino parla anche giapponese! Cosa vorreste di piu dalla

vita? Coraggio Lele, la svolta e in agguato per tutti! Cosı come un senegalese armato di

mazza..

Matteo “CoffeeMan” Benvenuti: Analizza bene quanto scritto in queste pagine, voglio

il tuo parere professionale.. Non mi dire cose che gia so, tipo:

1. Che sono pazzo.

2. Che ho un rapporto malato col sesso.

3. Che la mamma non mi ha allattato al seno.

Stupiscimi!

Daniele “Gufo” De Carolis: che dire, se non ci fosse andrebbe inventato. Rotto. Con

materiali scadenti. E i progettisti, assunti dal governo iraniano per creare un supersoldato

islamico, dopo il fallimento verrebbero giustiziati tutti. Ti voglio bene come se fossi il

fratello che non ho mai voluto!

Dario “Were-doormat” Cardilli: di giorno distinto consulente per Accenture Technol-

ogy Solution, di sera (e anche nei weekend e nei festivi) animale da compagnia e oggetto

d’arredamento di vaga matrice Ikea (modello “Dorrmatta”). Trovare lo scopo della pro-

pria vita in una persona diversa da se stessi e furbo come chiudersi dentro la gabbia di

un leone per difendersi dai ladri: la soluzione finira per farti piu male del problema.

Ambra: la prima persona che io abbia mai visto come una compagna, grazie per avermi

dato un sogno, anche se esso e morto all’alba di un nuovo giorno. Spero che tu sappia

comportati meglio con la prossima persona che sara al tuo fianco.

Dedico questa tesi, anche ai miei amici che, sopportandomi per mesi, hanno reso

Monaco un’esperienza densa di emozioni:

Monica detta “Chicca”, Elisa, Sabine, Fiorella, Alex detto “Sacco”, Elisabetta detta“Fratello”,

Valeria, Stefano, Maria e tanti altri.

Se non fosse stato per voi, la Baviera non sarebbe di smeraldo, la Madonna non

brillerebbe dorata su Marienplatz e la Weissbier non sarebbe cosı buona.

Ringraziamenti 107

Last but not least, ringrazio Dio per avermi fatto ateo2, la caffeina, la Weissbier e

l’heavy metal in tutte le sue declinazioni.

Alza tu cerveza, brinda por la libertad [. . .] Leva in alto la tua birra, brinda alla liberta [. . .]

llegar a la meta no es vencer. giungere alla meta non e vincere.

Lo importante es el camino y en el, L’importante nel cammino (della vita) e

caer, levantarse, insistir, aprender. cadere, rialzarsi, insistere, imparare.

La Posada De Los Muertos - Mago de Oz

2Sono anni che aspetto questo momento!!!!

Bibliography

[1] P. Artzner and F. Delbaen. Default risk and incomplete insurance

markets. Mathematical Finance, 5:187–195, 1995.

[2] D. Bates. Jumps and stochastic volatility: the exchange rate processes

implicit in deutschemark options. Review of Financial Studies, 9:69–

107, 1996.

[3] D. Bates. Maximum likelihood estimation of latent affine processes.

Review of Financial Studies, 19(3):909–965, 2006.

[4] Messod D. Beneish and Eric G. Press. Interrelation among events of

default. Contemporary Accounting Research, 12:57–84, 1995.

[5] N. Bingham and R. Kiesel. Risk-Neutral Valuation : Pricing and Hedg-

ing of Financial Derivatives. Springer Verlag, 2004.

[6] F. Black and J. Cox. Valuing corporate securities: Liabilities : Some

effectsof bond indenture provisions. Journal of Finance, 31:351–367,

1976.

[7] P. Bremaud. Point Processes and Queues: Martingale Dynamics.

Springer Verlag, 1981.

[8] Wolfgang Buhler and Monika Trapp. Credit and liquidity risk in bond

and cds markets. Working Paper, University of Mannheim, 2005.

[9] J. C. Butcher. Numerical Methods for Ordinary Differential Equation.

John Wiley and Sons, 2003.

108

BIBLIOGRAPHY 109

[10] J. R. Cash and A. H. Karp. A variable order runge-kutta method

for initial value problems with rapidly varying right-hand sides. ACM

Transactions on Mathematical Software, 16:201–222, 1990.

[11] Li Chen and Damir Filipovic. Credit derivatives in an affine frame-

work. Asia-Pacific Financial Markets, 13:123–140, 2007.

[12] Patrick Cheridito, Damir Filipovic, and Robert L. Kimmel. A note

on the dai–singleton canonical representation of affine term structure

models. forthcoming in Mathematical Finance.

[13] K. Chung. Lectures from Markov Process to Brownian Motion.


[14] Rama Cont and Peter Tankov. Financial modelling with Jump Pro-

cesses. Chapman & Hall - CRC Press, 2003.

[15] J. Cox, J.E. Ingersoll, and S.A. Ross. A theory of the term structure

of interest rates. Econometrica, 53:385–402, 1985.

[16] Quiang Dai and Kenneth J. Singleton. Specification analysis of affine

term structure models. The Journal of Finance, 55(5):1943–1978,

2000.

[17] C. Dellacherie and P.-A. Meyer. Probabilities and Potential. North

Holland, 1978.

[18] J. R. Dormand and P. J. Prince. A family of embedded runge-kutta for-

mulae. Journal of Computational and Applied Mathematics, 6(1):19–

26, 1980.

[19] J.-C. Duan and J.G. Simonato. Estimating and testing exponential

affine term structure models by kalman filter. Review of Quantitative

Finance and Accounting, 13(2):111–135, 1999.

BIBLIOGRAPHY 110

[20] Darrel Duffie and Rui Kan. A yield-factor model for interest rate.

Mathematical Finance, 6(4):379–406, 1996.

[21] Darrell Duffie. Credit risk modeling with affine processes. Technical

report, Stanford University and Scuola Normale Superiore, Pisa, 2004.

[22] Darrell Duffie, David Filipovic, and Walter Schachermayer. Affine

process and applications in finance. Annals of Applied Probability,

13(3):984–1053, 2003.

[23] Darrell Duffie, Jun Pan, and Kenneth Singleton. Transform analysis

and asset pricing for affine jump-diffusion. Econometrica, 68(6):1343–

1376, 2000.

[24] Abel Elizalde. Credit risk models ii: Structural models. CEMFI Work-

ing Paper No. 0606, September 2003.

[25] S. Ethier and T. Kurtz. Markov Processes, Characterization and Con-

vergence. John Wiley and Sons, 1986.

[26] Erwin Fehlberg. Low-order classical runge-kutta formulas with step

size control and their application to some heat transfer problems.

NASA Technical Report, 315, 1969.

[27] Damir Filipovic. Time-inhomogeneous affine processes. Stochastic

Processes and Their Applications, 115:639–659, 2005.

[28] Damir Filipovic. Term Structure Models, an introduction. Springer

Verlag, 2008.

[29] Gianni Gilardi. Analisi due. McGraw-Hill, 1996.

[30] Gene H. Golub and Charles F. Van loan. Matrix computation. Johns

Hopkins University Press, 1996.

[31] Jan Grandell. Doubly stochastic Poisson processes, volume 529 of Lec-

ture Notes in Mathematics. Springer Verlag, 1976.

BIBLIOGRAPHY 111

[32] A. Harvey, E. Ruiz, and N. Shepard. Multivariate stochastic variance

models. Review of Economic Studies, 61(2):247–264, 1994.

[33] Stephanie Hofling. Credit risk modeling and valuation: The reduced

form approach and copula models. Master’s thesis, Technische Uni-

versitat Munchen, 2006.

[34] John C. Hull. Options, futures and other derivatives, 6th edition. Pren-

tice Hall, 2006.

[35] J. Jacod and A.N. Shirayev. Limit Theorems for Stochastic Processes.


[36] Jan Kallsen. A didactic note on affine volatility models. In From

Stochastic calcolus to Mathematical Finance, pages 343–368. Springer

Verlag, 2006.

[37] A.N. Kolmogorov and S.V. Fomin. Measure, Lebesgue Integrals, and

Hilbert Space. Academic Press, 1961.

[38] J. Lambert. Numerical Methods for Ordinary Differential Systems.

John Wiley and Sons, 1991.

[39] R. Merton. On the pricing of corporate debt: The risk structure of

interest rate. Journal of Finance, 29:449–470, 1974.

[40] H. Mori and H. Takahasi. Quadrature formulas obtained by variable

transformation. Numerische Mathematik, 21:206–219, 1973.

[41] M. Mori. Quadrature formulas obtained by variable transformation

and de rule. Journal of Computational and Applied Mathematics, 12-

13:119–130, 1985.

[42] Masatake Mori. Discovery of the double exponential transformation

and its developments. Publications of the Research Institute for Math-

ematical Sciences, 41:897–935, 2005.

BIBLIOGRAPHY 112

[43] Bernt Oksendal. Stochastic Differential Equations. Springer Verlag,

1991.

[44] C.D. Pagani and S.Salsa. Analisi Matematica, Volume 2. Masson,

2004.

[45] William H. Press, Saul A. Teukolsky, William T. Vetterling, and

Brian P. Flannery. Numerical Recipes: The Art of Scientific Com-

puting, 3th edition. Cambridge University Press, 2007.

[46] Philip E. Protter. Stochastic Integration and Differential Equations.


[47] Alfio Quarteroni, Riccardo Sacco, and Fausto Saleri. Numerical Math-

ematics. Springer Verlag, 2000.

[48] Walter Schachermayer. The notion of arbitrage and free lunch in math-

ematical finance. In Aspects of Mathematical Finance, pages 15–22.


[49] P. J. Schonbucher. Credit Derivatives Pricing Models. John Wiley and

Sons, 2003.

[50] Kenneth J. Singleton. Estimation of affine asset pricing models us-

ing the empirical characteristic function. Journal of Econometrics,

102:111–141, 2001.

[51] J. Stoer and R. Burlish. Introduction to Numerical Analisys. Springer

Verlag, 1991.

[52] An Introduction to Continuum Mechanics. Morton Gurtin. Academic

Press, 1982.

[53] Oldrich Vasicek. An equilibrium characterization of the term struc-

ture. Journal of Financial Economics, 5:177–188, November 1977.

master’s thesis - mediatum.ub.tum.de · master’s thesis a ne models in credit risk student...

Documents