estimating doubly stochastic poisson process with affine intensities by kalman filter

Stat PapersDOI 10.1007/s00362-014-0606-6

REGULAR ARTICLE

Estimating doubly stochastic Poisson process with affineintensities by Kalman filter

Alan De Genaro · Adilson Simonis

Received: 17 December 2012 / Revised: 25 May 2014© Springer-Verlag Berlin Heidelberg 2014

Abstract This paper proposes a Kalman filter formulation for parameter estimationof doubly stochastic Poisson processes (DSPP) with stochastic affine intensities. Toachieve this aim, an analytical expression for the probability distribution functions ofthe corresponding DSPP for any intensity from the class of affine diffusions is obtained.More detailed results are provided for one- and two-factor Feller and Ornstein–Uhlenbeck diffusions. A Monte Carlo study indicates that the proposed method isa reliable procedure for moderate sample sizes. An empirical analysis of one- andtwo-factor Feller and Ornstein–Uhlenbeck models is carried out using high frequencytransaction data.

Keywords Doubly stochastic Poisson process · Affine diffusion · Kalman filter ·Order book

Mathematics Subject Classification 62M99 · 62P05

1 Introduction

Doubly stochastic Poisson processes (DSPPs) were introduced by Cox (1955) byallowing the intensity of the Poisson process to be described by a positive random

An earlier version of this paper circulated under the title doubly stochastic Poisson processes with affineintensities. The results of this paper are part of the first author’s Ph.D Thesis completed under supervisionof the second author.

A. De Genaro (B)Department of Economics, University of Sao Paulo and Securities, Commodities and Futures Exchange,BM&FBOVESPA, São Paulo, Brazile-mail: [email protected]

A. SimonisInstitute of Mathematics and Statistics, University of Sao Paulo, São Paulo, Brazil

123

A. De Genaro, A. Simonis

process and not just a deterministic function. (In recognition of the value of paper(Cox 1955), one also uses the term Cox processes.) The aim of such a constructionwas to allow the dynamics of a random mechanism that is exogenous to the modelto influence transitions in the point process we are concerned with. Point processeshave applications in several areas of research, among which we mention biostatistics,finance, insurance and reliability theory. In biostatistics they form a theoretical frame-work for analyzing recurrent events, as was done in Gail et al. (1980), in studies of thesize of tumors in rats over a period of time. In finance, Lando (1998) was the pioneerin using point processes for describing occurrences of credit events. Point processes,in turn, are used in insurance for modeling claims arrivals, either assuming a con-stant intensity model as in Seal (1983) or a general stochastic intensity as outlined inGrandell (1991). A recent application of Cox process in insurance can be found inDassios and Jang (2012). Finally, in reliability theory, the work by Dalal and McIntosh(1994) has developed criteria for determining an optimal stopping time for testing andvalidating a software.

For the interested reader, the books by Grandell (1976) and Snyder and Miller(1991) are classical references on DSPP and provide an in-depth coverage of themain properties of DSPPs in terms of standard constructions of Probability Theory.Alternatively, in the books of Brémaud (1972) and Daley and Vere-Jones (1988) thepresentation is based on the concept and properties of martingales, while Kallenberg(1986) presents the material through the notation of random measure. However, allthese sources focus on deriving general properties of DSPPs, without exploring thefunctional form of the intensity of the process. Consequently, they attracted a reducednumber of applications.

The study of DSPPs took a new turn once the functional form for the intensityof the process had been specified. As a result, it became possible to obtain analyt-ical expressions for probability density functions for various types of processes. Inthis context we can quote (Bouzas et al. 2002), who made use of a truncated normaldistribution to describe the intensity of the process and Bouzas et al. (2006), whogeneralized the form of intensity to include the case of a harmonic oscillator. Thecontribution of these works was that closed-form expressions for density functions ofthe DSPP had been obtained, as well as their moments. However, both papers usedconstructions in which the restriction on non-negativity for the intensity measure wasnot maintained. In order to get around this limitation a region in the parameter spacewas defined, where the probability of occurrence of negative values for the intensityis reduced. With the intention of guaranteeing the preservation of the non-negativitycondition, Basu and Dassios (2002) and Kozachenko and Pogorilyak (2008) sug-gested the adoption of a lognormal model for the intensity process. A formulation thatincorporates the intensity into a dynamic formulation had been developed in Dassiosand Jang (2008) and Dassios and Jang (2003), who used the functional form of ashot-noise process, thus guaranteeing non-negativity of intensity. On the other hand,Wei et al. (2002) assumed that the intensity is governed by a one-dimensional Fellerprocess and obtained a form of probability density function for the correspondingDSPP.

The use of point processes in finance, especially of DSPPs, progressed considerablyat the end of the 1990s, with the development of models of managing and pricing

123

Estimating doubly stochastic Poisson process

Fig. 1 Average number of sell order per minute during October 2009 BRL/USD Futures contract atBM&FBOVESPA–Brazil (ticker DOL FUT—expiry NOV09)

the credit risk. In particular, Duffie and Singleton (1999) and Duffie et al. (2003)formalized the construction of the probability density of the first jump in the process.More precisely, the time to the first jump represents, in the case of credit risk, thetime until the bankruptcy (default) of a company (and/or a country). Here, once theabsorbing state had been reached, it was unnecessary to study the further dynamics ofthe DSPP.

Recently a new area of applications of point processes in finance has emerged,along with the use of these models for describing the arrival process of bid and askorders in an electronic trading environment. In these models, the arrival process oforders changes over time; the idea is to characterize the dynamics of this process andobtain expressions that may be treated analytically, describing the probability that anorder had been sent in line with a market configuration and was executed before theprice was altered (Cont et al. 2010).

Perhaps the most known point process is a homogeneous Poisson process. For thisprocess the arrival rate is constant. However, in many applications the assumption of aconstant arrival rate is not realistic. Indeed, in financial data we tend to observe burstsof trading activities followed by lulls. This feature becomes apparent when lookingat the series of actual orders arrivals. Figure 1 presents the average number of sellorders for BRL/USD FX futures submitted to the Brazilian Exchange, BVMF. Theplot indicates a non-constant behavior for the average number of submitted orders; thehomogeneous Poisson model is clearly not suitable for such data.

In fact, the number of bids and asks getting into the order book may depend upon anumber of factors external to the model. For example, it may be the level of investors

123


risk aversion on a given day, intra-day seasonality, a disclosure of a piece of informa-tion, or a new technology producing an impact on a certain sector of the economy. Forthis reason, in our opinion, the construction of a model that can be treated analyticallyand incorporates endogenously a stochastic behavior of the intensity of a point processrepresents a contribution to the literature.

In this situation it seems reasonable to focus on a particular class of models andattempt to use the approach based on point processes in order to study their dynamicsover time. For this purpose we selected the affine diffusions, as formalized by Duffieand Kan (1996). Affine processes can be viewed as multi-dimensional generalizationsof Ornstein–Uhlenbeck (O–U) processes and continuous-branching processes, suchas Feller (1951) diffusion model of population-size fluctuations. In the 1970s and1980s, Gaussian O–U processes and Feller diffusions were popular foundations formodels of the term structure of interest rates due to Vasicek (1977) and Cox et al.(1985).

In light of the above topics, the main aim of this paper is to propose a new techniquefor estimating DSPP parameters with stochastic affine intensity. Our approach is basedon the Kalman filter methodology and exploits a relationship between the DSPP andthe state variables for the purpose of the subsequent estimation of the collection ofparameters of an affine diffusion. The virtue of this approach relies on using the Kalmanfilter for identifying the underlying state variables which control the DSPP dynamics.Once the unobserved component has been filtered, the model’s parameters can beestimated using maximum likelihood estimation. To achieve our main aim, we needfirst to find the expression of the probability distribution function (pdf) for the DSPPwhen the intensity is given by an arbitrary affine diffusions. In this way, the modelsfrom Wei et al. (2002), Basu and Dassios (2002) and Kozachenko and Pogorilyak(2008) can be seen as particular cases of the model proposed in this paper. Finally,we applied our methodology for estimating parameters of one- and two-factor CIRand O–U processes using high frequency transactional data from FX futures contractstraded at BVMF, Brazil.

The structure of the remainder of the paper is as follows. In Sect. 2 we outlinethe basic structure required to properly work with DSPP and affine diffusions. Next,we present a closed-form expression for the probability density function of DSPPwith general affine intensity. Inference of DSPP with affine intensities by means of aMaximum Likelihood estimator conjugated with Kalman filter is provided in Sect. 4.In Sect. 5 we present estimations results of DSPP models with affine intensity appliedto high-frequency transactional data. Section 6, presents the results of simulationexperiments illustrating the performance of the proposed estimator. Finally, Sect. 7presents the final remarks. Various technical details are included in “Appendices 1–3”.

2 The basic structure

For a general definition of a point process on R+ the reader is referred to Brémaud(1972), Grandell (1976), Snyder and Miller (1991), Kallenberg (1986) and Daley andVere-Jones (1988). The concept of the DSPP is introduced in the following definition:

123


Definition 1 Let λ(t), t ≥ 0, be a random process with non-negative values and set

�t,t ′ =t ′∫

t

λ(u)du, for 0 ≤ t < t ′.

The doubly stochastic Poisson process (DSPP) with intensity process λ(t) is a pointprocess Nt , t ≥ 0, satisfying the following property. For every T > 0, points 0 =t0 < t1 < · · · < tl = T and values k1, . . . , kl in {0, 1, . . . , }:

P(Nt j − Nt j−1 = k j , 1 ≥ j ≥ K ) = E

⎡⎣e−�0,t

l∏j=1

1

k j !�k jt j−1,t j

⎤⎦ (1)

An important example of an intensity process emerges in relation to Markovprocesses on R+ = [0,∞); in this case an important task is to connect probabili-ties (1) with local parameters of the underlying Markov process. For example, if λ(t)is related to a diffusion then it would be interesting to either obtain a closed formfor probabilities (1) or provide realistic algorithms of their numerical simulation, interms of the drift and diffusion coefficient of the underlying process. In this paper weconsider a particular form of the intensity process.

Definition 2 In this paper, an intensity process λ(t), t ≥ 0, is defined by:

λ(t) = λ(X(t)) = ρ0 +d∑

i=1

ρi Xi (t). (2)

Here ρ0 ∈ R and ρ = (ρ1, . . . , ρd) ∈ Rd are constants (scalar and vector, respec-

tively), and X(t) = (X1(t), . . . , Xd(t)), t ≥ 0, is an underlying random process(called a state process) which is supposed to be a d-dimensional diffusion. Of course,we want to maintain the property that λ(t) ≥ 0 for every t ≥ 0. The integral

�0,t =t∫

0

λ(u)du

determines the process of the cumulative intensity, or the hazard process, for theintensity process λ(t).

The hazard process plays an important role in the martingale approach to both creditrisk and asset pricing in finance because it is the compensator of the associated doublystochastic Poisson process (Bielecki and Rutkowski 2002).

We now proceed with specifying the class of state processes X(t) under consider-ation.

123


Definition 3 A state process X(t) satisfying a stochastic differential equation of theform:

dX(t) = μ(X(t))dt + �(X(t))dW(t), t > 0. (3)

is an affine diffusion if μ and � satisfy:

1. Affine drift

μ(X(t)) = K0 + K1X(t), X(t) = (X1(t), . . . , Xd(t)) ∈ Rd (4)

for some K0 ∈ Rd and K1 ∈ R

d×d .

2. Diffusion coefficient

�(X(t)) = A√

V (X(t)), X(t) = (X1(t), . . . , Xd(t)) ∈ Rd (5)

where A and V (X(t)) are d × d matrices. Moreover, A is a constant matrixand V (X(t)) is a diagonal matrix-function, with the i i-th element of the form vi =vi (X(t)). Here a (scalar) function vi is given by:

vi = ai + bi · x, x ∈ Rd , i = 1, . . . , d, (6)

for some ai ∈ R and bi ∈ Rd .

According to Duffie and Kan (1996) the coefficients v1, . . . , vd in (6) generatestochastic volatility. The open domain D ⊂ R

d corresponding to nonnegative valuesfor the vi ’s is defined as follows:

D ={

x ∈ Rd : ai + bi · x ≥ 0, 1 ≤ i ≤ d

}. (7)

It was proved in Duffie and Kan (1996) that if, under Condition A below, the initialpoint X(0) ∈ D then the solution to affine diffusion exists and is unique, for everyt > 0; moreover, the process X(t) stays in D.

Condition A (Duffie and Kan 1996)

1. For every i and X(t) ∈ Rd such that vi (x) = 0, the inequality

bTi (K0 + K1X(t)) >

1

2bi AATbT

i

holds true.2. For every i, j = 1, . . . , d, if (bi A) j �= 0 then

vi ≡ kv j

for some constant k > 0.

123


An additional set of constrains must be imposed to guarantee the non-negativitycondition for λ(t), t ≥ 0. The open domain C indicating positive intensities is definedby:

C ={

x ∈ Rd : λ(x) > 0

}. (8)

In view of the above result, a natural way to guarantee that condition λ(t) ≥ 0 isfulfilled is to tie it with (7), i.e. to require that C ⊆ D. In other words, if we choosepositive α1, . . ., αd for (2) and set

ρ0 =d∑

j=1

α j a j , ρi =d∑

j=1

α j b(i)j (9)

then (8) will be satisfied, since the intensity will be given by

λ(t) =d∑

i=1

αivi (X(t)) (10)

with vi (X(t)) ≥ 0. There are also other means to secure the property λ(t) > 0.The Laplace transform of a stochastic process is a key ingredient to achieve our

results. The first key result in Duffie et al. (2010) is that an extended form of the Laplaceand Fourier transforms of X(t) and of certain related random variables is known inclosed form up to the resolution of an ODE. However, the Laplace transform for anintegral of a stochastic process may be obtained in a closed form only for a limitedclass of processes.

A convenient result was established in Albanese and Lawi (2004), giving a criteriaof when a particular analytic form of the Laplace transform can be ascertained. Moreprecisely, Alabanese–Lawi analyze the following expectation:

F(x, t) = E

⎡⎣ f (X (T )) exp

⎛⎝−μ

T∫

t

φ(X (u))du

⎞⎠∣∣∣∣Gt

⎤⎦ . (11)

Here 0 < t ≤ T , and f, φ : R �→ R two Borel functions.When the Laplace Transform exist,1 the expected value in Eq. (11) is often referred

to as a Feynman–Kac integral. The following general result is well-known (Karatzasand Shreve 1991):

Result 1 (Feynman–Kac) Let X(t), t ≥ 0, be a diffusion obeying SDE (3). Assumethat function F ∈ C2,1 and function φ ∈ C1 is bounded. Given T > 0 and μ ∈ R,consider the function

1 In “Appendix 3” we outline, based on Albanese and Lawi (2004), the class of diffusions whose LaplaceTransform exists.

123


F(x, t) = E

⎡⎣ f (X(T )) exp

⎛⎝−μ

T∫

t

φ(X(u))du

⎞⎠∣∣∣∣X(t) = x

⎤⎦ , 0 < t < T . (12)

Then F(x, t) yields a unique solution to the partial differential equation (PDE) witha terminal condition:

{∂ F

∂t(x, t) = AF(x, t) − φ(X(t))F(x, t), 0 < t < T,

F(x, T ) = f (x),(13)

with its infinitesimal generator given by:

AF(x, t) = ∂ F

∂t(x, t) + ∂ F

∂x(x, t)μ(x, t)T + 1

2tr

[�(x, t)�(x, t)T ∂2 F(x, t)

∂x∂x

](14)

In the case of an affine diffusion, Duffie et al. (2010), Duffie and Kan (1996) andDuffie et al. (2003) produced a particularly simple form of function F(x, t):

Result 2 (Duffie et al. 2003) Take function φ of the same form as in (2):

φ(x) = ρ0 +d∑

i=1

ρi xi , x = (x1, . . . , xd) ∈ Rd .

Set f (x) := 1. Suppose that X(t), t ≥ 0, is an affine diffusion [cf. (4), (5)], satisfyingsome regularity conditions.

Then the function F(x, t) from (12) gets the form:

F(x, t) = eα(t)+β(t)xT(15)

where the coefficients α(t) ∈ R and β(t) = (β1(t), . . . , βd(t)) ∈ Rd are deterministic

and satisfy

d

dtβ(t) = ρ − K1

Tβ(t) − 1

2

d∑l=1

bi(β(t)A

)2i , (16)

d

dtα(t) = ρ0 − K0

Tβ(t) − 1

2

d∑i=1

ai (β(t)A)2i (17)

with vectors a = (a1, . . . , ad) ∈ R and bi = (b(l)1 , . . . , b(l)

d ) ∈ Rd , 1 ≤ i ≤ d, as in

(6) and boundary condition α(0) = 0 and β(0) = 0.

Proof The proof is given in the “Appendix 1”. �

123


In some cases, explicit solutions for α and β are known. A general analyticalframework for explicit solutions is provided by Grasselli and Tebaldi (2008). Onecan alternatively solve the ODE numerically, for example by a Runge–Kutta method.An explicit fourth-order Runge–Kutta method is often used in financial applications.For cases in which the ODE is stiff2 an implicit Runge–Kutta method may be moreeffective.

As we will show later, results 1 and 2 will be necessary to prove the Theorem 1below.

3 On the distribution of a DSPP

For a non-homogeneous Poisson processes Nt with arbitrary stochastic intensity λ(t),the probability of k occurrences within the interval [t, T ] is given by:

P(NT − Nt = k) = 1

k!E⎡⎢⎣⎛⎝

T∫

t

λ(u)du

⎞⎠

k

exp

⎛⎝−

T∫

t

λ(u)du

⎞⎠⎤⎥⎦ (18)

for k = 0,1,2,…Thus it is possible to state:

Theorem 1 The conditional probabilities for the DSPP with an affine diffusion-drivenintensity λ(Xt ), as in (2)–(3) and (4)–(5), are given by:

P(NT−Nt =k|λ(Xt )=λ(xt ))= 1

k!∂k

∂μkexp

(α(T − t;μ)+β(T − t;μ)λ(xt )

T) ∣∣∣∣

μ=1(19)

Here the coefficients α(t) ∈ R and β(t) = (β1(t), . . . , βd(t)) ∈ Rd are deterministic

and satisfy:

d

dtβ(t) = ρ − K1

Tβ(t) − 1

2

d∑l=1

bi(β(t)A

)2i , (20)

d

dtα(t) = ρ0 − K0

Tβ(t) − 1

2

d∑i=1

ai (β(t)A)2i (21)

with vectors a = (a1, . . . , ad) ∈ R and bi = (b(l)1 , . . . , b(l)

d ) ∈ Rd , 1 ≤ i ≤ d, as in

(6) and boundary condition α(0) = 0 and β(0) = 0.

2 A stiff equation is a differential equation for which certain numerical methods for solving the equation arenumerically unstable, unless the step size is taken to be extremely small. It has proven difficult to formulatea precise definition of stiffness, but the main idea is that the equation includes some terms that can lead torapid variation in the solution.

123


Proof Lets solve (18) first for the case k = 0

P(NT − Nt = 0) = E

⎡⎣exp

⎛⎝−μ

T∫

t

λ(X(u))du

⎞⎠⎤⎦ (22)

According to our assumption λ(X(t)) is an affine diffusion, thus setting f (X(T )) := 1into Eq. (13) we can solve the RHS of (22) using Result 2:

E

⎡⎣exp

⎛⎝−μ

T∫

t

λ(X(u))du

⎞⎠∣∣∣∣λ(Xt )=λ(x)

⎤⎦=exp

(α(T −t;μ)+β(T − t;μ)λ(x)T) ∣∣∣∣

μ=1

(23)

and coefficients α(t) ∈ R and β(t) = (β1(t), . . . , βd(t)) ∈ Rd are deterministic and

satisfy equations (17) and (18). The exact form for these coefficients depend on whatelement from the affine diffusion family we choose to describe the intensity λ(X(t)).

Next, for k > 1 we need to calculate:

E

⎡⎢⎣⎛⎝

T∫

t

λ(X(u))du

⎞⎠

k

exp

⎛⎝−

T∫

t

λ(X(u))du

⎞⎠⎤⎥⎦ (24)

Observe that the LHS of Eq. (23) can be identified as the Moment GeneratingFunction of the Hazard Process �t,T := ∫ T

t λ(X(u))du. Thus, as in Bouzas et al.(2002) if G�t,T (μ) is the MGF of �t,T conditional on λ(X(t)), then:

E

(�k

t,T e−μ�t,T)

= 1

k!∂k

∂μkG�t,T (μ) (25)

= 1

k!∂k

∂μkexp

(α(T − t;μ) + β(T − t;μ)λ(x)T

) ∣∣∣∣μ=1

(26)

for every k > 0 and α(t) ∈ R and β(t) = (β1(t), . . . , βd(t)) ∈ Rd remain as described

above. �Based on Theorem 1 we can obtain more detailed results when assuming two well-

known multifactor affine intensities:

Corollary 1 (Multifactor feller intensity) Assume that the intensity λ(X(t)) is givenby:

λ(X(t)) =d∑

i=1

Xi (t) (27)

where each Xi (t) is an independent Feller diffusion satisfying:

dXi (t) = κi (θi − Xi (t))dt + σi

√Xi (t)dWi (t). (28)

123


Then the probabilities for the corresponding DSPP are given by:

P(NT −Nt = k|λ(X(t))) = 1

k!∂k

∂μkexp

( d∑i=1

αi (T −t;μ)+βi (T −t;μ)Xi (t)

)∣∣∣∣μ=1(29)

where

αi (T − t;μ) = 2κiθi

σ 2i

ln

(2γi (μ)

(e(γi (μ)+κi )/2

)(γi (μ) + κi )

(e−γi (μ)(T −t) − 1

)+ 2γi (μ)

), (30)

and

βi (T − t;μ) =[

2μ(e−γi (μ)(T −t) − 1

)(γi (μ) + κi )

(e−γi (μ)(T −t) − 1

)+ 2γi (μ)

], (31)

with

γi (μ) =√

κ2i + 2μσ 2

i (32)


Corollary 2 (Multifactor Ornstein–Uhlenbeck intensity) Assume that the intensityλ(X(t)) is given by:

λ(X(t)) =d∑

i=1

Xi (t) (33)

where each Xi (t) is an independent Ornstein–Uhlenbeck diffusion satisfying:

dXi (t) = κi (θi − Xi (t))dt + σi dWi (t). (34)

Then the probabilities for the corresponding DSPP are given by:

P(NT − Nt = k|λ(X(t)) = 1

k!∂k

∂μkexp

( d∑i=1

αi (T − t;μ)+βi (T − t;μ)Xi (t)

)∣∣∣∣μ=1

(35)where

αi (T − t;μ) ={(

θi − μσ 2i

2κ2i

)[βi (T − t;μ) − μ(T − t)] − σ 2

i

4κiβi (T − t;μ)2

},

(36)and

βi (T − t;μ) = μ − μe−κi (T −t)

κi, (37)


123


The Ornstein–Uhlenbeck (OU) process is perhaps the simplest example of an affinediffusion. Observe that while the multifactor OU model is sufficiently flexible to incor-porate covariance between the state variables (Xi ), we elected to force independence.The consequent reduction in the parameter space is helpful in ensuring the identifica-tion of model parameters and reducing the complexity of our optimization algorithm.Also due to its own nature, the Ornstein–Uhlenbeck process lives on the whole line R

which violate the non-negativity condition for λ(X(t)). A simple way to secure non-negativity is to impose an additional set of constraints as stated in (8). Although intheory the intensity could be negative, such probability is almost negligible in practicewith estimated parameters.

4 Inference of doubly stochastic Poisson process with affine intensity

Preceding sections summarized some theoretical material used to analyze a DSSPwhose intensity is generated from an affine diffusion. An affine diffusion is determinedby a collection of parameters, and an important issue is how to estimate them from anobservable data. Unfortunately, the existing mathematical literature gives no straight-forward recommendations on this issue; we must, therefore, turn to the econometricliterature to handle this important topic. Since the seminal paper by Engle and Russell(1998), modeling of point processes in finance is an ongoing topic in the area of finan-cial econometrics. Financial point processes are associated with a random arrival ofspecific events, such as transactions, quote updates, limit orders or price changes whichare observable from financial high-frequency data. But while Engle and Russell (1998)and Engle (2000) investigate the trade duration (i.e. interarrival time between transac-tions), we advocate in favor of analyzing the point process directly and of modeling thisprocess dynamically. Hence, instead of looking at the duration in time between obser-vations, we investigate the dual of this problem, that is, its associated counting process.

Although the literature on the parametric estimation of point processes (includ-ing financial point process as well) is as large as the theoretical literature, there isyet no consensus as to what the best approach could be. The majority of statisticalstudies of DSPP in the literature mainly focus on constructing/analyzing parametricmodels. In addition to those works already mentioned in this paper’s introduction, weacknowledge Bouzas et al. (2010), where the stochastic intensity of the process forcounting emitted particles is estimated by functional principal components analysisand confidence bands are provided for two radioactive isotopes, and Minozzo andCentanni (2012), where a class of DSPP in which the intensity is driven by a shot-noise and its parameters are estimated by mean of an importance sampling algorithm,as a recent contribution to the literature. Although effective for studying the stochas-tic dynamics of interest when they are correctly specified, parametric models are notalways applicable for data analysis. Recently, Zhang and Kou (2010) suggested a non-parametric method for inference of Cox Process with applications on biophysics andphysical chemistry. In particular, the authors develop a kernel based estimators forthe arrival rate λ(t) and its autocorrelation function (ACF). Since the choice of thebandwidth h affects the performance of the kernel estimator, the authors determinethe optimal h that gives the smallest mean square error (MSE):

123


Table 1 Empirical ACF fororders arrivals

Empirical ACF

Lag 5 Lag 10 Lag 15

0.42 0.40 0.30

hopt =⎡⎣ E(λ0)

C ′(0+)γ f

b∫

−b

f 2(r)dr

⎤⎦

1/2

, (38)

Where the constant γ f is strictly negative as long as f is a density function.The term C ′(t) denote the first derivative of the Autocorrelation Function for the

arrival rate λ(t): C(t) = cov(λ(0), λ(t)). ACF is of interest because it directly mea-sures the strength of dependence and reveals the internal structure of the system.However C(t) is unobservable and must be replaced by its empirical estimator, C(t),which is proved by the authors as a consistent estimator of C(t) since it decays rea-sonably fast. This assumption embeds the intuitive idea that as time-points move faraway from each other, their dependence should eventually vanish.

While short-range dependent arrival rate can be a reasonable hypothesis for somebiophysical applications, this desirable pattern is not present in financial data as wecan observe from the ACF for orders arrivals in Table 1.

We observe from the Table 1 that order arrivals display a positive, significant andslowly decaying autocorrelation function. These characteristics violate the conditionsrequired for obtaining a consistent estimator of C(t) and consequently for hopt . Asobserved by Zhang and Kou (2010) the performance of the kernel method for estimat-ing λ(t) depends crucially on the choice of the bandwidth and therefore we pursue adifferent approach.

4.1 Kalman filtering of DSPP

We propose a new technique which handles properly the dependency empirically foundin transactional data. Our approach is based on the Kalman filter methodology andexploits a relationship between the DSPP and the state variables for the purpose ofthe subsequent estimation of the collection of parameters of an affine diffusion. Thestrength of this approach is that the Kalman filter acts as an identifier of the underlying(and unobserved) state variables controlling the DSPP dynamics. Once the unobservedcomponent is filtered, the maximum likelihood estimator will be able to determine themodel parameters.3 For particulars of the Kalman filter methodology the reader isreferred to Hamilton (1994) and Bolder (2001) for a financial application.

In order to estimate parameters and extract the unobservable state variables, werestrict (29) to P(NT − Nt = 0|λ(X(t))), the conditional probability of no arrivals

3 The estimation procedure and the Kalman filter algorithm were implemented in this work in accordancewith Bolder (2001)

123


within the time interval (t, T ), given the value of intensity4 at time t . Without anyloss of generality, we proceed from here on assuming the intensity of the DSPP isrepresented by an arbitrary affine intensity as a way to obtain a testable expression forour estimation procedure. Here,

P(NT − Nt = 0|λ(t)) = e∑d

i=1 αi (T −t)−βi (T −t)Xi (t) (39)

where αi (T − t) and βi (T − t) are deterministic and satisfy equations (20) and (21)with μ = 1. The exact form for these coefficients depend on what element from theaffine diffusion family we choose to describe the intensity λ(X(t)). Passing to thelogarithms, we obtain the measurement equation:

ln P(NT − Nt = 0|λ(t)) =d∑

i=1

αi (T − t) − βi (T − t)Xi (t) (40)

It is convenient to introduce a grid of time points 0, τ, 2τ, . . . where τ is a chosenspacing. Then, with t = sτ and T = (s + 1)τ , s = 0, 1, 2, . . ., we set:

zs := ln P(N(s+1)τ − Nsτ = 0|λ(sτ)) =d∑

i=1

(αi (τ ) − βi (τ )Xi (sτ)) + ξs, (41)

with IID, normally distributed error-terms:

ξs ∼ N (0, ζ ), s = 1, 2, . . . .

The inclusion of an error term in (41) is motivated by the fact that the underly-ing intensity process chosen may be inadequate. Indeed, under an incorrect intensityfunction (39) will be functionally misspecified, and estimates of λ(t) will contain asystematic error. In this case the probability of no arrivals within the interval (t, T )

implied by the affine processes will systematically deviate from observed arrivals.Consequently, in a correctly specified model the errors ξs should be serially and cross-sectionally uncorrelated and with mean zero.

Essentially, Kalman filter is a recursive algorithm. It begins with an educated guessas to the initial values for the state variables and a measure of the certainty of thisguess; in our case, we use the unconditional mean and variance of our state variables.The Kalman filter technique then proceeds to use these initial state variable values toinfer the value of the measurement equation. Using the observed value, we can thenupdate our inferences about the current value of the transition system. These updatedvalues are then used to predict the subsequent value of the state variables. We thenrepeat the process for the next time period. In this manner, we recurse through theentire data sample and construct a time series for our unobserved state variables.

4 We have described the intensity by λ(X(t)) as a way to make explicit the role played by the state variableX (t), now we simplify this cumbersome notation to λ(t).

123


Next, following Harvey (1989), we can write down the log-likelihood function:

log L(�) = −d S

2log 2π − 1

2

S∑s=1

log |Qs | − 1

2

S∑s=1

zTs Q−1

s zs . (42)

Here S is the sample size, d is the number of state variables, zs represents themeasurement-system prediction error:

zs = zs − zs (43)

where zs is the observable variable and zs as defined in (41).Further, as usual in Kalman Filter application Qs represents the prediction error-

variance:Qs = Var(λ(s)|λ(s − 1)) (44)

Maximization of the likelihood function (42) is performed using standard numericaloptimization and gradients of the likelihood are computed numerically.

For a Gaussian disturbance model, the Kalman filter provides an optimal solutionto prediction, updating and evaluating the likelihood function. On the other hand, aspointed by Hamilton (1994) and Duan and Simonato (1999) even if the disturbanceare non-Gaussian we can still use Kalman filter to calculate linear projection of λ

on past observation and the resulting filter is quasi-optimal. According to Bollerslevand Wooldridge (1992), the use of this quasi-optimal filter yields an approximatequasi-likelihood function with which parameter estimation can be carried out and theestimates obtained would be consistent and asymptotically normal.

4.2 Known issues and limitations

It is known that the exact transition density p(τ, x, y) = P(λ((s +1)τ ) ∈ dy|λ(sτ) =x)/

dy for a Feller intensity process is a convolution of a collection of non-central χ2-densities. As pointed out by Dyrting (2004), although the non-central χ2-distributionsis a well known special function it is sometimes difficult to evaluate accurately andefficiently. In part this is due to its multiple arguments. Most special functions haveone or two, but the non-central χ2-distribution has three: the number of degrees offreedom, the non-centrality, and the distribution’s boundary. As a result fast methodssuch as polynomial or rational function approximation are not very useful. Anotherdifficulty in evaluating this function is due to the large range of the non-centralityparameter. One of the key parameters in the Feller process is the time step. This timeparameter is in turn related to the non-centrality parameter such that for large times thenon-centrality approaches zero and for small times it approaches infinity. The mostwidely used method for evaluating the non-central χ2-distribution is by its gammaseries representation.

Dyrting (2004) also shows that there are parameter values where analytic andasymptotic methods, have an accuracy that is unacceptable by today’s standards, andwhere the series representation is either inefficient or inaccurate.

123


As a way to overcome numerical issues related to evaluating χ2 and simultaneouslyto meet the requirements for estimating parameters with maximum likelihood conju-gated with Kalman Filter we adopted an analytic approximation for the non-centralχ2-distribution due to Sankaran (1978):5

√χ2

ν (αλ(s); ν, ξ)a∼ N (ξ + [(ν − 1)/2], 1) (45)

where:

α ≡ 4κ

σ 2(1 − e−κ(s−t)), ν ≡ 4κθ

σ 2 , ξ ≡ αλt e−κ(s−t), t < s (46)

This approximation, which is quite accurate when the difference between s and ttends to zero, allows us to write the likelihood function using Normal distribution.

Therefore, the estimation of the unobservable state variables λ(sτ), s = 1, 2, . . .

by means of Kalman filter, in combination with a quasi-maximum-likelihood (QML)estimation of the model parameters � = {θ, κ, σ }, can be carried out by replacingthe exact transition density by an analytical approximation as proposed by Sankaran(1978).

Finally, the normal distribution (either approximate or exact) proposed here has apotential drawback because λ(sτ) may become negative which is inconsistent with therequired positivity of λ. This issue can be tackled by means of different approaches.Here we follow Geye and Pichler (1999) and modify the standard Kalman filter bysimply replacing any negative element of the state estimate λ(sτ) with zero. Therefore,in general, the Kalman filter is not strictly a linear estimator for the state variables,but it is linear for λ(sτ) > 0. Another way to get around the non-negativity constraintis to skip the updating step whenever an element of λ(sτ) becomes negative and setλ(sτ) = λ(τ(s − 1)).

We examine in Sect. 6 the seriousness of these potential bias by performing a MonteCarlo analysis of the approximate maximum likelihood estimator.

5 The empirical analysis

5.1 Data description

In this section, we apply our estimation technique to some transaction data fromthe Brazilian Exchange (BM&FBOVESPA).6 The sample is formed by all submit-ted sell orders for BRL/USD FX futures contract.7 traded in BM&FBOVESPA dur-ing two days in October 2008, with the expiry date of November 1, 2008. The

5 A comparison among different approximation to the non-central chi-square can be found at Johnson andKotz (1970).6 BM&FBOVESPA is the fourth largest exchange in the word in terms of market capitalization. Thisexchange has a vertically integrated business model with a trade platform and clearing for equities, deriva-tives and cash market for currency, government and private bonds.7 Ticker: FUT DOLX08.

123


BM&FBOVESPA FX futures contract is one of the most liquid FX contracts in emerg-ing markets worldwide, and the average volume of 300,000 traded daily is significanteven for developed markets.

The BM&FBOVESPA electronic trade system (PUMA) uses the concept of a limitorder book, matching orders by price/time priority. A lower ask price takes precedenceover higher ask prices, and a higher bid price takes precedence over lower bid prices.If there are more than one bid or ask at same price, the earlier bids and asks takeprecedence over later ones. The offers are recorded in milliseconds; this allows thesystem to use a high precision for the determination of precedence criteria. We alsoobserve that no two consecutive orders arrive in the order book in an intervals smallerthan 10 ms, probably due to the internal network latency.

While the probabilities of non-arrival are not themselves directly observable, wecan use the empirical frequency of order arrivals as a proxy. For a given day, weconstructed the empirical frequency by partitioning the trading section into 1-minintervals, totaling 540 1-min intervals per day. (It means that τ had been chosen toequal 1 min and S = 540.) For a given 1-minute interval, considering the 10 msnetwork latency, it would be possible, at least theoretically, for maximum 6,000sell orders to arrive in the order book at this time interval. Thus, for each 1-mininterval the empirical frequency is calculated as the ratio of submitted orders to6,000.

Therefore, the observable variable is constructed as:8

zs = 1

6,000#(Sell orders arrived within the s-th

1-min interval), s = 1, . . . , 540.

(47)

Remark 1 The normalization by 6,000 is not unquestionable, and our final numericalresults to a certain degree are dependent on the choice of the normalizing denominator.

5.2 Estimation results

In this section we apply our estimation technique for estimating the parameters of aDSPP for one- and two-factor O–U and Feller intensities applied to high-frequencydata. As aforementioned, on a given day we have a total of 540 observations, represent-ing the empirical frequency of sell orders arrival for BRL/USD FX futures contract.

To illustrate our findings we randomly selected one day of October to exhibit theresults of our estimation. For reasons of space we do not report the results for theparameter estimates for all days along our sample although they are available uponrequest.

The numerical optimization routine used in this study is the trust-region-reflectivealgorithm.9 The convergence criteria, based on the maximum absolute differencein both parameter and functional values between two successive iterations, is set

8 Here # stands for the number of arrivals within the given time interval.9 Definition and properties of the trust-region-reflective algorithm can be found in Byrd (1987).

123


Table 2 Estimated parameters of the DSPP with affine intensity applied for BRL/USD Futures sell ordersdata (ticker DOL FUT—maturity NOV08)—October 23, 2008

Feller O–U

1 factor 2 factors 1 factor 2 factors

κ1 0.23 0.239 0.286 0.505

(1.91E−2) (1.07–8) (1.41E−6) (4.04E−8)

θ1 0.061 0.07 0.059 0.04

(3.445E−6) (5.885E−7) (3.74E−6) (3.31E−7)

σ1 0.187 0.164 0.031 0.008

(1.39E−5 ) (2.24E−9) (2.25E−5) (2.61E−7)

ζ 0.005 0.005 0.005 0.004

(3.67E−8 ) (1.26E−9 ) (3.16E−8) (2.57E−7)

κ2 – 1.211 – 2.036

(1.83E−8) (2.90E−8)

θ2 – 0.0112 – 0.028

(2.93E−10) (4.27E−9)

σ2 – 0.18 – 0.282

(2.03E−7) (5.26E−9)

log L(�) −1.81E + 3 −5.38E + 6 −1.81E + 3 −1.54E + 6

LR – 2.97E + 3 – 8.51E + 2

LB(15) 16.27 17.81 16.67 18.55

LM(15) 27.23 29.56 27.34 29.76

to 0.0001. For each model examined below, the approximate Kalman filter recur-sion is initialized with the stationary mean and variance of the unobserved state-variable(s).

Next, in order to prevent getting trapped at a local minimum point, we have trieddifferent sets of starting values for the model. Therefore initial values for the parameters� = {θ, κ, σ } are based on 1,000 random samples of each parameter drawn from areasonable range. We have not come across an instant, in which two different startingvalues led to different estimates.

The estimation results for each model are reported in the columns of Table 2. Thistable contains parameter estimates along with their corresponding standard errors inparentheses. As the Table 2 shows, we obtain quite significant parameter estimates(at the 1 % level) for both one- and two-factor models and the log-likelihood valueincreases after the inclusion of a second factor in both cases.

However, rather than focusing on the many individual parameter estimates forO–U and Feller models, we examine likelihood ratio tests for the significance ofthese extra parameters. In this case by testing we compare the unrestricted model(two-factor) to those with the restrictions (one-factor). Specifically, either for theFeller or O–U, we test whether the inclusion of a second factor improves themodel performance. Therefore we perform a joint test having as null hypothesis:κ2 = θ2 = σ2 = 0. At this point we are just testing whether the inclusion of a

123


second factor improves the model performance rather than which model (Feller orO–U) best fits our data. These likelihood ratio tests are shown at bottom of Table 2 asLR. For both Feller and O–U intensities, the likelihood ratio (LR) statistic rejects thenull hypotheses that the additional factors are not jointly significant at the 1.0 % level(Table 2).

In addition, it is worth to mention that the estimated standard deviation for themeasurement error ζ is smaller than the diffusion parameter σ in both models. Thisfact indicates that the measuring process has not overshadowed the state variabledynamic during our estimation procedure.

In order to assess the models fitted, we conducted a diagnostic checking for possiblemisspecifications based on the standardized residuals. If a model is well specified, itis able to capture all systematic patterns that were present in the original series andno significant autocorrelation is left to residuals. Two commonly used indicators forthat are the Ljung–Box (LB) (Ljung and Box 1978) and Lagrange Multiplier (LM)(Breusch 1979; Godfrey 1978) tests. In short, both Ljung–Box and Lagrange Multipliertests are designed, based on different assumptions, to test the hypothesis that residualsare serially uncorrelated. On the bottom on Table 2 we have values of the LB(15) andLM(15) autocorrelation tests for one- and two-factor Feller and O–U models. Thesestatistics provide, at a high significance level, elements to confirm that for all modelsthe null hypothesis of no autocorrelation is not rejected at all 15 lags.

We observe that both two-factor Feller and two-factor O–U models exhibit superiorperformance over their one-factor counterparty according to the LR test. Even thoughthe log-likelihood of two-factor Feller model is almost 3 times greater than the two-factor O–U it is not possible to make a straight comparison between them to select themost appropriate model because these two competing models are non-nested.10 Thereexist a wide variety of criteria available in the model selection literature for selectingthe model that best fits the data. Apart from a few exceptions, the Kullback–LeiblerInformation Criteria (KLIC) has played a vital role in the development of a number ofnon-nested test statistics. For instance, Akaike (1973) suggested comparing the valuesof the log-likelihood penalized by the number of parameters and selecting the modelthat yields the highest value, without taking into consideration whether the differencesin the penalized log-likelihoods are statistically significant or not. This latter point isaddressed by Vuong (1989), whose approach sets the model selection criteria in ahypothesis testing framework. More specifically, it tests whether the models underconsideration are equally close to the true model, where proximity (closeness) ismeasured by the KLIC. The null hypothesis states that both models are equivalent (i.e.equally close to the truth in the KLIC sense) against the alternative that H f is betterthan Hg or Hg is better than H f . Therefore, we are interested in comparing pairs ofcompeting models which are defined as:

H f = { f (y|x, θ); θ ∈ �}

10 Generally speaking, two models, say H f and Hg , are said to be non-nested if it is not possible to deriveH f (or Hg) from the other model either by means of exact set of parametric restrictions or as a result of alimiting process.

123


9 AM 10 AM 11 AM 1 PM 2 PM 3 PM 4 PM 5 PM

0.99

0.992

0.994

0.996

0.998

1

Data two-factor Feller two-factor OU

Fig. 2 Fitting performance of two-factor models. The y-axis represents P(Nt = 0| t ∈ 60 s) and the x-axisrepresents the trading hours

Hg = {g(y|x, δ); δ ∈ �}

Vuong (1989) proposed for strictly non-nested models the following statistics underH0:

L RN (θN , δN )√N ωN

D→ N (0, 1) (48)

where N is the sample size and θN and δN are the maximum likelihood estimates.Additionally, the numerator is the difference in the summed log-likelihoods for thetwo models, L RN (θN , δN ) = log L(θN ) − log L(δN ), and ωN is a normalizing scalarequal to the mean of the squares of the pointwise log-likelihood ratios:

�i = logf (yi |xi , θN )

g(yi |xi , δN ), i = 1, . . . N (49)

Vuong (1989) test has been used for comparison of two-factor Feller and two-factor O–U models. We have defined H f for the two-factor Feller process while Hg

for two-factor O–U process. The statistics defined in (48) is 5.393 which based onthe normality distribution of the statistics, we can claim that two-factor Feller modelshould be preferred to model two-factor O–U.

From the Fig. 2, it is possible to compare the fitting performance of two-factorFeller and two-factor OU to actual data. Visually the two-factor Feller process seemsto fit quite well the actual order flow while the two-factor OU model tend to provide

123


poorer fitting results. It is also possible to see the model’s flexibility in reacting todifferent changes in the numbers of orders submitted.

6 Simulation results

In this section, we analyze the performance of our estimation algorithm for estimatingone- and two-factor Feller processes. To this end, we simulate various DSPP outcomesby using a collection of known parameters and proceed to estimate the unknownparameters of the model. This simulation exercise is intended to indicate how effectivethis technique is in terms of identifying parameters.

Assume that the intensity λ(X(t)) is given by:

λ(X(t)) =d∑

i=1

Xi (t) (50)

where each Xi (t) is an independent Feller diffusion satisfying:

dXi (t) = κi (θi − Xi (t))dt + σi

√Xi (t)dWi (t). (51)

Based on results of Feller (1951), Cox et al. (1985) noted that the distribution ofXi (t) given Xi (u) for some u < t is, up to a scale factor, a non-central chi-squaredistribution. Thus, for the Feller intensity the transition law of Xi (t) can be expressedas:

Xi (t) = σ 2(1 − e−κ(t−u))

4κχ

′2d

(4κe−κ(t−u) Xi (u)

σ 2(1 − e−κ(t−u))

)t > u (52)

where

d = 4κθ

σ 2 (53)

Therefore, we can simulate the process (50) exactly on a discrete time grid providedwe can sample from the non-central chi square distribution. As pointed out by Dyrting(2004) sampling from a non-central chi square can be performed by different meth-ods where the trade-off between accuracy of the method and its efficiency should beproperly addressed. We adopted the gamma series representation. This approach takesadvantage of the well-known recurrence relations for both the Poisson weights andthe incomplete gamma functions which allows replacing an infinite series represen-tation by one with only a few algebraic operations (Johnson and Kotz 1970; Dyrting2004). Additionally, the measurement errors ξs have been simulated as normal randomvariables with zero means.

The simulation of a sample path for the DSPP with 540 elements, followed byan application of the estimation algorithm has been repeated 500 times. The tablesummarizes the results of a simulation exercise for the DSPP. The Table 3 reports thetrue values, the mean estimate over the 500 simulations and the associated standarddeviation of the estimates.

123


Table 3 This table summarizes the results of the simulation exercise for the one- and two-factor fellermodels

One-factor Two-factor

Actual value Mean estimate Standard deviation Actual value Mean estimate Standard deviation

θ1 0.04 0.033 4.8E−04 0.04 0.05 1.1E−01

κ1 0.2 0.208 0.032 0.2 0.203 0.021

σ 21 0.017 0.0165 8.2E−04 0.017 0.016 2.1E−04

θ2 – – – 1.2 1.15 1.3E−02

κ2 – – – 0.02 0.023 2.6E−01

σ 22 – – – 0.062 0.065 5.1E−03

We found that, the estimation errors in the Kalman filter appear to be small in theMonte Carlo simulations for the one- and two-factor Feller models, and the biases inthe κ and θ parameters could be nothing more than the familiar finite sample biasesfound in autoregressive models. When compared, for example, to Chen and Scott(2003) and Duan and Simonato (1999) who applied the Kalman filter in conjunctionwith the QML to estimate parameters of affine term structure models, our findingshave been very similar for describing properties of the finite sample distributions ofthe estimated parameters.

Finally, as a way of assessing the possible bias in the use of approximations todescribe the non-central chi square distribution we altered the previous configurationin two directions. Firstly by substituting the approximation proposed by Sankaran(1978) (Eq. 45) for the normal distribution adjusted so that the first two moments ofthe normal distribution are equal to the moments of the Feller process, as suggested byGeye and Pichler (1999). The results of the simulation exercise were shown to be verysimilar to those presented in the Table 3. A possible explanation results from the verysmall time increments used to evaluate the transition density in both cases (Sankaran1978; Geye and Pichler 1999), which supports that the Sankaran (1978) approximationis, in fact, very close to the normal distribution. The closeness of the approximationproposed by Sankaran (1978) to the normal distribution supports the implementationof the Kalman filter adopted and, therefore, takes advantage of its known optimalfiltering performance for parameter estimation under the normal distribution.

The second modification was to alter the way we sample elements of Eq. (52),substituting the result of the gamma series representation for the Bessel method. Onceagain the results were equal to those in the above table up to the fifth decimal place.These results are in line with Dyrting (2004), who, by comparing the simulation per-formance of the Eq. (52) using three different methods, Gamma series representation,Bessel function series representation and Penev and Raykov’s analytic approximation,found equal values up to the fifth decimal place for a wide range of parameters.

Based on the above results, we conclude that the estimation process proposed doesnot depend on the choice adopted to approximate the non-central chi square distributionand proves itself to be a robust model for estimating the parameters of the Cox processwith affine intensity.

123


7 Final remarks

We have presented a computationally tractable method for estimating DSPP para-meters with stochastic affine intensity. Our approach is based on the Kalman filtermethodology and exploits a relationship between the DSPP and the state variables forthe purpose of the subsequent estimation of a collection of parameters of the affinediffusion.

Making use of the theoretical framework developed for modeling the interest rateterm structure allows us to obtain expressions for probabilities in a DSPP when theintensity process belongs to a family of affine diffusions. More detailed results areprovided for one- and two-factor Feller and Ornstein–Uhlenbeck diffusions.

To illustrate the virtues of this approach, parameters of DSPP with one- and two-factor Feller and O–U intensities are estimated using high frequency transaction datafrom FX futures contracts traded in Brazil. In this way, we have estimated a total offour different specifications, one- and two-factor Feller and O–U models. In general,we have found highly significant coefficients associated to both Feller and O–U inten-sities. After testing restricted and unrestricted nested models, we end up with twocompeting non-nested specifications, two-factor Feller and O–U models. Therefore,we performed a Vuong (1989) test, whose approach sets the model selection criteriawithin a hypothesis testing framework, concluding that the two-factor Feller modelprovided the best fit to our data.

To examine the statistical properties of the proposed method, a Monte Carlo studyis carried out. The simulation study suggest that the method is reasonably reliable forestimating parameters of one- and two-factors Feller intensities.

In a future publication, we will compare this framework with other existing methodsfor estimating DSPP parameters.

Therefore, the main thrust of this work is a new methodology that has been provedto provide reasonably accurate results. Consequently, this methodology may be rec-ommended in a number of situations where a doubly stochastic point-type dynamic ispresent.

Acknowledgments Alan De Genaro would like to thank Marco Avellaneda, Jorge Zubelli, Cristiano Fer-nandes, Julio Stern, Peter Carr, Cris Rogers, Jean Pierre Fouque and seminar participants at NYU-Courant,IMPA, FEA/USP, SUNY—Stony Brook for helpful comments. We also thank the three reviewers for theirthorough review and highly appreciate the comments and suggestions, which significantly contributed toimproving the quality of this paper. A special thank is due to Yuri Suhov for his invaluable suggestions.

Appendix 1: Proof of Result 2

To convey the spirit of proofs used for this kind of results, we give here a shortdemonstration. Substituting the form of processes X(t), t ≥ 0, and λ(t), t ≥ 0, as in(3) and (2), we obtain:

0 = −(ρ0 + ρ · X(t))F(x, t) + ∂ F(x, t)

∂t

+∂ F(x t)

∂x(K0 + K1 · X(t)) + 1

2

∑i, j

∂2 F(x, t)

∂xi∂x j(ai j + bi j · X(t)).

(54)

123


Inserting F(x, t) = eα(t)+β(t)·x into the PDE above and grouping the terms in x:

u(·)x + v(·) = 0

Where

u(·) = −β �(t) + ρ1 − K1�β(t) − 1

2β(t)�bβ(t) (55)

v(·) = α�(t) + ρ0 − K0β(t) − 1

2β(t)�aβ(t) (56)

Use the separation of variable technique to obtain that α and β satisfy a Ricatti equationwith boundary condition α(0) = 0 and β(0) = 0. �

Appendix 2: Proof of Corollary 1 and 2

To obtain in a closed-form the PDF of a DSPP with a given affine intensity, we needfrom Theorem 1 to solve:

{0 = −β �(t) + ρ1 − K1

�β(t) − 12β(t)�bβ(t)

0 = α�(t) + ρ0 − K0β(t) − 12β(t)�aβ(t)

(57)

The exact solution for each case can be obtained after replacing the appropriateparametrization:

1. Feller intensity K0 = κθ , K1 = −κ , a = 0 and b = σ 2

2. O–U intensity K0 = κθ , K1 = −κ , a = σ 2 and b = 0

on (57) and solving a Ricatti equation for α and β with boundary condition α(0) = 0e β(0) = 0.

As the multidimensional case is merely a sum of decoupled one-dimensional solu-tions, its derivation is identical to described above. �

Appendix 3: Conditions to existence of Laplace transform forT∫

tλ(t)dt

Without going into detail of the Albanese–Lawi result, we describe below the class of(scalar) processes introduced in Albanese and Lawi (2004). It consists of diffusionsX (t), t ≥ 0, solving the following SDE:

dX (t) = 2h′(X (t))

h(X (t))

A(X (t))2

R(X (t))dt +

√2A(X (t))√R(X (t))

dW (t). (58)

Here A(x), R(x) and h(x) are second-order polynomials and in addition:

1. A(x) belongs to the set {1, x, x(1 − x), x2 + 1} and R(X (t)) ≥ 0;

123


2. the function h(x) is a linear combination of hypergeometric functions of a confluenttype 1 F1 if A(x) ∈ {1, x} and of a Gaussian type 2 F1 if A(x) ∈ {x(1− x), x2 +1}.

A hypergeometric function in its general form may be written as

p Fq(α1, . . . , αp; γ1, . . . , γq ; z).

For p ≤ q + 1, γ j ∈ C \Z+ it can be represented by using Taylor’s expansion aroundz = 0:

p Fq(α1, . . . , αp; γ1, . . . , γq ; z) =∞∑0

(α1)n · · · (αp)n

(γ1)n · · · (γq)n

zn

n! .

As an example of application of the Albanese–Lawi, let the intensity process λ(t)follows a one-dimensional Feller diffusion. To this end, assume that the polynomialsA(x), h(x) and R(x) are defined as:

A(x) = x, R(x) = 2x

σ 2 , h(x) = xa/σ 2e− b

σ2 x (59)

Substituting the polynomial into (58) and performing the change of parameters a =κθ/σ 2 and κ/σ 2 we conclude our example. �

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Proceedingsof the 2nd international symposium on information theory, pp 267–281

Albanese C, Lawi S (2004) Laplace transforms for integrals of markov processes. Markov Process RelFields 11:677–724

Basu S, Dassios A (2002) A cox process with log-normal intensity. Insur Math Econ 31:297–302Bielecki TR, Rutkowski M (2002) Credit risk: modeling, valuation and hedging. Springer, BerlinBolder DJ (2001) Yield curve modelling at the bank of Canada. Bank of Canada working paper 2001–2015Bollerslev T, Wooldridge JM (1992) Quasi-maximum likelihood estimation and inference in dynamic

models with time-varying covariances. Econ Rev 11:143–172Bouzas PR, Valderrama MJ, Aguilera AM (2002) Forecasting a class of doubly Poisson processes. Stat Pap

43:507–523Bouzas PR, Valderrama MJ, Aguilera AM (2006) On the characteristic functional of a doubly stochastic

poisson process: application to a narrow-band process. Appl Math Model 30:1021–1032Bouzas PR, Ruiz-Fuentes N, Mantilla A, Valderrama MJ, Aguilera AM (2010) A Cox model for radioactive

counting measure: inference on the intensity process. Chemometr Intell Lab 103:116–121Brémaud P (1972) Point processes and queues: martingale dynamics. Springer, New YorkBreusch TS (1979) Testing for autocorrelation in dynamic linear models. Aust Econ Pap 17:334–355Byrd RH, Schnabel RB, Schultz GA (1987) A trust region algorithm for nonlinearly constrained optimiza-

tion. SIAM J Numer Anal 24:1152–1170Chen R-R, Scott L (2003) Multi-factor Cox–Ingersoll–Ross models of the term structure: estimates and

tests from a Kalman filter model. J Real Estate Financ Econ 27:143–172Cont R, Stoikov S, Talreja R (2010) A stochastic model for order book dynamics. Oper Res 10(3):549–563Cox DR (1955) Some statistical methods connected with series of events. J R Stat Soc B 17:129–164Cox J, Ingersoll J, Ross S (1985) A theory of the term structure of interest rates. Econometrica 53:385–408Dalal S, McIntosh A (1994) When to stop testing for large software systems with changing code. IEEE

Trans Softw Eng 20:318–323Daley DJ, Vere-Jones D (1988) An introduction to theory of point processes. Springer, New York

123


De Genaro A (2011) Cox processes with affine intensity. PhD. Thesis, Institute of Mathematics and Statistics-IME USP, Sao Paulo

Dassios A, Jang J (2003) Pricing of castrophe reinsurance and derivatives using the Cox process with shotnoise intensity. Financ Stoch 7:73–95

Dassios A, Jang J (2008) The distribution of the interval between events of a cox process with shot noiseintensity. J Appl Math Stoch Anal 2008:1–14

Dassios A, Jang J (2012) A double shot-noise process and its application in insurance. J Math Syst Sci2:82–93

Duan J, Simonato J (1999) Estimating and testing exponential-affine term structure models by kalman filter.Rev Quant Financ Acc 13:111–135

Duffie D, Pan J, Singleton K (2010) Transform analysis and asset pricing for affine jump-diffusions. Econo-metrica 68(6):1343–1376

Duffie D, Filipovic D, Schachermayer W (2003) Affine processes and applications in finance. Ann ApplProbab 13:984–1053

Duffie D, Kan R (1996) A yield-factor model of interest rates. Math Financ 6:379–406Duffie D, Singleton K (1999) Modeling term structures defautable bonds. Rev Financ Stud 12:687–720Dyrting S (2004) Evaluating the noncentral chi-square distribution for the Cox–Ingersoll–Ross process.

Comput Econ 24:35–50Engle R, Russell J (1998) Autoregressive conditional duration: a new model for irregularly spaced transac-

tion data. Econometrica 66:1127–1162Engle R, Russell J (2000) The econometrics of ultra-high-frequency data. Econometrica 68–1:1–22Feller W (1951) Two singular diffusion problems. Ann Math 54:173–182Gail M, Santner T, Brown C (1980) An analysis of comparative carcinogenesis experiments based on

multiple times to tumor. Biometrics 36:255–266Geye A, Pichler S (1999) A state-space approach to estimate and test multifactor Cox–Ingersoll–Ross

models of the term structure of interest rates. J Financ Res 22:107–130Godfrey LG (1978) Testing against general autoregressive and moving average error models when the

regressors include lagged dependent variables. Econometrica 46:1293–1302Grandell J (1976) Doubly stochastic process, 1st edn. Springer, New YorkGrandell J (1991) Aspects of risk theory. Springer, New YorkGrasselli M, Tebaldi C (2008) Solvable affine term structure models. Math Financ 18:135–153Hamilton J (1994) Time series analysis. Princeton University Press, PrincetonHarvey A (1989) Forecasting, structural time series models and the Kalman Filter. Cambridge University

Press, CambridgeJohnson N, Kotz S (1970) Distributions in statistics: continuous univariate distributions, vol 2. Wiley, New

YorkKallenberg O (1986) Random measures, 4th edn. Academic Press, LondonKaratzas I, Shreve S (1991) Brownian motion and stochastic calculus, 2nd edn. Springer, New YorkKarlin S, Taylor H (1981) A second course in stochastic process. Academic Press, New YorkKozachenko YuV, Pogorilyak OO (2008) A method of modelling log Gaussian Cox process. Theory Probab

Math Stat 77:91–105Lando D (1998) On cox processes and credit risky securities. Rev Deriv Res 2:99–120Ljung GM, Box G (1978) On a measure of lack of fit in time series models. Biometrika 62–2:297–303Minozzo M, Centanni S (2012) Monte Carlo likelihood inference for marked doubly stochastic Poisson

processes with intensity driven by marked point processes. Working Paper Series, Dept. Economics,University of Verona

Seal H (1983) The Poisson process: its failure in risk theory. Insur Math Econ 2–4:287–288. London: CroomHelm, 1979

Sankaran M (1963) Approximations to the non-central chi-square distribution. Biometrika 50:199–204Snyder D, Miller M (1991) Random point processes in time and space, 2nd edn. Springer, New YorkVasicek O (1977) An equilibrium characterization of the term structure. J Financ Econ 5:177–188Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypothesis. Econometrica

57:307–333Wei G, Clifford P, Feng J (2002) Population death sequences and cox processes driven by interacting Feller

diffusions. J Phys A Math Gen 35:9–31Zhang T, Kou S (2010) Nonparametric inference of doubly stochastic Poisson process data via kernel

method. Ann Appl Stat 4:1913–1941

123

estimating doubly stochastic poisson process with affine intensities by kalman filter

Documents