estimation of the score vector and observed information matrix in intractable models
TRANSCRIPT
![Page 1: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/1.jpg)
Estimation of the score vector and observedinformation matrix in intractable models
Arnaud Doucet (University of Oxford)Pierre E. Jacob (University of Oxford)
Sylvain Rubenthaler (Universite Nice Sophia Antipolis)
April 15th, 2015
Pierre Jacob Derivative estimation 1/ 35
![Page 2: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/2.jpg)
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 1/ 35
![Page 3: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/3.jpg)
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 2/ 35
![Page 4: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/4.jpg)
Motivation
Derivatives of the likelihood help optimizing / sampling.
For many models they are not available.
One can resort to approximation techniques.
Pierre Jacob Derivative estimation 2/ 35
![Page 5: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/5.jpg)
Motivation
Let ℓ(θ) denote a “log-likelihood” but it could be anyfunction.
Let L(θ) = exp ℓ(θ), the “likelihood”.
Assume that we have access to estimators L(θ) of L(θ)such that
E[L(θ)] = L(θ)
andV[L(θ)] = L(θ)2v(θ)
M.
Pierre Jacob Derivative estimation 3/ 35
![Page 6: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/6.jpg)
Finite difference
First derivative:
ℓ(1)(θ⋆) = log L(θ⋆ + h) − log L(θ⋆ − h)2h
.
converges to ∇ℓ(θ⋆) when M → ∞ and h → 0.
Second derivative:
ℓ(2)(θ) = log L(θ + h) − 2 log L(θ) + log L(θ − h)h2 .
converges to ∇2ℓ(θ⋆) when M → ∞ and h → 0.
Pierre Jacob Derivative estimation 4/ 35
![Page 7: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/7.jpg)
Finite difference
Optimal rate of convergence for the first derivative:
h ∼ M −1/6 leading to MSE ∼ M −2/3.
For the second derivative:
h ∼ M −1/8 leading to MSE ∼ M −1/2.
Pierre Jacob Derivative estimation 5/ 35
![Page 8: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/8.jpg)
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 6/ 35
![Page 9: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/9.jpg)
Iterated Filtering
Given a log likelihood ℓ and a given point, consider a prior
θ ∼ N (θ⋆, τ2).
Posterior expectation when the prior variance goes to zero
First-order moments give first-order derivatives:
|τ−2 (E[θ|Y ] − θ⋆) − ∇ℓ(θ⋆)| ≤ Cτ.
Phrased simply,
posterior mean − prior meanprior variance
≈ score.
Result from Ionides, Bhadra, Atchade, King, Iterated filtering,2011.
Pierre Jacob Derivative estimation 6/ 35
![Page 10: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/10.jpg)
Stein’s lemma
Stein’s lemma states that
θ ∼ N (θ⋆, τ2)
if and only if for any function g such that E [|∇g(θ)|] < ∞,
E [(θ − θ⋆) g (θ)] = τ2E [∇g (θ)] .
If we choose the function g : θ 7→ exp ℓ (θ) /Z withZ = E [exp ℓ (θ)] and apply Stein’s lemma we obtain
1ZE [θ exp ℓ(θ)] − θ0 = τ2
ZE [∇ℓ (θ) exp (ℓ (θ))]
⇔ τ−2 (E [θ | Y ] − θ0) = E [∇ℓ (θ) | Y ] .
Notation: E[φ(θ) | Y ] := E[φ(θ) exp ℓ(θ)]/Z.
Pierre Jacob Derivative estimation 7/ 35
![Page 11: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/11.jpg)
Stein’s lemmaFor the second derivative, we consider
h : θ 7→ (θ − θ⋆) exp ℓ (θ) /Z.
Then
E[(θ − θ⋆)2 | Y
]= τ2 + τ4E
[∇2ℓ(θ) + ∇ℓ(θ)2 | Y
].
Adding and subtracting terms also yields
τ−4(V [θ | Y ] − τ2
)= E
[∇2ℓ(θ) | Y
]+{E[∇ℓ(θ)2 | Y
]− (E [∇ℓ(θ) | Y ])2
}.
. . . but what we really want is
∇ℓ(θ⋆), ∇ℓ2(θ⋆)
and notE [∇ℓ(θ) | Y ] ,E
[∇ℓ2(θ) | Y
].
Pierre Jacob Derivative estimation 8/ 35
![Page 12: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/12.jpg)
Core Idea
The prior is a normal distribution N (θ⋆, τ2).The prior moments behave like:
Eτ [φ (Θ)] = φ (θ⋆) + τ2
2∇2φ (θ⋆) + O
(τ4)
.
The posterior moments behave like:
Eτ [φ (Θ) | Y ] = φ (θ⋆)+τ2
2
(∇2φ (θ⋆) + 2∇φ (θ⋆) ∇ℓ (θ⋆)
)+O
(τ4)
.
Our arXived proof suffers from an overdose of Taylorexpansions.
Pierre Jacob Derivative estimation 9/ 35
![Page 13: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/13.jpg)
Proof: prior moments
Let φ : R → R be a four times continuously differentiablefunction. Assume that there exists a constant M < ∞ andδ > 0 such that
∣∣∣d4φ(θ)dθ4
∣∣∣ ≤ M for all θ ∈ B(θ⋆, δ).
Cut the expectation into two parts:
Eτ [φ (Θ)] =∫
B(θ⋆,δ)φ (θ) pτ (θ) dθ +
∫Bc(θ⋆,δ)
φ (θ) pτ (θ) dθ
The second term is o(τ k) for any power k.
Pierre Jacob Derivative estimation 10/ 35
![Page 14: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/14.jpg)
Proof: prior momentsLet’s deal with the first term:∫
B(θ⋆,δ)φ (θ) pτ (dθ) .
Taylor expansion:
∀θ ∈ B(θ⋆, δ) φ(θ) = φ (θ⋆) +∑
k=1,2,3
dkφ(θ⋆)dθk
1k!
(θ − θ⋆)k + R3 (θ, θ⋆) .
The Gaussian prior integrates any (θ − θ⋆)k with odd k to zeroover B(θ⋆, δ), so∫
B(θ⋆,δ)φ(θ)pτ (dθ) = φ (θ⋆) + d2φ(θ⋆)
dθ2
∫B(θ⋆,δ)
12
(θ − θ⋆)2 pτ (dθ)
+∫
B(θ⋆,δ)R3 (θ, θ⋆) pτ (dθ) .
Pierre Jacob Derivative estimation 11/ 35
![Page 15: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/15.jpg)
Proof: prior moments
Ford2φ(θ⋆)
dθ2
∫B(θ⋆,δ)
12
(θ − θ⋆)2 pτ (dθ)
we “complete the integral” over all R and subtract an integralover Bc(θ⋆, δ) which is o(τ k) for any k. We are left with
d2φ(θ⋆)dθ2
τ2
2+ o(τ k) for any k
For ∫B(θ⋆,δ)
R3 (θ, θ⋆) pτ (dθ) .
we use the assumption to say that for all θ there is θ in B(θ⋆, δ)such that
R3(θ, θ⋆) = d4φ(θ)dθ4
(θ − θ⋆)4
4!.
Pierre Jacob Derivative estimation 12/ 35
![Page 16: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/16.jpg)
Proof: prior moments
Since d4φ(θ)dθ4 is upper bounded by some M by assumption:
|∫
B(θ⋆,δ)R3 (θ, θ⋆) pτ (dθ) |
≤ M∫
B(θ⋆,δ)
(θ − θ⋆)4
4!pτ (dθ)
≤ M∫R
(θ − θ⋆)4
4!pτ (dθ)
= τ4 × C .
Combining all the terms, we obtain
Eτ [φ (Θ)] = φ (θ⋆) + τ2
2∇2φ (θ⋆) + O
(τ4)
.
Pierre Jacob Derivative estimation 13/ 35
![Page 17: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/17.jpg)
Proof: posterior moments
We want to obtain the posterior moments:
Eτ [φ (Θ) | Y ] = φ (θ⋆)+τ2
2
(∇2φ (θ⋆) + 2∇φ (θ⋆) ∇ℓ (θ⋆)
)+O
(τ4)
.
We write:Eτ [φ (Θ) | Y ] = Eτ [φ (Θ) L (Θ)]
Eτ [L (Θ)].
Then we apply the prior moment expansion for φ × L and forL, and we simplify the ratio of two expansions.
We need to assume that the likelihood is four times continuouslydifferentiable, with bounded fourth derivatives around θ⋆.
Pierre Jacob Derivative estimation 14/ 35
![Page 18: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/18.jpg)
Main results
In general, with a prior N (θ⋆, τ2Σ), when Σ is fixed and τ goesto zero,
τ−2Σ−1 Epost [(Θ − θ⋆)] = ∇ℓ(θ⋆) + O(τ2)
,{τ−4Σ−1
(Vpost [Θ] − τ2Σ
)Σ−1
}= ∇2ℓ(θ⋆) + O
(τ2)
.
Pierre Jacob Derivative estimation 15/ 35
![Page 19: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/19.jpg)
Extension of Iterated Filtering
Posterior variance when the prior variance goes to zero
Second-order moments give second-order derivatives:
|τ−4(Cov[θ|Y ] − τ2
)− ∇2ℓ(θ⋆)| ≤ Cτ2.
Phrased simply,
posterior variance − prior varianceprior variance2 ≈ hessian.
Result from Doucet, Jacob, Rubenthaler on arXiv, 2013.
Pierre Jacob Derivative estimation 16/ 35
![Page 20: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/20.jpg)
Proximity mappingGiven a real function f and a point θ⋆, consider for any τ2 > 0
θ 7→ f (θ) exp{
− 12τ2 (θ − θ⋆)2
}
θθ0
Figure : Example for f : θ 7→ exp(−|θ|) and three values of τ2.
Pierre Jacob Derivative estimation 17/ 35
![Page 21: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/21.jpg)
Proximity mapping
Proximity mapping
The τ2-proximity mapping is defined by
proxf : θ0 7→ argmaxθ∈R f (θ) exp{
− 12τ2 (θ − θ0)2
}.
Moreau approximation
The τ2-Moreau approximation is defined by
fτ2 : θ0 7→ C supθ∈R f (θ) exp{
− 12τ2 (θ − θ0)2
}where C is a normalizing constant.
Pierre Jacob Derivative estimation 18/ 35
![Page 22: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/22.jpg)
Proximity mapping
θ
Figure : θ 7→ f (θ) and θ 7→ fτ2(θ) for three values of τ2.
Pierre Jacob Derivative estimation 18/ 35
![Page 23: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/23.jpg)
Proximity mapping
Property
Those objects are such that
proxf (θ0) − θ0
τ2 = ∇ log fτ2(θ0) −−−→τ2→0
∇ log f (θ0)
Moreau (1962), Fonctions convexes duales et points proximauxdans un espace Hilbertien.
Pereyra (2013), Proximal Markov chain Monte Carloalgorithms.
Pierre Jacob Derivative estimation 19/ 35
![Page 24: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/24.jpg)
Proximity mapping
Bayesian interpretation
If f is a seen as a likelihood function then
θ 7→ f (θ) exp{
− 12τ2 (θ − θ0)2
}is an unnormalized posterior density function based on aNormal prior with mean θ0 and variance τ2.
Henceproxf (θ0) − θ0
τ2 −−−→τ2→0
∇ log f (θ0)
can be read
posterior mode − prior modeprior variance
≈ score.
Pierre Jacob Derivative estimation 20/ 35
![Page 25: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/25.jpg)
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 21/ 35
![Page 26: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/26.jpg)
Moment shift estimatorHow to estimate:
S(θ⋆) = τ−2Σ−1 Epost [(Θ − θ⋆)] = ∇ℓ(θ⋆) + O(τ2)
?
Importance Sampling estimator from the prior :
SN (θ⋆) = τ−2Σ−1(
1N
N∑i=1
Wiθi − θ⋆
)
where
Wi = L(θi)∑Nj=1 L(θj)
.
Turns out it is better to use
SN (θ⋆) = τ−2Σ−1(
1N
N∑i=1
Wiθi − 1N
N∑i=1
θi
).
Any idea why?Pierre Jacob Derivative estimation 21/ 35
![Page 27: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/27.jpg)
Moment shift estimator
For the second order derivative, we want to estimate:{τ−4Σ−1
(Vpost [Θ] − τ2Σ
)Σ−1
}= ∇2ℓ(θ⋆) + O
(τ2)
.
We propose:
τ−4Σ−1
N∑i=1
Wi
(θi −
N∑i=1
Wjθj
)2
− 1N
N∑i=1
(θi − 1
N
N∑i=1
θi
)2Σ−1.
Pierre Jacob Derivative estimation 22/ 35
![Page 28: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/28.jpg)
Moment shift estimator
We retrieve the same rates of convergence as in finite difference,e.g. if τ ∼ N −1/6, the MSE is in N −2/3.
Then why bother? Why has better performance been observedin practice in some scenarios?
Different behaviour in the non-asympotic regime.
For any function υ and tuning parameter M ,
V(SN (θ⋆)
)≤ τ−2CN ,
where CN does not depend on υ nor M .
Pierre Jacob Derivative estimation 23/ 35
![Page 29: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/29.jpg)
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 24/ 35
![Page 30: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/30.jpg)
Hidden Markov models
y2
X2X0
y1
X1...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Pierre Jacob Derivative estimation 24/ 35
![Page 31: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/31.jpg)
Hidden Markov models
Direct application of the previous results
1 Prior distribution N (θ0, σ2) on the parameter θ.
2 The derivative approximations involve E[θ|Y ] andCov[θ|Y ].
3 Posterior moments for HMMs can be estimated byparticle MCMC,
SMC2,
ABCor your favourite method.
Ionides et al. proposed another approach.Pierre Jacob Derivative estimation 25/ 35
![Page 32: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/32.jpg)
Iterated Filtering
Modification of the model: θ is time-varying.The associated loglikelihood is
ℓ(θ1:T ) = log p(y1:T ; θ1:T )
= log∫
X T+1
T∏t=1
g(yt | xt , θt) µ(dx1 | θ1)T∏
t=2f (dxt | xt−1, θt).
Introducing θ 7→ (θ, θ, . . . , θ) := θ[T ] ∈ RT , we have
ℓ(θ[T ]) = ℓ(θ)
and the chain rule yields
dℓ(θ)dθ
=T∑
t=1
∂ℓ(θ[T ])∂θt
.
Pierre Jacob Derivative estimation 26/ 35
![Page 33: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/33.jpg)
Iterated Filtering
Choice of prior on θ1:T :
θ1 = θ0 + V1, V1 ∼ τ−1κ{
τ−1 (·)}
θt+1 − θ0 = ρ(θt − θ0
)+ Vt+1, Vt+1 ∼ σ−1κ
{σ−1 (·)
}Choose σ2 such that τ2 = σ2/(1 − ρ2). Covariance of the prioron θ1:T :
ΣT = τ2
1 ρ · · · · · · · · · ρT−1
ρ 1 ρ · · · · · · ρT−2
ρ2 ρ 1 . . . ρT−3
... . . . . . . . . . ...
ρT−2 . . . 1 ρρT−1 · · · · · · · · · ρ 1
.
Pierre Jacob Derivative estimation 27/ 35
![Page 34: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/34.jpg)
Iterated Filtering
Applying the general results for this prior yields, with|x| =
∑Tt=1 |xi |:
|∇ℓ(θ[T ]0 ) − Σ−1
T
(E[θ1:T | Y
]− θ
[T ]0
)| ≤ Cτ2
Moreover we have∣∣∣∣∣T∑
t=1
∂ℓ(θ[T ])∂θt
−T∑
t=1
{Σ−1
T
(E[θ1:T | Y
]− θ
[T ]0
)}t
∣∣∣∣∣≤
T∑t=1
∣∣∣∣∣∂ℓ(θ[T ])∂θt
−{
Σ−1T
(E[θ1:T | Y
]− θ
[T ]0
)}t
∣∣∣∣∣and
dℓ(θ)dθ
=T∑
t=1
∂ℓ(θ[T ])∂θt
.
Pierre Jacob Derivative estimation 28/ 35
![Page 35: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/35.jpg)
Iterated Filtering
The estimator of the score is thus given by
T∑t=1
{Σ−1
T
(E[θ1:T | Y
]− θ
[T ]0
)}t
which can be reduced to
Sτ,ρ,T (θ0) = τ−2
1 + ρ
[(1 − ρ)
{T−1∑t=2
E(
θt∣∣∣Y)}− {(1 − ρ) T + 2ρ} θ0
+E(
θ1∣∣∣Y)+ E
(θT∣∣∣Y)] ,
given the form of Σ−1T . Note that in the quantities E(θt | Y ),
Y = Y1:T is the complete dataset, thus those expectations arewith respect to the smoothing distribution.
Pierre Jacob Derivative estimation 29/ 35
![Page 36: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/36.jpg)
Iterated Filtering
If ρ = 1, then the parameters follow a random walk:
θ1 = θ0 + N (0, τ2) and θt+1 = θt + N (0, σ2).
In this case Ionides et al. proposed the estimator
Sτ,σ,T = τ−2(E(θT | Y
)− θ0
)as well as
S (bis)τ,σ,T =
T∑t=1
VP,t−1(θF ,t − θF ,t−1
)with VP,t = Cov[θt | y1:t−1] and θF ,t = E[θt | y1:t ].
Those expressions only involve expectations with respect tofiltering distributions.
Pierre Jacob Derivative estimation 30/ 35
![Page 37: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/37.jpg)
Iterated Filtering
If ρ = 0, then the parameters are i.i.d:
θ1 = θ0 + N (0, τ2) and θt+1 = θ0 + N (0, τ2).
In this case the expression of the score estimator reduces to
Sτ,T = τ−2T∑
t=1
(E(θt | Y
)− θ0
)which involves smoothing distributions.
There’s only one parameter τ2 to choose for the prior.However smoothing for general hidden Markov models isdifficult, and typically resorts to “fixed lag approximations”.
Pierre Jacob Derivative estimation 31/ 35
![Page 38: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/38.jpg)
Numerical results
Linear Gaussian state space model where the ground truth isavailable through the Kalman filter.
X0 ∼ N (0, 1) and Xt = ρXt−1 + N (0, V )Yt = ηXt + N (0, W ).
Generate T = 100 observations and setρ = 0.9, V = 0.7, η = 0.9 and W = 0.1, 0.2, 0.4, 0.9.
240 independent runs, matching the computational costsbetween methods in terms of number of calls to the transitionkernel.
Pierre Jacob Derivative estimation 32/ 35
![Page 39: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/39.jpg)
Numerical results
●●●
●●
●● ● ●
● ● ● ● ● ● ● ● ● ●●●●
●●●
●●
● ●● ● ● ● ● ● ● ● ●
●●●
●●
●●
●●
●● ● ●
● ● ● ● ●
●
●●●
●●
●●
●●
●●
●● ● ●
●●
●
●
10
100
1000
10000
0.0 0.1 0.2 0.3h
RM
SE
parameter ● ● ● ●1 2 3 4
Finite Difference
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5tau
RM
SE
parameter ● ● ● ●1 2 3 4
Iterated Smoothing
Figure : 240 runs for Iterated Smoothing and Finite Difference.
Pierre Jacob Derivative estimation 33/ 35
![Page 40: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/40.jpg)
Numerical results
●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5tau
RM
SE
parameter ● ● ● ●1 2 3 4
Iterated Smoothing
●●●
●●
● ● ● ● ●●●
●
● ●
● ● ● ● ●
●
●
●●
●●
● ● ● ●●
●
●●
●● ● ● ● ●
10
100
1000
10000
0.0 0.1 0.2 0.3 0.4 0.5tau
RM
SE
parameter ● ● ● ●1 2 3 4
Iterated Filtering 1
●●●● ● ● ● ● ● ●
●●●● ● ● ●● ● ●
●●●● ●●
●● ● ●
●●●●●
●● ● ● ●
10
100
1000
10000
0.0 0.1 0.2 0.3 0.4 0.5tau
RM
SE
parameter ● ● ● ●1 2 3 4
Iterated Filtering 2
Figure : 240 runs for Iterated Smoothing and Iterated Filtering.
Pierre Jacob Derivative estimation 34/ 35
![Page 41: Estimation of the score vector and observed information matrix in intractable models](https://reader034.vdocuments.us/reader034/viewer/2022052700/55c0605dbb61eb3c4c8b4647/html5/thumbnails/41.jpg)
Bibliography
Main references:Inference for nonlinear dynamical systems, Ionides, Breto,King, PNAS, 2006.Iterated filtering, Ionides, Bhadra, Atchade, King, Annalsof Statistics, 2011.Efficient iterated filtering, Lindstrom, Ionides, Frydendall,Madsen, 16th IFAC Symposium on System Identification.Derivative-Free Estimation of the Score Vectorand Observed Information Matrix,Doucet, Jacob, Rubenthaler, 2013 (on arXiv).
Pierre Jacob Derivative estimation 35/ 35