introduction to rare event simulation · introduction to rare event simulation rare event...
TRANSCRIPT
Introduction to Rare Event SimulationBrown University: Summer School on Rare Event Simulation
Jose Blanchet
Columbia University.Department of Statistics,Department of IEOR.
Blanchet (Columbia) 1 / 31
Goals / questions to address in this lecture
The need for rare event simulation by introducing several problems.
Why do we care about rare event simulation?
How does one measure the effi ciency of rare event simulationalgorithms?
Describe basic rare event simulation techniques.
Blanchet (Columbia) 2 / 31
Goals / questions to address in this lecture
The need for rare event simulation by introducing several problems.
Why do we care about rare event simulation?
How does one measure the effi ciency of rare event simulationalgorithms?
Describe basic rare event simulation techniques.
Blanchet (Columbia) 2 / 31
Goals / questions to address in this lecture
The need for rare event simulation by introducing several problems.
Why do we care about rare event simulation?
How does one measure the effi ciency of rare event simulationalgorithms?
Describe basic rare event simulation techniques.
Blanchet (Columbia) 2 / 31
Goals / questions to address in this lecture
The need for rare event simulation by introducing several problems.
Why do we care about rare event simulation?
How does one measure the effi ciency of rare event simulationalgorithms?
Describe basic rare event simulation techniques.
Blanchet (Columbia) 2 / 31
Why are Rare Events Diffi cult to Assess?
Typically no closed forms
But crudely implemented simulation might not be good
1
10
Blanchet (Columbia) 3 / 31
Why are Rare Events Diffi cult to Assess?
1
10
Blanchet (Columbia) 4 / 31
Why are Rare Events Diffi cult to Assess?
Relative mean squared error (MSE) PER TRIAL = stdev / mean√P (RED) (1− P (RED))
P (RED)≈ 1√
P (RED).
Moral of Story: Naive Monte Carlo needs
N = O(
1P (Rare Event)
)replications to obtain a given relative accuracy.
Blanchet (Columbia) 5 / 31
Why are Rare Events Diffi cult to Assess?
Relative mean squared error (MSE) PER TRIAL = stdev / mean√P (RED) (1− P (RED))
P (RED)≈ 1√
P (RED).
Moral of Story: Naive Monte Carlo needs
N = O(
1P (Rare Event)
)replications to obtain a given relative accuracy.
Blanchet (Columbia) 5 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.
Arises in settings such as:
1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:
1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:
1 Finance and Insurance,
2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:
1 Finance and Insurance,2 Queueing networks (operations and communications),
3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:
1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,
4 Statistical inference and combinatorics...5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:
1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...
5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Introduction to Rare Event Simulation
Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:
1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,
Blanchet (Columbia) 6 / 31
Insurance Reserve Setup
Vi’s i.i.d. (independent and identially distributed) claim sizes
τi’s i.i.d. inter-arrival times
An = time of the n-th arrival
Constant premium p
Reserve process
R (t) = b+ pt −N (t)
∑j=1
Vj
N (t) = # arrivals up to time t
Blanchet (Columbia) 7 / 31
Insurance Reserve Setup
Vi’s i.i.d. (independent and identially distributed) claim sizes
τi’s i.i.d. inter-arrival times
An = time of the n-th arrival
Constant premium p
Reserve process
R (t) = b+ pt −N (t)
∑j=1
Vj
N (t) = # arrivals up to time t
Blanchet (Columbia) 7 / 31
Insurance Reserve Setup
Vi’s i.i.d. (independent and identially distributed) claim sizes
τi’s i.i.d. inter-arrival times
An = time of the n-th arrival
Constant premium p
Reserve process
R (t) = b+ pt −N (t)
∑j=1
Vj
N (t) = # arrivals up to time t
Blanchet (Columbia) 7 / 31
Insurance Reserve Setup
Vi’s i.i.d. (independent and identially distributed) claim sizes
τi’s i.i.d. inter-arrival times
An = time of the n-th arrival
Constant premium p
Reserve process
R (t) = b+ pt −N (t)
∑j=1
Vj
N (t) = # arrivals up to time t
Blanchet (Columbia) 7 / 31
Insurance Reserve Setup
Vi’s i.i.d. (independent and identially distributed) claim sizes
τi’s i.i.d. inter-arrival times
An = time of the n-th arrival
Constant premium p
Reserve process
R (t) = b+ pt −N (t)
∑j=1
Vj
N (t) = # arrivals up to time t
Blanchet (Columbia) 7 / 31
Insurance Reserve Setup
Vi’s i.i.d. (independent and identially distributed) claim sizes
τi’s i.i.d. inter-arrival times
An = time of the n-th arrival
Constant premium p
Reserve process
R (t) = b+ pt −N (t)
∑j=1
Vj
N (t) = # arrivals up to time t
Blanchet (Columbia) 7 / 31
Insurance Reserve Plot
Plot of risk reserve
R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3
R(t)
t
b
A2 A3A1
Blanchet (Columbia) 8 / 31
Insurance Reserve and Random Walks
Evaluating reserve at arrival times we get random walk
Suppose Y1,Y2, ... are i.i.d.
S (n) = b+ Y1 + ...+ Yn
R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3
R(t)
t
b
A2 A3A1
Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?
Blanchet (Columbia) 9 / 31
Insurance Reserve and Random Walks
Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.
S (n) = b+ Y1 + ...+ Yn
R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3
R(t)
t
b
A2 A3A1
Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?
Blanchet (Columbia) 9 / 31
Insurance Reserve and Random Walks
Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.
S (n) = b+ Y1 + ...+ Yn
R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3
R(t)
t
b
A2 A3A1
Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?
Blanchet (Columbia) 9 / 31
Insurance Reserve and Random Walks
Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.
S (n) = b+ Y1 + ...+ Yn
R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3A1
R(t)
t
b
A2 A3
R(t)
t
b
A2 A3A1
Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?Blanchet (Columbia) 9 / 31
How Does Ruin Occur with Light-Tailed Increments?
Yi’s are N(0, 1), EYi = 1
Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm
Blanchet (Columbia) 10 / 31
How Does Ruin Occur with Light-Tailed Increments?
Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruin
Light tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm
Blanchet (Columbia) 10 / 31
How Does Ruin Occur with Light-Tailed Increments?
Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.
Picture generated with Siegmund’s 76 algorithm
Blanchet (Columbia) 10 / 31
How Does Ruin Occur with Light-Tailed Increments?
Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm
Blanchet (Columbia) 10 / 31
Back to Heavy-tailed Insurance
Ruin with Pareto Claims
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
Time
Res
erve
Ruin with Powerlaw Type Increments
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
B Time
Ran
dom
Wal
k Va
lue
Ruin with Pareto Claims
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
Time
Res
erve
Ruin with Powerlaw Type Increments
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
B Time
Ran
dom
Wal
k Va
lue
Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.
Heavy-tailed random walk conditioned on ruin.
Picture generated with Blanchet and Glynn’s 09 algorithm.
Blanchet (Columbia) 11 / 31
Back to Heavy-tailed Insurance
Ruin with Pareto Claims
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
Time
Res
erve
Ruin with Powerlaw Type Increments
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
B Time
Ran
dom
Wal
k Va
lue
Ruin with Pareto Claims
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
Time
Res
erve
Ruin with Powerlaw Type Increments
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
B Time
Ran
dom
Wal
k Va
lue
Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.
Heavy-tailed random walk conditioned on ruin.
Picture generated with Blanchet and Glynn’s 09 algorithm.
Blanchet (Columbia) 11 / 31
Back to Heavy-tailed Insurance
Ruin with Pareto Claims
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
Time
Res
erve
Ruin with Powerlaw Type Increments
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
B Time
Ran
dom
Wal
k Va
lue
Ruin with Pareto Claims
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
Time
Res
erve
Ruin with Powerlaw Type Increments
200
100
0
100
200
300
400
1 21 41 61 81 101 121 141 161 181 201
B Time
Ran
dom
Wal
k Va
lue
Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.
Heavy-tailed random walk conditioned on ruin.
Picture generated with Blanchet and Glynn’s 09 algorithm.
Blanchet (Columbia) 11 / 31
A Two-node Tandem Network
Poisson arrivals, exponential service times, and Markovian routing...
Blanchet (Columbia) 12 / 31
Example: A Simple Stochastic Network
Questions of interest:
How would the system perform IF TOTAL POPULATION reaches ninside a busy period?
How likely is this event under different parameters / designs?
Blanchet (Columbia) 13 / 31
Naive Benchmark: Crude Monte Carlo
Say λ = .1, µ1 = .5, µ2 = .4, p = .1
Overflow probability ≈ 10−7
Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours
What if you want to do sensitivity analysis for a wide range ofparameters?
What about different network designs?
Blanchet (Columbia) 14 / 31
Naive Benchmark: Crude Monte Carlo
Say λ = .1, µ1 = .5, µ2 = .4, p = .1
Overflow probability ≈ 10−7
Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours
What if you want to do sensitivity analysis for a wide range ofparameters?
What about different network designs?
Blanchet (Columbia) 14 / 31
Naive Benchmark: Crude Monte Carlo
Say λ = .1, µ1 = .5, µ2 = .4, p = .1
Overflow probability ≈ 10−7
Each replication in crude Monte Carlo ≈ 10−3 secs
Time to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours
What if you want to do sensitivity analysis for a wide range ofparameters?
What about different network designs?
Blanchet (Columbia) 14 / 31
Naive Benchmark: Crude Monte Carlo
Say λ = .1, µ1 = .5, µ2 = .4, p = .1
Overflow probability ≈ 10−7
Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours
What if you want to do sensitivity analysis for a wide range ofparameters?
What about different network designs?
Blanchet (Columbia) 14 / 31
Naive Benchmark: Crude Monte Carlo
Say λ = .1, µ1 = .5, µ2 = .4, p = .1
Overflow probability ≈ 10−7
Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours
What if you want to do sensitivity analysis for a wide range ofparameters?
What about different network designs?
Blanchet (Columbia) 14 / 31
Naive Benchmark: Crude Monte Carlo
Say λ = .1, µ1 = .5, µ2 = .4, p = .1
Overflow probability ≈ 10−7
Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours
What if you want to do sensitivity analysis for a wide range ofparameters?
What about different network designs?
Blanchet (Columbia) 14 / 31
Rare Events in Networks
Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?
Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds
Blanchet (Columbia) 15 / 31
Rare Events in Networks
Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?
Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?
Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds
Blanchet (Columbia) 15 / 31
Rare Events in Networks
Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?
Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 days
Each picture below took ≈ .01 seconds
Blanchet (Columbia) 15 / 31
Rare Events in Networks
Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?
Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds
Blanchet (Columbia) 15 / 31
Rare Events in Networks
λ = .1, µ1 = .3, µ2 = .5
X1
X2
Plot of Conditional Path
OF
Blanchet (Columbia) 16 / 31
Rare Events in Networks
λ = .1, µ1 = .5, µ2 = .2
X1
X2
Plot of Conditional Path
OF
Blanchet (Columbia) 17 / 31
Rare Events in Networks
λ = .1, µ1 = .4, µ2 = .4
X1
X2
Plot of Conditional Path
OF
Blanchet (Columbia) 18 / 31
Several Notions of Effi ciency
Assume the estimator is unbiased.
E (Z ) = P (A) and we assume P (A)→ 0.
Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):
limP (A)→0
log E(Z 2)
2 logP (A)= 1.
Bounded Relative Error
E(Z 2)
P (A)2= O (1) as P (A)→ 0.
Blanchet (Columbia) 19 / 31
Several Notions of Effi ciency
Assume the estimator is unbiased.
E (Z ) = P (A) and we assume P (A)→ 0.
Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):
limP (A)→0
log E(Z 2)
2 logP (A)= 1.
Bounded Relative Error
E(Z 2)
P (A)2= O (1) as P (A)→ 0.
Blanchet (Columbia) 19 / 31
Several Notions of Effi ciency
Assume the estimator is unbiased.
E (Z ) = P (A) and we assume P (A)→ 0.
Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):
limP (A)→0
log E(Z 2)
2 logP (A)= 1.
Bounded Relative Error
E(Z 2)
P (A)2= O (1) as P (A)→ 0.
Blanchet (Columbia) 19 / 31
Several Notions of Effi ciency
Assume the estimator is unbiased.
E (Z ) = P (A) and we assume P (A)→ 0.
Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):
limP (A)→0
log E(Z 2)
2 logP (A)= 1.
Bounded Relative Error
E(Z 2)
P (A)2= O (1) as P (A)→ 0.
Blanchet (Columbia) 19 / 31
Effi ciency of Naive Monte Carlo
If Z = I (A) then E(Z 2)= E (Z ) = P (A)
Thereforelim
P (A)→0
logP (A)2 logP (A)
=12.
How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...
Blanchet (Columbia) 20 / 31
Effi ciency of Naive Monte Carlo
If Z = I (A) then E(Z 2)= E (Z ) = P (A)
Thereforelim
P (A)→0
logP (A)2 logP (A)
=12.
How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...
Blanchet (Columbia) 20 / 31
Effi ciency of Naive Monte Carlo
If Z = I (A) then E(Z 2)= E (Z ) = P (A)
Thereforelim
P (A)→0
logP (A)2 logP (A)
=12.
How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...
Blanchet (Columbia) 20 / 31
Graphical Interpretation of Importance Sampling
Importance sampling (I.S.): sample from the important region andcorrect via likelihood ratio
RED AREA ≈ PROPORTION DARTS IN RED AREA× 19
0 1
1
0 1
1
Blanchet (Columbia) 21 / 31
Importance Sampling
Goal: Estimate P (A) > 0
Choose P̃ and simulate ω from it
Importance Sampling (I.S.) estimator per trial is
I .S .Estimator = L (ω) I (ω ∈ A) ,
where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure
Blanchet (Columbia) 22 / 31
Importance Sampling
Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it
Importance Sampling (I.S.) estimator per trial is
I .S .Estimator = L (ω) I (ω ∈ A) ,
where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure
Blanchet (Columbia) 22 / 31
Importance Sampling
Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it
Importance Sampling (I.S.) estimator per trial is
I .S .Estimator = L (ω) I (ω ∈ A) ,
where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).
NOTE: P̃ (·) is called a change-of-measure
Blanchet (Columbia) 22 / 31
Importance Sampling
Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it
Importance Sampling (I.S.) estimator per trial is
I .S .Estimator = L (ω) I (ω ∈ A) ,
where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure
Blanchet (Columbia) 22 / 31
Importance Sampling
Suppose we choose P̃ (·) = P ( ·|A)
L (ω) =dP (ω)
I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)
Estimator has zero variance, but requires knoweledge of P (A)
Lesson: Try choosing P̃ (·) close to P ( ·|A)!
Blanchet (Columbia) 23 / 31
Importance Sampling
Suppose we choose P̃ (·) = P ( ·|A)
L (ω) =dP (ω)
I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)
Estimator has zero variance, but requires knoweledge of P (A)
Lesson: Try choosing P̃ (·) close to P ( ·|A)!
Blanchet (Columbia) 23 / 31
Importance Sampling
Suppose we choose P̃ (·) = P ( ·|A)
L (ω) =dP (ω)
I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)
Estimator has zero variance, but requires knoweledge of P (A)
Lesson: Try choosing P̃ (·) close to P ( ·|A)!
Blanchet (Columbia) 23 / 31
Large Deviations of the Empirical Mean
Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.
Assume that E (Xi ) = 0.
Suppose that for each a > 0, there is θa such that H ′ (θa) = a.
Consider the problem of estimating via simulation
P (S (n) > na) ,
where S (n) = X1 + ...+ Xn.
Blanchet (Columbia) 24 / 31
Large Deviations of the Empirical Mean
Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.
Assume that E (Xi ) = 0.
Suppose that for each a > 0, there is θa such that H ′ (θa) = a.
Consider the problem of estimating via simulation
P (S (n) > na) ,
where S (n) = X1 + ...+ Xn.
Blanchet (Columbia) 24 / 31
Large Deviations of the Empirical Mean
Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.
Assume that E (Xi ) = 0.
Suppose that for each a > 0, there is θa such that H ′ (θa) = a.
Consider the problem of estimating via simulation
P (S (n) > na) ,
where S (n) = X1 + ...+ Xn.
Blanchet (Columbia) 24 / 31
Large Deviations of the Empirical Mean
Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.
Assume that E (Xi ) = 0.
Suppose that for each a > 0, there is θa such that H ′ (θa) = a.
Consider the problem of estimating via simulation
P (S (n) > na) ,
where S (n) = X1 + ...+ Xn.
Blanchet (Columbia) 24 / 31
Exponential Tilting
Suppose that Xi has density f (·) and define
fθ (x) = exp (θx −H (θ)) f (x) .
Note that
log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx
= log∫f (x) exp ((η + θ) x −H (θ)) dx
= H (η + θ)−H (θ) .
Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .
Blanchet (Columbia) 25 / 31
Exponential Tilting
Suppose that Xi has density f (·) and define
fθ (x) = exp (θx −H (θ)) f (x) .
Note that
log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx
= log∫f (x) exp ((η + θ) x −H (θ)) dx
= H (η + θ)−H (θ) .
Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .
Blanchet (Columbia) 25 / 31
Exponential Tilting
Suppose that Xi has density f (·) and define
fθ (x) = exp (θx −H (θ)) f (x) .
Note that
log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx
= log∫f (x) exp ((η + θ) x −H (θ)) dx
= H (η + θ)−H (θ) .
Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .
Blanchet (Columbia) 25 / 31
Exponential Tilting in Action
Cramer’s theorem (strong version): (Sketch)
P (Sn > na)
=∫f (x1) ...f (xn) I (sn > na) dx1:n
=∫{sn>na}
fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n
= exp (−n[aθa −H (θa)])Eθa [exp (−θa (Sn − na)) I (Sn > na)]= exp (−nI (a))Eθa [exp (−θa (Sn − na)) I (Sn > na)] .
Use CLT and approximate (Sn − na) by N (0, nH ′′ (θa)) to obtain
Eθa [exp (−θa (Sn − na)) I (Sn > na)] ≈1
n1/2 (2πH ′′ (θa))1/2 θa
.
Blanchet (Columbia) 26 / 31
Exponential Tilting in Action
Cramer’s theorem (strong version): (Sketch)
P (Sn > na)
=∫f (x1) ...f (xn) I (sn > na) dx1:n
=∫{sn>na}
fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n
= exp (−n[aθa −H (θa)])Eθa [exp (−θa (Sn − na)) I (Sn > na)]= exp (−nI (a))Eθa [exp (−θa (Sn − na)) I (Sn > na)] .
Use CLT and approximate (Sn − na) by N (0, nH ′′ (θa)) to obtain
Eθa [exp (−θa (Sn − na)) I (Sn > na)] ≈1
n1/2 (2πH ′′ (θa))1/2 θa
.
Blanchet (Columbia) 26 / 31
Why is Exponential tilting Useful?
TheoremUnder the assumptions imposed
P (X1 ∈ dx1|Sn > na)→ Pθa (X1 ∈ dx1) .
In simple words, exponential tilting approximates thezero-variance (optimal!) change-of-measure!
Proof.
P (X1 ∈ dx1|Sn > na) =P (Sn−1 > na− x1)
P (Sn > na)f (x1) dx .
Now apply Cramer’s theorem with a←− (na− x1) /(n− 1) andn←− n− 1 in the numerator and simplify, observing that
θ
(na− x1n− 1
)= θ
(a+
a− x1n− 1
)≈ θ (a) +
1H ′′ (θa)
a− x1n− 1 .
Blanchet (Columbia) 27 / 31
First Asymptotically Optimal Rare Event SimulationEstimator
Recall the identity used in Cramer’s theorem
P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n
=∫{sn>na}
fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n
= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .
IMPORTANCE SAMPLING ESTIMATOR:
Z = exp (−θaSn + nH (θa)) I (Sn > na) ,
with X1, ...,Xn simulated using the density fθa (·) .Let us verify now that Z is asymptotically effi cient!
Blanchet (Columbia) 28 / 31
First Asymptotically Optimal Rare Event SimulationEstimator
Recall the identity used in Cramer’s theorem
P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n
=∫{sn>na}
fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n
= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .
IMPORTANCE SAMPLING ESTIMATOR:
Z = exp (−θaSn + nH (θa)) I (Sn > na) ,
with X1, ...,Xn simulated using the density fθa (·) .
Let us verify now that Z is asymptotically effi cient!
Blanchet (Columbia) 28 / 31
First Asymptotically Optimal Rare Event SimulationEstimator
Recall the identity used in Cramer’s theorem
P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n
=∫{sn>na}
fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n
= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .
IMPORTANCE SAMPLING ESTIMATOR:
Z = exp (−θaSn + nH (θa)) I (Sn > na) ,
with X1, ...,Xn simulated using the density fθa (·) .Let us verify now that Z is asymptotically effi cient!
Blanchet (Columbia) 28 / 31
First Asymptotically Optimal Rare Event SimulationEstimator
IMPORTANCE SAMPLING ESTIMATOR:
Z = exp (−θaSn + nH (θa)) I (Sn > na) ,
with X1, ...,Xn simulated using the density fθa (·) .
Therefore,
Eθa
(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)
= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .
Thuslog Eθa
(Z 2)
2 logP (Sn > na)=2nI (a)2nI (a)
+ o (1)→ 1.
Blanchet (Columbia) 29 / 31
First Asymptotically Optimal Rare Event SimulationEstimator
IMPORTANCE SAMPLING ESTIMATOR:
Z = exp (−θaSn + nH (θa)) I (Sn > na) ,
with X1, ...,Xn simulated using the density fθa (·) .Therefore,
Eθa
(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)
= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .
Thuslog Eθa
(Z 2)
2 logP (Sn > na)=2nI (a)2nI (a)
+ o (1)→ 1.
Blanchet (Columbia) 29 / 31
First Asymptotically Optimal Rare Event SimulationEstimator
IMPORTANCE SAMPLING ESTIMATOR:
Z = exp (−θaSn + nH (θa)) I (Sn > na) ,
with X1, ...,Xn simulated using the density fθa (·) .Therefore,
Eθa
(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)
= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .
Thuslog Eθa
(Z 2)
2 logP (Sn > na)=2nI (a)2nI (a)
+ o (1)→ 1.
Blanchet (Columbia) 29 / 31
How to Simulate under Exponential Tilting
Key is to identify: Hθ (η) = H (η + θ)−H (θ) .
Example 1: Gaussian if X ∼ N (µ, 1)
H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.
Thus, under fθ, X ∼ N (µ, 1).Example 2: If X ∼ Poisson (λ), then
H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .
Thus, under fθ, X ∼ Poisson (λ exp (θ)).
Blanchet (Columbia) 30 / 31
How to Simulate under Exponential Tilting
Key is to identify: Hθ (η) = H (η + θ)−H (θ) .Example 1: Gaussian if X ∼ N (µ, 1)
H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.
Thus, under fθ, X ∼ N (µ, 1).
Example 2: If X ∼ Poisson (λ), then
H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .
Thus, under fθ, X ∼ Poisson (λ exp (θ)).
Blanchet (Columbia) 30 / 31
How to Simulate under Exponential Tilting
Key is to identify: Hθ (η) = H (η + θ)−H (θ) .Example 1: Gaussian if X ∼ N (µ, 1)
H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.
Thus, under fθ, X ∼ N (µ, 1).Example 2: If X ∼ Poisson (λ), then
H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .
Thus, under fθ, X ∼ Poisson (λ exp (θ)).
Blanchet (Columbia) 30 / 31
Conclusions
1 Rare event simulation is a power numerical tool applicable in a widerange of examples.
2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.
4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.
5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.
Blanchet (Columbia) 31 / 31
Conclusions
1 Rare event simulation is a power numerical tool applicable in a widerange of examples.
2 Importance sampling can be used to construct effi cient estimators.
3 Best importance sampler is the conditional distribution given theevent of interest.
4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.
5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.
Blanchet (Columbia) 31 / 31
Conclusions
1 Rare event simulation is a power numerical tool applicable in a widerange of examples.
2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.
4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.
5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.
Blanchet (Columbia) 31 / 31
Conclusions
1 Rare event simulation is a power numerical tool applicable in a widerange of examples.
2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.
4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.
5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.
Blanchet (Columbia) 31 / 31
Conclusions
1 Rare event simulation is a power numerical tool applicable in a widerange of examples.
2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.
4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.
5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.
Blanchet (Columbia) 31 / 31