1 terminating statistical analysis by dr. jason merrick
TRANSCRIPT
1
Terminating Statistical Analysis
By Dr. Jason Merrick
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/2
Statistical Analysis of Output Data: Terminating Simulations
• Random input leads to random output (RIRO)
• Run a simulation (once) — what does it mean?– Was this run “typical” or not?– Variability from run to run (of the same model)?
• Need statistical analysis of output data
• Time frame of simulations– Terminating: Specific starting, stopping conditions– Steady-state: Long-run (technically forever)– Here: Terminating
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/3
Point and Interval Estimation
• Suppose we are trying to estimate an output measure E[Y] = based upon a simulated sample Y1,…,Yn
• We come up with an estimate – For instance
• How good is this estimate?– Unbiased – Low Variance (possibly minimum variance)– Consistent– Confidence Interval
n
iiYnY
1
1
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/4
T-distribution
• The t-statistic is given by
– If the Y1,…,Yn are normally distributed and then the t-statistic is t-distributed
– If the Y1,…,Yn are not normally distributed, but then the t-statistic is approximately t-distributed thanks to the Central Limit Theorem• requires a reasonably large sample size n
– We require an estimate of the variance of denoted
)ˆ(ˆ
ˆ
t
Y
Y
)ˆ(2
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/5
T-distribution Confidence Interval
• An approximate confidence interval for is then
– The center of the confidence interval is
– The half-width of the confidence interval is
– is the 100(/2)% percentile of a t-distribution with f degrees of freedom.
)]ˆ(ˆˆ),ˆ(ˆˆ[ 2/1,,2/1, afaf tt
)ˆ(ˆ2/1, aft
2/1, aft
0 5 10 15 20 25 30
1S
amp
le R
epet
itio
n
Parameter Value
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/6
T-distribution Confidence Interval
• Case 1: Y1,…,Yn are independent
– This is the case when you are making n independent replications of the simulations• Terminating simulations
• Try and force this with steady-state simulations
– Compute your estimate and then compute the sample variance
– s2 is an unbiased estimator of the population variance, so s2/n is an unbiased estimator of with f = n-1 degrees of freedom
n
i
i
n
Ys
1
22
1
)ˆ(
)ˆ(2
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/7
T-distribution Confidence Interval
• Case 2: Y1,…,Yn are not independent
– This is the case when you are using data generated within a single simulation run• sequences of observations in long-run steady-state simulations
– s2/n is a biased estimator of
– Y1,…,Yn is an auto-correlated sequence or a time-series
– Suppose that our point estimator for is , a general result from mathematical statistics is
)ˆ(2
Y
n
i
n
jji YYn 1 1
22 ),cov(
1)ˆ(
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/8
T-distribution Confidence Interval
• Case 2: Y1,…,Yn are not independent
– For n observations there are n2 covariances to estimate– However, most simulations are covariance stationary, that
is for all i, j and k
– Recall that k is the lag, so for a given lag, the covariance remains the same throughout the sequence
– If this is the case then there are n-1 lagged covariances to estimate, denoted k and
),cov(),cov( kjjkii YYYY
1
12
22 121)ˆ(
n
i
k
n
k
n
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/9
Time-Series Examples
0
10
20
30
40
50
60
70
80
90
100
1 11 21 31 41 51 61 71 81 91 101
Time or Observations
Ob
serv
ed V
alu
e
0
10
20
30
40
50
60
70
1 11 21 31 41 51 61 71 81 91 101
Time or Observations
Ob
serv
ed V
alu
e
-15
-10
-5
0
5
10
15
20
1 11 21 31 41 51 61 71 81 91 101
Time or Observations
Ob
serv
ed V
alu
e
-300
-200
-100
0
100
200
300
400
500
600
1 11 21 31 41 51 61 71 81 91 101
Time or Observations
Ob
serv
ed V
alu
e
Positively correlated sequence with lag 1
Positively correlated sequence with lags
1 & 2
Negatively correlated sequence with lag 1
Positively correlated, covariance
non-stationary sequence
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/10
T-distribution Confidence Interval
• Case 2: Y1,…,Yn are not independent
– What is the effect of this bias term?
– For primarily positively correlated sequences B < 1, so the half-width of the confidence interval will be too small• Overstating the precision => make conclusions you shouldn’t
– For primarily negatively correlated sequences B > 1, so the half-width of the confidence interval will be too large• Underestimating the precision => don’t make conclusions you
should
1
1]/[2
2
ncnnsE
B
1
12
121n
i
k
n
kc
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/11
Strategy for Terminating Simulations
• For terminating case, make IID replications– Simulate module: Number of Replications field– Check both boxes for Initialization Between Reps.– Get multiple independent Summary Reports– Different random seeds for each replication
• How many replications?– Trial and error (now)– Approximate no. for acceptable precision – Sequential sampling
• Save summary statistics (e.g. average, variance) across replications– Statistics Module, Outputs Area, save to files
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/12
Half Width and Number of Replications
• Prefer smaller confidence intervals — precision
• Notation:
• Confidence interval:
• Half-width =t
snn 11 2, /
X tsnn 11 2, /
n
X
s
t tn
no. replications
sample mean
= sample standard deviation
critical value from tables11 2, /
Want this to be “small,” say< h where h is prespecified
Y
Y
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/13
Half Width and Number of Replications
nnmnn
m
m
YYYY
YYYY
YYYY
n ,,,
,,,
,,,
21
222221
111211
2
1
Y
2s
• To improve the half-width, we can
– Increase the length of each simulation run and so increase the mi
– What does increasing the run length do?– Increase the number of replications t
snn 11 2, /
Simulation with Arena — Intermediate Modeling and Terminating Statistical Analysis C6/14
Half Width and Number of Replications (cont’d.)
• Set half-width = h, solve for
• Not really solved for n (t, s depend on n)
• Approximation:
– Replace t by z, corresponding normal critical value– Pretend that current s will hold for larger samples
– Get
• Easier but different approximation:
n ts
hn 11 22
2
2, /
n zs
h 1 2
22
2 /s = sample standarddeviation from “initial”number n0 of replications
n nh
h 0
02
2h0 = half width from “initial”number n0 of replications
n grows quadraticallyas h decreases.