simulation modelling in natural resource management
DESCRIPTION
Photo: Kendra Holt. SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT. For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i)TRANSCRIPT
1
SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT
Photo: Kendra Holt
For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i) <= min_L Then min_L = neg_log_L(i) Ebest = i End IfNext i
for (i=1;i<=nobs;i++){ ad_begin_funnel(); x=S(i); y=R(i); lim1=x*mfexp(-6*sx); lim2=x*mfexp(8*sx); p=adromb(&model_parameters::fz,lim1,lim2,nsteps); resid+=log(p+1.e-300);}
2
Statistical Simulation
Statistics
Power analysis
Bootstrapping
Randomization
Simulation/Estimation for testing statistical methods
Fitting models to data
3
Statistics
The ability to simplify means to eliminate the unnecessary so that the necessary may speak
Hans Hoffman
Unfortunately, acquisition of statistical knowledge is blocked by a formidable wall of mathematics
It is equally unfortunate that much of this mathematical basis relies on relatively strong assumptions about such things as large sample sizes and asymptotic normality
On the positive side, modern computing power allows us to relax many of these assumptions, particularly those that commonly arise in natural resource applications
4
Statistics
Three main objectives of Statistics:
1. How should data be collected in order to:a. test hypotheses?b. estimate parameters for a model?c. make decisions?
2. How should data be summarized and analyzed?
3. How accurate are the summaries and analyses?Do they adequately reflect the “truth”?
Data Collection
Analysis
Inference
5
Statistical Inference
A sample of 15 (X,Y) pairs from the universe of 100
possible pairs
6
Statistical Inference
A sample of 15 (X,Y) pairs from the universe of 100
possible pairs
The complete universe of 100 possible pairs
“The Truth”
Statistics tries to infer the picture on the right from the picture on the left
7
X = {76.3, 74.2, 90.2, 83.9, 89.8, 89.1, 94.3, 62.3, 87.9}
mean(X) = 96.7sd(X) = 10.6
How accurate is the mean of X?
Accuracy formula for the mean (and only the “mean”)
53.39
6.10
..
2
2
n
ses
Statistical Inference – the mean
8
Bootstrapping
That was easy! But try that for the correlation coefficient
A statistical formula might be available, but it will require assumptions about the distribution of the data
9
Bootstrapping – what’s the big idea?
X=(x1,x2,…xn) Original Dataset
X*1 X*2 X*3 Bootstrap samples
s(X*1) s(X*2) s(X*3)Bootstrap replications of the function s(X*)
There is no real limit to what the function s() represents!!
10
Bootstrapping
There is no real limit to what the function s() represents!!
For example, s() could be the mean or other summary statistic. It could be a correlation coefficient, a regression estimate, parameter estimates for a non-linear model
The data could be multivariate in which more than one response variable is observed for each sample. Thus, s() could be a Principal Component statistic.
Most, if not all, statistical functions are automatically generated in software like R, SAS, Systat, S-plus, etc. Therefore, you can make inferences just by (1) resampling the data, (2) repeating the analysis, (3) summarizing your estimates, and (4) drawing inferences. All without much mathematical or statistical training.
11
Bootstrapping - Example
Suppose we wish to infer the regression slope for our hypothetical universe of possible data using only the sample
1. Resample the 12 data points 1000 times
2. Compute the regression slope for each
3. Compute the mean
4. Compute 95% confidence limits by a. sorting the slope estimates b. L95 is 25th estimate c. U95 is 975th estimate
12
Bootstrapping – R code for regression#initialise vector to hold slope estimatesslope<-0.#generate 1000 bootstrap replications of X,Y, slopefor(i in 1:1000){ #randomly choose nobs = 12 index values boot<-sample(seq(1, nobs, 1), size = nobs, replace=T) #create X,Y bootstrap pair Xboot<-X[boot] Yboot<-Y[boot] #perform regression using lm() function reg.boot<-lm(Yboot~Xboot) #extract slope from resulting list slope.boot[i]<-reg.boot$coeff[2]}#generate histogramhist(slope.boot, main="")#generate summary statsprint(mean(slope.boot))slope<-sort(slope.boot)L95<-slope.boot [25]; U95<-slope.boot [975]print(c(L95,U95))
Bootstrap reps of X*,Y*
The function s(X,Y)
The distribution F *
The inference
13
Bootstrapping - Example
67.0ˆ
42.0ˆ
56.0ˆ
50.0
975.0
025.0
boot
true
Original Data The Distribution F *
14
Model fitting
Curve fitting and Model fitting including some important definitions
Probability models and Likelihood Functions
Likelihood functions that obey constraints
“Safe” parameterization of non-linear models
Linear vs. non-linear estimation
Estimation
15
Curve fitting
A toxicologist has performed an experiment on accumulation of a chemicalpollutant in tissue where she obtained the following data:
She selects from a class of functions, e.g.,
mmBBBC 2
210
She then wishes to summarize this data into a more comprehensible and reduced form.
16
Curve fitting
A toxicologist has performed an experiment on accumulation of a chemicalpollutant in tissue where she obtained the following data:
This is called CURVE FITTING
She then wishes to summarize this data into a more comprehensible and reduced form.
She selects from a class of functions, e.g.,
the curve and parameters that best fit her data.
mmBBBC 2
210
17
Curve fitting
Curve fitting typically involves polynomials
mmBBBC 2
210ˆ
Values for the parameters 0, 1,… m, are chosen so as to get the best possible fit to the data
In curve fitting, the most common objective function used to judge the fit between curve and data is a least-squares criterion
2
1 0
2
1
ˆ
n
j
m
i
ijij
n
jjj xCCCSS
predictedobserved predictedobserved
18
Curve fitting
Curve fitting involves two levels of arbitrariness:
1. The function used to predict the data is arbitrary, being dictated only to a minor extent by the process from which the data came
2. The best fit criterion is arbitrary, being independent of statistical considerations (e.g., sums-of-squares is not probability distribution?)
19
Curve fitting
Curve fitting as described is also easy because equations like:
are linear functions of the parameters that can solved analytically
To see why this is linear, examine the independent variables in tabular or experimental design form
mmBBBC 2
210
X0 X1 X2 … … Xm
1 B B2 … … Bm
20
Curve fitting
The original equation can now be rewritten from
to
mmXXXC 22110
mmBBBC 2
210
which is just a linear model (e.g., multiple linear regression) that can be solved by setting all derivatives
0id
dSS
And solving the set of simultaneous linear equations for
21
Model fitting
Suppose now that our toxicologist friend is familiar with the biophysical laws governing accumulation of contaminants in tissue
B
BC
2
1
She may then choose to derive an equation that obeys these laws, e.g.,
The parameters 1 and 2 now have a biophysical interpretation
maximum thehalf reaches
t contaminanat which sizebody
constant scalingt contaminan
2
1
The function also obeys the biological constraint C ≥ 0.
22
Model fitting
When an equation is derived based on theoretical considerations, the procedure of finding the best fitting parameters is called MODEL FITTING
She may choose similar goodness of fit criteria, but the form of the equation is no longer guided by computational convenience
B
BC
2
1
In this case, the model
is no longer linear in the parameters
Computing the “best-fit” must involve some numerical procedure
23
Model estimation
Because parameters from model fitting have some natural interpretation, we may wish to ask: “What is the true value of 2 in nature?”
The imprecise nature of the measurements means that we can never answer this question with absolute certainty
Also, if she performed this experiment on a new set of subjects, she may get a different best-fitting 2 value
The process of finding parameter values that
a. Fit the data wellb. Come close on average to the true valuesc. Do not vary excessively from one experiment to the next
is called MODEL ESTIMATION
Model estimation is a critical component of Simulation Modelling!
24
Statistical estimation
Determining the parameters of a probability distribution is calledSTATISTICAL ESTIMATION
with the usual estimates
Statistical estimation is also a critical component of Simulation Modelling!
For example, the observed value of a random variable h may be the height of trees from an even-aged stand of lodgepole pine
If we assume that h has a normal distribution with mean h0 and standard deviation , then the probability density function (pdf) of h is
2
022 2
1exp
2
1)( hhhp
n
ii
n
ii hh
nh
nh
1
20
2
10 1
1 and
1
STATISTICAL ESTIMATION
25
Parameter estimation
Model estimation can be combined with Statistical estimation in the following way:
Suppose the measured concentration of pollutant Ci taken at body size Bi is a random variable whose mean is given by
i
ii B
BC
2
1
If many measurements were taken at body size Bi we would expect C values to fluctuate around this mean with standard deviation .
If we assume that these variations have a normal distribution, then the probability density function for Ci is
2
2
122 2
1exp
2
1)(
i
iii B
BCCp
26
Parameter estimation
Bi
Ci
i
ii B
BC
2
1ˆ
2
2
122 2
1exp
2
1)(
i
iii B
BCCp
Each value of Ci will have a mean that depends on Bi and the spread of Ci values depends on standard deviation
27
Parameter estimation
For several measurements (Ci, Bi) where i={1,2,…n}, we have a pdf for each:
28
Parameter estimation
,, 21
Although each value of Ci will have a mean that depends on Bi, the set of parameters
are common to all measurements!
Estimating parameters that are common to both the model and the probability density function of the observations is called PARAMETER ESTIMATION
Because parameter estimation encompasses all other forms of estimation, it is the most critical component of Simulation Modelling!
29
Simulation vs. Estimation
Simulation generates predicted values of state variables (observations) from a known set of parameter values
Initial Conditionst=0
Spring Juvenile Production
Summer Juvenile Survival
Update Fall Abundance
Calculate Harvest
Winter Survival
Adult Spring Abundance
End Yes
Hunting effortyr t
Set t=t+1
t=T? No
Echo parameter inputOutput initial conditions
Output results yr t
Input parameters of function equationsParameters in:
Observations out:
30
The key elements of simulation models are Parameters, State Variables, and Controls
We can use the notation:
to state that The Simulation Problem is to go from known Parameters and Controls to unknown State Variables:
YZΘ ,
Parameters Θ Survival, production rates, error variances, etc.
Controls Z Hunting effort, harvest rates
State Variables* Y Duck abundance, or index
*”State Variable” in this context includes a measurement of state such as index. Inthat case, the parameter set includes parameters of the measurement system
Simulation vs. Estimation
31
Simulation vs. Estimation
Simulation generates predicted values of state variables (observations) from a known set of parameter values
Initial Conditionst=0
Spring Juvenile Production
Summer Juvenile Survival
Update Fall Abundance
Calculate Harvest
Winter Survival
Adult Spring Abundance
End Yes
Hunting effortyr t
Set t=t+1
t=T? No
Echo parameter inputOutput initial conditions
Output results yr t
Input parameters of function equationsParameters in:
Observations out:
Θ
Z
Y
32
Simulation vs. Estimation
Estimation is concerned with finding the models and parameter values that generate the observed data.
The Estimation Problem is to go from known (observed) State Variables and Controls to unknown Parameters:
ΘZY ,In this case the “known” state variables Y represent the data or observations
33
Simulation vs. Estimation
YZΘ ,
Simulation
ΘZY ,
Estimation
34
Yeah, but how does estimation work?
For the contaminant problem, we have
State Variables eCY Controls BZ
Model
i
ii Z
ZY
2
1
,, 21Parameters
35
ΘZY ,Estimation
Yeah, but how does estimation work?
ii YZΘ ˆ,
2. Use simulation step to “predict” state variables given parameters at current iteration
1. Guess initial parameters at iteration i=00iΘ
3. Calculate likelihood that data would have arisen if these parameters were true
n
i i
ii Z
ZYp
1
2
2
122 2
1exp
2
1)(
ZΘ,|Y
4. Repeat 1-3 after adjusting parameters in “best-fit” direction
36
The simulated predictions and likelihood function
20.0,60,25. 21
37
20.0,52,25. 21 The simulated predictions and likelihood function
38
20.0,40,25. 21 The simulated predictions and likelihood function
39
20.0,28,25. 21 The simulated predictions and likelihood function
40
20.0,20,25. 21 The simulated predictions and likelihood function
41
Lo
g-L
ikel
iho
od
2
The total likelihood function
Optimization procedure used tofind the value *2 that maximizesthe likelihood
42
The “Model” can be anything used to predict data
Parameters Simulation Model
Controls
Observed DataYobs
Likelihood Function
Predicted DataYpred
Confrontation between Modeland Data
43
The Likelihood Function
The likelihood function used depends on the type of data/observations
Most stats books have Appendices listing distributions for particular data types, pdfs, expected values (means, variances), and random generation
The likelihood can only tell you what hypotheses/parameters are more likely than others. It cannot tell you the probability that a given hypothesis is true…that requires a Bayesian Approach