simulation modelling in natural resource management

1

SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

Photo: Kendra Holt

For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i) <= min_L Then min_L = neg_log_L(i) Ebest = i End IfNext i

for (i=1;i<=nobs;i++){ ad_begin_funnel(); x=S(i); y=R(i); lim1=x*mfexp(-6*sx); lim2=x*mfexp(8*sx); p=adromb(&model_parameters::fz,lim1,lim2,nsteps); resid+=log(p+1.e-300);}

2

Statistical Simulation

Statistics

Power analysis

Bootstrapping

Randomization

Simulation/Estimation for testing statistical methods

Fitting models to data

3

Statistics

The ability to simplify means to eliminate the unnecessary so that the necessary may speak

Hans Hoffman

Unfortunately, acquisition of statistical knowledge is blocked by a formidable wall of mathematics

It is equally unfortunate that much of this mathematical basis relies on relatively strong assumptions about such things as large sample sizes and asymptotic normality

On the positive side, modern computing power allows us to relax many of these assumptions, particularly those that commonly arise in natural resource applications

4

Statistics

Three main objectives of Statistics:

1. How should data be collected in order to:a. test hypotheses?b. estimate parameters for a model?c. make decisions?

2. How should data be summarized and analyzed?

3. How accurate are the summaries and analyses?Do they adequately reflect the “truth”?

Data Collection

Analysis

Inference

5

Statistical Inference

A sample of 15 (X,Y) pairs from the universe of 100

possible pairs

6

Statistical Inference

A sample of 15 (X,Y) pairs from the universe of 100

possible pairs

The complete universe of 100 possible pairs

“The Truth”

Statistics tries to infer the picture on the right from the picture on the left

7

X = {76.3, 74.2, 90.2, 83.9, 89.8, 89.1, 94.3, 62.3, 87.9}

mean(X) = 96.7sd(X) = 10.6

How accurate is the mean of X?

Accuracy formula for the mean (and only the “mean”)

53.39

6.10

..

2

2

n

ses

Statistical Inference – the mean

8

Bootstrapping

That was easy! But try that for the correlation coefficient

A statistical formula might be available, but it will require assumptions about the distribution of the data

9

Bootstrapping – what’s the big idea?

X=(x1,x2,…xn) Original Dataset

X*1 X*2 X*3 Bootstrap samples

s(X*1) s(X*2) s(X*3)Bootstrap replications of the function s(X*)

There is no real limit to what the function s() represents!!

10

Bootstrapping

There is no real limit to what the function s() represents!!

For example, s() could be the mean or other summary statistic. It could be a correlation coefficient, a regression estimate, parameter estimates for a non-linear model

The data could be multivariate in which more than one response variable is observed for each sample. Thus, s() could be a Principal Component statistic.

Most, if not all, statistical functions are automatically generated in software like R, SAS, Systat, S-plus, etc. Therefore, you can make inferences just by (1) resampling the data, (2) repeating the analysis, (3) summarizing your estimates, and (4) drawing inferences. All without much mathematical or statistical training.

11

Bootstrapping - Example

Suppose we wish to infer the regression slope for our hypothetical universe of possible data using only the sample

1. Resample the 12 data points 1000 times

2. Compute the regression slope for each

3. Compute the mean

4. Compute 95% confidence limits by a. sorting the slope estimates b. L95 is 25th estimate c. U95 is 975th estimate

12

Bootstrapping – R code for regression#initialise vector to hold slope estimatesslope<-0.#generate 1000 bootstrap replications of X,Y, slopefor(i in 1:1000){ #randomly choose nobs = 12 index values boot<-sample(seq(1, nobs, 1), size = nobs, replace=T) #create X,Y bootstrap pair Xboot<-X[boot] Yboot<-Y[boot] #perform regression using lm() function reg.boot<-lm(Yboot~Xboot) #extract slope from resulting list slope.boot[i]<-reg.boot$coeff[2]}#generate histogramhist(slope.boot, main="")#generate summary statsprint(mean(slope.boot))slope<-sort(slope.boot)L95<-slope.boot [25]; U95<-slope.boot [975]print(c(L95,U95))

Bootstrap reps of X*,Y*

The function s(X,Y)

The distribution F *

The inference

13

Bootstrapping - Example

67.0ˆ

42.0ˆ

56.0ˆ

50.0

975.0

025.0

boot

true

Original Data The Distribution F *

14

Model fitting

Curve fitting and Model fitting including some important definitions

Probability models and Likelihood Functions

Likelihood functions that obey constraints

“Safe” parameterization of non-linear models

Linear vs. non-linear estimation

Estimation

15

Curve fitting

A toxicologist has performed an experiment on accumulation of a chemicalpollutant in tissue where she obtained the following data:

She selects from a class of functions, e.g.,

mmBBBC 2

210

She then wishes to summarize this data into a more comprehensible and reduced form.

16

Curve fitting

A toxicologist has performed an experiment on accumulation of a chemicalpollutant in tissue where she obtained the following data:

This is called CURVE FITTING

She then wishes to summarize this data into a more comprehensible and reduced form.

She selects from a class of functions, e.g.,

the curve and parameters that best fit her data.

mmBBBC 2

210

17

Curve fitting

Curve fitting typically involves polynomials

mmBBBC 2

210ˆ

Values for the parameters 0, 1,… m, are chosen so as to get the best possible fit to the data

In curve fitting, the most common objective function used to judge the fit between curve and data is a least-squares criterion

2

1 0

2

1

ˆ

n

j

m

i

ijij

n

jjj xCCCSS

predictedobserved predictedobserved

18

Curve fitting

Curve fitting involves two levels of arbitrariness:

1. The function used to predict the data is arbitrary, being dictated only to a minor extent by the process from which the data came

2. The best fit criterion is arbitrary, being independent of statistical considerations (e.g., sums-of-squares is not probability distribution?)

19

Curve fitting

Curve fitting as described is also easy because equations like:

are linear functions of the parameters that can solved analytically

To see why this is linear, examine the independent variables in tabular or experimental design form

mmBBBC 2

210

X0 X1 X2 … … Xm

1 B B2 … … Bm

20

Curve fitting

The original equation can now be rewritten from

to

mmXXXC 22110

mmBBBC 2

210

which is just a linear model (e.g., multiple linear regression) that can be solved by setting all derivatives

0id

dSS

And solving the set of simultaneous linear equations for

21

Model fitting

Suppose now that our toxicologist friend is familiar with the biophysical laws governing accumulation of contaminants in tissue

B

BC

2

1

She may then choose to derive an equation that obeys these laws, e.g.,

The parameters 1 and 2 now have a biophysical interpretation

maximum thehalf reaches

t contaminanat which sizebody

constant scalingt contaminan

2

1

The function also obeys the biological constraint C ≥ 0.

22

Model fitting

When an equation is derived based on theoretical considerations, the procedure of finding the best fitting parameters is called MODEL FITTING

She may choose similar goodness of fit criteria, but the form of the equation is no longer guided by computational convenience

B

BC

2

1

In this case, the model

is no longer linear in the parameters

Computing the “best-fit” must involve some numerical procedure

23

Model estimation

Because parameters from model fitting have some natural interpretation, we may wish to ask: “What is the true value of 2 in nature?”

The imprecise nature of the measurements means that we can never answer this question with absolute certainty

Also, if she performed this experiment on a new set of subjects, she may get a different best-fitting 2 value

The process of finding parameter values that

a. Fit the data wellb. Come close on average to the true valuesc. Do not vary excessively from one experiment to the next

is called MODEL ESTIMATION

Model estimation is a critical component of Simulation Modelling!

24

Statistical estimation

Determining the parameters of a probability distribution is calledSTATISTICAL ESTIMATION

with the usual estimates

Statistical estimation is also a critical component of Simulation Modelling!

For example, the observed value of a random variable h may be the height of trees from an even-aged stand of lodgepole pine

If we assume that h has a normal distribution with mean h0 and standard deviation , then the probability density function (pdf) of h is

2

022 2

1exp

2

1)( hhhp

n

ii

n

ii hh

nh

nh

1

20

2

10 1

1 and

1

STATISTICAL ESTIMATION

25

Parameter estimation

Model estimation can be combined with Statistical estimation in the following way:

Suppose the measured concentration of pollutant Ci taken at body size Bi is a random variable whose mean is given by

i

ii B

BC

2

1

If many measurements were taken at body size Bi we would expect C values to fluctuate around this mean with standard deviation .

If we assume that these variations have a normal distribution, then the probability density function for Ci is

2

2

122 2

1exp

2

1)(

i

iii B

BCCp

26


Bi

Ci

i

ii B

BC

2

1ˆ

2

2

122 2

1exp

2

1)(

i

iii B

BCCp

Each value of Ci will have a mean that depends on Bi and the spread of Ci values depends on standard deviation

27


For several measurements (Ci, Bi) where i={1,2,…n}, we have a pdf for each:

28


,, 21

Although each value of Ci will have a mean that depends on Bi, the set of parameters

are common to all measurements!

Estimating parameters that are common to both the model and the probability density function of the observations is called PARAMETER ESTIMATION

Because parameter estimation encompasses all other forms of estimation, it is the most critical component of Simulation Modelling!

29

Simulation vs. Estimation

Simulation generates predicted values of state variables (observations) from a known set of parameter values

Initial Conditionst=0

Spring Juvenile Production

Summer Juvenile Survival

Update Fall Abundance

Calculate Harvest

Winter Survival

Adult Spring Abundance

End Yes

Hunting effortyr t

Set t=t+1

t=T? No

Echo parameter inputOutput initial conditions

Output results yr t

Input parameters of function equationsParameters in:

Observations out:

30

The key elements of simulation models are Parameters, State Variables, and Controls

We can use the notation:

to state that The Simulation Problem is to go from known Parameters and Controls to unknown State Variables:

YZΘ ,

Parameters Θ Survival, production rates, error variances, etc.

Controls Z Hunting effort, harvest rates

State Variables* Y Duck abundance, or index

*”State Variable” in this context includes a measurement of state such as index. Inthat case, the parameter set includes parameters of the measurement system


31


Simulation generates predicted values of state variables (observations) from a known set of parameter values

Initial Conditionst=0

Spring Juvenile Production

Summer Juvenile Survival

Update Fall Abundance

Calculate Harvest

Winter Survival

Adult Spring Abundance

End Yes

Hunting effortyr t

Set t=t+1

t=T? No

Echo parameter inputOutput initial conditions

Output results yr t

Input parameters of function equationsParameters in:

Observations out:

Θ

Z

Y

32


Estimation is concerned with finding the models and parameter values that generate the observed data.

The Estimation Problem is to go from known (observed) State Variables and Controls to unknown Parameters:

ΘZY ,In this case the “known” state variables Y represent the data or observations

33


YZΘ ,

Simulation

ΘZY ,

Estimation

34

Yeah, but how does estimation work?

For the contaminant problem, we have

State Variables eCY Controls BZ

Model

i

ii Z

ZY

2

1

,, 21Parameters

35

ΘZY ,Estimation

Yeah, but how does estimation work?

ii YZΘ ˆ,

2. Use simulation step to “predict” state variables given parameters at current iteration

1. Guess initial parameters at iteration i=00iΘ

3. Calculate likelihood that data would have arisen if these parameters were true

n

i i

ii Z

ZYp

1

2

2

122 2

1exp

2

1)(

ZΘ,|Y

4. Repeat 1-3 after adjusting parameters in “best-fit” direction

36

The simulated predictions and likelihood function

20.0,60,25. 21

37

20.0,52,25. 21 The simulated predictions and likelihood function

38


39


40


41

Lo

g-L

ikel

iho

od

2

The total likelihood function

Optimization procedure used tofind the value *2 that maximizesthe likelihood

42

The “Model” can be anything used to predict data

Parameters Simulation Model

Controls

Observed DataYobs

Likelihood Function

Predicted DataYpred

Confrontation between Modeland Data

43

The Likelihood Function

The likelihood function used depends on the type of data/observations

Most stats books have Appendices listing distributions for particular data types, pdfs, expected values (means, variances), and random generation

The likelihood can only tell you what hypotheses/parameters are more likely than others. It cannot tell you the probability that a given hypothesis is true…that requires a Bayesian Approach

simulation modelling in natural resource management

Documents

regression slope

data points

statistical training

statistical functions

regression estimate

mean of x

function s

possible pairsthe truth