Download - Ten Lectures on Ecological Statistics

Ten Lectureson

Ecological Statistics

John [email protected]

Department of Statistical ScienceCornell University

1

mailto:[email protected]

An exampleMean number of eggs laid by birds nesting in a

forest: changing?Population: closed – no birth, death or

immigration/emigrationPopulation is the target of inference -- from

known (data) to unknown (population)Statistical epistemology

“All models are wrong, but some are useful” – George E. P. Box

2

3

Collect sample – finite-population; infinite-population definitions.Sampling unit; sample size.n = 41.

1, 2, 2, 0, 1, 2, 3, 5, 5, 0, 0, 4, 0, 1, 3, 3, 2, 4, 2, 7, 0, 3, 1, 2, 3, 2, 8, 3, 3, 2, 7, 2, 4, 1, 1, 6, 5, 0, 5, 1, 0

Probability model. Probability distribution controlled/determined by one or more parameters.Parameter: population. Statistic: data.

4

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

P(j), lambda = 2.5P(j), lambda = 1.0P(j), lambda = 5.0

Model: the Poisson distribution.(Siméon-Denis Poisson, 1781-1840)Assigns probability to nonnegative integersOne-parameter (λ>0) modelλ = mean of distribution – mean, median, mode

!)(

jejp

j

5

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

P(j), lambda = xbar = 2.585

data (proportion)

• How to fit model?• Statistical inference

• Parameter estimation• Hypothesis testing

• How to assess fit of model?

jdata

(count)data

(prop)

p(j) fitted 2.585

0 7 0.171 0.0751 7 0.171 0.1952 9 0.220 0.2523 7 0.171 0.2174 3 0.073 0.1405 4 0.098 0.0736 1 0.024 0.0317 2 0.049 0.0128 1 0.024 0.004

Statistic vs. parameterEstimator vs. estimateAssumptions:

data arise from Poisson distributionthe notion of i.i.d.:

Maximum likelihood estimate

MLE is optimal in several senses: consistent, efficient, asymptotically normal

6

59.2585366.2ˆ x

)(...~,,1 Poissondiixx n

Still not much use without error term:2.59 +/- ??Theory also provides standard error. Program to find SE:

1. Find theoretical variance of MLE2. Find empirical approximation to (1);3. Take square root of (2).

In our example, SE =

So we can write: 2.59 w/SE of 0.33.

7

329.0329404.041

109213.2

ns

Still not much use.Notion of confidence interval:

We are 95% confident that the true value lies in a certain range.Often:

estimate +/- 1.96*SE ≈ estimate +/- 2*SEExample:

≈ 2.59 +/- 2* 0.33 = 2.59 +/- 0.66= (1.93, 3.25) ≈ (1.940, 3.231)

We are 95% confident that the true mean # of eggs per nest lies in this range.

8

Claim: Historical norm for mean # of eggs/nest is 3.6. Question: Does new data contradict this?Hypothesis testing. Conceptual framework:Null hypothesis H0: situation unchanged, no difference, “nothing is happening.”Alternative hypothesis HA (or H1): difference from H0. “Hypothesis of interest.”

9

State of nature/decision

H0 HA

H0 Correct Type I errorHA Type II error Correct

How far is 2.59 from 3.6?Test statistic T measures distance of observed data from null hypothesis.

H0: mean μ = μ0 = 3.6HA: mean μ ≠ μ0 = 3.6.

Two-sided alternative.

What does -3.06 mean? Need null distribution of test statistic.

10

06.333.06.359.2

/00

nsx

SExT

In hypothesis testing, assume H0 true throughout. Test evaluates whether data is consistent with H0.Two approaches (overall equivalent):• Fixed-level: compare test statistic T to some cutoff value. If T > (<) cutoff, “reject” H0, otherwise “accept” or “fail to reject” H0. α = 0.05 “significant”; α = 0.01 “highly significant.” • P-value: compute p-value of test = probability, given H0 true, of observing data “as or more extreme” than what actually occurred.

11

12

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

T = -3.06

Standard normal (Gaussian) distribution

LH tail area = 0.001107p-value = 2*0.001107 = 0.002213 < .01 < .05

Reject H0.

Foregoing: frequentist, parametric analysis.

13

Philosophy/approach

Parametric Nonparametric

Frequentist Most “classical”: requires parametric model fit

Less efficient but does not require model fit

Bayesian Philosophically appealing, numerically more direct

Difficult or not obvious

Bayesian (parametric) statistical analysis: the notion of a prior distribution.Prior represents investigators prior belief or information, before performing experiment/collecting data, regarding value(s) of parameter(s).Subjective, objective Bayesianism. Elicitation of priors; noninformative or objective priors – Jeffreys’, reference.Modern Bayesian computation: MCMC etc.

14

Parametric Bayesian (point) estimationPoisson case: conjugate prior Γ(α,β)

Bayesian program: (1) establish prior; (2) collect data; (3) update prior based on data, to obtain posterior.

15

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

moderatediffusefocused

Posterior mode = 2.65≠ 2.59

16

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

diffuseposterior

95% highest posterior density (HPD) region; credible region≠ 95% confidence interval

Bayesian hypothesis testingAssign prior probabilities to null & alternative hypotheses. Noninformative/objective: 0.5.Collect data; compute posterior probability that each hypothesis is trueBayes factor: roughly, likelihood of data under H0 / likelihood of data under HA.(more advanced topic)

17

18

jdata

(count)P(j), lambda =

2.585 expectedchi-

squarep-

value0 7 0.075 3.090 17.050 0.0301 7 0.195 7.9892 9 0.252 10.3273 7 0.217 8.9004 3 0.140 5.7525 4 0.073 2.9746 1 0.031 1.2827 2 0.012 0.4738 1 0.004 0.153

>=9 0 0.001 0.058

Quantitative goodness-of-fit assessment for bird nesting dataNaïve chi-square test, 10 – 1 – 1 = 8 d.f.

Test accepts Poisson model @ level α = .01, rejects @ α = .05Actually problem with test: chi-square distribution of test statistic is asymptotic; requires cell counts >=5 (but see literature); fails in example.Alternative GOF tests are possible.

19

20

Philosophy/approach

Parametric Nonparametric

Frequentist Parameter estimation

Parameter estimation

Hypothesis testing

Hypothesis testing

Bayesian Parameter estimation

Parameter estimation

Hypothesis testing

Hypothesis testing

Will look @ classical nonparametrics in multiple-sample context

21

Predictor (independent) variable/response (dependent) variable

Qualitative Quantitative

Qualitative Contingency tablesΧ2 test; log-linear models

Two-sample tests; ANOVA

Quantitative Logistic regression, data mining, machine learning

Regression, linear models of various types

Permeability constants of human chorioamnion (a placental membrane) at term (X) and between 12 to 26 weeks gestational age (Y). Alternative of interest is greater permeability for term pregnancy.

22

X: term Y : age0.80 1.150.83 0.881.89 0.901.04 0.741.45 1.211.381.911.640.731.46

t-Test: Two-Sample, Unequal Variances X: term Y : age

Mean 1.313 0.976Variance 0.195 0.039Observations 10 5Hypothesized Mean Diff. 0df 13t Stat 2.041P(T<=t) one-tail 0.031t Critical one-tail 1.771P(T<=t) two-tail 0.062t Critical two-tail 2.160

group X Y X X Y Y X Y Y X X X X X X

value 0.73 0.74 0.80 0.83 0.88 0.90 1.04 1.15 1.21 1.38 1.45 1.46 1.64 1.89 1.91

rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

sum of X ranks

sum of Y ranks

90 30W35

One-sided p-value = 0.1272

Mann-Whitney-Wilcoxon (rank-sum) test

T-test: H0: μ1 = μ2

HA: μ1 ≠ μ2

W-test: H0: Δ = 0HA: Δ ≠ 0

Multiple populations or groupsANOVA: H0: μ1 = μ2 = … = μk

HA: not H0

Nonparametric version: Kruskal-Wallis testH0: no location shiftHA: not H0

SAS: PROC NPAR1WAYFollowups: multiple comparisons (for selection of the best)Simultaneous inference.

24

k confidence intervals or tests simultaneously:Use α/k.

Example: simultaneous 95% confidence intervals for 2 means (X & Y) in permeability example. k = 2; α = 0.05 (95% = 100*(1-.05) = 100*(1- α).

α/2 = 0.05/2 = 0.025, so use 100*(1- α/2)% = 97.5% confidence intervals for simultaneous 95% confidence.z = 2.2414: use est +/- 2.2414*SE

25

Qualitative-qualitative: contingency tables.The 1894-96 Calcutta cholera study

26

infected not infectedInoculated 3 276 279

not inoculated 66 473 53969 749 818

infected not infectedinoculated 23.53 255.47 279

not inoculated 45.47 493.53 53969 749 818

17.92 1.65 29.709.27 0.85 5.05481E-08

Χ2

p

Quantitative-quantitative: Linear regression

27

10 20 30 40 50 60 70 80100

120

140

160

180

200

220

240

f(x) = 0.970870351442724 x + 98.7147181382184R² = 0.432394731927596

age

syst0lic bp

),0(...~ 2

10

Ndii

xy

i

ii Basic model equation: Fitted equation: xy 10

ˆˆˆ

Basic hypothesis test:

0:0:

1

10

AHH

28

Regression StatisticsMultiple R 0.658R Square 0.432

Adjusted R Square 0.412Standard Error 17.314Observations 30.000ANOVA

df SS MS FSignifica

nce FRegression 1.000 6394.023 6394.023 21.330 0.000Residual 28.000 8393.444 299.766Total 29.000 14787.467

Coefficien

tsStandard

Error t Stat P-valueLower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept 98.715 10.000 9.871 0.000 78.230 119.200 78.230 119.200age 0.971 0.210 4.618 0.000 0.540 1.401 0.540 1.401

29

Regression StatisticsMultiple R 0.668R Square 0.447Adjusted R Square 0.406Standard Error 17.405Observations 30.000ANOVA

df SS MS FSignificanc

e F

Regression 2.000 6608.3293304.1

6510.90

7 0.000

Residual 27.000 8179.137302.93

1Total 29.000 14787.467

Coefficie

ntsStandard

Error t StatP-

value Lower 95%Upper 95%

Lower 95.0%

Upper 95.0%

Intercept 94.306 11.338 8.318 0.000 71.043 117.569 71.043 117.569age 0.949 0.213 4.456 0.000 0.512 1.386 0.512 1.386X2 0.929 1.105 0.841 0.408 -1.338 3.196 -1.338 3.196

Multiple regressionSimultaneous inferenceLarge p, small n problems

Estimating biodiversity: species richness from abundance dataICoMM datasetABR 0005 2005 01 07Application of the 454 technology to active-but-rare biosphere in the oceans: large-scale basin-wide comparison in the Pacific Ocean(Hamasaki & Taniguchi)

30

freq count freq count freq count freq count1 2013 23 1 56 1 165 12 416 25 3 57 1 173 13 173 27 2 59 1 191 14 85 28 1 71 1 195 15 63 29 3 73 1 201 16 43 30 1 76 1 202 17 39 31 3 80 1 208 18 18 32 1 84 1 223 19 24 33 1 85 1 225 1

10 8 34 1 93 1 233 111 17 35 1 94 1 254 112 8 36 1 114 1 319 213 6 38 2 119 1 328 114 3 40 1 122 1 548 115 4 42 1 123 2 560 116 6 43 1 131 1 675 117 9 46 1 148 1 1036 118 2 48 1 150 1 1361 120 6 53 2 154 1 1526 122 4 54 1 155 1 1784 1

0 50 100 150 200 2500

500

1000

1500

2000

Observed

Best--ThreeMixedExp/Tau 119

Frequency

Coun

ts

31

32

Total Number of Observed Species = 3018 Model Tau Observed Sp

Estimated Total Sp SE Lower CB Upper CB GOF0 GOF5

Best Model ThreeMixedExp 119 2989 15369 1322 13037 18243 0.01 0.32

Model 2a ThreeMixedExp 71 2980 16032 1615 13231 19600 0.07 0.09

Model 2b FourMixedExp 254 3008 16245 1833 13111 20352 0.00 0.08

Model 2c TwoMixedExp 13 2913 16604 1607 13802 20134 0.00 0.01

Non-P 1 Chao1 10 2882 7888 330 7283 8580

Non-P 2 ACE1 10 2882 13519 777 12104 15156

Parm Max Tau ThreeMixedExp 1784 3018 13476 810 12005 15188 0

Non-P Max Tau ACE1 1784 3018 12227106 4012967 6532741 22887348

Analysis output from CatchAll

• True (population) vs. observed (sample) richness• Sample size, bias, & standard error

• Asymptotic normality• Parametric vs. coverage-based nonparametric estimation

• Bayesian methods• Nonparametric maximum likelihood estimation• Linear modeling of ratios of successive frequency counts

• Standard error computation• The issue of τ

33

Comparison of two populations

34

A B

Jaccard index: Sorensen index:BABA

BABA

2

As stated: sample-based.If separate estimates of true, population |A|, |B| & |A∩B| are available, can estimate Jaccard (& Sorensen)

35

BA

ii

BAii

Bk

Aj

BAii

ba

ba

ba

ba1

),min(2

B B

k

A A

j

BA B

i

A

i

nb

na

nb

na

22

2

Bray-Curtis:

Morisita-Horn:

ai = abundance of ith species in population Abi = abundance of ith species in population BBased on existing sample (abundance) data

36

A B0 10

13 00 12

12 67 01 10

18 1014 017 20

0 145 0

15 29 20 3

16 0

Sample-based:J = 0.375B-C = 0.351852

ai bi

There exist adjusted versions of Jaccard & Sorensen which do (attempt to) account for unseen species: see SPADE; Chao/Chazdon/Colwell/Shen (2005).

37

species/occasion 1 2 3 4 5 6freq1 0 1 0 0 1 0 22 0 1 0 0 0 1 23 0 0 1 0 0 0 14 1 0 0 1 0 0 25 0 0 0 1 0 0 16 1 1 0 0 1 1 47 0 0 0 1 0 0 18 0 0 1 0 1 1 39 1 1 1 0 1 0 4

10 0 1 1 0 1 0 311 0 1 0 0 0 1 212 1 0 0 1 0 0 213 0 0 0 0 1 0 114 0 1 1 0 0 1 315 1 0 0 1 0 0 2

Incidence data (vs. abundance)

Example: 6 “occasions” (samples, etc.);15 total species observed.

Only presence/absence recorded (not abundance).

Also capture-recapture, multiple recapture, etc.

Models: M0, Mh, Mt, Mb, Mth, Mtb, Mtbh

38

M(h) - 2-point finite mixture

Population size (CI) 15.7 (15.0 - 1567.9)Capture probability 0.3508 (per occasion)Capture probability 0.9251 (overall)Npar 4Log likelihood -31.079AIC 70.157AICc 74.157

M(0) - ML estimator (Otis et al. 1978)

Population size (CI) 16.0 ± 1.4 (15.0 - 20.0)Capture probability 0.3438 (per occasion)Capture probability 0.9201 (overall)Npar 2Log likelihood -59.003AIC 122.005AICc 123.005

0: null model, homogeneous capture probabilities

h: heterogeneous capture probabilities

t: time (occasion, sample) effect

b: “behavioral” effect

Program DENSITY

39

Multivariate data: Principal components analysis

x1 x2 x30.553 0.710 0.5440.846 0.793 0.8070.374 0.474 0.5530.563 0.570 0.3760.495 0.634 0.6640.322 0.440 0.6170.551 0.373 0.2200.616 0.508 0.5240.780 0.915 1.0030.109 0.175 0.083

0.000 0.200 0.400 0.600 0.800 1.0000.0000.1000.2000.3000.4000.5000.6000.7000.8000.9001.000

x1

x2

0.000 0.200 0.400 0.600 0.800 1.0000.000

0.200

0.400

0.600

0.800

1.000

1.200

x1

x3

0.000 0.200 0.400 0.600 0.800 1.0000.000

0.200

0.400

0.600

0.800

1.000

1.200

x2

x3 x1 x2 x3

x1 1.000

x2 0.854 1.000

x3 0.687 0.889 1.000

• PCA creates a new coordinate system, i.e, a new set of variables• New coordinate system is orthogonal• New variables are linear combinations of original variables• New variables capture variance of original variables (data) in optimal way• Dimension reduction: often possible to use only a few (e.g., 2) principal components instead of original variables, but still capture most of the information (variance) in data.

40

41

F1 F2 F30

0.5

1

1.5

2

2.5

3

0

20

40

60

80

100

Scree plot

axis

Eige

nval

ue

Cum

ulati

ve v

aria

bilit

y (%

)

-4 -3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

Obs1

Obs2

Obs3

Obs4

Obs5

Obs6

Obs7

Obs8

Obs9Obs10

Observations (axes F1 and F2: 97.91 %)

F1 (87.44 %)

F2 (1

0.47

%)

F1 F2 F3Eigenvalue 2.62 0.31 0.06Variability (%) 87.44 10.47 2.09Cumulative % 87.44 97.91 100.00

42

x1 x2 x3 x4 x50.553 0.710 0.544 0.916 0.6590.846 0.793 0.807 0.911 0.9580.374 0.474 0.553 0.277 0.5520.563 0.570 0.376 0.377 0.2630.495 0.634 0.664 0.581 0.8240.322 0.440 0.617 0.426 0.6010.551 0.373 0.220 0.415 0.1130.616 0.508 0.524 0.740 0.4650.780 0.915 1.003 0.845 1.0730.109 0.175 0.083 0.407 0.326

x1 x2 x3 x4 x5x1 1.000x2 0.854 1.000x3 0.687 0.889 1.000x4 0.713 0.752 0.614 1.000x5 0.526 0.809 0.919 0.677 1.000

F1 F2 F3 F4 F5Eigenvalue 3.99 0.57 0.35 0.06 0.03Variability (%) 79.76 11.38 7.05 1.20 0.61Cumulative % 79.76 91.14 98.19 99.39 100.00

F1 F2 F3 F4 F50

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0

20

40

60

80

100

Scree plot

axis

Eige

nval

ue

Cum

ulati

ve v

aria

bilit

y (%

)

43

-4 -3 -2 -1 0 1 2 3 4-3

-2

-1

0

1

2

3

Obs1Obs2

Obs3

Obs4

Obs5

Obs6

Obs7

Obs8

Obs9Obs10

Observations (axes F1 and F2: 91.14 %)

F1 (79.76 %)

F2 (1

1.38

%)

Multidimensional scaling• Represents N data points in low (2 or 3) –dimensional space.• Visualization technique

• Data analysis (exploration, description) vs. statistical inference

• Input is dissimilarity or distance matrix• Attempts to reproduce given distances with minimum stress• Metric: preserves distances as closely as possible• Nonmetric: preserves order of distances as closely

as possible44

45

A B C D E F G H

A 0.000 0.372 0.300 0.289 0.120 0.015 0.885 0.444

B 0.372 0.000 0.351 0.128 0.000 0.353 0.301 0.102

C 0.300 0.351 0.000 0.772 0.000 0.889 0.087 0.648

D 0.289 0.128 0.772 0.000 0.000 0.051 0.019 0.042

E 0.120 0.000 0.000 0.000 0.000 0.012 0.000 0.856

F 0.015 0.353 0.889 0.051 0.012 0.000 0.680 0.556

G 0.885 0.301 0.087 0.019 0.000 0.680 0.000 0.113

H 0.444 0.102 0.648 0.042 0.856 0.556 0.113 0.000

Distance matrix:Symmetric, zeroes on diagonalDistances may be derived from, e.g., community

comparison metrics etc.Be careful of similarity vs. dissimilarity (distance)

46

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

-0.3

-0.2

-0.1

5.55111512312578E-17

0.1

0.2

0.3

A

BC D

E

F

G

H

Configuration (Kruskal's stress (1) = 0.569)

Dim1

Dim

2

Dimensions 2Kruskal's stress (1) 0.569Iterations 16

Convergence 0.000

Absolute MDS

47

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

A

B

C

D

E

F

G

H


Dim1

Dim

2

Dimensions 2Kruskal's stress (1) 0.134Iterations 86

Convergence 0.000

Non-metric MDS

48

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

-0.3

-0.2

-0.1

5.55111512312578E-17

0.1

0.2

0.3

A

BC D

E

F

G

H


Dim1

Dim

2

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

A

B

C

D

E

F

G

H


Dim1

Dim

2

Absolute MDS

Non-metric MDS

Invariant to rotation & symmetry

Download - Ten Lectures on Ecological Statistics

Top Related