Download - Ten Lectures on Ecological Statistics
Ten Lectureson
Ecological Statistics
John [email protected]
Department of Statistical ScienceCornell University
1
An exampleMean number of eggs laid by birds nesting in a
forest: changing?Population: closed – no birth, death or
immigration/emigrationPopulation is the target of inference -- from
known (data) to unknown (population)Statistical epistemology
“All models are wrong, but some are useful” – George E. P. Box
2
3
Collect sample – finite-population; infinite-population definitions.Sampling unit; sample size.n = 41.
1, 2, 2, 0, 1, 2, 3, 5, 5, 0, 0, 4, 0, 1, 3, 3, 2, 4, 2, 7, 0, 3, 1, 2, 3, 2, 8, 3, 3, 2, 7, 2, 4, 1, 1, 6, 5, 0, 5, 1, 0
Probability model. Probability distribution controlled/determined by one or more parameters.Parameter: population. Statistic: data.
4
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
P(j), lambda = 2.5P(j), lambda = 1.0P(j), lambda = 5.0
Model: the Poisson distribution.(Siméon-Denis Poisson, 1781-1840)Assigns probability to nonnegative integersOne-parameter (λ>0) modelλ = mean of distribution – mean, median, mode
!)(
jejp
j
5
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
P(j), lambda = xbar = 2.585
data (proportion)
• How to fit model?• Statistical inference
• Parameter estimation• Hypothesis testing
• How to assess fit of model?
jdata
(count)data
(prop)
p(j) fitted 2.585
0 7 0.171 0.0751 7 0.171 0.1952 9 0.220 0.2523 7 0.171 0.2174 3 0.073 0.1405 4 0.098 0.0736 1 0.024 0.0317 2 0.049 0.0128 1 0.024 0.004
Statistic vs. parameterEstimator vs. estimateAssumptions:
data arise from Poisson distributionthe notion of i.i.d.:
Maximum likelihood estimate
MLE is optimal in several senses: consistent, efficient, asymptotically normal
6
59.2585366.2ˆ x
)(...~,,1 Poissondiixx n
Still not much use without error term:2.59 +/- ??Theory also provides standard error. Program to find SE:
1. Find theoretical variance of MLE2. Find empirical approximation to (1);3. Take square root of (2).
In our example, SE =
So we can write: 2.59 w/SE of 0.33.
7
329.0329404.041
109213.2
ns
Still not much use.Notion of confidence interval:
We are 95% confident that the true value lies in a certain range.Often:
estimate +/- 1.96*SE ≈ estimate +/- 2*SEExample:
≈ 2.59 +/- 2* 0.33 = 2.59 +/- 0.66= (1.93, 3.25) ≈ (1.940, 3.231)
We are 95% confident that the true mean # of eggs per nest lies in this range.
8
Claim: Historical norm for mean # of eggs/nest is 3.6. Question: Does new data contradict this?Hypothesis testing. Conceptual framework:Null hypothesis H0: situation unchanged, no difference, “nothing is happening.”Alternative hypothesis HA (or H1): difference from H0. “Hypothesis of interest.”
9
State of nature/decision
H0 HA
H0 Correct Type I errorHA Type II error Correct
How far is 2.59 from 3.6?Test statistic T measures distance of observed data from null hypothesis.
H0: mean μ = μ0 = 3.6HA: mean μ ≠ μ0 = 3.6.
Two-sided alternative.
What does -3.06 mean? Need null distribution of test statistic.
10
06.333.06.359.2
/00
nsx
SExT
In hypothesis testing, assume H0 true throughout. Test evaluates whether data is consistent with H0.Two approaches (overall equivalent):• Fixed-level: compare test statistic T to some cutoff value. If T > (<) cutoff, “reject” H0, otherwise “accept” or “fail to reject” H0. α = 0.05 “significant”; α = 0.01 “highly significant.” • P-value: compute p-value of test = probability, given H0 true, of observing data “as or more extreme” than what actually occurred.
11
12
-4 -3 -2 -1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
T = -3.06
Standard normal (Gaussian) distribution
LH tail area = 0.001107p-value = 2*0.001107 = 0.002213 < .01 < .05
Reject H0.
Foregoing: frequentist, parametric analysis.
13
Philosophy/approach
Parametric Nonparametric
Frequentist Most “classical”: requires parametric model fit
Less efficient but does not require model fit
Bayesian Philosophically appealing, numerically more direct
Difficult or not obvious
Bayesian (parametric) statistical analysis: the notion of a prior distribution.Prior represents investigators prior belief or information, before performing experiment/collecting data, regarding value(s) of parameter(s).Subjective, objective Bayesianism. Elicitation of priors; noninformative or objective priors – Jeffreys’, reference.Modern Bayesian computation: MCMC etc.
14
Parametric Bayesian (point) estimationPoisson case: conjugate prior Γ(α,β)
Bayesian program: (1) establish prior; (2) collect data; (3) update prior based on data, to obtain posterior.
15
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
moderatediffusefocused
Posterior mode = 2.65≠ 2.59
16
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
diffuseposterior
95% highest posterior density (HPD) region; credible region≠ 95% confidence interval
Bayesian hypothesis testingAssign prior probabilities to null & alternative hypotheses. Noninformative/objective: 0.5.Collect data; compute posterior probability that each hypothesis is trueBayes factor: roughly, likelihood of data under H0 / likelihood of data under HA.(more advanced topic)
17
18
jdata
(count)P(j), lambda =
2.585 expectedchi-
squarep-
value0 7 0.075 3.090 17.050 0.0301 7 0.195 7.9892 9 0.252 10.3273 7 0.217 8.9004 3 0.140 5.7525 4 0.073 2.9746 1 0.031 1.2827 2 0.012 0.4738 1 0.004 0.153
>=9 0 0.001 0.058
Quantitative goodness-of-fit assessment for bird nesting dataNaïve chi-square test, 10 – 1 – 1 = 8 d.f.
Test accepts Poisson model @ level α = .01, rejects @ α = .05Actually problem with test: chi-square distribution of test statistic is asymptotic; requires cell counts >=5 (but see literature); fails in example.Alternative GOF tests are possible.
19
20
Philosophy/approach
Parametric Nonparametric
Frequentist Parameter estimation
Parameter estimation
Hypothesis testing
Hypothesis testing
Bayesian Parameter estimation
Parameter estimation
Hypothesis testing
Hypothesis testing
Will look @ classical nonparametrics in multiple-sample context
21
Predictor (independent) variable/response (dependent) variable
Qualitative Quantitative
Qualitative Contingency tablesΧ2 test; log-linear models
Two-sample tests; ANOVA
Quantitative Logistic regression, data mining, machine learning
Regression, linear models of various types
Permeability constants of human chorioamnion (a placental membrane) at term (X) and between 12 to 26 weeks gestational age (Y). Alternative of interest is greater permeability for term pregnancy.
22
X: term Y : age0.80 1.150.83 0.881.89 0.901.04 0.741.45 1.211.381.911.640.731.46
t-Test: Two-Sample, Unequal Variances X: term Y : age
Mean 1.313 0.976Variance 0.195 0.039Observations 10 5Hypothesized Mean Diff. 0df 13t Stat 2.041P(T<=t) one-tail 0.031t Critical one-tail 1.771P(T<=t) two-tail 0.062t Critical two-tail 2.160
group X Y X X Y Y X Y Y X X X X X X
value 0.73 0.74 0.80 0.83 0.88 0.90 1.04 1.15 1.21 1.38 1.45 1.46 1.64 1.89 1.91
rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sum of X ranks
sum of Y ranks
90 30W35
One-sided p-value = 0.1272
Mann-Whitney-Wilcoxon (rank-sum) test
T-test: H0: μ1 = μ2
HA: μ1 ≠ μ2
W-test: H0: Δ = 0HA: Δ ≠ 0
Multiple populations or groupsANOVA: H0: μ1 = μ2 = … = μk
HA: not H0
Nonparametric version: Kruskal-Wallis testH0: no location shiftHA: not H0
SAS: PROC NPAR1WAYFollowups: multiple comparisons (for selection of the best)Simultaneous inference.
24
k confidence intervals or tests simultaneously:Use α/k.
Example: simultaneous 95% confidence intervals for 2 means (X & Y) in permeability example. k = 2; α = 0.05 (95% = 100*(1-.05) = 100*(1- α).
α/2 = 0.05/2 = 0.025, so use 100*(1- α/2)% = 97.5% confidence intervals for simultaneous 95% confidence.z = 2.2414: use est +/- 2.2414*SE
25
Qualitative-qualitative: contingency tables.The 1894-96 Calcutta cholera study
26
infected not infectedInoculated 3 276 279
not inoculated 66 473 53969 749 818
infected not infectedinoculated 23.53 255.47 279
not inoculated 45.47 493.53 53969 749 818
17.92 1.65 29.709.27 0.85 5.05481E-08
Χ2
p
Quantitative-quantitative: Linear regression
27
10 20 30 40 50 60 70 80100
120
140
160
180
200
220
240
f(x) = 0.970870351442724 x + 98.7147181382184R² = 0.432394731927596
age
syst0lic bp
),0(...~ 2
10
Ndii
xy
i
ii Basic model equation: Fitted equation: xy 10
ˆˆˆ
Basic hypothesis test:
0:0:
1
10
AHH
28
Regression StatisticsMultiple R 0.658R Square 0.432
Adjusted R Square 0.412Standard Error 17.314Observations 30.000ANOVA
df SS MS FSignifica
nce FRegression 1.000 6394.023 6394.023 21.330 0.000Residual 28.000 8393.444 299.766Total 29.000 14787.467
Coefficien
tsStandard
Error t Stat P-valueLower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept 98.715 10.000 9.871 0.000 78.230 119.200 78.230 119.200age 0.971 0.210 4.618 0.000 0.540 1.401 0.540 1.401
29
Regression StatisticsMultiple R 0.668R Square 0.447Adjusted R Square 0.406Standard Error 17.405Observations 30.000ANOVA
df SS MS FSignificanc
e F
Regression 2.000 6608.3293304.1
6510.90
7 0.000
Residual 27.000 8179.137302.93
1Total 29.000 14787.467
Coefficie
ntsStandard
Error t StatP-
value Lower 95%Upper 95%
Lower 95.0%
Upper 95.0%
Intercept 94.306 11.338 8.318 0.000 71.043 117.569 71.043 117.569age 0.949 0.213 4.456 0.000 0.512 1.386 0.512 1.386X2 0.929 1.105 0.841 0.408 -1.338 3.196 -1.338 3.196
Multiple regressionSimultaneous inferenceLarge p, small n problems
Estimating biodiversity: species richness from abundance dataICoMM datasetABR 0005 2005 01 07Application of the 454 technology to active-but-rare biosphere in the oceans: large-scale basin-wide comparison in the Pacific Ocean(Hamasaki & Taniguchi)
30
freq count freq count freq count freq count1 2013 23 1 56 1 165 12 416 25 3 57 1 173 13 173 27 2 59 1 191 14 85 28 1 71 1 195 15 63 29 3 73 1 201 16 43 30 1 76 1 202 17 39 31 3 80 1 208 18 18 32 1 84 1 223 19 24 33 1 85 1 225 1
10 8 34 1 93 1 233 111 17 35 1 94 1 254 112 8 36 1 114 1 319 213 6 38 2 119 1 328 114 3 40 1 122 1 548 115 4 42 1 123 2 560 116 6 43 1 131 1 675 117 9 46 1 148 1 1036 118 2 48 1 150 1 1361 120 6 53 2 154 1 1526 122 4 54 1 155 1 1784 1
0 50 100 150 200 2500
500
1000
1500
2000
Observed
Best--ThreeMixedExp/Tau 119
Frequency
Coun
ts
31
32
Total Number of Observed Species = 3018 Model Tau Observed Sp
Estimated Total Sp SE Lower CB Upper CB GOF0 GOF5
Best Model ThreeMixedExp 119 2989 15369 1322 13037 18243 0.01 0.32
Model 2a ThreeMixedExp 71 2980 16032 1615 13231 19600 0.07 0.09
Model 2b FourMixedExp 254 3008 16245 1833 13111 20352 0.00 0.08
Model 2c TwoMixedExp 13 2913 16604 1607 13802 20134 0.00 0.01
Non-P 1 Chao1 10 2882 7888 330 7283 8580
Non-P 2 ACE1 10 2882 13519 777 12104 15156
Parm Max Tau ThreeMixedExp 1784 3018 13476 810 12005 15188 0
Non-P Max Tau ACE1 1784 3018 12227106 4012967 6532741 22887348
Analysis output from CatchAll
• True (population) vs. observed (sample) richness• Sample size, bias, & standard error
• Asymptotic normality• Parametric vs. coverage-based nonparametric estimation
• Bayesian methods• Nonparametric maximum likelihood estimation• Linear modeling of ratios of successive frequency counts
• Standard error computation• The issue of τ
33
Comparison of two populations
34
A B
Jaccard index: Sorensen index:BABA
BABA
2
As stated: sample-based.If separate estimates of true, population |A|, |B| & |A∩B| are available, can estimate Jaccard (& Sorensen)
35
BA
ii
BAii
Bk
Aj
BAii
ba
ba
ba
ba1
),min(2
B B
k
A A
j
BA B
i
A
i
nb
na
nb
na
22
2
Bray-Curtis:
Morisita-Horn:
ai = abundance of ith species in population Abi = abundance of ith species in population BBased on existing sample (abundance) data
36
A B0 10
13 00 12
12 67 01 10
18 1014 017 20
0 145 0
15 29 20 3
16 0
Sample-based:J = 0.375B-C = 0.351852
ai bi
There exist adjusted versions of Jaccard & Sorensen which do (attempt to) account for unseen species: see SPADE; Chao/Chazdon/Colwell/Shen (2005).
37
species/occasion 1 2 3 4 5 6freq1 0 1 0 0 1 0 22 0 1 0 0 0 1 23 0 0 1 0 0 0 14 1 0 0 1 0 0 25 0 0 0 1 0 0 16 1 1 0 0 1 1 47 0 0 0 1 0 0 18 0 0 1 0 1 1 39 1 1 1 0 1 0 4
10 0 1 1 0 1 0 311 0 1 0 0 0 1 212 1 0 0 1 0 0 213 0 0 0 0 1 0 114 0 1 1 0 0 1 315 1 0 0 1 0 0 2
Incidence data (vs. abundance)
Example: 6 “occasions” (samples, etc.);15 total species observed.
Only presence/absence recorded (not abundance).
Also capture-recapture, multiple recapture, etc.
Models: M0, Mh, Mt, Mb, Mth, Mtb, Mtbh
38
M(h) - 2-point finite mixture
Population size (CI) 15.7 (15.0 - 1567.9)Capture probability 0.3508 (per occasion)Capture probability 0.9251 (overall)Npar 4Log likelihood -31.079AIC 70.157AICc 74.157
M(0) - ML estimator (Otis et al. 1978)
Population size (CI) 16.0 ± 1.4 (15.0 - 20.0)Capture probability 0.3438 (per occasion)Capture probability 0.9201 (overall)Npar 2Log likelihood -59.003AIC 122.005AICc 123.005
0: null model, homogeneous capture probabilities
h: heterogeneous capture probabilities
t: time (occasion, sample) effect
b: “behavioral” effect
Program DENSITY
39
Multivariate data: Principal components analysis
x1 x2 x30.553 0.710 0.5440.846 0.793 0.8070.374 0.474 0.5530.563 0.570 0.3760.495 0.634 0.6640.322 0.440 0.6170.551 0.373 0.2200.616 0.508 0.5240.780 0.915 1.0030.109 0.175 0.083
0.000 0.200 0.400 0.600 0.800 1.0000.0000.1000.2000.3000.4000.5000.6000.7000.8000.9001.000
x1
x2
0.000 0.200 0.400 0.600 0.800 1.0000.000
0.200
0.400
0.600
0.800
1.000
1.200
x1
x3
0.000 0.200 0.400 0.600 0.800 1.0000.000
0.200
0.400
0.600
0.800
1.000
1.200
x2
x3 x1 x2 x3
x1 1.000
x2 0.854 1.000
x3 0.687 0.889 1.000
• PCA creates a new coordinate system, i.e, a new set of variables• New coordinate system is orthogonal• New variables are linear combinations of original variables• New variables capture variance of original variables (data) in optimal way• Dimension reduction: often possible to use only a few (e.g., 2) principal components instead of original variables, but still capture most of the information (variance) in data.
40
41
F1 F2 F30
0.5
1
1.5
2
2.5
3
0
20
40
60
80
100
Scree plot
axis
Eige
nval
ue
Cum
ulati
ve v
aria
bilit
y (%
)
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
Obs1
Obs2
Obs3
Obs4
Obs5
Obs6
Obs7
Obs8
Obs9Obs10
Observations (axes F1 and F2: 97.91 %)
F1 (87.44 %)
F2 (1
0.47
%)
F1 F2 F3Eigenvalue 2.62 0.31 0.06Variability (%) 87.44 10.47 2.09Cumulative % 87.44 97.91 100.00
42
x1 x2 x3 x4 x50.553 0.710 0.544 0.916 0.6590.846 0.793 0.807 0.911 0.9580.374 0.474 0.553 0.277 0.5520.563 0.570 0.376 0.377 0.2630.495 0.634 0.664 0.581 0.8240.322 0.440 0.617 0.426 0.6010.551 0.373 0.220 0.415 0.1130.616 0.508 0.524 0.740 0.4650.780 0.915 1.003 0.845 1.0730.109 0.175 0.083 0.407 0.326
x1 x2 x3 x4 x5x1 1.000x2 0.854 1.000x3 0.687 0.889 1.000x4 0.713 0.752 0.614 1.000x5 0.526 0.809 0.919 0.677 1.000
F1 F2 F3 F4 F5Eigenvalue 3.99 0.57 0.35 0.06 0.03Variability (%) 79.76 11.38 7.05 1.20 0.61Cumulative % 79.76 91.14 98.19 99.39 100.00
F1 F2 F3 F4 F50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
20
40
60
80
100
Scree plot
axis
Eige
nval
ue
Cum
ulati
ve v
aria
bilit
y (%
)
43
-4 -3 -2 -1 0 1 2 3 4-3
-2
-1
0
1
2
3
Obs1Obs2
Obs3
Obs4
Obs5
Obs6
Obs7
Obs8
Obs9Obs10
Observations (axes F1 and F2: 91.14 %)
F1 (79.76 %)
F2 (1
1.38
%)
Multidimensional scaling• Represents N data points in low (2 or 3) –dimensional space.• Visualization technique
• Data analysis (exploration, description) vs. statistical inference
• Input is dissimilarity or distance matrix• Attempts to reproduce given distances with minimum stress• Metric: preserves distances as closely as possible• Nonmetric: preserves order of distances as closely
as possible44
45
A B C D E F G H
A 0.000 0.372 0.300 0.289 0.120 0.015 0.885 0.444
B 0.372 0.000 0.351 0.128 0.000 0.353 0.301 0.102
C 0.300 0.351 0.000 0.772 0.000 0.889 0.087 0.648
D 0.289 0.128 0.772 0.000 0.000 0.051 0.019 0.042
E 0.120 0.000 0.000 0.000 0.000 0.012 0.000 0.856
F 0.015 0.353 0.889 0.051 0.012 0.000 0.680 0.556
G 0.885 0.301 0.087 0.019 0.000 0.680 0.000 0.113
H 0.444 0.102 0.648 0.042 0.856 0.556 0.113 0.000
Distance matrix:Symmetric, zeroes on diagonalDistances may be derived from, e.g., community
comparison metrics etc.Be careful of similarity vs. dissimilarity (distance)
46
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.3
-0.2
-0.1
5.55111512312578E-17
0.1
0.2
0.3
A
BC D
E
F
G
H
Configuration (Kruskal's stress (1) = 0.569)
Dim1
Dim
2
Dimensions 2Kruskal's stress (1) 0.569Iterations 16
Convergence 0.000
Absolute MDS
47
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
A
B
C
D
E
F
G
H
Configuration (Kruskal's stress (1) = 0.134)
Dim1
Dim
2
Dimensions 2Kruskal's stress (1) 0.134Iterations 86
Convergence 0.000
Non-metric MDS
48
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.3
-0.2
-0.1
5.55111512312578E-17
0.1
0.2
0.3
A
BC D
E
F
G
H
Configuration (Kruskal's stress (1) = 0.569)
Dim1
Dim
2
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
A
B
C
D
E
F
G
H
Configuration (Kruskal's stress (1) = 0.134)
Dim1
Dim
2
Absolute MDS
Non-metric MDS
Invariant to rotation & symmetry