body size distribution of european collembola lecture 9 moments of distributions
Post on 19-Dec-2015
215 views
TRANSCRIPT
Body size distribution of European Collembola
SpeciesBody
weight [mg]
ln weight
ln body weight [mg] class means
Number of
speciesTetrodontophora bielanensis (Waga 1842) 13.471729 2.6006 -4.71511 7Orchesella chiantica Frati & Szeptycki 1990 13.471729 2.6006 -4.018377 53Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 12.924837 2.5592 -3.321643 133Orchesella dallaii Frati & Szeptycki 1990 9.4503028 2.246 -2.624909 224Seira pini Jordana & Arbea 1989 9.4503028 2.246 -1.928176 353Isotomurus pentodon (Kos,1937) 7.1044808 1.9607 -1.231442 395Heteromurus (V.) longicornis (Absolon 1900) 7.1044808 1.9607 -0.534708 325Pogonognathellus flavescens (Tullberg 1871) 6.9512714 1.9389 0.162025 126Orchesella hoffmanni Stomp 1968 6.9512714 1.9389 0.858759 45Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 6.3862223 1.8541 1.555493 24Pogonognathellus longicornis (Müller 1776) 6.2133935 1.8267 2.252226 9Orchesella devergens Handschin 1924 6.2133935 1.8267Orchesella flavescens (Bourlet 1839) 6.2133935 1.8267Orchesella quinquefasciata (Bourlet 1841) 6.2133935 1.8267
0
100
200
300
400
500
-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25
Num
ber o
f spe
cies
ln body weight class
CollembolaThe histogram of raw data
Modus
Weighed mean
)(1
111
ifxnn
xnxn
xk
ii
ik
iii
k
ii
Class 1 Class 2 Class 3N 25 31 43
Mean 1.8169079 1.032923 0.5310592.6005933 1.313477 0.6518082.5591508 1.313477 0.6518082.2460468 1.313477 0.6518082.2460468 1.313477 0.6518081.9607257 1.313477 0.6518081.9607257 1.301948 0.6518081.9389246 1.225568 0.6518081.9389246 1.165038 0.6518081.8541429 1.165038 0.6518081.8267072 1.165038 0.6518081.8267072 1.165038 0.6518081.8267072 1.006355 0.6518081.8267072 1.006355 0.6518081.8267072 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6131521.584378 1.006355 0.5738351.584378 1.006355 0.5738351.5326904 1.006355 0.5338341.5326904 0.939683 0.4931251.5064044 0.871022 0.4931251.4529137 0.871022 0.4931251.4529137 0.835906 0.493125
0.835906 0.4931250.800247 0.4890140.800247 0.4516820.764026 0.4516820.756712 0.4516820.727225 0.451682
0.409479
Three Collembolan weight classes
What is the average body weight?
013.1531.09943
033.19931
812.19925 x
n
xn
ii
1n
xx
n
ii
1
Population mean Sample mean
ln body weight [mg] class means
Number of
speciesFrequency
Arithmetic mean
Variance
-4.72 7 =B2/B14 =A2*C2 =(A2-D14)^2*C2-4.02 53 0.031286895 -0.125723 0.202268085-3.32 133 0.078512397 -0.26079 0.267516588-2.62 224 0.132231405 -0.347095 0.174619987-1.93 353 0.208382527 -0.401798 0.042653444-1.23 395 0.233175915 -0.287143 0.013917567-0.53 325 0.191853601 -0.102586 0.1698983170.16 126 0.074380165 0.0120514 0.1995107270.86 45 0.026564345 0.0228124 0.1447740291.56 24 0.014167651 0.0220377 0.1301786272.25 9 0.005312869 0.0119658 0.073837264
Sum 1694 -1.475751 1.462535979StDev 1.209353538
0
0.05
0.1
0.15
0.2
0.25
-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25
Num
ber o
f spe
cies
ln body weight class
Collembola
nn
xf i)( 1
Weighed mean
k
iii
k
i
iin
i
i xfxnxn
nx
x111
)(
Discrete distributions
Continuous distributions
max
min
)( dxxxf
The average European springtail has a body weight of e-1.476 = 023 mg.
Most often encounted is a weight around e-1.23 = 029 mg.
Why did we use log transformed values?
SpeciesAverage
body length [mm]
Body weight
[mg]
Tetrodontophora bielanensis (Waga 1842) 7 13.472Orchesella chiantica Frati & Szeptycki 1990 7 13.472Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 6.875 12.925Orchesella dallaii Frati & Szeptycki 1990 6 9.4503Seira pini Jordana & Arbea 1989 6 9.4503Isotomurus pentodon (Kos,1937) 5.3 7.1045Heteromurus (V.) longicornis (Absolon 1900) 5.3 7.1045Pogonognathellus flavescens (Tullberg 1871) 5.25 6.9513Orchesella hoffmanni Stomp 1968 5.25 6.9513Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 5.06 6.3862Pogonognathellus longicornis (Müller 1776) 5 6.2134Orchesella devergens Handschin 1924 5 6.2134Orchesella flavescens (Bourlet 1839) 5 6.2134Orchesella quinquefasciata (Bourlet 1841) 5 6.2134
5 =JEŻELI(B86=0;0;EXP(-1.875+LN(B86)*2.3))
3.2875.1 ][]/[][ mmLLWemgW
0
100
200
300
400
500
-6.00 -4.00 -2.00 0.00 2.00 4.00
Num
ber o
f spe
cies
ln body weight class
Collembola
0
100
200
300
400
500
0 2 4 6 8 10
Num
ber o
f spe
cies
Body weight class
CollembolaLog transformed data Linear data
The distribution is skewed
Body weight [mg] class
means
Number of
speciesFrequency
Arithmetic mean
Geometric mean
0.01 7 0.004132231 3.702E-05 -0.0194839260.02 53 0.031286895 0.0005626 -0.1257225390.04 133 0.078512397 0.0028338 -0.2607901530.07 224 0.132231405 0.0095797 -0.3470954050.15 353 0.208382527 0.0303016 -0.4017981870.29 395 0.233175915 0.0680574 -0.2871426150.59 325 0.191853601 0.1123956 -0.1025856551.18 126 0.074380165 0.0874629 0.0120514462.36 45 0.026564345 0.062698 0.022812374.74 24 0.014167651 0.0671181 0.0220376819.51 9 0.005312869 0.0505194 0.011965782
Sum 1694 0.491566 -1.4757512Exp() 0.228606933
0
100
200
300
400
500
0 2 4 6 8 10
Num
ber o
f spe
cies
Body weight class
Collembola
LzWW
LWW
mmLLWemgWz
lnlnln
][]/[][
0
0
3.2875.1
In the case of exponentially distributed data we have to use the geometric mean.To make things easier we first log-transform our data.
nx
n
n
ii
n
ii
ex
1
ln
1
Geometric mean
The average European springtail has a body weight of
e-1.476 = 023 mg.
lb scaled weight classes
How to use geometric means
A tropical forest is logged during three years: first year 0.1%, second year 1% and third year 10% of area.
Hence the total decrease in forest area is
3 0 0A (1 0.001)(1 0.01)(1 0.1)A 0.890A
11% of area has been logged during three year. What is the mean logging rate per year?
Arithmetic mean
33 0 0
0.999 0.99 0.90.963
3
A 0.963 A 0.893A
Geometric mean
1/3
33 0 0
(0.999*0.99*0.9) 0.962
A 0.962 A 0.890A
nx
n
n
ii
n
ii
ex
1
ln
1
In multiplicative processes we should use the geometric mean.
ln body weight [mg] class means
Number of
speciesFrequency
Arithmetic mean
Variance
-4.72 7 =B2/B14 =A2*C2 =(A2-D14)^2*C2-4.02 53 0.031286895 -0.125723 0.202268085-3.32 133 0.078512397 -0.26079 0.267516588-2.62 224 0.132231405 -0.347095 0.174619987-1.93 353 0.208382527 -0.401798 0.042653444-1.23 395 0.233175915 -0.287143 0.013917567-0.53 325 0.191853601 -0.102586 0.1698983170.16 126 0.074380165 0.0120514 0.1995107270.86 45 0.026564345 0.0228124 0.1447740291.56 24 0.014167651 0.0220377 0.1301786272.25 9 0.005312869 0.0119658 0.073837264
Sum 1694 -1.475751 1.462535979StDev 1.209353538
0
0.05
0.1
0.15
0.2
0.25
-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25
Num
ber o
f spe
cies
ln body weight class
Collembola
nn
xf i)( 1
1
)(1
2
2
n
xxs
n
ii
n
xn
ii
1
2
2
)(
Variance
)()(1
22i
n
ii xfxxs
Continuous distributions
dxxfxxs max
min
22 )()(
2ss Standard deviation
Mean
1 SD
The standard deviation is a measure of the width of the statistical distribution that has the sam
dimension as the mean.
Degrees of freedom
The standard deviation as a measure of errorsEnvironmental pollution
Station NOx [ppm]1 8.492 1.123 9.114 7.755 0.756 8.237 0.978 6.069 8.48
10 5.8811 8.5112 9.6213 3.3514 7.7415 2.0316 5.0617 7.6118 0.9919 2.5520 8.91
Mean 5.66Variance 10.45
Standard deviation
3.23
DistanceAverage NOx
concentrationStandard deviation
1 9.53 1.702 7.37 1.183 5.24 0.864 3.15 0.265 2.17 0.186 1.05 0.097 0.84 0.148 0.63 0.109 0.32 0.03
10 0.21 0.02
The precision of derived metrics should always match the precision of the raw data
02468
101214
1 2 3 4 5 6 7 8 9 10
Conc
entr
ation
Distance [km]
± 1 standard deviation is the most often used estimator of error.The probablity that the true mean is within ± 1 standard deviation is approximately 68%.The probablity that the true mean is within ± 2 standard deviations is approximately 95%.
± 1 standard deviation
MeanStandard deviation
5.44 4.15
4.49 5.29
5.55 3.39
5.56 3.13
Standard deviation and standard errorEnvironmental
pollution
StationNOx
[ppm]1 8.492 1.123 9.114 7.755 0.756 8.237 0.978 6.069 8.48
10 5.8811 8.5112 9.6213 3.3514 7.7415 2.0316 5.0617 7.6118 0.9919 2.5520 8.91
The standard deviation is constant irrespective of sample size.
The precision of the estimate of the mean should increase with sample size n.
The standard error is a measure of precision.
n
SDSE
DistanceAverage NOx
concentrationStandard deviation
Standard error n=20
1 9.53 3.32 0.742 7.37 2.45 0.553 5.24 1.24 0.284 3.15 0.67 0.155 2.17 0.87 0.196 1.05 0.34 0.087 0.84 0.14 0.038 0.63 0.10 0.029 0.32 0.03 0.01
10 0.21 0.02 0.01
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Conc
entr
ation
Distance [km]
)()()(2)()()()(1
2
11
2
1
22i
n
ii
n
iii
n
iii
n
ii xfxxfxxxfxxfxxs
2
1
22
1
22 )()(1)(2)()( xxfxxxxxfxs i
n
iii
n
ii
E(x2) [E(x)]2
222 )()( xExE
The variance is the difference between the mean of the squared values and the squared mean
1
( ) ( )n
k ki i
i
E X x f x
( ) ( )k kE X x f x dx
( )E X k-th central moment
2 2 2
1
( ) ( ) (( ) )n
i ii
X f X E X
Mathematical expectation
Central moments
First central momentFirst moment of central tendency
2
11
2
2
11
n
x
n
xs
n
ii
n
ii
Frequency distributions of resource use or wealth in a population can be described by a power law (the famous Pareto-Zipf law) with exponents that often have values around
-5/2. What are the mean and the variance of such a power function distribution?
1)(max
min
xf zaxxf )(
z Mean Variance StDev2.5 1.50942 18.9121 4.34881
x f(x) f(x)/sum xf(x) (x-m)2f(x)1 1 0.756475 0.756475 0.5669292 0.176777 0.133727 0.267454 1.5424843 0.06415 0.048528 0.145584 1.8600554 0.03125 0.02364 0.094559 2.0018365 0.017889 0.013532 0.067661 2.0786746 0.01134 0.008579 0.051472 2.125627 0.007714 0.005835 0.040846 2.1567168 0.005524 0.004179 0.033432 2.1785479 0.004115 0.003113 0.028018 2.194559
10 0.003162 0.002392 0.023922 2.206711Sum 1.32192 1 1.50942 18.9121
0.001
0.01
0.1
1
1 10
Freq
uenc
y
Wealth class
2/5)( xxf
32.1)(
2/5
xxf
Discrete distribution
Most people are in the lowest income class and the average is half between the first and the second.
07.193.01
193.0
93.035.0
35.10
)32(
2/32/35.10
5.0
2/35.10
5.0
2/5
aa
aaa
xa
dxax
37.2)5.05.10(14.25.01
07.193.01 2/12/1
5.10
5.0
2/15.10
5.0
2/5
xdxxx
Continuous approximation
0.001
0.01
0.1
1
0 1 2 3 4 5 6 7 8 9 10Fr
eque
ncy
Wealth class
Upper bound of ten would only cover half of the column
Note that the y-axis is at log
scale.
56.1)5.05.10(52.15.01
76.032.11 2/12/1
5.10
5.0
2/15.10
5.0
2/5
xdxxx
The estimate of a is imprecise
The Arrhenius probability model assumes the same probability of an event irrespective of the time that elapsed from the starting.
What are the mean and the variance of such a distribution?
max
min
1)( dxxf
0
00
1
1
11
dte
aa
ea
dtea
t
tt
1)1(
00
te
dttext
t
taetf )(
Cumulative density function
2
2
22
2
0
2
22
0
22
112
2)22(][
tte
dtetxEt
t
00.20.40.60.8
1
0 2 4 6 8
f(x)
x
3
3
(( ) )E X
Skewness
3 3 2 2 3 3 2 3(( ) ) ( ) 3 ( ) 3 ( ) ( ) 3 ( ) 2E X E X E X E X E X E X Third central moment
4
4
( )( ) 3X
E
Kurtosis
00.20.40.60.8
1
0 2 4 6 8
f(x)
x
00.20.40.60.8
1
0 500 1000 1500 2000f(
x)
x
00.20.40.60.8
1
1 1.5 2
f(x)
x
g=0 g>0 g<0
Symmetric distribution Right skewed distribution Left skewed distribution
d=0
00.20.40.60.8
1
0 2 4 6 8
f(x)
x
d>0
How to get the modus?
We need the maximum of the pdf
00.20.40.60.8
1
0 2 4 6
f(x)
x
xxey Mode
10
xexedxdxe xx
x
111)1(
00
xedxxe
xx
2)22()(0
2
0
2
0
xxedxexdxxxexE xxx
A probability distribution if
Arithmetic mean
Mean
Body volumes are estimated from measures of height*length*width. Assume you estimated the thorax volume of insects and used this volume to infer the body weight.
WidthHeightLengthcV
zWidthHeightLengthamgW ][
How to get the parameters a and z?
Standard deviation is a measure of accuracy (error) Independent measurements
n
iitotal
n
iitotal
1
22
1
22 ;
035.00012.0
0012.002.002.002.0 2222
total
total
Body weights are estimated from species weights against thorax
volume.
y = 1.7754x0.6072
0.0000.5001.0001.5002.0002.5003.000
0.000 0.500 1.000 1.500 2.000
Dry
wei
ght
Thorax volume
The body weight of a new species is estimated from the regression function
Height, length and width could be measured with an accuracy of ± 2%.
2
The error of the thorax estimate is 3.5%.
zWidthHeightLengthamgW ][
Home work and literature
Refresh:
• Arithmetic, geometric, harmonic mean• Cauchy inequality• Statistical distribution• Probability distribution• Moments of distributions• Error law of Gauß• Bootstrap
Prepare to the next lecture:
• Bionomial distribution• Mean and variance of the binomial
distribution• Poisson distribution• Mean and variance of the Poisson
distribution• Moments of distributions• DNA mutations• Transition matrix
Literature:Łomnicki: Statystyka dla biologów
Binomial distribution:http://www.stat.yale.edu/Courses/1997-98/101/binom.htmPoisson dstribution:http://en.wikipedia.org/wiki/Poisson_distribution