logistic regression. analysis of proportion data we know how many times an event occurred, and how...

20
Logistic regression 0 0.2 0.4 0.6 0.8 1 1.2 0 50 100 150

Upload: randall-owen

Post on 16-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Logistic regression

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150

Page 2: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Analysis of proportion data

• We know how many times an event occurred, and how many times did not occur.

• We want to know if these proportions are affected by a treatment or a factor

• Examples:Proportion dying

Proportion responding to a treatment

Proportion in a sex

Proportion flowering

Page 3: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

The old fashion way:

• People used to model these data using percentage mortality as the response variable

• The problems with this are:• Errors are not normally distributed• The variance is not constant• The response is bounded (1-0)

• We lose information of the size of the sample

Page 4: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

However…

• Some data as percentage of plant cover are better analyzed using the conventional models (normal errors and constant variance) following arcsine transformation (the response variable measured in radians)

proportion1sin

Page 5: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.2 0.4 0.6 0.8 1

proportiontiontransforma 1sinarcsin_

Page 6: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

If the response variable takes the form of percentage change is some

measurement • It is usually better:

• Analysis of covariance, using final weight as the response variable and initial weight as covariate, or

• By specifying the response variable as a relative growth rate, measured as log(final/initial)

Both of which can be analyzed with normal errors without further transformation

Page 7: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Rational for logistic regression

• The traditional transformation of proportion data was arcsine. This transformation took care of the error distribution. There is nothing wrong with this transformation, but a simpler approach is often preferable, and is likely to produce a model easier to interpret

Page 8: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

The logistic curve

• The logistic curve is commonly used to describe data on proportions.

• It asymptotes at 0 and 1, so that negative proportions and responses of more than 100 % cannot be predicted.

Page 9: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Binomial errors• If p = proportion of individuals observed to respond in a given

way• The proportion of individuals that respond in alternative ways

is: 1-p and we shall call this proportion q• n is the size of the sample (or number of attempts • An important point is that the variance of the binomial

distribution is not constant. In fact the variance of a binomial distribution with mean np is:

npqs 2

So that the variance changes with the mean like this:

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

S2

Page 10: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

The logistic model

X

X

e

ep

10

10

1

0_, pthenx

The logistic model for p as a function of x is given by:

This model is bounded since:

1_, pthenx

Page 11: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

The trick of linearizing the logistic model is a simple transformation

X

X

e

ep

10

10

1

Xp

p101

ln

See better description for the logit transformation in the class website

Page 12: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

• Small short-lived perennial herb • Narrowly endemic and endangered• Flowers are small and bisexual• Self-compatible, but requires pollinators to set seed

Hypericum cumulicola:

Menges et al. (1999)

Dolan et al. (1999)

Boyle and Menges (2001)

Page 13: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

• 15 populations (various patch sizes)

• >80 individuals per population each year

• Data on height and number of reproductive structures

• Survival between August 1994 and August 1995

Demographic data

Page 14: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Histogram of height (cm) Hypericum cumulicola (1994)

Page 15: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Call:glm(formula = survival ~ rep_structures * height, family = binomial)

Deviance Residuals: Min 1Q Median 3Q Max -2.0576 -0.9510 0.5748 0.7394 1.5518

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.043e+00 1.888e-01 10.819 < 2e-16 ***rep_structures -9.112e-03 2.518e-03 -3.619 0.000296 ***height -2.717e-02 7.588e-03 -3.581 0.000343 ***rep_structures:height 1.219e-04 4.096e-05 2.977 0.002912 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1018.68 on 878 degrees of freedomResidual deviance: 925.22 on 875 degrees of freedomAIC: 933.22

Number of Fisher Scoring iterations: 4

Page 16: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Calculating a given proportion

• You can back-transform from logits (z) to proportions (p) by

)exp(1

1

1

z

p

Page 17: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Survival vs height

Page 18: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Survival vs rep_structures

Page 19: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these

Height - rep structures interaction0 fruits 100 fruits

200 fruits 1000 fruits

Height (cm)

surv

ival

Page 20: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these