logistic regression hal whitehead biol4062/5062

20

Click here to load reader

Upload: emil-malone

Post on 19-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Categorical data Logistic regression on binary data Odds ratio Logits Probit regression With many categories

TRANSCRIPT

Page 1: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression

Hal WhiteheadBIOL40625062

bull Categorical databull Logistic regression on binary databull Odds ratiobull Logitsbull Probit regressionbull With many categories

Categorical databull Categorical data

ndash Sex species morph physiological statebull Categorical vs Continuous

ndash Continuous =gt Continuous Linear regressionndash Categorical =gt Continuous ANOVAndash Categorical =gt Categorical Log-linear modelsndash Continuous =gt Categorical Logistic regression

Also Continuous + Categorical =gt Categorical

Logistic Regression on Binary Data

bull Binary datandash two categoriesndash proportionsndash want to work out probability of being in a

category Pbull Logistic regression

Error 1

P Z

Z

e

eZ= β0 + β1X1 + hellip

Logistic Regression

bull If Z is large and positive P ~ 10bull If Z is large and negative P ~ 00

bull Fit β0 β1 using maximum likelihood

bull Xrsquos can be categorical as well as continuous

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic Regression Outputs

bull Estimates of regression coefficientsndash β0 β1 hellip

bull Significance of regression coefficients and overall logistic regression

bull Quantile probabilitiesbull Accuracy of predictionbull Odds ratios

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 2: Logistic Regression Hal Whitehead BIOL4062/5062

bull Categorical databull Logistic regression on binary databull Odds ratiobull Logitsbull Probit regressionbull With many categories

Categorical databull Categorical data

ndash Sex species morph physiological statebull Categorical vs Continuous

ndash Continuous =gt Continuous Linear regressionndash Categorical =gt Continuous ANOVAndash Categorical =gt Categorical Log-linear modelsndash Continuous =gt Categorical Logistic regression

Also Continuous + Categorical =gt Categorical

Logistic Regression on Binary Data

bull Binary datandash two categoriesndash proportionsndash want to work out probability of being in a

category Pbull Logistic regression

Error 1

P Z

Z

e

eZ= β0 + β1X1 + hellip

Logistic Regression

bull If Z is large and positive P ~ 10bull If Z is large and negative P ~ 00

bull Fit β0 β1 using maximum likelihood

bull Xrsquos can be categorical as well as continuous

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic Regression Outputs

bull Estimates of regression coefficientsndash β0 β1 hellip

bull Significance of regression coefficients and overall logistic regression

bull Quantile probabilitiesbull Accuracy of predictionbull Odds ratios

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 3: Logistic Regression Hal Whitehead BIOL4062/5062

Categorical databull Categorical data

ndash Sex species morph physiological statebull Categorical vs Continuous

ndash Continuous =gt Continuous Linear regressionndash Categorical =gt Continuous ANOVAndash Categorical =gt Categorical Log-linear modelsndash Continuous =gt Categorical Logistic regression

Also Continuous + Categorical =gt Categorical

Logistic Regression on Binary Data

bull Binary datandash two categoriesndash proportionsndash want to work out probability of being in a

category Pbull Logistic regression

Error 1

P Z

Z

e

eZ= β0 + β1X1 + hellip

Logistic Regression

bull If Z is large and positive P ~ 10bull If Z is large and negative P ~ 00

bull Fit β0 β1 using maximum likelihood

bull Xrsquos can be categorical as well as continuous

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic Regression Outputs

bull Estimates of regression coefficientsndash β0 β1 hellip

bull Significance of regression coefficients and overall logistic regression

bull Quantile probabilitiesbull Accuracy of predictionbull Odds ratios

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 4: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression on Binary Data

bull Binary datandash two categoriesndash proportionsndash want to work out probability of being in a

category Pbull Logistic regression

Error 1

P Z

Z

e

eZ= β0 + β1X1 + hellip

Logistic Regression

bull If Z is large and positive P ~ 10bull If Z is large and negative P ~ 00

bull Fit β0 β1 using maximum likelihood

bull Xrsquos can be categorical as well as continuous

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic Regression Outputs

bull Estimates of regression coefficientsndash β0 β1 hellip

bull Significance of regression coefficients and overall logistic regression

bull Quantile probabilitiesbull Accuracy of predictionbull Odds ratios

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 5: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression

bull If Z is large and positive P ~ 10bull If Z is large and negative P ~ 00

bull Fit β0 β1 using maximum likelihood

bull Xrsquos can be categorical as well as continuous

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic Regression Outputs

bull Estimates of regression coefficientsndash β0 β1 hellip

bull Significance of regression coefficients and overall logistic regression

bull Quantile probabilitiesbull Accuracy of predictionbull Odds ratios

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 6: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression Outputs

bull Estimates of regression coefficientsndash β0 β1 hellip

bull Significance of regression coefficients and overall logistic regression

bull Quantile probabilitiesbull Accuracy of predictionbull Odds ratios

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 7: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression

bull Regression coefficients estimated by maximizing log-likelihood iteratively

bull Significance of coefficients indicated byndash likelihood ratio test (theoretically best)ndash Wald test (normal approximation)

bull Can reduce numbers of independent variables using stepwise elimination

bull Or choose ldquobestrdquo model using AIC

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 8: Logistic Regression Hal Whitehead BIOL4062/5062

Example Fruit-fly DeathDose Dead Alive001 1 401 3 210 2 3100 4 11000 5 0

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

of d

e ath

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 9: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression

bull β0 = 056 ndash Constant

bull β1 = 092ndash x Log(Dose)

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

P=0255

P=0020

Overall P=00064

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 10: Logistic Regression Hal Whitehead BIOL4062/5062

Model selection using AIC

bull Constant only Log(L)=-16825 AIC=35650

bull Const dose Log(L)=-13112 AIC=30224

bull Const dose dose2 Log(L)=-12869 AIC=31738

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 11: Logistic Regression Hal Whitehead BIOL4062/5062

Accuracy of prediction Predicted

Actual Died Lived

Died 106 44Lived 44 56Correct 07 0 6

Overall correct 065

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 12: Logistic Regression Hal Whitehead BIOL4062/5062

Odds ratiobull Compares probabilities of something happening at

two values of independent variablendash ω=[P(A)(1-P(A))] [P(B)(1-P(B))]

bull ldquoOdds of dying in next 5 years are ω times greater for smokers than non-smokersrdquo

bull Log(ω)= βndash the change in odds of the event happening as the

independent variable changes by one is the log of the regression coefficient

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 13: Logistic Regression Hal Whitehead BIOL4062/5062

Odds ratio

bull Odds ratio for β1 = 25ndash 95 ci 12-54

bull Odds of dying are 25 greater when dose is 10-fold stronger

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

-4 -2 0 2 4Log(Dose)

00

02

04

06

08

10

Pro

babi

lity

o f d

e ath

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 14: Logistic Regression Hal Whitehead BIOL4062/5062

Example Matriarchs As Repositories of Social

Knowledge in African Elephants

bull Playback vocalizations of other elephants to matriarchal groups of elephants

bull Do they ldquobunchrdquo

McComb et al Science 2001

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 15: Logistic Regression Hal Whitehead BIOL4062/5062

Elephant Knowledgebull Dependent variable Bunch not bunchbull Independent variables

ndash Family [Categorical]ndash Age of matriarchndash Mean age of other femalesndash Number of females in groupndash Number of calves in groupndash Age of youngest calfndash Presence of adult malesndash Association index between group and playback individualndash Interactions

bull Age of matriarch X

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 16: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression Elephant Bunching on

β dfVariables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

Variables excluded from final modelAge of other females -0201 1 P = 0248Females in group 0033 1 P = 0867Calves in group 0015 1 P = 0946Age of youngest calf 0032 1 P = 0194Presence of males -0851 1 P = 0166 Other interactions with Age of matriarch

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 17: Logistic Regression Hal Whitehead BIOL4062/5062

Logistic Regression Elephant Bunching on β df

Variables included in final modelFamily - 20 P = 0029Age of matriarch -0514 1 P = 0005Association index 980 1 P = 0147Age of matriarch times association index -431 1 P = 0011

55 yr-old matriarchs

35 yr-old matriarchs ldquosensitivity of the bunching response to the

association index increased with the age of

the matriarchrdquo

McComb et al Science 2001

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 18: Logistic Regression Hal Whitehead BIOL4062/5062

Error 1

P Z

Z

e

eZ= β0 + β1 X1 + hellip

Logistic regression

PP

1LogZ

Logit transformation

Logitbull Logit transformation is inverse of logistic functionbull Logit differences are logs of odds-ratiosbull Logit regression (almost) equivalent to logistic regression

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 19: Logistic Regression Hal Whitehead BIOL4062/5062

Probit Regressionbull Transforms values in

range [0 1] using inverse cumulative normal function

bull Useful for proportions (when numbers are not available)

bull Type of generalized linear model

Y

Prob

it(Y

)

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories
Page 20: Logistic Regression Hal Whitehead BIOL4062/5062

With Many Categories

bull Logistic regression for one category against rest

bull Canonical Variate Analysis

  • Logistic Regression
  • Slide 2
  • Categorical data
  • Logistic Regression on Binary Data
  • Slide 5
  • Logistic Regression Outputs
  • Slide 7
  • Example Fruit-fly Death
  • Slide 9
  • Model selection using AIC
  • Accuracy of prediction
  • Odds ratio
  • Slide 13
  • Example Matriarchs As Repositories of Social Knowledge in African Elephants
  • Elephant Knowledge
  • Logistic Regression Elephant Bunching on
  • Slide 17
  • Logit
  • Probit Regression
  • With Many Categories