lecture 7lecturecontent.s3.amazonaws.com/pdf/15067.pdf · lecture 7 . section a multiple logistic...

Post on 27-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Logistic Regression for Estimation, Adjustment and Basic Prediction

Lecture 7

Section A

Multiple Logistic Regression: Some Examples

3

Learning Objectives

n  Interpret the estimates from multiple logistic regression models in a scientific context

n  Compare the results from simple and multiple logistic regression models to assess confounding

4

Example 1: Breast Feeding and Sex

n  Data from a random sample of 192 Nepali children [12, 36) months old

n  Question: what is the relationship between breast feeding and sex of a child?

n  Data: -  Breast fed: 70% -  Sex: 48% female (1= male, 0 = female)

5

Example 1: Breast Feeding and Sex

n  The resulting unadjusted association

: the ln(odds ratio) of being breast fed for males to females is 0.02

: the ln(odds) of being breast fed for female children is 1.12

02.01̂ =β

12.1ˆ =oβ

102.012.11

ln xpp

+=⎟⎟⎠

⎞⎜⎜⎝

6

Example 1: Breast Feeding , Sex and Age

n  Now for the results of a multiple regression of breast feeding status on sex, and age of the child (months)

with x1 = sex (1=F) and x2 =age (months), p = probability (proportion) of y=1, where y = 1 if breastfed, 0 if not

n  Additionally: -  95% CI for -  95% CI for

21 24.027.02.71

ln xxpp

−++=⎟⎟⎠

⎞⎜⎜⎝

48.0),.041 ,50.0(:1 =− pvalueβ

001.0 ,.17) ,31.0(:2 <−− pvalueβ

7

Example 1: Breast Feeding , Sex and Age

n  The slope estimate for sex is -  Still an estimated ln(odds ratio) of breast feeding for male

children to female children, who are of the same age -  This called the age adjusted association between breast feeding

and sex

n  The resulting odds ratio estimate is e0.27≈1.30; Male children in the sample have 30% greater odds of being breastfed than female children in the sample of the same age

n  The 95% for the age adjusted odds ratio for males compared to females is (e-0.50, e1.04) →(0.61 , 2.82)

27.01̂ =β

8

Ex1: Predictors of Arm Circumference: Height &Weight

n  The slope estimate for age is -  Still an estimated ln(odds ratio) of breast feeding for children

who differ by one month in age (older to younger), but are of the same sex

-  This called the sex adjusted association between breast feeding and age

n  The resulting odds ratio estimate is e-0.24≈.79; A one month difference in age is associatedwith a 21% reduction in the odds of being breast fed (older to younger) among children of the same sex

n  The 95% for the true sex-adjusted odds ratio for age is (e-0.31, e-0.17) →(0.73 , 0.84)

24.0ˆ2 −=β

Example 1: Presentation of Findings

n  In research articles, frequently a single table of unadjusted and adjusted associations will be presented (for non-randomized studies)

9

Table  1:  Logistic  Regression  Results  for  Predictors  of  Breast  FeedingOdds  Ratio  (95%  CI)

Predictor Unadjusted Adjusted

Sexfemale 1 1male 1.02  (0.55,  1.90) 1.30  (0.61,  2.82)

age  (months) 0.79  (0.73  -­‐  0.84) 0.79  (0.73,  0.84)

Baseline  Odds   1,333(exponetiated  intercept)

Example 1: Additional Predictors

n  Some other additional predictors of interest include -  Maternal Parity

No Previous Children 17% 1 Previous Child 16% 2 Previous Children 14% 3 Previous Children 15% >= 4 Previous Children 38%

-  Maternal Age mean = 27.7 years, range 17-43 years

10

Example 1: Presentation of Findings

n  Sometimes the results from several models will be presented

11

Table  1:  Logistic  Regression  Results  for  Predictors  of  Breast  FeedingOdds  Ratio  (95%  CI)

Predictor Unadjusted Model  2 Model  3 Model  4

Sexfemale 1 1 1 1male 1.02  (0.55,  1.90) 1.30  (0.61,  2.82)male 1.02  (0.55,  1.90) 1.30  (0.61,  2.82) 1.23  (0.54,  2.77) 1.22  (0.54,  1.77)

age  (months) 0.79  (0.73  -­‐  0.84) 0.79  (0.73,  0.84) 0.77  (0.72,  0.83) 0.77  (0.71,  0.83)

Maternal  Parity p=0.40 p=0.12 p=0.11  No  previous  children 1 1 1

1  previous  child 0.38  (.12,  1.22) 0.23  (0.05,  1.01) 0.24  (0.05,  1.14)2  previous  children 0.50  (0.15,  1.69) 0.36  (0.08,  1.54) 0.39  (0.08,  1.83)3  previous  children 0.34  (0.11,  1.10) 0.18  (0.04,  1.05) 0.21  (0.04,  1.04)>=4  previous  children 0.61  (0.18,  2.08) 0.61  (0.18,  2.08) 0.75  (0.14,  4.2)

Mother's  Age  (years) 0.99  (0.94,  1.04) 0.98  (0.89,  1.08)

Baseline  Odds   1,333 4,932 7,071(exponetiated  intercept)

Example 1: Presentation of Findings

n  Sometimes the results from several models will be presented

12

Table  1:  Logistic  Regression  Results  for  Predictors  of  Breast  FeedingOdds  Ratio  (95%  CI)

Predictor Model  4

Sexfemale 1malemale 1.22  (0.54,  1.77)

age  (months) 0.77  (0.71,  0.83)

Maternal  Parity p=0.11  No  previous  children 1

1  previous  child 0.24  (0.05,  1.14)2  previous  children 0.39  (0.08,  1.83)3  previous  children 0.21  (0.04,  1.04)>=4  previous  children 0.75  (0.14,  4.2)

Mother's  Age  (years) 0.98  (0.89,  1.08)

Baseline  Odds   7,071

13

Example 2: Predictors of Obesity

n  Data from 2009-10 NHANES1

n  Sample of over 6,400 US residents, 16-80 years old

n  HDL levels: mean 52.4 mg/dl, sd = 16, range 11-14, 15% of sample is obese by (BMI)

n  Other potential predictors include sex, age (yrs), and marital status

14

Example 2: Predictors of Obesity

n  Data from 2009-10 NHANES

Sex Distribution: 49% F, 51% M Age (years): 46.3 years, range 16 to 80 Marital Status:

Married 52% Widowed 9% Divorced 11% Separated 3%

Never Married 18% Living Together 7%

15

Example 2: Predictors of Obesity

n  Note about the obesity/age relationship: here are the results from a lowess plot (this shows the unadjusted association)

-3-2

-10

1

Nat

ural

log

(ln) O

dds

of O

besi

ty

20 40 60 80Age (Years)

bandwidth = .8

2009-10 NHANESNatural ln Odds of Obesity as a Function of Age

Example 2: Logistic Regressions

16

Table  1:  Logistic  Regression  Results  for  Predictors  of    ObesityOdds  Ratio  (95%  CI)

Predictor Unadjusted Model  2 Model  3

HDL  (  mg/dL) 0.967  (0.961,  0.973) 0.956  (0.951,  0.962) 0.958  (0.952,  0.964)

Males 1.75  (1.52,2.01) 2.63  (2.25,  3.07) 2.61  (2.22,  3.08)

Age  Category p<0.001 p  <  0.001 p<  0.001<  30  years 1 1 130-­‐46  years 1.79  (1.46,  2.19) 1.76  (1.42,  2.17) 1.62  (1.25,  2.10)46-­‐62  years 1.82  (1.49,  2.24) 1.95  (1.57,  2.43) 1.79  (1.37,  2.36)>=  62  years 1.47  (1.19,  1.81) 1.66  (1.34,  2.07) 1.53  (1.15,  2.05)

Marital  Status p=0.69 oMarried 1 1Widowed 1.10  (0.85,  1.41) 1.05  (0.79,  1.41)Divorced 1.13  (0.90,  1.42) 1.14  (0.89,  1.44)Separated 1.18  (0.81,  1.73) 1.16  (0.78,  1.72)

Never  Married 0.99  (0.81,  1.20) 1.19  (0.94,  1.50)Living  together 0.91  (0.69,  1.20) 0.92  (0.68,  1.24)

Baseline  Odds   0.61 0.58(exponentiated  intercept)

Example 2: Logistic Regressions

17

Table  1:  Logistic  Regression  Results  for  Predictors  of    ObesityOdds  Ratio  (95%  CI)

Predictor Model  2

HDL  (  mg/dL) 0.956  (0.951,  0.962)

Males 2.63  (2.25,  3.07)

Age  Category p  <  0.001<  30  years 130-­‐46  years 1.76  (1.42,  2.17)46-­‐62  years 1.95  (1.57,  2.43)>=  62  years 1.66  (1.34,  2.07)

Marital  StatusMarriedWidowedDivorcedSeparated

Never  MarriedLiving  together

Baseline  Odds   0.61

Summary

n  Summary

18

Section B

Multiple Logistic Regression: Basics of Model Selection and Estimating Outcomes

20

Learning Objectives

n  Understand the “linearity assumption” as it applies to multiple logistic regression

n  Explain different strategies for picking a “final” multiple logistic regression model among candidates

n  Use the results of multiple logistic regression models to compare groups who differ by more than one predictor, and estimate proportions/probabilities for groups given their x values

21

n  The algorithm to estimate the equation of the multiple logistic regression is called the maximum likelihood estimation (same process used with simple logistic)

n  The estimates for the intercept ( ) and the slopes ( ) are the values that make the observed data “most” likely among all choices for and .

n  This must be done by computer.

Estimating for Logistic Regression

oβ̂ p21ˆ,....ˆ,ˆ βββ

oβ̂ p21ˆ,....ˆ,ˆ βββ

n  With more than 1 predictor, the logistic regression model is no longer estimating a line in two dimensional space

22

Resulting Shape for Multiple Logistic Regression

n  The “linearity” assumption assumes that the adjusted relationship being estimated between the ln(odds of y=1), for a binary outcome y, and each xi is linear in nature (this is an issue for continuous predictors, but not for binary or multi-categorical predictors)

23

Linearity Assumption

n  When faced with potentially many possible predictors, how does a researcher go about choosing a “best” model?

n  Model building and selection is a combination of science, statistics, and the research goal(s)

24

Choosing a “Final Model”

n  If goal is to maximize precision of adjusted estimates

-  Keep only those predictors that are statistically significant in final model

n  If goal is to present results comparable to results of similar analyses presented by other researchers (on similar or different populations)

-  Present at least one model that includes the same predictor set as the other research

25

Choosing a “Final Model”

n  If goal is to show what happens to magnitude of association with different levels of adjustment -  Present the results from several models that include different

subsets or combinations of adjustment variables

n  If goal is prediction…..well this is a slightly more complicated “story”, and we will discuss briefly later in the course

26

Choosing a “Final Model”

Example 1: Prediction with Regression Results

n  Recall the models for looking at predictors of breast feeding status (y/n) in Nepali children, 12-36 months

27

Table  1:  Logistic  Regression  Results  for  Predictors  of  Breast  FeedingOdds  Ratio  (95%  CI)

Predictor Unadjusted Adjusted

Sexfemale 1 1male 1.02  (0.55,  1.90) 1.30  (0.61,  2.82)

age  (months) 0.79  (0.73  -­‐  0.84) 0.79  (0.73,  0.84)

Baseline  Odds   1,333(exponetiated  intercept)

Example 1: Prediction with Regression Results

n  Adjusted model on regression scale:

where x1 age (months), x2 = sex (1 M, 0 F)

n  Estimate the probability (proportion of) that female children 22 months old are breast fed

28

21 27.024.02.7breastfed) being of oddsln( xx +−+=

Example 1: Prediction with Regression Results

n  FYI:

29

Predictor Adjusted

Sexfemale 1male 1.30  (0.61,  2.82)

age  (months) 0.79  (0.73,  0.84)

Baseline  Odds   1,333(exponetiated  intercept)

Example 1: Prediction with Regression Results

n  Possible way to present the associated estimated probabilities from the logistic regression results

30

.2.4

.6.8

1

Est

imat

ed P

roba

bilit

y (P

ropo

rtion

)

10 15 20 25 30 35Age of Child (months)

Male Female

estimated from multiple logistic regression

Sample of 192 Nepali Children, 12-36 MonthsEstimated Probabiliy of Being Breast Fed by Age and Sex

Example 1: Prediction with Regression Results

n  Adjusted model on regression scale:

where x1 age (months), x2 = sex (1 F, 0 M)

n  Estimate the odds ratio of being breast fed for female children 22 months compared to male children , 19 months old

31

321 30.027.024.02.7breastfed) being of oddsln( xxx ++−+=

Example 1: Prediction with Regression Results

n  Computations

32

92.1)0(27.0)22(24.0 2.7 ln(odds) :months 22 F, =+−+=

91.2)1(27.0)19(24.0 2.7 ln(odds) :months 19 M, =+−+=

Example 2: Predictors of Obesity: NHANES

33

Table  1:  Logistic  Regression  Results  for  Predictors  of    ObesityOdds  Ratio  (95%  CI)

Predictor Unadjusted Model  2 Model  3

HDL  (  mg/dL) 0.967  (0.961,  0.973) 0.956  (0.951,  0.962) 0.958  (0.952,  0.964)

Males 1.75  (1.52,2.01) 2.63  (2.25,  3.07) 2.61  (2.22,  3.08)

Age  Category p<0.001 p  <  0.001 p<  0.001<  30  years 1 1 130-­‐46  years 1.79  (1.46,  2.19) 1.76  (1.42,  2.17) 1.62  (1.25,  2.10)46-­‐62  years 1.82  (1.49,  2.24) 1.95  (1.57,  2.43) 1.79  (1.37,  2.36)>=  62  years 1.47  (1.19,  1.81) 1.66  (1.34,  2.07) 1.53  (1.15,  2.05)

Marital  Status p=0.69 oMarried 1 1Widowed 1.10  (0.85,  1.41) 1.05  (0.79,  1.41)Divorced 1.13  (0.90,  1.42) 1.14  (0.89,  1.44)Separated 1.18  (0.81,  1.73) 1.16  (0.78,  1.72)

Never  Married 0.99  (0.81,  1.20) 1.19  (0.94,  1.50)Living  together 0.91  (0.69,  1.20) 0.92  (0.68,  1.24)

Baseline  Odds   0.61 0.58(exponentiated  intercept)

Example 2: Logistic Regressions

34

Table  1:  Logistic  Regression  Results  for  Predictors  of    ObesityOdds  Ratio  (95%  CI)

Predictor Model  2

HDL  (  mg/dL) 0.956  (0.951,  0.962)

Males 2.63  (2.25,  3.07)

Age  Category p  <  0.001<  30  years 130-­‐46  years 1.76  (1.42,  2.17)46-­‐62  years 1.95  (1.57,  2.43)>=  62  years 1.66  (1.34,  2.07)

Marital  StatusMarriedWidowedDivorcedSeparated

Never  MarriedLiving  together

Baseline  Odds   0.61

5432

1

97.056.067.056.0045.05.0obese) being of oddsln(

xxxxx

+++

+−+−=

Example 2: Logistic Regressions

n  Estimate the proportion (probability) of 50 year old males with HDL of 80 mg/dL who are obese

35

46.2)1(97.0)0(56.0)1(67.0)0(56.0)80(045.05.0obese) being of oddsln(

−=+++

+−+−=

Example 2: Logistic Regressions

n  Potential Presentation of Results

36

0.2

.4.6

Pre

dict

ed P

roba

bilit

y

0 50 100 150HDL

< 30 years 30-44 years44-62 years >= 62 years

By HDL levels and AgePredicted Probability of Obesity for Males

0.2

.4.6

Pre

dict

ed P

roba

bilit

y

0 50 100 150HDL

< 30 years 30-44 years44-62 years >= 62 years

By HDL levels and AgePredicted Probability of Obesity for Females

Summary

n  Multiple logistic regression results can be used to estimate probabilities (proportions) of binary outcomes for a given subset in a population given their predictor values

n  Multiple logistic results can be used to estimate odds ratios between groups who differ by more than one characteristic (predictor)

n  Confidence intervals can be estimated for each of the above, but a computer is required

37

Section C

Multiple Logistic Regression: Some Examples from the Literature

39

Learning Objectives

n  Interpret the results from simple and multiple logistic regression models presented in published journal articles

Example 1: Chemotherapy Utilization1

n  From abstract

n  1 Du x. et al., Discrepancy Between Consensus Recommendations and Actual Community Use of Adjuvant Chemotherapy in Women with Breast Cancer. Archives of Internal Medicine, 203, Vol. 138, 90-07

40

Example 1: Chemotherapy Utilization

n  From methods section “we used multivariable logistic regression analysis to generate the odds ratio of receiving chemotherapy in women with breast cancer and to determine the effect of age (Table 1) on chemotherapy use. In this model, we adjusted for race (white, black, or others), tumor stage (stage I, stage II, or stage IIIA), node status, hormone receptor status (Table 2), whether the patient had received surgery and radiation therapy (categorized as breast-conserving surgery without radiation, breast-conserving surgery with radiation, or mastectomy), and adjuvant hormone therapy use (yes or no). In addition to odds ratios, we generated the probabilities of receiving chemotherapy from the parameters of the logistic regression for women with different ages by holding other factors constant. “

41

Example 1: Chemotherapy Utilization

n  Results table, Table 4

42

Example 2: Obesity and Neighborhood Condition2

n  From abstract

n  3 Singh G. et al., Neighborhood Socioeconomic Conditions, Built Environments, And Childhood Obesity. Health Affairs, 2010, Vol. 29: No 3: 503-512

43

Example 2: Obesity and Neighborhood Condition

n  From abstract

44

Example 2: Obesity and Neighborhood Condition

n  Extensive Footnote

45

Example 2: Obesity and Neighborhood Condition

n  Results

46

Example 3: HIV, HBV, HCV in IVDUs3

n  Abstract 3 Akselrod H, et al. Seroprevalence of HIV, Hepatitis B Virus, and HCV Among Injection Drug Users in Connecticut: Understanding Infection and Coinfection Risks in a Nonurban Population. American Journal of Public Health. (Published online ahead of print October 17, 2013)

47

Example 3: HIV, HBV, HCV in IVDUs

n  From Methods Section

48

Example 3: HIV, HBV, HCV in IVDUs

n  Results: small portion of table

49

Example 3: HIV, HBV, HCV in IVDUs

n  Results: another portion of table (with header appended)

50

Example 3: HIV, HBV, HCV in IVDUs

n  Interesting….

51

Example 4: Alcohol and Mortalilty: Case/Control 4

n  Abstract, part 1 4 Leon D, et al. Hazardous alcohol drinking and premature mortality in Russia: a population based case-control study. Lancet. 2007; 369: 2001–09

52

Example 4: Alcohol and Mortalilty: Case/Control 4

n  Abstract, part 2 4 Leon D, et al. Hazardous alcohol drinking and premature mortality in Russia: a population based case-control study. Lancet. 2007; 369: 2001–09

53

Example 4: Alcohol and Mortality: Case/Control

n  From Methods Section

“Logistic regression was used to estimate the strength of association of factors with mortality, with all analyses done with STATA (version 9.1). In all models, age was included in six 5-year categories. Education, smoking, and marital status were treated as potential confounders, and where appropriate were introduced into models as categorical variables.”

54

Example 4: Alcohol and Mortality: Case/Control

n  From Methods Section

“Logistic regression was used to estimate the strength of association of factors with mortality, with all analyses done with STATA (version 9.1). In all models, age was included in six 5-year categories. Education, smoking, and marital status were treated as potential confounders, and where appropriate were introduced into models as categorical variables.”

55

Example 4: Alcohol and Mortality: Case/Control

n  Results table

56

Example 5: How Not to Present Results

n  From an article that will not be named

57

top related