race 616: advance analysis in medical research jan 5 –feb ......logistic regression 1 i, p1 roc...

49
1/4/2017 1 RACE 616: Advance Analysis in Medical Research Jan 5 th – Feb 7 th 2017 Ammarin Thakkinstian, Ph.D. Section for Clinical Epidemiology and Biostatistics (CEB) Email: [email protected] http://www.ceb-rama.org Application: CEB-RAMA Phone: 022011762 Curse outline Contents Session Module Assignments Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1 I Clinical prediction scores (cont.) 1 I, P2 1 Log-linear & Poisson regression 1 II 2 Tutoring: wrap up, questions & answers Survival analysis : KM & Cox regression 1 III, P1 3 Survival analysis II: Competing risk model 2 III, P2 Survival analysis III: Multi-state model 3 III, P3 4 Sample size estimation 4 III, P4

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

1

RACE 616: Advance Analysis in Medical Research

Jan 5th – Feb 7th 2017Ammarin Thakkinstian, Ph.D.

Section for Clinical Epidemiology and Biostatistics (CEB)

Email: [email protected]

http://www.ceb-rama.org

Application: CEB-RAMA

Phone: 022011762

Curse outlineContents

Session Module Assignments

Logistic regression 1 I, P1

ROC curve analysis & Clinical prediction score

1 I

Clinical prediction scores (cont.) 1 I, P2 1

Log-linear & Poisson regression 1 II 2

Tutoring: wrap up, questions & answers

Survival analysis : KM & Cox regression 1 III, P1 3

Survival analysis II: Competing risk model

2 III, P2

Survival analysis III: Multi-state model 3 III, P3 4

Sample size estimation 4 III, P4

Page 2: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

2

Longitudinal data analysis I

1 IV

Longitudinal data analysis II

2 IV 5

Curse outlineContents

Session Module Assignments

•EvaluationFive assignments Due ~ 2 weeks after finishing that topic

• Resource

– http://www.ra.mahidol.ac.th/dpt/CEB/Downloadmodule

– CEB-RAMA application

• Modules

• Data

• Assignments

• Slides

• Further readings – Appendix 1-12

Page 3: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

3

Reference• Hosmer DW, Lemeshow S. Applied logistic regression,

2ndedition. New York: John Weiley & Sons, Inc 2000.• Klienbaum GD., Kupper LL, Muller EK, and Nizam A.

Allied regression analysis and other multivariable methods, 3rd edition. Washington: Duxbury Press 1998; 39 - 212.

• Pagano M. and Gauvreau K. Principle of Biostatistics. California: Duxbury Press 1993; 379 - 424.

• Moons KG, Kengne AP, Woodward M, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012;98(9):683-90.

• North RA, McCowan LM, Dekker GA, et al. Clinical risk prediction for pre-eclampsia in nulliparous women: development of model in international prospective cohort. BMJ 2011;342:d1875.

• Cook NR, Paynter NP. Performance of reclassification statistics in comparing risk prediction models. Biom J 2011;53(2):237-58.

• Pencina MJ, D'Agostino RB, Sr., D'Agostino RB, Jr., et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008;27(2):157-72; discussion 207-12.

Page 4: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

4

Logistic regression analysis

Objective• Apply logistic regression properly

• Construct the logit equation

• Estimate the probability of event, the adjusted odds ratio and its 95% confidence interval

• Interpret the results of logistic regression analysis

• Assess goodness of fit of the logit model & diagnostic measuring

Page 5: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

5

Objective

• Develop a clinical prediction model using the logit equation

• Calibrate the cut-off or threshold

• Assess model performances – Calibration

– Validation

• Perform internal and external validations

Outline of talk• Construct logistic equation

– Simple logistic model

– Multiple logistic model• Model selection

– Assessing a goodness of fit of the model

– Diagnostic measures

• Creating a clinical prediction score – Derivative phase

– Validation phase • Internal validation

• External validation

Page 6: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

6

When should apply logistic regression

• Assessing association between factors and the outcome– Outcome

• Dichotomous only – DM/Non-Dm, HT/Non-HT, CKD/non-CKD,

Retinopathy/Non-Ratinopathy,

– Studied factors

• Can be either continuous or categorical variables

• Construct risk/prognostic prediction models

Example 1

Factors associate with acute stroke

• Design: Case-control study

• Outcome variable: Case vs Control

– Case is patient who has been diagnosed as haemorhagic or ischemic stroke

– Control is subject who has never had history of stroke

Page 7: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

7

• Interested variables – Demographic variables

• Age, gender, BMI, Waist-hip ratio

– Risk behaviour • Smoking, alcohol consumption

• Physical activity

– History of illness

• DM

• HT

• High Cholesterol, LDL, HDL, Trig

• Variables (cont)– Genetic factors

• tissue-type plasminogen activator (t-PA)

• R353Q polymorphism of the Factor VII gene

• Platelet glycoprotein (GP 1bα) gene –Thr/Met & Kozak polymorphisms

Page 8: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

8

Example 2Prognostic factors of retinopathy

in diabetic type 2 patients • Design

– Cohort study

• Study period– 10 years

• Outcome– Retinopathy vs Non-retinopathy

• Variables

– Demographic data

• Age, gender BMI/Waist-hip ratio, smoking, alcohol

– History of disease

• HT

• Abnormal lipid profile

– Clinical data

• SBP/DBP

• Kidney function (GFR or Cr)

• HA1C

• Medication

– ACR-I, ARB

Page 9: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

9

Example 3Risk factors of chronic kidney

disease (CKD) • Design

– Cross-sectional survey study

• Outcome – CKD versus non-CKD

• Variables – Demographic variables

• Age, gender, BMI/Waist-hip ratio

– Risk/preventive behaviours• Alcohol consumption

• Smoking

• Exercise & Physical activity

– Co-morbidity • DM, HT, Abnormal lipid profile, kidney stone

– Medications • NSAID, Cyclo-oxygenase type 2 inhibitor

(Cox-2), Traditional medicine

Page 10: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

10

Example 4 Does MPV associate with progression

of cardio-vascular diseases ?

• Design – EGAT Cohort with Major cardio-vascular

diseases

• Study period – 1997-2012

• Outcome – Cardiovascular death

• Studied variables – MPV

• Covariables– Demographic variables

• Age, gender, BMI/Waist-hip ratio

– Risk/preventive behaviours• Alcohol consumption

• Smoking

• Exercise & Physical activity

– Co-morbidity • DM, HT, Abnormal lipid profile

Page 11: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

11

Example 5 Factors associate with sleep apnea

• Design – Cross-sectional study of subjects who were

on the waiting lists of performing polysormnography at sleep lab centre, Royal Newcastle Hospital, Newcastle, Au.

• Variables– Demographic variables

– Sleep variables

– Co-morbid

Variables Description Categorical

variables

Code/value

Age Age at performing PS, year

< 3030 - 4445 - 59

> 60

1234

BMI Body mass index, weight/ ht2(m)

< 2525 - 29.930 - 39.9

> 40

1234

Sex Gender MaleFemale

12

Page 12: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

12

Variables Description Categorical

variables

Code/value

Snoring History of snoring YesNo

12

Stopping breathing

History of stop breathing YesNo

12

Choking History of choking during sleeping

YesNo

12

Waking up refreshed

History of being refreshed after waking up

Yessometim

eNo

123

Leg kicking History of kicking leg during sleep

Yessometim

eNo

123

Variables Description Categorical

variables

Code/value

DM History of diabetes YesNo

12

Hypertension

History of high blood pressure

YesNo

12

Allergy History of allergy YesNo

12

Outcome: ahi

Apnoea-hypopnoea index

> 5≤ 5

10

Page 13: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

13

Assess associations between categorical variables

| snoreSA | 1 2 | Total

-----------+----------------------+----------0 | 149 119 | 268 1 | 488 81 | 569

-----------+----------------------+----------Total | 637 200 | 837

2x2 contingency table

Ho: Snore and sleep apnea (SA) are independent OR Ho: P1=P2

• Statistical test – Chi-square

– Exact test

• Magnitude of association – Odds ratio – Risk ratio

Page 14: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

14

| snore

SA | 1 2 | Total

-----------+----------------------+----------

0 | 149 119 | 268

| 55.60 44.40 | 100.00

-----------+----------------------+----------

1 | 488 81 | 569

| 85.76 14.24 | 100.00

-----------+----------------------+----------

Total | 637 200 | 837

| 76.11 23.89 | 100.00

Pearson chi2(1) = 91.1762 Pr = 0.000

cc SA snore2

Proportion

| Exposed Unexposed | Total Exposed

-----------------+------------------------+------------------------

Cases | 488 81 | 569 0.8576

Controls | 149 119 | 268 0.5560

-----------------+------------------------+------------------------

Total | 637 200 | 837 0.7611

| |

| Point estimate | [95% Conf. Interval]

|------------------------+------------------------

Odds ratio | 4.811666 | 3.387801 6.83543 (exact)

Attr. frac. ex. | .7921718 | .7048233 .8537034 (exact)

Attr. frac. pop | .6794022 |

+-------------------------------------------------

chi2(1) = 91.18 Pr>chi2 = 0.0000

Page 15: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

15

Attributable risk • Attributable fraction of exposure

AF = (OR-1)/OR– The proportion (number) of cases that can be attributed to that

exposure

• Population attributable risk (PAR)PAR = AFxa/n1 or

PAR = Pe (ORe-1) / [1 + Pe (ORe-1)]– The proportion (or number) of cases that would not occur if the

factor was eliminated

– A= a number of expose in cases– n1 = a number of cases

2x4 contingency tablestab SA age_gr, col

+-------------------+

| Key |

|-------------------|

| frequency |

| column percentage |

+-------------------+

| age_gr

SA | <30 30-44 45-60 60+ | Total

-----------+--------------------------------------------+----------

0 | 53 99 79 37 | 268

| 70.67 40.41 25.99 17.37 | 32.02

-----------+--------------------------------------------+----------

1 | 22 146 225 176 | 569

| 29.33 59.59 74.01 82.63 | 67.98

-----------+--------------------------------------------+----------

Total | 75 245 304 213 | 837

| 100.00 100.00 100.00 100.00 | 100.00

Page 16: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

16

Ho: Odds1=Odd2=…,=Oddsk

tabodds SA age_gr

------------------------------------------------------------------------

age_gr | cases controls odds [95% Conf.Interval]

------------+------------------------------------------------------------

<30 | 22 53 0.41509 0.25250 0.68238

30-44 | 146 99 1.47475 1.14261 1.90344

45-60 | 225 79 2.84810 2.20413 3.68021

60+ | 176 37 4.75676 3.33708 6.78041

-------------------------------------------------------------------------

Test of homogeneity (equal odds): chi2(3) = 85.36

Pr>chi2 = 0.0000

tabodds SA agegr, or

-------------------------------------------------------------------------

agegr | Odds Ratio chi2 P>chi2 [95% Conf. Interval]

-------------+----------------------------------------------------------

1 | 1.000000 . . . .

2 | 3.552801 21.02 0.0000 1.991142 6.339274

3 | 6.861335 52.77 0.0000 3.751518 12.549031

4 | 11.459459 73.08 0.0000 5.643031 23.271040

-------------------------------------------------------------------------

Test of homogeneity (equal odds): chi2(3) = 85.36

Pr>chi2 = 0.0000

Score test for trend of odds: chi2(1) = 76.90

Pr>chi2 = 0.0000

Page 17: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

17

Confounder effects• Confounders

• Crude OR versus Adjusted OR

-> -> snore = 1

+-------------------+| Key ||-------------------|| frequency || column percentage |+-------------------+

| SAchoking | 0 1 | Total

-----------+----------------------+----------0 | 31 93 | 124 | 20.81 19.06 | 19.47

-----------+----------------------+----------1 | 118 395 | 513 | 79.19 80.94 | 80.53

-----------+----------------------+----------Total | 149 488 | 637

| 100.00 100.00 | 100.00

Page 18: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

18

-> snore = 0

+-------------------+

| Key |

|-------------------|

| frequency |

| column percentage |

+-------------------+

| SA

choking | 0 1 | Total

-----------+----------------------+----------

0 | 58 32 | 90

| 48.74 39.51 | 45.00

-----------+----------------------+----------

1 | 61 49 | 110

| 51.26 60.49 | 55.00

-----------+----------------------+----------

Total | 119 81 | 200

| 100.00 100.00 | 100.00

Page 19: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

19

Effect modifier

Logistic equations

• Consider > 2 variables simultaneously

• Linear regression

Page 20: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

20

Age & Sleep apnea

Group Age SA Non-SA n

MeanP

1 < 30 22 53 75 0.29

2 30-44 146 99 245 0.60

3 45-60 225 79 304 0.74

4 60+ 176 37 213 0.83

Page 21: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

21

• Mean value of SA given age group

• E(Y|X)

• Expected value (mean) of SA given X

0 ≤E(Y|X) ≤ 1

Page 22: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

22

Logit equation:

Page 23: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

23

Simple logistic regression

• Fit equation

Page 24: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

24

Performing analysis in STATA

xi: logit SA i.snore, nolog

i.snore _Isnore_1-2 (naturally coded; _Isnore_2 omitted)

Logistic regression Number of obs = 837

LR chi2(1) = 86.63

Prob > chi2 = 0.0000

Log likelihood = -481.49775 Pseudo R2 = 0.0825

------------------------------------------------------------------------------

SA | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Isnore_1 | 1.571043 .1717837 9.15 0.000 1.234354 1.907733

_cons | -.3846743 .1440453 -2.67 0.008 -.6669979 -.1023508

------------------------------------------------------------------------------

Interpretation

• Patients with a history of snoring have the logit of sleep apnea 1.57 higher than patients without a history of snoring.

Page 25: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

25

Interpretation

• The logit of sleep apnea for patients with & without a history of snoring is therefore equated

Interpretation

Page 26: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

26

Testing association

• Wald test

Testing association

• Likelihood ratio test

Page 27: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

27

Estimate probability of having event

Multiple logistic regression

• Multiple factors associate with the outcome of interest

• Osteoporotic hip fracture

– Age, BMI, use of Corticosteroid, alcohol consumption, calcium intake, etc

Page 28: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

28

• CKD– Age, Gender, BMI, use of NSAID,

diabetes, HT, Chol

• SA– Age, gender, BMI, snore, stop

breathing, etc

Multiple logistic regression

• Consider > 1 factor simultaneously

• Cumulative factors can better predict event than one factor

• Control confounding effects, i.e., assess effect of each factor controlling for other factors

Page 29: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

29

Steps of analysis

• Model selection –Not too many variables

–Only variables can well explain the interested event • Clinical significance

• Statistical significance

Model selection– Univariated analysis

– Multi-variated model

• Selection methods–Backward

–Forward

• Model comparison

– Likelihood ratio test = G = -2[LL0 - LL1]

–Wald test = β/se

–AIC/BIC

Page 30: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

30

Model selectionI) Univariate analysis

• Demographic varaibles

– age_gr , sex, BMI_gr,

• Sleep variables

– snore, stop_bre, choking, awake_re, kick_leg, accident, ess

• Risk behaviour

– smoker, alcohol,

• Co-morbid

– ht, dm allergy

Dealing with continuous • Compare mean/median between two

groups of SA

• Fit it as it is in the logit model – Keep all possible information

– Linear, polynomial, fractional polynomial relationship

• Categorization – Using previous reference range

– Likelihood ratio test

– Yuden’s index

Page 31: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

31

Fractional polynomial regression

Allows

• Logarithm transformation

• Non-integer powers (e.g., -0.5), and

• Repeated powers (e.g., 0.5, 0.5)

• Equation

Page 32: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

32

Age

fp <age>: logit SA <age>

(fitting 44 models)

(....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%)

Fractional polynomial comparisons:

--------------------------------------------------------------------

age | df Deviance Dev. dif. P(*) Powers

-------------+------------------------------------------------------

omitted | 0 1049.621 91.311 0.000

linear | 1 967.235 8.926 0.030 1

m = 1 | 2 958.513 0.204 0.903 -1

m = 2 | 4 958.309 0.000 -- -.5 3

--------------------------------------------------------------------

(*) P = sig. level of model with m = 2 based on chi^2 of dev. dif.

• FP will search for powers with a degree of 2 by choosing from a range of possible power (-2,-1, -0.5,0,0.5,1,2,3)

• Deviance (-2LL) of that model compares with the model with lowest -2LL– The model df 1 (i.e. linear) is compared to the

model df 4

– D = 967.235 -958.309 =8.926 ; df = 4-1 = 3; p value of 0.030.

– FP with power of (-.5 3) is better than linear model

Page 33: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

33

Age is transformed to be age_1 and age_2; detail of transformation can be seen from

. fracpoly logit SA age

........

-> gen double Iage__1 = X^-.5-.4491638012 if e(sample)

-> gen double Iage__2 = X^3-121.7787491 if e(sample)

(where: X = age/10)

That is age is transformed to be age/10, then power with

-(0.5-.4491638012) for age_1

(3-121.7787491) for age_2

• Suggestion from FP can lead to generate new fp variables

fp gen age1 = age^(-.5 3)

logit SA age1_1 age1_2

• OR fitting the command fp <age>, fp(-.5 3): logit SA <age>

Page 34: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

34

Factors Group P valueSAn = (%) Non-SA n = (%)

Age < 30

30 - 4445 - 59

> 60

GenderMaleFemale

BMI< 25

25 - 29.930 - 39.9

> 40

TABLE 1. Patients’ characteristics between SA and non-SA groups

Snoring YesNo

Stopping breathing YesNo

Choking YesNo

Waking up refreshed YesSometimeNo

Leg kicking YesSometimeNo

Accident due to sleepinessYesNo

Page 35: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

35

ESS score, median (range)

Smoking YesEx-smoke No

Alcohol consumptionYesNo

HypertensionYesNo

Diabetes mellitusYesNo

Allergy YesNo

Factors Coefficient SE P value OR (95% CI)

TABLE 2. Factors associated with SA (AHI > 5): multiple logistic regression analysis

Page 36: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

36

Factors Scoring Score for individual

……………………..

……………………..

……………………..

……………………..

……………………..Total score ……………………..

TABLE 3. Scoring scheme: steps used to calculate prediction scores

Score Probability of SA

Derivation Validation

Groups LR+

(95% CI)

PPV Group LR+

(95% CI)

PPV

SA Non-SA

SA Non-SA

low

medium

high

Table 4. Percentage of sleep apnoea according to prediction score category in derivation and validation phases

Page 37: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

37

Model selection• II) Multivariate analysis by

simultaneously considering variables p < 0.10 into the model– Stepwise backward/forward selection using

LR test

• III) AIC/BIC – Leaps-and-bound selection

– gvselect (SJ15-4)

• Akaike information criterion (AIC)

Fitness and complexity

Page 38: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

38

• Bayesian information criterion

• BIC = -2(LL) +ln(N)k– N = Number of observations

– K = number of parameters estimated

• Given two models fit on the same data, the model with the smaller value of the information criterion is considered to be better

Leaps-and-bound selection

Page 39: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

39

xi: gvselect <term> (i.age_gr) i.sex (i.BMI_gr) i.snore i.stop(i.choking) i.awake i.accident i.ht : logit SA <term>

Optimal models:

# Preds LL AIC BIC

1 -479.1029 962.2059 971.6655

2 -452.2264 910.4527 924.6422

3 -438.4437 884.8875 903.8067

4 -425.4238 860.8477 884.4968

5 -415.7195 843.4391 871.818

6 -407.2472 828.4944 861.6032

7 -401.5824 819.1647 857.0033

8 -396.221 810.4419 853.0104

9 -391.8127 803.6253 850.9236

10 -390.6937 803.3873 855.4154

predictors for each model:

1 : _Istop_bre_2

2 : _Isex_2 _Isnore_1

3 : _Iage_gr_4 _Isex_2 _Isnore_1

4 : _Iage_gr_4 _Isex_2 _Istop_bre_2 _Isnore_1

5 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Isnore_1

6 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1

7 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _Iage_gr_2

8 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _IBMI_gr_39

_Iage_gr_2

9 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _IBMI_gr_39

_Iage_gr_2 _IBMI_gr_29

10 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _IBMI_gr_39

_Iage_gr_2 _IBMI_gr_29 _Iawake_re_2

Page 40: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

40

logit SA stop_bre1 agegr2 agegr3 agegr4 sex2 BMI_gr2 BMI_gr3 BMI_gr4 snore2 estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

. | 837 -524.8103 -391.8127 10 803.6253 850.9236

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note.

Performance of the model

• Calibration – How similar are the predicted and observed

outcomes?

– Testing: Goodness of fit

All possible patterns = 2x4x4x2x2 = 128

Page 41: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

41

Hosmer-Lemeshow GOFestat gof, table gr(10)

Logistic model for SA, goodness-of-fit test

(Table collapsed on quantiles of estimated probabilities)

+--------------------------------------------------------+

| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |

|-------+--------+-------+-------+-------+-------+-------|

| 1 | 0.2359 | 8 | 10.3 | 79 | 76.7 | 87 |

| 2 | 0.4791 | 37 | 33.8 | 52 | 55.2 | 89 |

| 3 | 0.6222 | 56 | 54.0 | 38 | 40.0 | 94 |

| 4 | 0.6885 | 49 | 44.7 | 19 | 23.3 | 68 |

| 5 | 0.7651 | 57 | 59.1 | 24 | 21.9 | 81 |

|-------+--------+-------+-------+-------+-------+-------|

| 6 | 0.8307 | 86 | 89.9 | 25 | 21.1 | 111 |

| 7 | 0.8681 | 56 | 57.9 | 12 | 10.1 | 68 |

| 8 | 0.8926 | 90 | 87.2 | 8 | 10.8 | 98 |

| 9 | 0.9417 | 103 | 102.4 | 7 | 7.6 | 110 |

| 10 | 0.9750 | 27 | 29.7 | 4 | 1.3 | 31 |

+--------------------------------------------------------+

number of observations = 837

number of groups = 10

Hosmer-Lemeshow chi2(8) = 10.65

Prob > chi2 = 0.2224

.

#delimit;

disp ((8-10.3)^2/(10.3*(1-10.3/87))+

(37-33.8)^2/(33.8*(1-33.8/89))+

(56-54.8)^2/(54.8*(1-54.8/94))+

(49-44.7)^2/(44.7*(1-44.7/68))+

(86-89.9)^2/(89.9*(1-89.8/111)) +

(56-57.9)^2/(57.9*(1-57.9/68))+

(90-87.2)^2/(87.2*(1-87.2/98))+

(103-102.4)^2/(102.4*(1-102.4/110))+

(27-29.7)^2/(29.7*(1-29.7/31))) ;10.366731

HL Chi2 = sum[(oj-ej)2/ej(1-ej/nj)]

Page 42: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

42

O/E

•#delimit;

disp ((8/10.3)+(79/76.7)+ (37-33.8)+(52/55.2)+

(56/54.8)+(38/40)+(49/44.7)+(19/23.3)+ (57/59.1)+(24/21.9)+(86/89.9)+(25/21.1) +(56/57.9)+(12/10.1)+ (90/87.2)+(8/10.8)+(103/102.4)+(7/7.6)+ (27/29.7)+(4/1.3))/20 ;

1.1937575

notes: sum(oj/ej); j=1,...,20

Model performance

• Discrimination – Assign the cut-off/threshold

– Construct 2x2 table

– Estimate predictive values • Sen

• Spec

• PPV, NPV

• Accuracy

• Area under ROC or Concordance (C) statistics

Page 43: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

43

Model discrimination

• Area under the ROC – Also know as C statistic

– Summary statistics that can tell us whether the logit model can discriminate disease from non-disease subjects.

– Plots sensitivity versus 1-specificity (false positive) for the whole range of estimated probabilities

Page 44: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

44

Interpretation of ROC

Area under ROC Interpretation

0.5 ≤ ROC < 0.6 Fail

0.6 ≤ ROC < 0.7 Poor

0.7 ≤ ROC < 0.8 Fair

0.8 ≤ ROC < 0.9 Good

≥ 0.9Excellent

Page 45: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

45

Diagnostic measures • Residuals

– Pearson’s chi-square residual

– Deviance residual

Page 46: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

46

Outliers• Leverage hjj values • reflects distance of Xj from the centre mean• The higher the hjj, the longer distance from the

centre mean

Influence of outliers

• Influence on prediction value of Y

• Including/excluding the pattern/s that are outlier would change Y values

• Pearson residual change

Page 47: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

47

• Deviance residual change

Influence on coefficient estimation

Page 48: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

48

Page 49: RACE 616: Advance Analysis in Medical Research Jan 5 –Feb ......Logistic regression 1 I, P1 ROC curve analysis & Clinical prediction score 1I Clinical prediction scores (cont.) 1

1/4/2017

49