race 616: advance analysis in medical research jan 5 –feb ......logistic regression 1 i, p1 roc...
TRANSCRIPT
1/4/2017
1
RACE 616: Advance Analysis in Medical Research
Jan 5th – Feb 7th 2017Ammarin Thakkinstian, Ph.D.
Section for Clinical Epidemiology and Biostatistics (CEB)
Email: [email protected]
http://www.ceb-rama.org
Application: CEB-RAMA
Phone: 022011762
Curse outlineContents
Session Module Assignments
Logistic regression 1 I, P1
ROC curve analysis & Clinical prediction score
1 I
Clinical prediction scores (cont.) 1 I, P2 1
Log-linear & Poisson regression 1 II 2
Tutoring: wrap up, questions & answers
Survival analysis : KM & Cox regression 1 III, P1 3
Survival analysis II: Competing risk model
2 III, P2
Survival analysis III: Multi-state model 3 III, P3 4
Sample size estimation 4 III, P4
1/4/2017
2
Longitudinal data analysis I
1 IV
Longitudinal data analysis II
2 IV 5
Curse outlineContents
Session Module Assignments
•EvaluationFive assignments Due ~ 2 weeks after finishing that topic
• Resource
– http://www.ra.mahidol.ac.th/dpt/CEB/Downloadmodule
– CEB-RAMA application
• Modules
• Data
• Assignments
• Slides
• Further readings – Appendix 1-12
1/4/2017
3
Reference• Hosmer DW, Lemeshow S. Applied logistic regression,
2ndedition. New York: John Weiley & Sons, Inc 2000.• Klienbaum GD., Kupper LL, Muller EK, and Nizam A.
Allied regression analysis and other multivariable methods, 3rd edition. Washington: Duxbury Press 1998; 39 - 212.
• Pagano M. and Gauvreau K. Principle of Biostatistics. California: Duxbury Press 1993; 379 - 424.
• Moons KG, Kengne AP, Woodward M, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012;98(9):683-90.
• North RA, McCowan LM, Dekker GA, et al. Clinical risk prediction for pre-eclampsia in nulliparous women: development of model in international prospective cohort. BMJ 2011;342:d1875.
• Cook NR, Paynter NP. Performance of reclassification statistics in comparing risk prediction models. Biom J 2011;53(2):237-58.
• Pencina MJ, D'Agostino RB, Sr., D'Agostino RB, Jr., et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008;27(2):157-72; discussion 207-12.
1/4/2017
4
Logistic regression analysis
Objective• Apply logistic regression properly
• Construct the logit equation
• Estimate the probability of event, the adjusted odds ratio and its 95% confidence interval
• Interpret the results of logistic regression analysis
• Assess goodness of fit of the logit model & diagnostic measuring
1/4/2017
5
Objective
• Develop a clinical prediction model using the logit equation
• Calibrate the cut-off or threshold
• Assess model performances – Calibration
– Validation
• Perform internal and external validations
Outline of talk• Construct logistic equation
– Simple logistic model
– Multiple logistic model• Model selection
– Assessing a goodness of fit of the model
– Diagnostic measures
• Creating a clinical prediction score – Derivative phase
– Validation phase • Internal validation
• External validation
1/4/2017
6
When should apply logistic regression
• Assessing association between factors and the outcome– Outcome
• Dichotomous only – DM/Non-Dm, HT/Non-HT, CKD/non-CKD,
Retinopathy/Non-Ratinopathy,
– Studied factors
• Can be either continuous or categorical variables
• Construct risk/prognostic prediction models
Example 1
Factors associate with acute stroke
• Design: Case-control study
• Outcome variable: Case vs Control
– Case is patient who has been diagnosed as haemorhagic or ischemic stroke
– Control is subject who has never had history of stroke
1/4/2017
7
• Interested variables – Demographic variables
• Age, gender, BMI, Waist-hip ratio
– Risk behaviour • Smoking, alcohol consumption
• Physical activity
– History of illness
• DM
• HT
• High Cholesterol, LDL, HDL, Trig
• Variables (cont)– Genetic factors
• tissue-type plasminogen activator (t-PA)
• R353Q polymorphism of the Factor VII gene
• Platelet glycoprotein (GP 1bα) gene –Thr/Met & Kozak polymorphisms
1/4/2017
8
Example 2Prognostic factors of retinopathy
in diabetic type 2 patients • Design
– Cohort study
• Study period– 10 years
• Outcome– Retinopathy vs Non-retinopathy
• Variables
– Demographic data
• Age, gender BMI/Waist-hip ratio, smoking, alcohol
– History of disease
• HT
• Abnormal lipid profile
– Clinical data
• SBP/DBP
• Kidney function (GFR or Cr)
• HA1C
• Medication
– ACR-I, ARB
1/4/2017
9
Example 3Risk factors of chronic kidney
disease (CKD) • Design
– Cross-sectional survey study
• Outcome – CKD versus non-CKD
• Variables – Demographic variables
• Age, gender, BMI/Waist-hip ratio
– Risk/preventive behaviours• Alcohol consumption
• Smoking
• Exercise & Physical activity
– Co-morbidity • DM, HT, Abnormal lipid profile, kidney stone
– Medications • NSAID, Cyclo-oxygenase type 2 inhibitor
(Cox-2), Traditional medicine
1/4/2017
10
Example 4 Does MPV associate with progression
of cardio-vascular diseases ?
• Design – EGAT Cohort with Major cardio-vascular
diseases
• Study period – 1997-2012
• Outcome – Cardiovascular death
• Studied variables – MPV
• Covariables– Demographic variables
• Age, gender, BMI/Waist-hip ratio
– Risk/preventive behaviours• Alcohol consumption
• Smoking
• Exercise & Physical activity
– Co-morbidity • DM, HT, Abnormal lipid profile
1/4/2017
11
Example 5 Factors associate with sleep apnea
• Design – Cross-sectional study of subjects who were
on the waiting lists of performing polysormnography at sleep lab centre, Royal Newcastle Hospital, Newcastle, Au.
• Variables– Demographic variables
– Sleep variables
– Co-morbid
Variables Description Categorical
variables
Code/value
Age Age at performing PS, year
< 3030 - 4445 - 59
> 60
1234
BMI Body mass index, weight/ ht2(m)
< 2525 - 29.930 - 39.9
> 40
1234
Sex Gender MaleFemale
12
1/4/2017
12
Variables Description Categorical
variables
Code/value
Snoring History of snoring YesNo
12
Stopping breathing
History of stop breathing YesNo
12
Choking History of choking during sleeping
YesNo
12
Waking up refreshed
History of being refreshed after waking up
Yessometim
eNo
123
Leg kicking History of kicking leg during sleep
Yessometim
eNo
123
Variables Description Categorical
variables
Code/value
DM History of diabetes YesNo
12
Hypertension
History of high blood pressure
YesNo
12
Allergy History of allergy YesNo
12
Outcome: ahi
Apnoea-hypopnoea index
> 5≤ 5
10
1/4/2017
13
Assess associations between categorical variables
| snoreSA | 1 2 | Total
-----------+----------------------+----------0 | 149 119 | 268 1 | 488 81 | 569
-----------+----------------------+----------Total | 637 200 | 837
2x2 contingency table
Ho: Snore and sleep apnea (SA) are independent OR Ho: P1=P2
• Statistical test – Chi-square
– Exact test
• Magnitude of association – Odds ratio – Risk ratio
1/4/2017
14
| snore
SA | 1 2 | Total
-----------+----------------------+----------
0 | 149 119 | 268
| 55.60 44.40 | 100.00
-----------+----------------------+----------
1 | 488 81 | 569
| 85.76 14.24 | 100.00
-----------+----------------------+----------
Total | 637 200 | 837
| 76.11 23.89 | 100.00
Pearson chi2(1) = 91.1762 Pr = 0.000
cc SA snore2
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+------------------------
Cases | 488 81 | 569 0.8576
Controls | 149 119 | 268 0.5560
-----------------+------------------------+------------------------
Total | 637 200 | 837 0.7611
| |
| Point estimate | [95% Conf. Interval]
|------------------------+------------------------
Odds ratio | 4.811666 | 3.387801 6.83543 (exact)
Attr. frac. ex. | .7921718 | .7048233 .8537034 (exact)
Attr. frac. pop | .6794022 |
+-------------------------------------------------
chi2(1) = 91.18 Pr>chi2 = 0.0000
1/4/2017
15
Attributable risk • Attributable fraction of exposure
AF = (OR-1)/OR– The proportion (number) of cases that can be attributed to that
exposure
• Population attributable risk (PAR)PAR = AFxa/n1 or
PAR = Pe (ORe-1) / [1 + Pe (ORe-1)]– The proportion (or number) of cases that would not occur if the
factor was eliminated
– A= a number of expose in cases– n1 = a number of cases
2x4 contingency tablestab SA age_gr, col
+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+
| age_gr
SA | <30 30-44 45-60 60+ | Total
-----------+--------------------------------------------+----------
0 | 53 99 79 37 | 268
| 70.67 40.41 25.99 17.37 | 32.02
-----------+--------------------------------------------+----------
1 | 22 146 225 176 | 569
| 29.33 59.59 74.01 82.63 | 67.98
-----------+--------------------------------------------+----------
Total | 75 245 304 213 | 837
| 100.00 100.00 100.00 100.00 | 100.00
1/4/2017
16
Ho: Odds1=Odd2=…,=Oddsk
tabodds SA age_gr
------------------------------------------------------------------------
age_gr | cases controls odds [95% Conf.Interval]
------------+------------------------------------------------------------
<30 | 22 53 0.41509 0.25250 0.68238
30-44 | 146 99 1.47475 1.14261 1.90344
45-60 | 225 79 2.84810 2.20413 3.68021
60+ | 176 37 4.75676 3.33708 6.78041
-------------------------------------------------------------------------
Test of homogeneity (equal odds): chi2(3) = 85.36
Pr>chi2 = 0.0000
tabodds SA agegr, or
-------------------------------------------------------------------------
agegr | Odds Ratio chi2 P>chi2 [95% Conf. Interval]
-------------+----------------------------------------------------------
1 | 1.000000 . . . .
2 | 3.552801 21.02 0.0000 1.991142 6.339274
3 | 6.861335 52.77 0.0000 3.751518 12.549031
4 | 11.459459 73.08 0.0000 5.643031 23.271040
-------------------------------------------------------------------------
Test of homogeneity (equal odds): chi2(3) = 85.36
Pr>chi2 = 0.0000
Score test for trend of odds: chi2(1) = 76.90
Pr>chi2 = 0.0000
1/4/2017
17
Confounder effects• Confounders
• Crude OR versus Adjusted OR
-> -> snore = 1
+-------------------+| Key ||-------------------|| frequency || column percentage |+-------------------+
| SAchoking | 0 1 | Total
-----------+----------------------+----------0 | 31 93 | 124 | 20.81 19.06 | 19.47
-----------+----------------------+----------1 | 118 395 | 513 | 79.19 80.94 | 80.53
-----------+----------------------+----------Total | 149 488 | 637
| 100.00 100.00 | 100.00
1/4/2017
18
-> snore = 0
+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+
| SA
choking | 0 1 | Total
-----------+----------------------+----------
0 | 58 32 | 90
| 48.74 39.51 | 45.00
-----------+----------------------+----------
1 | 61 49 | 110
| 51.26 60.49 | 55.00
-----------+----------------------+----------
Total | 119 81 | 200
| 100.00 100.00 | 100.00
1/4/2017
19
Effect modifier
Logistic equations
• Consider > 2 variables simultaneously
• Linear regression
1/4/2017
20
Age & Sleep apnea
Group Age SA Non-SA n
MeanP
1 < 30 22 53 75 0.29
2 30-44 146 99 245 0.60
3 45-60 225 79 304 0.74
4 60+ 176 37 213 0.83
1/4/2017
21
• Mean value of SA given age group
• E(Y|X)
• Expected value (mean) of SA given X
0 ≤E(Y|X) ≤ 1
1/4/2017
22
Logit equation:
1/4/2017
23
Simple logistic regression
• Fit equation
1/4/2017
24
Performing analysis in STATA
xi: logit SA i.snore, nolog
i.snore _Isnore_1-2 (naturally coded; _Isnore_2 omitted)
Logistic regression Number of obs = 837
LR chi2(1) = 86.63
Prob > chi2 = 0.0000
Log likelihood = -481.49775 Pseudo R2 = 0.0825
------------------------------------------------------------------------------
SA | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Isnore_1 | 1.571043 .1717837 9.15 0.000 1.234354 1.907733
_cons | -.3846743 .1440453 -2.67 0.008 -.6669979 -.1023508
------------------------------------------------------------------------------
Interpretation
• Patients with a history of snoring have the logit of sleep apnea 1.57 higher than patients without a history of snoring.
1/4/2017
25
Interpretation
• The logit of sleep apnea for patients with & without a history of snoring is therefore equated
Interpretation
1/4/2017
26
Testing association
• Wald test
Testing association
• Likelihood ratio test
1/4/2017
27
Estimate probability of having event
Multiple logistic regression
• Multiple factors associate with the outcome of interest
• Osteoporotic hip fracture
– Age, BMI, use of Corticosteroid, alcohol consumption, calcium intake, etc
1/4/2017
28
• CKD– Age, Gender, BMI, use of NSAID,
diabetes, HT, Chol
• SA– Age, gender, BMI, snore, stop
breathing, etc
Multiple logistic regression
• Consider > 1 factor simultaneously
• Cumulative factors can better predict event than one factor
• Control confounding effects, i.e., assess effect of each factor controlling for other factors
1/4/2017
29
Steps of analysis
• Model selection –Not too many variables
–Only variables can well explain the interested event • Clinical significance
• Statistical significance
Model selection– Univariated analysis
– Multi-variated model
• Selection methods–Backward
–Forward
• Model comparison
– Likelihood ratio test = G = -2[LL0 - LL1]
–Wald test = β/se
–AIC/BIC
1/4/2017
30
Model selectionI) Univariate analysis
• Demographic varaibles
– age_gr , sex, BMI_gr,
• Sleep variables
– snore, stop_bre, choking, awake_re, kick_leg, accident, ess
• Risk behaviour
– smoker, alcohol,
• Co-morbid
– ht, dm allergy
Dealing with continuous • Compare mean/median between two
groups of SA
• Fit it as it is in the logit model – Keep all possible information
– Linear, polynomial, fractional polynomial relationship
• Categorization – Using previous reference range
– Likelihood ratio test
– Yuden’s index
1/4/2017
31
Fractional polynomial regression
Allows
• Logarithm transformation
• Non-integer powers (e.g., -0.5), and
• Repeated powers (e.g., 0.5, 0.5)
• Equation
1/4/2017
32
Age
fp <age>: logit SA <age>
(fitting 44 models)
(....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%)
Fractional polynomial comparisons:
--------------------------------------------------------------------
age | df Deviance Dev. dif. P(*) Powers
-------------+------------------------------------------------------
omitted | 0 1049.621 91.311 0.000
linear | 1 967.235 8.926 0.030 1
m = 1 | 2 958.513 0.204 0.903 -1
m = 2 | 4 958.309 0.000 -- -.5 3
--------------------------------------------------------------------
(*) P = sig. level of model with m = 2 based on chi^2 of dev. dif.
• FP will search for powers with a degree of 2 by choosing from a range of possible power (-2,-1, -0.5,0,0.5,1,2,3)
• Deviance (-2LL) of that model compares with the model with lowest -2LL– The model df 1 (i.e. linear) is compared to the
model df 4
– D = 967.235 -958.309 =8.926 ; df = 4-1 = 3; p value of 0.030.
– FP with power of (-.5 3) is better than linear model
1/4/2017
33
Age is transformed to be age_1 and age_2; detail of transformation can be seen from
. fracpoly logit SA age
........
-> gen double Iage__1 = X^-.5-.4491638012 if e(sample)
-> gen double Iage__2 = X^3-121.7787491 if e(sample)
(where: X = age/10)
That is age is transformed to be age/10, then power with
-(0.5-.4491638012) for age_1
(3-121.7787491) for age_2
• Suggestion from FP can lead to generate new fp variables
fp gen age1 = age^(-.5 3)
logit SA age1_1 age1_2
• OR fitting the command fp <age>, fp(-.5 3): logit SA <age>
1/4/2017
34
Factors Group P valueSAn = (%) Non-SA n = (%)
Age < 30
30 - 4445 - 59
> 60
GenderMaleFemale
BMI< 25
25 - 29.930 - 39.9
> 40
TABLE 1. Patients’ characteristics between SA and non-SA groups
Snoring YesNo
Stopping breathing YesNo
Choking YesNo
Waking up refreshed YesSometimeNo
Leg kicking YesSometimeNo
Accident due to sleepinessYesNo
1/4/2017
35
ESS score, median (range)
Smoking YesEx-smoke No
Alcohol consumptionYesNo
HypertensionYesNo
Diabetes mellitusYesNo
Allergy YesNo
Factors Coefficient SE P value OR (95% CI)
TABLE 2. Factors associated with SA (AHI > 5): multiple logistic regression analysis
1/4/2017
36
Factors Scoring Score for individual
……………………..
……………………..
……………………..
……………………..
……………………..Total score ……………………..
TABLE 3. Scoring scheme: steps used to calculate prediction scores
Score Probability of SA
Derivation Validation
Groups LR+
(95% CI)
PPV Group LR+
(95% CI)
PPV
SA Non-SA
SA Non-SA
low
medium
high
Table 4. Percentage of sleep apnoea according to prediction score category in derivation and validation phases
1/4/2017
37
Model selection• II) Multivariate analysis by
simultaneously considering variables p < 0.10 into the model– Stepwise backward/forward selection using
LR test
• III) AIC/BIC – Leaps-and-bound selection
– gvselect (SJ15-4)
• Akaike information criterion (AIC)
Fitness and complexity
1/4/2017
38
• Bayesian information criterion
• BIC = -2(LL) +ln(N)k– N = Number of observations
– K = number of parameters estimated
• Given two models fit on the same data, the model with the smaller value of the information criterion is considered to be better
Leaps-and-bound selection
1/4/2017
39
xi: gvselect <term> (i.age_gr) i.sex (i.BMI_gr) i.snore i.stop(i.choking) i.awake i.accident i.ht : logit SA <term>
Optimal models:
# Preds LL AIC BIC
1 -479.1029 962.2059 971.6655
2 -452.2264 910.4527 924.6422
3 -438.4437 884.8875 903.8067
4 -425.4238 860.8477 884.4968
5 -415.7195 843.4391 871.818
6 -407.2472 828.4944 861.6032
7 -401.5824 819.1647 857.0033
8 -396.221 810.4419 853.0104
9 -391.8127 803.6253 850.9236
10 -390.6937 803.3873 855.4154
predictors for each model:
1 : _Istop_bre_2
2 : _Isex_2 _Isnore_1
3 : _Iage_gr_4 _Isex_2 _Isnore_1
4 : _Iage_gr_4 _Isex_2 _Istop_bre_2 _Isnore_1
5 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Isnore_1
6 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1
7 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _Iage_gr_2
8 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _IBMI_gr_39
_Iage_gr_2
9 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _IBMI_gr_39
_Iage_gr_2 _IBMI_gr_29
10 : _Iage_gr_4 _Isex_2 _IBMI_gr_40 _Istop_bre_2 _Iage_gr_3 _Isnore_1 _IBMI_gr_39
_Iage_gr_2 _IBMI_gr_29 _Iawake_re_2
1/4/2017
40
logit SA stop_bre1 agegr2 agegr3 agegr4 sex2 BMI_gr2 BMI_gr3 BMI_gr4 snore2 estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 837 -524.8103 -391.8127 10 803.6253 850.9236
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note.
Performance of the model
• Calibration – How similar are the predicted and observed
outcomes?
– Testing: Goodness of fit
All possible patterns = 2x4x4x2x2 = 128
1/4/2017
41
Hosmer-Lemeshow GOFestat gof, table gr(10)
Logistic model for SA, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
+--------------------------------------------------------+
| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
|-------+--------+-------+-------+-------+-------+-------|
| 1 | 0.2359 | 8 | 10.3 | 79 | 76.7 | 87 |
| 2 | 0.4791 | 37 | 33.8 | 52 | 55.2 | 89 |
| 3 | 0.6222 | 56 | 54.0 | 38 | 40.0 | 94 |
| 4 | 0.6885 | 49 | 44.7 | 19 | 23.3 | 68 |
| 5 | 0.7651 | 57 | 59.1 | 24 | 21.9 | 81 |
|-------+--------+-------+-------+-------+-------+-------|
| 6 | 0.8307 | 86 | 89.9 | 25 | 21.1 | 111 |
| 7 | 0.8681 | 56 | 57.9 | 12 | 10.1 | 68 |
| 8 | 0.8926 | 90 | 87.2 | 8 | 10.8 | 98 |
| 9 | 0.9417 | 103 | 102.4 | 7 | 7.6 | 110 |
| 10 | 0.9750 | 27 | 29.7 | 4 | 1.3 | 31 |
+--------------------------------------------------------+
number of observations = 837
number of groups = 10
Hosmer-Lemeshow chi2(8) = 10.65
Prob > chi2 = 0.2224
.
#delimit;
disp ((8-10.3)^2/(10.3*(1-10.3/87))+
(37-33.8)^2/(33.8*(1-33.8/89))+
(56-54.8)^2/(54.8*(1-54.8/94))+
(49-44.7)^2/(44.7*(1-44.7/68))+
(86-89.9)^2/(89.9*(1-89.8/111)) +
(56-57.9)^2/(57.9*(1-57.9/68))+
(90-87.2)^2/(87.2*(1-87.2/98))+
(103-102.4)^2/(102.4*(1-102.4/110))+
(27-29.7)^2/(29.7*(1-29.7/31))) ;10.366731
HL Chi2 = sum[(oj-ej)2/ej(1-ej/nj)]
1/4/2017
42
O/E
•#delimit;
disp ((8/10.3)+(79/76.7)+ (37-33.8)+(52/55.2)+
(56/54.8)+(38/40)+(49/44.7)+(19/23.3)+ (57/59.1)+(24/21.9)+(86/89.9)+(25/21.1) +(56/57.9)+(12/10.1)+ (90/87.2)+(8/10.8)+(103/102.4)+(7/7.6)+ (27/29.7)+(4/1.3))/20 ;
1.1937575
notes: sum(oj/ej); j=1,...,20
Model performance
• Discrimination – Assign the cut-off/threshold
– Construct 2x2 table
– Estimate predictive values • Sen
• Spec
• PPV, NPV
• Accuracy
• Area under ROC or Concordance (C) statistics
1/4/2017
43
Model discrimination
• Area under the ROC – Also know as C statistic
– Summary statistics that can tell us whether the logit model can discriminate disease from non-disease subjects.
– Plots sensitivity versus 1-specificity (false positive) for the whole range of estimated probabilities
1/4/2017
44
Interpretation of ROC
Area under ROC Interpretation
0.5 ≤ ROC < 0.6 Fail
0.6 ≤ ROC < 0.7 Poor
0.7 ≤ ROC < 0.8 Fair
0.8 ≤ ROC < 0.9 Good
≥ 0.9Excellent
1/4/2017
45
Diagnostic measures • Residuals
– Pearson’s chi-square residual
– Deviance residual
1/4/2017
46
Outliers• Leverage hjj values • reflects distance of Xj from the centre mean• The higher the hjj, the longer distance from the
centre mean
Influence of outliers
• Influence on prediction value of Y
• Including/excluding the pattern/s that are outlier would change Y values
• Pearson residual change
1/4/2017
47
• Deviance residual change
Influence on coefficient estimation
1/4/2017
48
1/4/2017
49