feng guo ph d feng guo ph.d. department of statistics ...feng guo ph d feng guo ph.d. department of...
TRANSCRIPT
-
Feng Guo Ph D Feng Guo Ph.D. Department of Statistics
VTTI, CASRVi i i T hVirginia Tech
First Human Factors Symposium: Naturalistic Driving Methods & Analyses
August 26th 2008
-
EventExposure
Crash/near crash
Cell Phone use
Weather condition crashWeather condition
drowsiness
….
Hypothetical examples: Example 1I t li ti t d it i f d th t i 95 t f 100 In a naturalistic study, it is found that in 95 out of 100
crashes observed, the driver was listening to music. Can we conclude that listening to music contributes to crashes?crashes?
-
Example 2Example 2If it is found in 10 crashes, the driver fallen in sleep for more than 6 seconds. Can we conclude that drowsiness/fatigue contributes conclude that drowsiness/fatigue contributes to crashes?
Have to compare with “Normal” (Baseline) conditions!Have to compare with Normal (Baseline) conditions!
• 95% of the times people are listening to music when driving : listening to music is unlikely a risky behavior.
•Essentially nobody sleep when driving: Sleeping during driving is dangerous.
-
• Cohort• Case-control• Case-cohort• Case-crossover
• Major issue: how to reduce bias. • Analysis/modeling is directly related to study design!
-
Exposure Outcome Perspective:Study begins
YES
Case
No No Case
Case
NO
Case
No CaseCase
Forward direction: from exposure to case
-
Exposure Outcome
Retrospective: Study begins
YES
Case
No No Case
Case
NO
Case
No CaseCase
Forward direction: from exposure to case
-
Pros:L t t bi• Least prone to bias– Relative to other observational study designs
• Can address several diseases in same study• Retrospective can be relatively low cost and quick
– Frequently used in occupational studiesCons:Cons:•Loss to follow-up is potential source of bias•Prospective cohort study
–Quite costly and time consumingQ y g–May not find enough cases if disease is rare
-
Exposure Outcome Study begins
YES
Case
NO
Case
YES
NO
Control
Backward direction: from outcome to exposureBackward timing: study begins after outcome
NO
Backward timing: study begins after outcome
-
Pros:Pros:• Less expensive and time-consuming• Optimal for rare diseasesp
– Subjects selected based on disease status• Allows several exposures to be evaluated
– Multiple etiologic factors for a single disease
-
Cons:• More susceptible to selection bias (than cohort studies)
– Presence or absence of exposure may influence selection of disease and non-disease groups
Mo e s s eptible to info mation bias• More susceptible to information bias– Observer bias – Recall bias
• Does not allow direct estimation of risk– Not possible to calculate rate of development of disease given exposure
status • Does not allow several diseases to be evaluated• Generally not feasible for rare exposures
-
•Mixture of cohort case-control •Mixture of cohort, case-control, crossover, and cross-sectional design
•Case-cohort •Case-crossover
-
Exposure Outcome Study begins
YES
Case
NO
Case
YES
NO
Control
Backward direction: from outcome to exposureForward timing: study begins BEFORE outcome
NO
Forward timing: study begins BEFORE outcome
-
Exposure Outcome Study begins
YES
Case
NO
Case
YES
NO
Control
Backward direction: from outcome to exposureForward timing: study begins BEFORE outcome
NO
Forward timing: study begins BEFORE outcome
-
• Several diseases can be studied– In contrast to case-control study
• Less costly and more efficient than cohort study– Smaller number of non-cases
• More prone to measurement error than cohort study– Exposure status determined after cases and control
Unless exposure status at initial cohort enrollment– Unless exposure status at initial cohort enrollment
• Can be more expensive and time-consuming than case-control studycontrol study– Requires identifying original cohort
-
• Case-control Studies: exposure odds ratiop
• Cohort studies: risk odds ratio (ROR)
-
• Cohort StudyE E
• Case-Control StudyE E t t l
FIXED
E+ E-D+ A CD- B D
E+ E- totalCase a c M1Control b d M0D B D
total N+ N-Control b d M0
P(D+|E+)= (A/N+) P(D+|E-)= (C/N-)
P(E+|Case)=a/M1P(E+|Control)=b/M0
aRisk Ratio (RR)= P(D+|E+)/ P(D+|E-)
OR = (a/c) / (b/d)= (ad/bc)
ca
Mc
Ma
CaseEPCaseEPCaseEOdds ==+−
+=+
1
1)|(1
)|()|(
( ) ( ) ( )
Although conceptually very different, the formulas for Risk OR and Exposure OR are the same: AD/BC
-
In case control studies the exposure odds ratio (EOR) In case-control studies, the exposure odds ratio (EOR) approximates the risk ratio when the following 3 conditions are satisfied:
• 1. The rare disease assumption holds
• 2. The choice of controls in the case-control study must be representative of the source population from which the case developed.
• 3. The cases must be incident cases
-
# of Event under drowsinessRate1:Miles (time) traveled under drowsiness
# of Event under NO drowsinessRate2 :Rate2 :Miles (time) traveled under NO drowsiness
If R t 1 i i ifi tl t th R t 2
Drowsy driving length Non-Drowsy driving lengthCases
P bl H k il /ti
If Rate1 is significantly greater than Rate2, we considered drowsiness is a risk factor for safety.
Problem: How can we know miles/time traveled under drowsiness?
-
Odds Ratio Approximation to Rate Ratio
• Cohort StudyE+ E-
• Case-Control StudyE+ E- totalE+ E
Dis+ A C total PT+ PT-
E+ E- totalCase a c M1Control b d M0
• IDR = (A/PT+) / (C/PT-)= (A/C) / (PT+/PT-) • OR = (a/c) / (b/d)
≈ (a/c) / (PT+ / PT-)= IDR
A tiAssumptions:1. M0 subjects are randomly selected via source population2. Their exposure odds (b/d) similar to that in source
population (PT+/PT )population (PT+/PT-).3. Steady state
-
• Modeling 100 car (STSCE): ode g 00 ca (S SC )– Random sampling case-cohort design: non-
matched design– Confounding/interaction factors controlled through – Confounding/interaction factors controlled through
modeling– Incorporate driver specific correlation through
modelsmodels
• Case-crossover design (NHTSA)– Case-crossover sampling: matched design– Part of confounding/interaction factors controlled
through baseline sampling
-
Principle: ideal control group is representative of the source population from which the cases are derivedsource population from which the cases are derived
1. Time variant exposures: risk rate2 Sampling should reflect the odds ratio to risk rate 2. Sampling should reflect the odds ratio to risk rate
principles3. Random sampling stratified by vehicle was adopted
Drowsy driving length Non-Drowsy driving length
Cases
-
• Control for confounding and interaction Control for confounding and interaction factors.
• Multiple events for same participant: u t pe e e ts o sa e pa t cpa tdriver specific correlations!
-
• Stratified analysisStratified analysis– Categorize control variables and form combinations of
categories or strataDrawback of running out of numbers when the number – Drawback of running out of numbers when the number of strata is large
M h i l d li• Mathematical modeling– Use a mathematical expression for predicting the outcome from the
exposure and the control variables– Considerations on choice of model and variables to include in initial
and final model
-
• Generalized linear model (GLM) frameworkBaseline Multinomial model • Baseline Multinomial model – Contrast crash, near-crash, and critical incident with base-line
separately in a same model Th dd ti i dj t d ith t t th i bl i th – The odds ratio is adjusted with respect to other variables in the model
~ (1, )iy Multinomial py is a categorical variable corresponding to the events and baseline
0
)log( r rpp
= X β
Where pr is the probability of in rth eventrp0 is the probability of baselineX is the covariates matrixβr is the vector of parameters for rth event,
it has a direct relationship with odds ratio it has a direct relationship with odds ratio.
-
Independent assumption for the basic modelOne driver have multiple event (baseline)
• Random effect model
p ( )They should be correlated: good driver, bad driver.
Random effect model– Extension of the basic model
)log( ijr r ij iijp
= +ZX β α
– is the driver specific random effect
• Generalized Estimation Equation (GEE) model
0r ij i
ijijp
iα• Generalized Estimation Equation (GEE) model
– Commonly used in longitudinal data analysis – Quasi-likelihood based method
-
ect
ed
S l
on
co
lle Sample
exposure immediate before
orm
ati
o before crashes
Sample
ure
in
fo
Sample exposure for time interval some period b f h
Exp
osu before crash
Control exposure Case Exposure Crash
-
Pros: 1. Less prone to biased2. More efficient in evaluating the effects of
transient exposure factorsCons: 1 Cannot be used to evaluate time invariant effect 1. Cannot be used to evaluate time-invariant effect
such as age and gender.2. Bring another level of correlation into the modelg
-
•Matched set correlation
•Driver specific correlation
Control exposure Case Exposure Crash p p
-
• Nested random effects model C diti l l i ti i d l• Conditional logistic regression model
• Bayesian hierarchical model Fit th t t t ll – Fit the context naturally
– Easy to expend to accommodate more levels (multicenter study)(multicenter study)
… ……
M t h d
Individual
Site
… …Matched Set
-
Model setup
( )
logit( )ijk ijk
ijk ijk ijk
Y Bernoulli p
p = +X β Z α
∼ Site i, individual j, event klogit( )ijk ijk ijkp +X β Z α
Prior:
( )Nβ Σ~ ( )
~ ( )
N
Nβ
α
β μ,Σ
α 0,ΣVague: fixed large varianceInformative: prior elicitation
•from previous study•From expert opinion•From expert opinion
-
• Appropriate baseline sampling scheme is Appropriate baseline sampling scheme is critical part of analyses.
• Analysis models should reflect the ays s odes s oud e ect t ecorresponding sampling scheme.
• Considering analysis at the beginning of g y g gthe study!
-
• Questions?Questions?• …• Thanks!• Thanks!