feng guo ph d feng guo ph.d. department of statistics ...feng guo ph d feng guo ph.d. department of...

Feng Guo Ph D Feng Guo Ph.D. Department of Statistics

VTTI, CASRVi i i T hVirginia Tech

First Human Factors Symposium: Naturalistic Driving Methods & Analyses

August 26th 2008

EventExposure

Crash/near crash

Cell Phone use

Weather condition crashWeather condition

drowsiness

….

Hypothetical examples: Example 1I t li ti t d it i f d th t i 95 t f 100 In a naturalistic study, it is found that in 95 out of 100

crashes observed, the driver was listening to music. Can we conclude that listening to music contributes to crashes?crashes?

Example 2Example 2If it is found in 10 crashes, the driver fallen in sleep for more than 6 seconds. Can we conclude that drowsiness/fatigue contributes conclude that drowsiness/fatigue contributes to crashes?

Have to compare with “Normal” (Baseline) conditions!Have to compare with Normal (Baseline) conditions!

• 95% of the times people are listening to music when driving : listening to music is unlikely a risky behavior.

•Essentially nobody sleep when driving: Sleeping during driving is dangerous.

• Cohort• Case-control• Case-cohort• Case-crossover

• Major issue: how to reduce bias. • Analysis/modeling is directly related to study design!

Exposure Outcome Perspective:Study begins

YES

Case

No No Case

Case

NO

Case

No CaseCase

Forward direction: from exposure to case

Exposure Outcome

Retrospective: Study begins

YES

Case

No No Case

Case

NO

Case

No CaseCase

Forward direction: from exposure to case

Pros:L t t bi• Least prone to bias– Relative to other observational study designs

• Can address several diseases in same study• Retrospective can be relatively low cost and quick

– Frequently used in occupational studiesCons:Cons:•Loss to follow-up is potential source of bias•Prospective cohort study

–Quite costly and time consumingQ y g–May not find enough cases if disease is rare

Exposure Outcome Study begins

YES

Case

NO

Case

YES

NO

Control

Backward direction: from outcome to exposureBackward timing: study begins after outcome

NO

Backward timing: study begins after outcome

Pros:Pros:• Less expensive and time-consuming• Optimal for rare diseasesp

– Subjects selected based on disease status• Allows several exposures to be evaluated

– Multiple etiologic factors for a single disease

Cons:• More susceptible to selection bias (than cohort studies)

– Presence or absence of exposure may influence selection of disease and non-disease groups

Mo e s s eptible to info mation bias• More susceptible to information bias– Observer bias – Recall bias

• Does not allow direct estimation of risk– Not possible to calculate rate of development of disease given exposure

status • Does not allow several diseases to be evaluated• Generally not feasible for rare exposures

•Mixture of cohort case-control •Mixture of cohort, case-control, crossover, and cross-sectional design

•Case-cohort •Case-crossover

Exposure Outcome Study begins

YES

Case

NO

Case

YES

NO

Control

Backward direction: from outcome to exposureForward timing: study begins BEFORE outcome

NO

Forward timing: study begins BEFORE outcome

• Several diseases can be studied– In contrast to case-control study

• Less costly and more efficient than cohort study– Smaller number of non-cases

• More prone to measurement error than cohort study– Exposure status determined after cases and control

Unless exposure status at initial cohort enrollment– Unless exposure status at initial cohort enrollment

• Can be more expensive and time-consuming than case-control studycontrol study– Requires identifying original cohort

• Case-control Studies: exposure odds ratiop

• Cohort studies: risk odds ratio (ROR)

• Cohort StudyE E

• Case-Control StudyE E t t l

FIXED

E+ E-D+ A CD- B D

E+ E- totalCase a c M1Control b d M0D B D

total N+ N-Control b d M0

P(D+|E+)= (A/N+) P(D+|E-)= (C/N-)

P(E+|Case)=a/M1P(E+|Control)=b/M0

aRisk Ratio (RR)= P(D+|E+)/ P(D+|E-)

OR = (a/c) / (b/d)= (ad/bc)

ca

Mc

Ma

CaseEPCaseEPCaseEOdds ==+−

+=+

1

1)|(1

)|()|(

( ) ( ) ( )

Although conceptually very different, the formulas for Risk OR and Exposure OR are the same: AD/BC

In case control studies the exposure odds ratio (EOR) In case-control studies, the exposure odds ratio (EOR) approximates the risk ratio when the following 3 conditions are satisfied:

• 1. The rare disease assumption holds

• 2. The choice of controls in the case-control study must be representative of the source population from which the case developed.

• 3. The cases must be incident cases

# of Event under drowsinessRate1:Miles (time) traveled under drowsiness

# of Event under NO drowsinessRate2 :Rate2 :Miles (time) traveled under NO drowsiness

If R t 1 i i ifi tl t th R t 2

Drowsy driving length Non-Drowsy driving lengthCases

P bl H k il /ti

If Rate1 is significantly greater than Rate2, we considered drowsiness is a risk factor for safety.

Problem: How can we know miles/time traveled under drowsiness?

Odds Ratio Approximation to Rate Ratio

• Cohort StudyE+ E-

• Case-Control StudyE+ E- totalE+ E

Dis+ A C total PT+ PT-

E+ E- totalCase a c M1Control b d M0

• IDR = (A/PT+) / (C/PT-)= (A/C) / (PT+/PT-) • OR = (a/c) / (b/d)

≈ (a/c) / (PT+ / PT-)= IDR

A tiAssumptions:1. M0 subjects are randomly selected via source population2. Their exposure odds (b/d) similar to that in source

population (PT+/PT )population (PT+/PT-).3. Steady state

• Modeling 100 car (STSCE): ode g 00 ca (S SC )– Random sampling case-cohort design: non-

matched design– Confounding/interaction factors controlled through – Confounding/interaction factors controlled through

modeling– Incorporate driver specific correlation through

modelsmodels

• Case-crossover design (NHTSA)– Case-crossover sampling: matched design– Part of confounding/interaction factors controlled

through baseline sampling

Principle: ideal control group is representative of the source population from which the cases are derivedsource population from which the cases are derived

1. Time variant exposures: risk rate2 Sampling should reflect the odds ratio to risk rate 2. Sampling should reflect the odds ratio to risk rate

principles3. Random sampling stratified by vehicle was adopted

Drowsy driving length Non-Drowsy driving length

Cases

• Control for confounding and interaction Control for confounding and interaction factors.

• Multiple events for same participant: u t pe e e ts o sa e pa t cpa tdriver specific correlations!

• Stratified analysisStratified analysis– Categorize control variables and form combinations of

categories or strataDrawback of running out of numbers when the number – Drawback of running out of numbers when the number of strata is large

M h i l d li• Mathematical modeling– Use a mathematical expression for predicting the outcome from the

exposure and the control variables– Considerations on choice of model and variables to include in initial

and final model

• Generalized linear model (GLM) frameworkBaseline Multinomial model • Baseline Multinomial model – Contrast crash, near-crash, and critical incident with base-line

separately in a same model Th dd ti i dj t d ith t t th i bl i th – The odds ratio is adjusted with respect to other variables in the model

~ (1, )iy Multinomial py is a categorical variable corresponding to the events and baseline

0

)log( r rpp

= X β

Where pr is the probability of in rth eventrp0 is the probability of baselineX is the covariates matrixβr is the vector of parameters for rth event,

it has a direct relationship with odds ratio it has a direct relationship with odds ratio.

Independent assumption for the basic modelOne driver have multiple event (baseline)

• Random effect model

p ( )They should be correlated: good driver, bad driver.

Random effect model– Extension of the basic model

)log( ijr r ij iijp

= +ZX β α

– is the driver specific random effect

• Generalized Estimation Equation (GEE) model

0r ij i

ijijp

iα• Generalized Estimation Equation (GEE) model

– Commonly used in longitudinal data analysis – Quasi-likelihood based method

ect

ed

S l

on

co

lle Sample

exposure immediate before

orm

ati

o before crashes

Sample

ure

in

fo

Sample exposure for time interval some period b f h

Exp

osu before crash

Control exposure Case Exposure Crash

Pros: 1. Less prone to biased2. More efficient in evaluating the effects of

transient exposure factorsCons: 1 Cannot be used to evaluate time invariant effect 1. Cannot be used to evaluate time-invariant effect

such as age and gender.2. Bring another level of correlation into the modelg

•Matched set correlation

•Driver specific correlation

Control exposure Case Exposure Crash p p

• Nested random effects model C diti l l i ti i d l• Conditional logistic regression model

• Bayesian hierarchical model Fit th t t t ll – Fit the context naturally

– Easy to expend to accommodate more levels (multicenter study)(multicenter study)

… ……

M t h d

Individual

Site

… …Matched Set

Model setup

( )

logit( )ijk ijk

ijk ijk ijk

Y Bernoulli p

p = +X β Z α

∼ Site i, individual j, event klogit( )ijk ijk ijkp +X β Z α

Prior:

( )Nβ Σ~ ( )

~ ( )

N

Nβ

α

β μ,Σ

α 0,ΣVague: fixed large varianceInformative: prior elicitation

•from previous study•From expert opinion•From expert opinion

• Appropriate baseline sampling scheme is Appropriate baseline sampling scheme is critical part of analyses.

• Analysis models should reflect the ays s odes s oud e ect t ecorresponding sampling scheme.

• Considering analysis at the beginning of g y g gthe study!

• Questions?Questions?• …• Thanks!• Thanks!

feng guo ph d feng guo ph.d. department of statistics ...feng guo ph d feng guo ph.d. department of...

Documents