biostatistical considerations

44
Biostatistical Considerations Nae-Yuh Wang, PhD ICTR Clinical Registries Workshop November 3, 2010

Upload: brooke-curtis

Post on 04-Jan-2016

39 views

Category:

Documents


2 download

DESCRIPTION

Biostatistical Considerations. Nae-Yuh Wang, PhD ICTR Clinical Registries Workshop November 3, 2010. OVERVIEW. Descriptive vs Analytic Goals Selection of Controls Confounding Measurement errors Missing data. Purposes of Patient Registry. Document natural history of disease - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biostatistical Considerations

Biostatistical Considerations

Nae-Yuh Wang, PhD

ICTR Clinical Registries Workshop

November 3, 2010

Page 2: Biostatistical Considerations

OVERVIEW

Descriptive vs Analytic Goals

Selection of Controls

Confounding

Measurement errors

Missing data

Page 3: Biostatistical Considerations

Purposes of Patient Registry

Document natural history of disease

Evaluate effectiveness of treatment

Monitoring safety

Measuring quality

Frequently multi purposes, addressing scientific, clinical, and policy questions

Page 4: Biostatistical Considerations

Natural History of Disease

Document characteristics, management, and outcomes

May be variable across subgroups

May be variable over time

Change after new guidelines or treatments

Page 5: Biostatistical Considerations

SMOKING PREVALENCE, %, AMONG U.S. MALE ADULTS*AND 1,213 WHITE MALE PHYSICIANS:THE PRECURSORS STUDY

0%

10%

20%

30%

40%

50%

60%

70%

1955 1965 1970 1975 1980 1985 1990 1994

YEAR

PR

EV

AL

EN

CE

,%

US MALE POPULATION

MALE PHYSICIANS,THE PRECURSORS STUDY

*CDC, National Health Interview Surveys, 18 years and older,1965-1994

Page 6: Biostatistical Considerations
Page 7: Biostatistical Considerations

Effectiveness of Treatment

RCTs usually have well defined populations

RCTs usually are short term

Clinical effectiveness, cost effectiveness

Comparative effectiveness --- indirect comparisons on differences between treatments

Page 8: Biostatistical Considerations

Safety Monitoring

Adverse event reporting relies on recognition of AE by clinician, and clinician’s effort in reporting --- frequently nonsystematic

Serves as active surveillance

Provides denominator to estimate incidence

Enables comparison to a reference rate

Page 9: Biostatistical Considerations

Health Care Quality

Compare performance measures (treatments provided or outcomes achieved) against evidence based guidelines or benchmarks (adjusted survival, infection rates) between provider or patient subgroups

Identify disparity in access to care

Demonstrate opportunities for improvement

Establish payment differentials

Page 10: Biostatistical Considerations

Types of Registry

Product registries (drug, device)

Health services registries (procedure)

Disease or condition registries

Patients defined by exposure to a product, procedure, or disease/condition

Frequently combination of types

Page 11: Biostatistical Considerations

Design of Registry

Research questions, stakeholders, and practical factors (regulatory, political, funding) define purpose and type of registry, and other design considerations such as sampling plan, data collection, validity, sample size, and analytic approaches

Types define the patient population

Purposes define the outcomes

Outcomes define the duration

Page 12: Biostatistical Considerations

Design of Clinical Research

Research questions (descriptive vs. hypothesis based --- analytic)

Population; outcome and exposure

Sampling (recruitment); measurements, duration and frequency of data collection

Internal / external validity (bias / generalizability)

Sample size (precision of estimates/degree of association, feasibility, resources)

Analytic plan

Page 13: Biostatistical Considerations

Sampling Design

External validity (generalizability): all patients, patients from tertiary medical center only? Single center, multiple center? Which way is more representative of the target population under study?

Do I need controls? (CI in children, language outcomes vs. meningitis)

Selection of controls

Match or not to match?

Page 14: Biostatistical Considerations

Cohort Design

Sampling based on predictor (exposure variables) of interest (collect as many exposure variables as possible). Good for rare exposure.

Follow up patients for outcomes, could study multiple outcomes (long time for outcomes to develop?)

Census (not feasible when population and per capita cost are large)

SRS, stratified RS (oversampling subgroup), Cluster RS (cluster characteristics as the aim), multistage RS

Nonrandom: case series / consecutive sampling

Page 15: Biostatistical Considerations

Case-Control Design

Stratified RS based on case status

Oversampling cases, good for rare diseases

No long follow up for disease development

Study multiple exposure variables

Exposure ascertainment is key

Nested case-control study using existing registry

Selection of controls, match or not to match

Page 16: Biostatistical Considerations

Measurement & Data Collection

Data from clinically based electronic sources only?

Linking from different sources (e.g., NDI searches)

Measurement (different labs) and coding consistency

Additional data collections --- potential confounders, nonclinical outcomes (e.g., QoL, QALY), medications

Page 17: Biostatistical Considerations

Measurement & Data Collection

Research versus clinical protocol (BP, busy schedule)

New / changes in treatments and guidelines over time

Changes / improvement in measurement precision and generation of technology over time

Change of outcome definition over time (clinical designation or collect and record raw measures)

Analytic corrections could only be done if needed data / information are available

Page 18: Biostatistical Considerations

Internal Validity --- Sources of Bias

Information bias: AE under reported if reporter (provider) will be viewed negatively on care quality. Self reported weight

Selection bias: patients included not representative (unintentional incentives for provider / patient), loss to follow-up, common exposure to unaccounted confound

Confounding by indication: newest drug to patients with worse prognosis

Survival bias: live long enough with exposure to be selected

Page 19: Biostatistical Considerations

Internal Validity --- Sources of Bias

Confounding:

CVD risk, age, gray hair

Controlled by matching through study design

Accounted for through stratification, covariate adjustment, or propensity score adjustment during analyses

Only work if data on confounders were collected, need to consider at design stage

Page 20: Biostatistical Considerations

Internal Validity --- Sources of Bias

Measurement errors: Mean of 3 repeatedly measured BP readings

used in RCT versus single BP used in clinic

Measured versus self reported body weight

Fruit / vegetable availability in an area used as proxy measure of fruit / vegetable consumption value

Areas measured by 2nd vs. 1st generation CT

Page 21: Biostatistical Considerations

Measurement Errors

Nondifferentiable ME in outcome causes no bias. Greater variability in outcomes due to ME reduces statistical power

Differentiable ME in outcome causes violation of constant variance assumption in regression.

Nondifferentable ME in covariate causes underestimation of association (bias towards the null)

Page 22: Biostatistical Considerations

ME in Covariate

β* = λβ, where

λ =

σ x2

σ x2 +σε

2

4

Page 23: Biostatistical Considerations

ME in Covariate Models

E ( Y | X ) = μ ( Xβ )

Classical error model:W = X + ε , X || ε (Note: non-differential)

i. X the measured weight, W the self reported weight

ii. X the measured BMI, W the self reported BMI

Berkson error model:X = W + ε , W || ε (Note: non-differential)

i. X the “true” F/V consumption, W the proxy value

Page 24: Biostatistical Considerations

ME in Covariate Models

Goal: E ( Y | X ) = μ ( Xβ )

Actual: E ( Y | W ) = μ ( Wβ* )

Need to correct the estimate of β* to get proper estimate of β

Need to quantify ME so proper correction of β* is possible:Validation: a subsample with both X and W

Replications: repeated measures of W (e.g., BP)

Transportability: information from another study if valid

Page 25: Biostatistical Considerations

ME in Covariate

Non-differential ME key assumption, not testable without validation data

When covariate with ME in the model, covariates w/o ME may also be biased. Directions of such biases depend on directions of association among Y and covariates in the model

ME model could be complicated: combined classical & Berkson’s error model, additive versus multiplicative ME

Differential ME: bias direction depends on how ME relates to Y

Page 26: Biostatistical Considerations

Design Considerations for ME

Conduct periodic validation study on small random sample of participants (e.g. self report vs. measured weight, outcomes coded by billing vs. coded under research protocol)

If not available from external sources, repeat assessments using old and new instruments in random sample of participants during transition to collect calibration data.

Sources of external validation/calibration data

Page 27: Biostatistical Considerations

Missing Data

Inevitable in population research

Prevention is better than statistical treatments

Too much missing information invalidates a study

Validity of methods accommodating missing data depends on the missing data mechanism and the analytic approach

Page 28: Biostatistical Considerations

Missing Data Mechanism

Missing completely at random (MCAR):

Pr (missing) is unrelated to process under study

Missing at random (MAR):

Pr (missing) depends only on observed data potential “ignorability”

Not missing at random (NMAR):

Pr (missing) depends on both observed and unobserved data non-ignorable

Page 29: Biostatistical Considerations

Simulations

N = 100, repeated outcome: y0, y1

Group = 0, 1 (n = 50 / 50)

FV = 0:

y0 ~ N(0,1) if Group = 0

y0 ~ N(1,1) if Group = 1

FV = 1:

y1 ~ N(0,1) if Group = 0

y1 ~ N(1,1) if Group = 1

E( y0) = E( y1) = 0.5, SD( y0) = SD( y1) = 1.12

Corr( y0, y1 | Group) = 0.6, Corr( y0, y1 ) = 0.68

Page 30: Biostatistical Considerations

Analytic Approach

Likelihood approach

Mixed effects models

Mean model = Intercept + FV versus Intercept + FV + Group

Correlation model: Working independent (WI) versus Unstructured (UN)

Model-based versus robust SE

Page 31: Biostatistical Considerations

Simulations

Full Sample:

MCAR: 25% random missing at FV1

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 100 0.54 1.09

y1 – y0 100 0.069 0.79

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 75 0.56 1.13

y1 – y0 75 0.088 0.83

Page 32: Biostatistical Considerations

Simulations

No Missing MCAR

(y1 – y0) .069 (.079) .088(.096)

Mean Model [ Corr ]

Model-Based

RobustModel-Based

Robust

Int. + FV [ WI ] .069 (.150) .069 (.079) .090 (.166) .090 (.100)

Int. + FV / [ UN ] .069 (.079) .069 (.079) .089 (.094) .089 (.094)

Int. + FV (+ GP) / [ WI ] .069 (.134) .069 (.079) .084 (.149) .084 (.095)

Int. + FV (+ GP) / [ UN ] .069 (.079) .069 (.079) .087 (.093) .087 (.093)

Page 33: Biostatistical Considerations

Simulations

Full Sample:

MAR1: 25% missing in Group 0 at FV1

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 100 0.54 1.09

y1 – y0 100 0.069 0.79

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 75 0.75 1.16

y1 – y0 75 0.094 0.76

Page 34: Biostatistical Considerations

Simulations

No Missing MAR1

(y1 – y0) .069 (.079) .094(.088)

Mean Model [ Corr ]

Model-Based

RobustModel-Based

Robust

Int. + FV [ WI ] .069 (.150) .069 (.079) .279 (.159) .279 (.102)

Int. + FV / [ UN ] .069 (.079) .069 (.079) .137 (.086) .137 (.086)

Int. + FV (+ GP) / [ WI ] .069 (.134) .069 (.079) .129 (.147) .129 (.093)

Int. + FV (+ GP) / [ UN ] .069 (.079) .069 (.079) .103 (.085) .103 (.085)

Page 35: Biostatistical Considerations

Simulations

Full Sample:

MAR2: 25% missing depends on values of y0

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 100 0.54 1.09

y1 – y0 100 0.069 0.79

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 75 0.29 1.04

y1 – y0 75 0.213 0.78

Page 36: Biostatistical Considerations

Simulations

No Missing MAR2

(y1 – y0) .069 (.079) .213(.090)

Mean Model [ Corr ]

Model-Based

RobustModel-Based

Robust

Int. + FV [ WI ] .069 (.150) .069 (.079) -.187 (.158) -.187 (.117)

Int. + FV / [ UN ] .069 (.079) .069 (.079) .154 (.090) .154 (.090)

Int. + FV (+ GP) / [ WI ] .069 (.134) .069 (.079) -.194 (.138) -.194 (.115)

Int. + FV (+ GP) / [ UN ] .069 (.079) .069 (.079) .106 (.091) .106 (.091)

Page 37: Biostatistical Considerations

Simulations

Full Sample:

NMAR: 25% missing depends on values of y1

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 100 0.54 1.09

y1 – y0 100 0.069 0.79

N Sample mean Sample SD

y0 (FV=0) 100 0.47 1.04

y1 (FV=1) 75 0.11 0.83

y1 – y0 75 -0.127 0.72

Page 38: Biostatistical Considerations

Simulations

No Missing NMAR

(y1 – y0) .069 (.079) -.127(.083)

Mean Model [ Corr ]

Model-Based

RobustModel-Based

Robust

Int. + FV [ WI ] .069 (.150) .069 (.079) -.367 (.141) -.367 (.090)

Int. + FV / [ UN ] .069 (.079) .069 (.079) -.228 (.080) -.228 (.080)

Int. + FV (+ GP) / [ WI ] .069 (.134) .069 (.079) -.360 (.120) -.360 (.090)

Int. + FV (+ GP) / [ UN ] .069 (.079) .069 (.079) -.257 (.081) -.257 (.081)

Page 39: Biostatistical Considerations

Simulations

ModelMean / Corr

No Missing

MCAR MAR1 MAR2 NMAR

(y1 – y0) .069 (.079) .088(.096) .094(.088) .213(.090) -.127(.083)

Int. + FV /WI (Model-based) .069 (.150) .090 (.166) .279 (.159) -.187 (.158) -.367 (.141)

Int. + FV /UN (Robust) .069 (.079) .089 (.094) .137 (.086) .154 (.090) -.228 (.080)

Int. + FV (+ GP) /WI (Model based) .069 (.134) .084 (.149) .129 (.147) -.194 (.138) -.360 (.120)

Int. + FV (+ GP) /UN (Robust) .069 (.079) .087 (.093) .103 (.085) .106 (.091) -.257 (.081)

Page 40: Biostatistical Considerations

Observations

MCAR:» Requires only correct mean model for valid

inferences» Complete case analysis is valid, but not

efficient for estimating fully observed variables

» Approaches valid for MAR also valid under MCAR

» Unlikely to be true in population based research

Page 41: Biostatistical Considerations

Observations

MAR: » Ignorablility of missing is possible but not

given

» Requires correct specification of likelihood (both mean and covariance model) for the observed data to achieve valid inferences

» Empirically cannot be confirmed without auxiliary data

Page 42: Biostatistical Considerations

Observations

NMAR: » Empirically cannot be ruled out without

auxiliary data» Likelihood, multiple imputation, propensity

score, inverse weighting approach cannot completely eliminate bias

» Need to conduct sensitivity analyses under various plausible NMAR scenarios to evaluate potential impacts on inferences

Page 43: Biostatistical Considerations

Observations

Observational studies face similar issues as RCTs with missing data: » Bias due to missing data selection bias» Proper selection of analytic models may

eliminate bias if the “selection” is based on observed data values, i.e. we have data to adjust for selection

» Bias due to “selection” according to data values not observed will be hard to correct

Page 44: Biostatistical Considerations

Sample Size Considerations

Descriptive: estimation precision

Hypothesis based: power to detect association

Design effects

Longitudinal correlations