measurement error in health studiescarroll/talks/carroll_short_course.pdf · † lecture 2 — data...

Post on 16-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MEASUREMENT ERROR IN HEALTH STUDIES

• Lecture 1 — Introduction, Examples, Effects of Measurement Error in Linear

Models

• Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact

Predictors, Berkson/Classical

• Lecture 3 — Distinguishing Berkson from Classical, Structural/Functional, Re-

gression Calibration, Non-Additive Errors

• Lecture 4 — SIMEX and Instrumental Variables

• Lecture 5 — Bayesian Methods

• Lecture 6 — Nonparametric Regression with Measurement Error

Lecture 1 1

LECTURE 1: INTRODUCTION, EXAMPLES AND LINEAR

MEASUREMENT ERROR MODELS

OUTLINE

• Why We Need Special Methods For Measurement Errors

• Measurement Error Examples

• Structure of a Measurement Error Problem

• Classical Error Model in Linear Regression

Lecture 1 2

WHY WE NEED SPECIAL METHODSTO HANDLE MEASUREMENT

ERRORS

• These lectures are about strategies for regression when some predictors are mea-

sured with error.

• Remember your introductory regression text. . .

∗ Snedecor and Cochran (1967),“Thus far we have assumed thatX-variable

in regression is measured without error. Since no measuring instrument is

perfect this assumption is often unrealistic.”

∗ Steele and Torrie (1980),“ . . . if the X ’s are also measured with error,. . . an

alternative computing procedure should be used. . .”

∗ Neter and Wasserman (1974),“Unfortunately, a different situation holds if

the independent variableX is known only with measurement error.”

Lecture 1 3

WHY SPECIAL METHODSARE NEEDED — CONT.

• Measurement error in outcome: If

Y = β0 + β1X + ε

and we observeY ∗ = Y + U , then

Y ∗ = β0 + β1X + (ε + U)

∗ All that has happened is that the error variance is bigger

∗ Standard regression applies

Lecture 1 4

WHY SPECIAL METHODSARE NEEDED — CONT.

• Consider now:

Y = β0 + β1X + ε

W = X + U

and

only W andY are observed.

Then

Y = β0 + β1W + (ε− β1U)

∗ Thisseems like a standard regression problemwith additional error.

I Can we analyze it that way?

Lecture 1 5

WHY SPECIAL METHODSARE NEEDED — CONT.

• From previous page:X = W − U implies

Y = β0 + β1W + (ε− β1U)

• Problem: W is correlated with the “error”ε− β1U

• Better to replaceW by a function ofW , namelyE(X|W ), since

X = E(X|W ) + V, whereE(X|W ) andV are uncorrelated

and then we have the model

Y = β0 + β1E(X|W ) + (ε + β1V )

∗ special case of“regression calibration”

∗ leads to unbiased estimates

Lecture 1 6

EXAMPLES OF MEASUREMENTERRORPROBLEMS

• Different measures of nutrient intake, e.g., by a food frequency questionnaire

(FFQ)

• Systolic Blood Pressure

• Radiation Dosimetry

• Exposure to arsenic in drinking water, dust in the workplace, radon gas in the

home, and other environmental hazards

Lecture 1 7

MEASURESOF NUTRIENT INTAKE

• Y = average daily % calories from fat by a FFQ.

• X = true long-term average daily percentage of calories from fat

• AssumeY = β0 + βxX + ε

∗ we donotassume that the FFQ is unbiased

• X is never observable. It is measuredwith error:

∗ Along with the FFQ, on 6 days over the course of a year women are interviewedby phone and asked to recall their food intake over the past year (24–hourrecalls). Their average is recorded and denoted byW .

I W is an“alloyed gold standard”

∗ The analysis of 24–hour recall introduces some error=⇒ analysis error

∗ Measurement error = sampling error + analysis error

∗ Classical measurement error model:

Wi = Xi + Ui, Ui are measurement errors

Lecture 1 8

HEART DISEASEVS SYSTOLIC BLOOD PRESSURE

• Y = indicator of Coronary Heart Disease (CHD)

• X = true long-term average systolic blood pressure (SBP) (maybe transformed)

• AssumeP (Y = 1) = H (β0 + βxX), H is the logistic or probit function

• Data are CHD indicators and determinations of systolic blood pressure forn =

1600 in Framingham Heart Study

Lecture 1 9

HEART DISEASEVS SYSTOLIC BLOOD PRESSURE— CONT.

• X measuredwith error:

∗ SBP measured at two exams (and averaged)=⇒ sampling error

∗ The determination of SBP is subject to machine and reader variability=⇒analysis error

∗ Measurement error = sampling error + analysis error

∗ Measurement error model

Wi = Xi + Ui, Ui are measurement errors

Lecture 1 10

GENERAL STRUCTUREOF A M EASUREMENTERRORPROBLEM

• Y = response,Z = error-free predictor,X = error-prone predictor

• E(Y |Z, X) = f (Z,X, β) (outcome model)

• Observed data:(Yi, Zi, Wi), i = 1, . . . , n

∗ E(Y |Z,W ) 6= f (Z,W, β) (source of our worries)

• Error model relatingWi andXi (measurement model)

∗ Wi = Xi + Ui (classical error model)

∗ Wi = γ0,em + γtx,emXi + γt

z,emZi + Ui (error calibration model)

Lecture 1 11

A CLASSICAL ERRORMODEL

• Wi = Xi + Ui (additive)

• Ui are:

∗ independent of allYi, Zi andXi

∗ IID (0, σ2u)

• In addition, wemayhave a model for the distribution of(X, Z) (exposure model)

∗ called astructural model

Lecture 1 12

SIMPLE L INEAR REGRESSIONWITH A CLASSICAL ERROR

MODEL

• Y = response,X = error-prone predictor

• Y = β0 + βxX + ε

• Observed data:(Yi,Wi), i = 1, . . . , n

• Wi = Xi + Ui (additive)

• Ui are:

∗ independent of allYi, Zi andXi

∗ IID (0, σ2u)

What are the effects of measurement error on the usual analysis?

Lecture 1 13

SIMULATION STUDY

• GenerateX1, . . . , X50, IID N (0, 1)

• GenerateYi = β0 + βxXi + εi

∗ εi IID N (0, 1/9)

∗ β0 = 0

∗ βx = 1

• GenerateU1, . . . , U50, IID N(0, 1)

• SetWi = Xi + Ui

• RegressY onX andY onW and compare

Lecture 1 14

-6 -2 0 2 4 6

-5

0

5

10

Effects of Measurement Error

Reliable Data

True Data Without Measurement Error

Lecture 1 15

-6 -2 0 2 4 6

-5

0

5

10

Effects of Measurement Error

Reliable Data

Error--prone Data

Observed Data With Measurement Error

Lecture 1 16

THEORY BEHIND THE PICTURES: THE NaiveANALYSIS

• Least Squares Estimate of Slope:

βx =Sy,w

S2w

=n−1

∑(Y − Y )(W −W )

n−1∑

(W −W )2

where

Sy,w −→ Cov(Y, W ) = Cov(Y, X + U) = Cov(Y, X) = σy,x

S2w −→ Var(W ) = Var(X + U) = σ2

x + σ2u

So

βx −→ σy,x

σ2x + σ2

u

=

(σ2

x

σ2x + σ2

u

)σy,x

σ2x

= λβx

where

λ =σ2

x

σ2x + σ2

u

=σ2

x

σ2w

= attenuation factor= reliability ratio

∗ It is therelativesize of the error that matters

Lecture 1 17

THEORY BEHIND THE PICTURES: THE NaiveANALYSIS

• Least Squares Estimate of Intercept:

β0 = Y − βxW

−→ µy − λβxµx

= β0 + (1− λ)βxµx

• Estimate of Residual Variance:

MSE−→ σ2ε + (1− λ)β2

xσ2x

Lecture 1 18

MORE THEORY: JOINT NORMALITY

• Y, X, W jointly normal=⇒

∗ Y | W ∼ Normal

∗ E(Y | W ) = β0 + (1− λ)βxµx + λβxW

∗ Var(Y | W ) = σ2ε + (1− λ)β2

xσ2x

• Intercept is shifted by(1− λ)βxµx

• Slope is attenuated by the factorλ

• Residual variance is inflated by(1− λ)β2xσ

2x

Lecture 1 19

MORE THEORY: IMPLICATIONS FOR TESTING HYPOTHESES

• Because

βx = 0 iff λβx = 0

it follows that

[H0 : βx = 0] ≡ [H0 : λβx = 0]

so the naive test ofβx = 0 is valid (correct Type I error rate).

• The naive test ofH0 : βx = 0 is asymptotically efficient whenE(X | W ) is linear

in W .

• The discussion of naive tests when there are multiple predictors measured with

error, or error-free predictors, is more complicated

Lecture 1 20

Measurement Error Variance

Sam

ple

Siz

e

0.0 0.2 0.4 0.6 0.8 1.0

0

5

10

15

20

25

30

Sample Size for 80% Power. True slopeβx = 0.75. Variancesσ2x = σ2

ε = 1.

Lecture 1 21

MULTIPLE L INEAR REGRESSIONWITH ERROR

• Model

Y = β0 + βtzZ + βt

xX + ε

W = X + U

• RegressingY onZ andW estimates(

βz∗βx∗

)= Λ

(βz

βx

) [6=

(βz

βx

)]

• Λ is theattenuation matrix or reliability matrix

Λ =

(σzz σzx

σxz σxx + σuu

)−1 (σzz σzx

σxz σxx

)

– Biases in components ofβx∗ andβz∗ can be multiplicative or additive=⇒∗ Naive test ofH0 : βx = 0, βz = 0 is valid

Lecture 1 22

From previous page:

• (βz∗

βx∗

)= Λ

(βz

βx

) [6=

(βz

βx

)]

•Λ =

(σzz σzx

σxz σxx + σuu

)−1 (σzz σzx

σxz σxx

)

∗ Naive test ofH0 : βx = 0 is valid

∗ Naive test ofH0 : βx,1 = 0 is typically not valid (βx,1 denotes a subvector of

βx)

∗ Naive test ofH0 : βz = 0 is typically not valid (same is true for subvectors)

Lecture 1 23

MULTIPLE L INEAR REGRESSIONWITH ERROR

• ForX scalar, attenuation factor inβx∗ is

λ1 =σ2

x|zσ2

x|z + σ2u

∗ σ2x|z = residual variance in regression ofX onZ

∗ σ2x|z ≤ σ2

x =⇒

λ1 =σ2

x|zσ2

x|z + σ2u

≤ σ2x

σ2x + σ2

u

= λ

∗ =⇒ Collinearity accentuates attenuation

• Biased estimates ofβz:

βz∗ = βz + (1− λ1)βxΓz,

∗ Γz is from E(X | Z) = Γ1 + ΓtzZ

Lecture 1 24

ANALYSIS OF COVARIANCE

• These results have implications for the two group ANCOVA.

∗ X = true covariate

∗ Z = dummy indicator of group 1, say

• We are interested in estimatingβz, the group effect. Biased estimates ofβz:

βz∗ = βz + (1− λ1)βxΓz,

∗ Γz is from E(X | Z) = Γ1 + ΓtzZ

∗ Γz is the difference in the mean ofX among the two groups.

∗ Thus, biased unlessX andZ are unrelated.

I Use a randomized Study!!!

Lecture 1 25

Predictors

Res

pons

e

-4 -2 0 2 4

-0.5

0.0

0.5

1.0

1.5

Unbalanced ANCOVA. Red = True Data,Blue = Observed.Solid = First Group,

Open = Second Group. No Difference In Groups.

Lecture 1 26

WHAT IF WE WILL PREDICT FUTURE OUTCOMESBASED ON

THE OBSERVEDCOVARIATE?

• If we wish to predictY based onW , then the regression ofY onW is the correct

model to use

∗ Though this assumes that the distribution ofX and the measurement error dis-

tribution will be the same in the future as in the study

I sinceλ = σ2x/(σ2

x + σ2u)

• In public health, interventions will change the trueX, notW , so we are interested

in the regression ofY onX

Lecture 1 27

SUMMARY OF THE EFFECTSOF MEASUREMENTERROR IN

SIMPLE L INEAR REGRESSION

• Regression ModelY = β0 + βxX + ε

W = X + U

• Attenuation Factor (Reliability Ratio)

λ =σ2

x

σ2x + σ2

u

∗ 0 < λ ≤ 1

∗ λ = 1 ⇐⇒ σ2u = 0

Regression Intercept Slope Residual VarianceY onX β0 βx σ2

ε

Y onW β0 + (1− λ)βxµx λβx σ2ε + (1− λ)β2

xσ2x

Lecture 1 28

END OF LECTURE1

Lecture 2 1

LECTURE 2: DATA TYPES, NONDIFFERENTIAL ERROR,

ESTIMATING ATTENTUATION, EXACT PREDICTORS,

BERKSON MODEL

OUTLINE

• Nondifferential measurement error

• Estimating the attenuation

• Replication and validation data

• Internal and external subsets

• Transportability across data sets

• Is there an “exact” predictor?

• Berkson and classical measurement error

Lecture 2 2

THE BASIC DATA

• A responseY

• PredictorsX measured with error (unobserved).

• PredictorsZ measured without error.

• A major proxyW for X.

• Sometime, a second proxyT for X.

Lecture 2 3

NONDIFFERENTIAL ERROR

• Roughly speaking, the error is said to benondifferentialif W andT would not be

measured if one could have measuredX.

• More formally,(W,T ) areconditionally independentof Y given(X,Z).

⇒ (W,T ) would provide no additional information aboutY if X were ob-

served

• This often makes sense, but it may be fairly subtle in each application.

Lecture 2 4

NONDIFFERENTIAL ERROR

• Many crucial theoretical calculations revolve around nondifferential error.

• Consider simple linear regression:Y = β0 + βxX + ε, ε independent ofX.

E(Y |W ) =

(1)︷ ︸︸ ︷E [{E(Y |X,W )} |W ] = E [{E(Y |X)} |W ] = β0 + βxE(X|W ).

∗ This reduces the problem to estimatingE(X|W ). For example,

E(X|W ) = λW + (1− λ)µx (under joint normality and classical error)

whereλ = σ2x/σ

2w.

• If the error isdifferential, then (1) fails, and no simplification is possible.

Lecture 2 5

HEART DISEASEVS SYSTOLIC BLOOD PRESSURE

• Y = indicator of Coronary heart Disease (CHD)

• X = true long-term average systolic blood pressure (SBP) (maybe transformed)

• AssumeP (Y = 1) = H (β0 + βxX)

• Data are CHD indicators and determinations of systolic blood pressure forn =

1600 in Framingham Heart Study

• X measuredwith error:

∗ SBP measured at two exams (and averaged)=⇒ sampling error

∗ The determination of SBP is subject to machine and reader variability

• It is hard to believe that the short term average of two days carries any additional

information about the subject’s chance of CHD over and above true SBP.

• Hence,nondifferential

Lecture 2 6

IS THIS NONDIFFERENTIAL?

• From Tosteson et al. (1989).

• Y = I{wheeze}.

• X is personal exposure to NO2.

• W = (NO2 in kitchen, NO2 in bedroom) is observed in the primary study.

Lecture 2 7

IS THIS NONDIFFERENTIAL?

• From Kuchenhoff & Carroll

• Y = I{lung irritation}.

• X is actual personal long–term dust exposure

• W = is dust exposure as measured by occupational epidemiology techniques.

∗ The sampled the plant for dust.

∗ Then they tried to match the person to where he/she worked.

Lecture 2 8

WHAT IS NECESSARYTO DO AN ANALYSIS?

• In linear regression with classical additive errorW = X + U , one needs:

∗ Nondifferential error

∗ An estimate of the error variance var(U) — sinceλ = σ2w−σ2

uσ2

w.

• How do we get the latter information?

• The best way is to get a subsample of the study in whichX is observed. This is

calledvalidation.

∗ In many applications not possible.

• Another method is to do replications of the process, often calledcalibration.

• A third way is to get the value from another similar study.

Lecture 2 9

REPLICATION

• In a replication study, for some of the study participants you measure more than

oneW .

• The standard model is

Wij = Xi + Uij, j = 1, ..., mi.

• This is a one-factor ANOVA with mean squared errorσ2u estimated by

σ2u =

∑ni=1

∑mij=1(Wij −W i•)2∑ni=1(mi − 1)

.

• As the proxy forXi one would useW i• where

W i• = Xi + U i•

var(U i•) = σ2u/mi.

Lecture 2 10

REPLICATION

• Replication allows you to test whether your model is additive with constant error

variance.

• If Wij = Xi + Uij with Uij symmetrically distributed about zero and independent

of Xi, we have a major fact:

∗ The sample mean and sample standard deviation are uncorrelated.

I Eckert, Carroll, and Wang (1997) use this fact to find a transformation to

additivity

• Also, if Uij are normally distributed, then so too are differencesWi1 − Wi2 =

Ui1 − Ui2.

• Graphical diagnostics can be implemented easily in any package.

Lecture 2 11

REPLICATION: WISH

• The WISH study measured caloric intake using a 24–hour recall.

∗ There were 6 replicates per woman in the study.

• A plot of the caloric intake data showed thatW was nowhere close to being nor-mally distributed in the population.

∗ If additive, then eitherX or U is not normal.

• When plotting standard deviation versus the mean, one often uses the rule that themethod “passes” the test if the max-to-min is less than 2.0.

∗ A little bit of non–constant variance does not seem to hurt. See Carroll &

Ruppert (1988)

∗ Caveat:Carroll & Ruppert studied effects of heteroscedasticity on efficiency,not on bias due to measurement error.

Lecture 2 12

-3 -2 -1 0 1 2 3

1000

2000

3000

4000

5000

6000

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

•••

•• •

••

••

••

• •

• •

••

••

••

••

••

••

••••

••

••

•• •

• •

••

•••

••

WISH Calories, FFQNormal QQ--Plot

WISH, Caloric Intake, Q–Q plot of W .

Lecture 2 13

-3 -2 -1 0 1 2 3

-3000

-2000

-1000

0

1000

2000

3000

••

••

••

••

••

••

••

••

••

•••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

• •

••

••

•••

• •

••

••

••

•••

••

••••

••

••

••

•••

••

••

••

••

••

••

•••

••

•••

••

••

••

••

• •

••

••

••

••

••

••

••

• • ••

••

••

••

•••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

•••

••

WISH Calories, 24--hour recallsNormal QQ--Plot of Pairwise Differences

WISH, Caloric Intake, Q–Q plot of Differenced Ws.

Lecture 2 14

Mean Calories

s.d.

1000 1500 2000 2500 3000

200

400

600

800

1000

1200

••

••

••

••

••

••

••

• •

••

••

••

• •

••

••

••

••

••

••

• •

••

••

••

••

••

••

••

• •

••

WISH Calories, 24--hour recallss.d. versus mean

WISH, Caloric Intake, plot for additivity, loess and OLS.

Lecture 2 15

REPLICATION: WISH

• Taking logarithm ofW improves all the plots.

Lecture 2 16

-3 -2 -1 0 1 2 3

6.0

6.5

7.0

7.5

8.0

8.5

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

•••

••

••

• •

••

••

••

••

•••

••

••

••

••

••

WISH Log-Calories, FFQNormal QQ--Plot

WISH, Log Caloric Intake, Q–Q Plot of ObservedW .

Lecture 2 17

-3 -2 -1 0 1 2 3

-2*10^0

-1.000020*10^0

-4.053116*10^-5

9.999392*10^-1

• ••

••

••

••

•••

••

••

••

••

••

•••

•••

••

••

••

••

••

••

••

••

••

••

• •

••

••

••

••

•••

••

••

••

••

•••

••

•• •

••••

••

••

••

••

••

••

••

••

• •

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

• •

••

•••

••

••

••

••

••

••

• •

••••

••

••

••••

••

••

••

••

••

•••

••

•••

••

••

••

• •

•••

••

••

• •

••

••

••

••

••

••

•••

•••

••

••

••

••••

••

•••

•••

••

••

••

• •

••••

••

••

••

••

••

• •

••

••

••

•••

••

WISH Log-Calories, 24--hour recallsNormal QQ--Plot of Pairwise Differences

WISH, Log Caloric Intake, Q–Q Plot of Differenced Ws.

Lecture 2 18

Mean Log--Calories

s.d.

6.5 7.0 7.5 8.0

0.2

0.4

0.6

0.8

• •

••

••

••

••

••

••

••

••

••

••

•••

•••

••

•••

••

••

••

••

••

••

••

••

••

••

•••

•••

•• ••

WISH Log--Calories, 24--hour recallss.d. versus mean

WISH, Log Caloric Intake, Plot For Additivity, Loess and OLS.

• Perhaps a bit over-transformed — square-root might be better.

Lecture 2 19

TRANSPORTABILITY OF ESIMATES

• In linear regression, we only require need the measurement error variance (after

checking for semi-constant variance, additivity, normality).

• In general though, more is needed. Let’s remember that if we observeW instead

of X, then the observed data have a regression ofY onW that effectively acts as

if

E(Y |W ) = β0 + βxE(X|W )

≈ β0 + βx{λW + (1− λ)µx}.

• As we will see later, in general problems we can do a likelihood analysis if we

know the distribution ofX givenW .

Lecture 2 20

TRANSPORTABILITY OF ESIMATES

• It is tempting to try to use outside data and transport an estimate ofλ to yourproblem.

∗ Bad idea!!!

λ =σ2

x

σ2x + σ2

u

∗ Note how this depends on the distribution ofX.

∗ It is rarely the case that two populations have the sameX distribution, evenwhen the same instrument is used.

I the instrument effects onlyσ2u, notσ2

x

• Maybe one should transport the estimate ofσ2u and use

λ =σ2

w − σ2u,transported

σ2w

Lecture 2 21

EXTERNAL DATA AND TRANSPORTABILITY

• We say that a model istransportableacross studies if the model holdswith the

same parametersin the two studies.

∗ Internal data is ideal since there is no question about transportability.

• With external data, transportability back to the primary study cannot be

taken for granted.

∗ Sometimes transportability clearly will not hold. Then the value of the external

data is, at best, questionable.

∗ Even is transportability seems to be a reasonable assumption, it is still just that,

an assumption.

Lecture 2 22

EXTERNAL DATA AND TRANSPORTABILITY

• As an illustration, consider two nutrition data sets which use exactly the same FFQ

• Nurses Health Study

∗ Nurses in the Boston Area

• American Cancer Society

∗ National sample

• Since the same instrument is used, error properties should be about the same.

∗ But maybe note the entire distribution!!

∗ var(differences, NHS = 47)

∗ var(differences, ACS = 45)

∗ var(sum, NHS = 152)

∗ var(sum, ACS = 296)

Lecture 2 23

10 20 30 40 50

010

2030

4050

pccal.nhs

NHS

10 20 30 40 50

010

2030

40

pccal.acs

ACS

FFQ Histograms in NHS and ACS

Lecture 2 24

IS THERE AN “EXACT” PREDICTOR?

• One can distinguish betweentwo concepts ofX:

∗ theactual exact predictor, which may never be observable under any circum-

stances.

∗ the operationally-defined exact predictorwhich could be observed, albeit at

great cost and effort.

• Example:X is long-term caloric intake.

∗ actualX is long-term intake over lifetime.

∗ operationally-definedX is long-term intake since inception of the study, as

measured by the best available instrument.

Lecture 2 25

IS THERE AN “EXACT” PREDICTOR? —CONTINUED

• One could probably construct hypothetical examples where the effects of actual

X and of operationally-definedX are quite different, even opposite in sign.

• Obviously, a policy to changeX, say to reduce intake of saturated fat or of calo-

ries, affects the actualX, not the operationally definedX.

• Nonetheless, the operationally-definedX is all we can work with.

• “Gold standard” generally means the operationally-definedX.

∗ We will take “exact predictor” and “gold standard” to both refer to operationally-

definedX.

∗ However, for some authors, “gold standard” mean the actual exactX even if

not operationally-defined.

Lecture 2 26

THE BERKSONMODEL

• Theclassical Berkson modelsays thatTrue Exposure = Observed Exposure + Mean Zero Error

X = W + Ub (or X = W × Ub),

• It is assumed thatW andUb are indep. andE(U) = 0 (additive error) orE(U) = 1

(multiplicative error) so that

E(X|W ) = W

• Compare with classical measurement error model where

W = X + U

andE(X|W ) = λW + (1− λ)µx.

Lecture 2 27

THE BERKSONMODEL — CONT.

• From previous pageE(X|W ) = W

• In the linear regression model,

∗ Ignoring error still leads to unbiased intercept and slope estimates,∗ but the error about the line is increased.

I To see this, note that

Y = β0 + β1(W + Ub) + ε = β0 + β1W + (β1Ub + ε)

• In the logistic model with normally distributed measurement error, ignoring error

∗ Leads to a bias in slope (and intercept):

Est. Slope≈ True Slope√1 + β2

1σ2u,b/2.9

.

∗ For many problems, this attenuation is only minor.

Lecture 2 28

END OF LECTURE2

Lecture 3 1

LECTURE 3: BERKSON/CLASSICAL,

STRUCTURAL/FUNCTIONAL,

REGRESSION CALIBRATION

OUTLINE

• What is Berkson? What is classical?

• Functional versus structural modeling

∗ Classical and flexible structural modeling

• Regression calibration

• Multiplicative error

Lecture 3 2

WHAT ’S BERKSON? WHAT ’S CLASSICAL?

• Berkson: X = W + Ub andW andUb are independent

• Classical: W = X + Uc andX andUc are independent

• In practice, it may be hard to distinguish between the classical and the Berksonerror models.

∗ In some instances, neither holds exactly.

∗ In some complex situations, errors may have both Berkson and classical com-ponents, e.g., when the observed predictor is a combination of 2 or more error–prone predictors.

• Berkson model: a nominal value is assigned.

∗ Direct measures cannot be taken, nor can replicates.

• Classical error structure: direct individual measurements are taken, and can bereplicated but with variability.

Lecture 3 3

WHAT ’S BERKSON? WHAT ’S CLASSICAL?

• The reason why people have trouble distinguishing the two is that in real life, it

takes hard thought to do so.

Lecture 3 4

LET’S PLAY STUMP THE EXPERTS!

• Framingham Heart Study

∗ Predictor is systolic blood pressure

• All workers with the same job classification and age are assigned the same expo-

sure based on job exposure studies.

• Using a phantom, all persons of a given height and weight with a given recorded

dose are assigned the same radiation exposure.

• The radon gas concentration in houses that have been torn down is estimated by

the average concentration in a sample of nearby houses

Lecture 3 5

BERKSONERROR IN NONLINEAR MODELS

• Suppose thatY = f (X, β) + ε

• Then

Y = f (W,β) +1

2f ′′(W,β)σ2

b +

{f ′(W,β)Ub +

1

2f ′′(W,β)(U 2

b − σ2b ) + ε

}

∗ Here

f ′(W,β) =∂

∂Wf (W,β)

and so forth

∗ The nature of the bias(in red)depends on the sign off ′′(W,β)

∗ In the linear casef ′′(W,β) = 0 and we get the previous result: no bias

Lecture 3 6

M IXTURES OF BERKSONAND CLASSICAL ERROR

• Let L be long-term average radon concentration for all houses in a neighborhood

• X is long-term average radon concentration in a specific house in this neighbor-hood

• W is average measured radon concentration for a sample of houses in this neigh-borhood

• Model of Tosteson and Tsiatis (1988)

X = L + Ub

W = L + Uc

L, Ub, Uc are independent

∗ Used by Reeves, Cox, Darby, and Whitley (1998) and Mallick, Hoffman, andCarroll (2002)

Lecture 3 7

M IXTURES OF BERKSONAND CLASSICAL ERROR

• From previous page: Model of Tosteson and Tsiatis (1988)

X = L + Ub

W = L + Uc

• Uc = 0 ⇒ X = W + Ub ⇒ Berkson

• Ub = 0⇒ W = X + Uc ⇒ Classical

• More generally,

W = X + Uc − Ub

andUc is independent ofX while Ub is independent ofW

Lecture 3 8

FUNCTIONAL AND STRUCTURAL MODELING CLASSICAL ERROR

MODELS

• The common linear regression texts make distinction:

∗ Functional: X ’s arefixed constants

∗ Structural: X ’s arerandom variables

• If you pretend that theX ’s are fixed constants, it seems plausible to try to estimatethem as well as all the other model parameters.

• This is the functional maximum likelihood estimator.

∗ Every textbook has the linear regression functional maximum likelihood esti-mator.

• Unfortunately, the functional MLE in nonlinear problems has two defects.

∗ It’s really nasty to compute.

∗ It’s a lousy estimator (badly inconsistent).

• Structural⇒ empirical Bayes type analysis

Lecture 3 9

FUNCTIONAL AND STRUCTURAL MODELING CLASSICAL ERROR

MODELS

• More useful distinction:

∗ Functional modeling: No assumptions made about theX ’s (could be random

or fixed)

∗ Classical structural modeling: Strong parametric assumptions made about

the distribution ofX. Generally normal, lognormal or gamma.

∗ Flexible structural modeling: Structural, but flexible parametric family. Tries

to get the best of both worlds.

• Flexible is similar in spirit toQuasi-structural modeling (Pierce et al., 1992)

∗ Assume the that distribution ofX is the (unknown) empirical distribution of

theX values in the sample

Lecture 3 10

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• Key questions:

∗ How sensitive are inferences to an assumedSTRUCTURAL model?

∗ How much does it “cost” to be functional?

• FUNCTIONAL : No need to perform extensive sensitivity analyses.

• Many functional methods are simple to implement (and some are computed using

little more than standard software).

• Functionality focuses emphasis on the error model.

• Because of “latent-model” robustness, a functional analysis serves as a useful

check on a parametric structural model.

Lecture 3 11

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• Structural models can be viewed as anempirical Bayesmethod of dealing with

a large number of nuisance parameters (the true covariate values) (Whittemore,

1989)

• Best knownfunctional methods can beinefficient (missing data, thresholds, non-

parametric regression).

Lecture 3 12

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• One should consider the distribution of the trueX ’s

∗ Example: A-bomb survivor study (Pierce et al., 1992):

I All subjects survived the acute effects

I This provides some information

I A high radiation measurement is likely to be due to a high measurement

error

∗ This information can, perhaps, be captured by during structural modeling

I The distribution of true doses is different among survivors than among all

exposed to the attacks

Lecture 3 13

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• My opinion is that one should usestructural models

∗ But care is needed when modeling the distribution ofX

Lecture 3 14

SOME FUNCTIONAL METHODS

• Regression Calibration/Substitution

∗ Replaces true exposureX by an estimate of itbased only on covariatesbut not

on the response.

∗ In linear model with additive errors, this is the classicalcorrection for attenua-

tion.

∗ In Berkson model, one simply ignores the measurement error.

• SIMEX is a fairly general functional method.

∗ It assumes only that you have an error model and that you can “add on” mea-

surement error to make the problem worse.

Lecture 3 15

REGRESSIONCALIBRATION —BASIC IDEAS

• Key idea: replace the unknownX by E(X|Z, W ) which depends only on the

known(Z, W ).

∗ This provides an approximate model forY in terms of(Z, W ).

∗ Called the “conditional expectation approach” by Lyles and Kupper (1997)

• Generally applicable.

∗ Depends on the measurement error being “not too large” in order for the ap-

proximation to be suffciently accurate.

∗ For some models, if var(X|Z,W ) is constant then the only bias due to the

regression calibration approximation is in the intercept parameter.

Lecture 3 16

REGRESSIONCALIBRATION —BASIC IDEAS

Why does regression calibration work?

• X = E(X|Z,W ) + U whereU is uncorrelated with any function of(Z, W )

• If

Y = XTβx + ZTβz

then

Y = E(X|Z, W )Tβx + ZTβz + (UTβx + ε)

where(UTβx + ε) is uncorrelated with the regressorsZ andE(X|Z, W )

• Therefore, regression ofY onE(X|Z,W ) andZ gives unbiased estimates

– E(X|Z,W ) could be replaced by the linear regression (best linear predictor)

of X using(Z,W )

Lecture 3 17

THE REGRESSIONCALIBRATION ALGORITHM

• The general algorithm is:

∗ Using replication, validation, or instrumental data, develop a modelE(X|Z, W ) =

m(Z,W, γcm) and estimateγcm.

∗ ReplaceX by m(Z, W, γcm) and run your favorite analysis.

∗ Obtain standard errors by the bootstrap or the “sandwich method.”

I Can automatically correct for uncertainty aboutγcm

• In linear regression, regression calibration is equivalent to the “correction for at-

tenuation.”

Lecture 3 18

THE REGRESSIONCALIBRATION ALGORITHM

• Easily adjusts for different amount of replication

• If W i is the average ofmi replicates then

λi =σ2

x

σ2x + σ2

u/mi

E(Xi|W i) = λiW i + (1− λi)µx

Lecture 3 19

LOGISTIC REGRESSION, NORMAL X

• Consider the logistic regression model

Pr(Y = 1|X) = {1 + exp(−β0 − βxX)}−1 = H(β0 + βxX).

• Suppose that

∗ X andU are normally distributed.

∗ ThenX givenW is normal with meanE(X|W ) = µx(1−λ)+λW and varianceλσ2

u.

• If the event is “rare”, thenH(β0 + βxX) ≈ exp(β0 + βxX).

• Then, by using moment generating functions, the observed data follow

Pr(Y = 1|W ) = E [{Pr(Y = 1|X, W )} |W ] =why?

E [{Pr(Y = 1|X)} |W ]

≈ E {exp(β0 + βxX)|W} = exp{β0 + (1/2)β2xλσ2

u + βxE(X|W )}≈ H{β0 + (1/2)β2

xλσ2u + βxE(X|W )}.

• Typically, the regression calibration approximation works fine in this case.

Lecture 3 20

LOGISTIC REGRESSION, NORMAL X

• A different approximation uses the fact thatH(x) ≈ Φ(kx) with k = 0.588 ≈1/1.70

−10 −5 0 5 100

0.5

1normallogistic

−10 −5 0 5 10−0.01

0

0.01

error

−3 −2 −1 0 1 2 3−0.2

0

0.2

relative error

error = {Φ(kx)−H(x)} and relative error = Error /H(x).

Lecture 3 21

LOGISTIC REGRESSION, NORMAL X

Pr(Y = 1|X) = Φ{k(β0 + βxX)} = Pr{U ≤ k(β0 + βxX)}

Therefore

Pr(Y = 1|W ) = E[Pr{U ≤ k(β0 + βxX)}|W ]

= Pr[U − βxkUb ≤ k{β0 + βxE(X|W )}]

= Φ

[k{β0 + βxE(X|W )}(1 + β2

xk2σ2

X|W )1/2

]≈ H

{β0 + βxE(X|W )

(1 + β2xk

2σ2X|W )1/2

}

Here

Ub ∼ N(0, σ2X|W ) = N(0, λσ2

u)

Lecture 3 22

LOGISTIC REGRESSION, NORMAL X

A third approximation is

Pr(Y = 1|W ) = H [β0 + βx{E(X|W ) + Ub}|W ]

≈ H{β0 + βxE(X|W )} +1

2β2

x H ′′{β0 + βxE(X|W )}σ2X|W

Note that

H ′′(x) > 0 if x < 0

H ′′(x) < 0 if x > 0

• This approximation is applicable to most models, not just logistic regression

Lecture 3 23

ESTIMATING THE CALIBRATION FUNCTION

• Need to estimateE(X|Z,W ).

∗ How this is done depends, of course, on the type of auxiliary data available.

• Easy case: validation data

∗ suppose one has internal, validation data.

∗ then one can simply regressX on (Z, W ) and transports the model to the non-

validation data.

∗ of course, for the validation data one regressesY on (Z, X), and this estimatemust be combined with the one from the non-validation data.

• Same approach can be used for external validation data, but with the usual concernfor non-transportability.

Lecture 3 24

ESTIMATING THE CALIBRATION FUNCTION: INSTRUMENTAL

DATA : ROSNER’ S METHOD

• Internal unbiased instrumental data:

∗ supposeE(T |Z,X) = E(T |Z,X,W ) = X so thatT is an unbiased instru-ment.

∗ If T is expensive to measure, thenT might be available for only a subset of thestudy.W will generally be available for all subjects.

∗ thenE(T |Z,W ) = E{E(T |Z, X,W )|Z, W} = E(X|Z, W ).

• Thus,T regressed on(Z, W ) follows the same model asX regressed on(Z, W ),although with greater variance.

∗ So one regressesT on (Z, W ) to estimate the parameters in the regression ofX on (Z, W ).

Lecture 3 25

ROSNER’ S METHOD, CONT.

• More formally, suppose we have the modelWi = γ0,em + γtx,emXi + γt

z,emZi + Ui

(error calibration model)

• Under joint normality, this implies a calibration model

Xi = γ0,cm + γtw,cmWi + γt

z,cmZi + Ucm,i

∗ Sometimes one can use validation data to regressX onZ andW

∗ Suppose we only have an “alloyed gold standard”Ti = Xi + UT,i on a “valida-

tion sample”

I the regressT onZ andW

I This works! — think ofTi = Xi + Ui as outcome measurement error

Lecture 3 26

ESTIMATING THE CALIBRATION FUNCTION: REPLICATION DATA

• Suppose that one has unbiased internal replicate data:

∗ Wij = Xi + Uij, i = 1, . . . , n andj = 1, . . . , ki, whereE(Uij|Zi, Xi) = 0.

∗ W i· := 1ki

∑j Wij.

∗ Notation:µz is E(Z), Σxz is the covariance (matrix) betweenX andZ, etc.

• Use standard least squares theory to get the best linear unbiased predictor ofX

from (W,Z):

E(X|Z, W ) ≈ µx + (Σxx Σtxz)

{Σxx + Σuu/k Σxz

Σtxz Σzz

}−1 (W − µw

Z − µz

)

(best linear approximation= exact conditional expectation under joint normality).

• Need to estimate the unknownµ’s andΣ’s.

• Essentially a flexible structure method since existence ofµx andΣxx is assumed.

Lecture 3 27

ESTIMATING THE CALIBRATION FUNCTION: REPLICATION DATA ,

CONTINUED

• µz andΣzz are the “usual” estimates since theZ ’s are observed.

• µx = µw =∑n

i=1 kiW i/∑n

i=1 ki.

• Σxz =∑n

i=1 ki(W i· − µw)(Zi − µz)t/ν

whereν =∑

ki −∑

k2i /

∑ki.

•Σuu =

∑ni=1

∑kij=1(Wij −W i·)(Wij −W i·)

t

∑ni=1(ki − 1)

.

•Σxx =

[{n∑

i=1

ki(W i· − µw)(W i· − µw)t

}− (n− 1)Σuu

]/ν.

Lecture 3 28

REGRESSIONCALIBRATION VERSUSCORRECTINGTHE NAIVE

ESTIMATOR FOR ATTENUATION

• If

g{E(Y |X)} = β0 + β1X (GLM)

and

E(X|W ) = α0 + λW

then these two slope estimators are equivalent:

∗ RegressY onE(X|W ) (e.g., Rosner, Spiegelman, Willett, 1989)

∗ RegressY on W and divide the slope byλ (e.g., Carroll, Ruppert, Stefanski,

1995)

• Equivalence does not hold in general, e.g., for multiplicative error, whenE(X|W )

is nonlinear.

Lecture 3 29

MULTIPLICATIVE ERROR

• The multiplicative lognormal error model is

W = X U, log(U) ∼ N(µu, σ2u) (1)

• If log(X) is normal then we can convert (1) to a model forX givenW

log(X) = α + λ log(W ) + Ub

whereUb is N(0, σ2b ) and independent ofW

• If Y were linear inlog(X) then we would work withlog(X) and have an additive

error model

∗ But typically it is assumed thatY is linear inX, not log(X)

Lecture 3 30

MULTIPLICATIVE ERROR

• Themultiplicative errormodelwith outcome linear in exposureis well-supported

by empirical work:

∗ Lyles and Kupper (1997,Biometrics):

I “there is much evidence for this model”

I “the more biologically relevant predictor is true mean exposure on theorig-

inal scale”

Lecture 3 31

MULTIPLICATIVE ERROR

∗ Pierce, Stram, Vaeth, Schafer (1992, JASA)

I data from Radiation Effects Research Foundation (RERF) in Hiroshima

I “On both radiobiological and empirical grounds, focus is on models where

the expected response is linear or quadratic in dose rather than linear on

logistic and logarithmic scales.”

I “It is accepted that radiation dose-estimation errors are more homogeneous

on a multiplicative than on an additive scale”

I “The distribution of true doses is extremely nonnormal”

I However, it seemed unlikely that a lognormal model for the true doses would

fit as well as the Weibull model that they used

Lecture 3 32

MULTIPLICATIVE ERROR

∗ Seychelles study (preliminary analysis of Thurston)

I validation data (brain versus maternal hair MeHg): error appears to be mul-

tiplicative and lognormal

I researchers use MeHg concentration, not log concentration, as the exposure

Lecture 3 33

MULTIPLICATIVE ERROR

• from previous page:

log(X) = α + λ log(W ) + Ub (2)

• From (2)

X = W λ exp(Ub)

and

E(X|W ) = W λ exp(α + σ2

b/2)

Not linear in W

Lecture 3 34

MULTIPLICATIVE ERROR

• Therefore ifE(Y |X) = β0 + β1X then

E(Y |W ) = β0 +{β1 exp

(α + σ2

B/2)}

W λ

• Therefore, regressY onW λ and divide the slope estimate byexp(α + σ2

B/2)

∗ This is regression calibration again

• Note that the regression ofY on W is not linear even though the regression ofY

onX is linear

∗ This is because the regression ofX onW is not linear

Lecture 3 35

MULTIPLICATIVE ERROR

0 1 2 3 4 5 6−0.5

0

0.5

1

1.5

2

predictor

outc

ome

n=50

true xsurrogate w

Simulation: W = XU , log(X) = N(0, 1/4), log(U) = N(−1/8, 1/4)

Y = X/2 + N(0, 0.04)

Lecture 3 36

MULTIPLICATIVE ERROR

0 5 10 15 20−1

0

1

2

3

4

5

w

outc

ome

n=10000

w−yfit to x−yfit to w−y

W = XU , log(X) = N(0, 1/4), log(U) = N(−1/8, 1/4)

Y = X/2 + N(0, 0.04)

Lecture 3 37

MULTIPLICATIVE ERROR

0 1 2 3 4 5−1

0

1

2

3

4

5

w1/2

outc

ome

n=10000 w1/2−yfit

W = XU , log(X) = N(0, 1/4), log(U) = N(−1/8, 1/4)

Y = X/2 + N(0, 0.04) Note: λ = 1/2

Lecture 3 38

MULTIPLICATIVE ERROR

Other work

• Hwang (1986) regressesY onW (notW λ) and then corrects

∗ This is the “correction method” that is equivalent to regression calibrationonly

whenE(X|W ) is linear inW – which is not the case here

∗ Consistent but badly biased in simulations of Lyle and Kupper

∗ Bias is still noticeable forn = 10, 000

∗ Previous plots suggest why

∗ These results are consistent with those of Iturria, Carroll, and Firth (1999)

Lecture 3 39

MULTIPLICATIVE ERROR

Other work

• Lyles and Kupper (1999) propose a method based on quasi-likelihood

∗ somewhat superior to regression calibration in a simulation study

I especially when measurement error is large relative to equation error (plus

measurement error inY )

∗ it is a weighted version of regression calibration

I more efficient than regression calibration because it weights according to

inverse conditional variances

∗ a special case of expanded regression calibration (Carroll, Ruppert, Stefanski,

1995)

Lecture 3 40

MULTIPLICATIVE ERROR

Other work

• Iturria, Carroll, and Firth (1999) study polynomial regression with multiplicative

error

∗ One of their general methods is a special case of Hwang’s

∗ Their “partial regression” estimator assumes lognormality and generalizes the

estimator discussed earlier

∗ In a simulation with lognormalX andU , the partial regression estimator is

superior to the ones that make less assumptions

I this is similar to the results of Lyles and Kupper

Lecture 3 41

TRANSFORMATIONSTO ADDITIVE ERROR

• Model ish(W ) = h(X) + U

∗ Includes additive and multiplicative error as a special case

• Eckert, Carroll, and Wang (1997) consider parametric (power) and nonparametric(monotonic cubic spline) models forh

∗ can transform to a specific error distribution

∗ could, instead, transform to constant error variance

∗ if model holds, then both methods have the same target transformation

∗ it could be more efficient to use both sources of information (Ruppert andAldershof, 1989)

• Nusser et al. (1996) use transformations to estimate the distribution ofX whereX is the daily intake of a dietary component

Lecture 3 42

END OF LECTURE3

Lecture 4 1

LECTURE 4: SIMEX AND INSTRUMENTAL VARIABLES

OUTLINE

• The Key Idea Behind SIMEX (Simulation/Extrapolation)

• An Empirical Version of SIMEX

• Details of Simulation Extrapolation Algorithm

• Example: Measurement Error in SBP in Framingham Study

• IV (Instrumental Variables): Rationale

• IV via prediction and the IV algorithm

• Example: Calibrating a FFQ

• Example: CHD and Cholesterol

Lecture 4 2

ABOUT SIMULATION EXTRAPOLATION

• A functional method

∗ no assumptions about the trueX values

• For bias reduction and variance estimation

∗ like bootstrap and jackknife

• Not model dependent

∗ like bootstrap and jackknife

• Handles complicated problems

• Computer intensive

• Approximate, less efficient for certain problems

Lecture 4 3

THE KEY IDEA

• Theeffects of measurement error on an estimator can be studied with a sim-

ulation experiment in which additional measurement error is addedto the

measured data and the estimator recalculated.

• “Response variable” is the estimator under study

• “Independent factor” is the measurement error variance

∗ “Factor levels” are the variances of the added measurement errors

• Objective is to study how the estimator depends on the variance of the measure-

ment error

Lecture 4 4

OUTLINE OF THE ALGORITHM

• Add measurement error to variable measured with error

∗ Λ controls amount of added measurement error

∗ σ2u increased to(1 + Λ)σ2

u

∗ Average over many simulations to remove Monte Carlo variation

• Recalculate estimates — called pseudo-estimates

• Plot Monte Carlo average pseudo-estimates versusΛ

• Extrapolate toΛ = −1

∗ Λ = −1 corresponds to case of no measurement error

Lecture 4 5

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

Illustration of SIMEX

Your estimate when you ignore measurement error.

Lecture 4 6

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

Illustration of SIMEX

What happens to your estimate when you have more error, which

you add on by simulation, but you still ignore the error.

Lecture 4 7

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

Illustration of SIMEX

What statistician can resist fitting a curve?.

Lecture 4 8

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

SIMEX Estimate

Illustration of SIMEX

Now extrapolate to the case of no measurement error.

Lecture 4 9

AN EMPIRICAL VERSIONOF SIMEX: FRAMINGHAM DATA

EXAMPLE

• Data

∗ Y = indicator of CHD

∗ Wk = SBP at Examk, k = 1, 2

∗ X = “true” SBP

∗ Data:

(Yj, W1,j, W2,j), j = 1, . . . , 1660

• Model Assumptions

∗ W1, W2 | X iid N(X, σ2u)

∗ Pr(Y = 1 | X) = H(α + βX), H logistic

Lecture 4 10

FRAMINGHAM DATA EXAMPLE : THREE NAIVE ANALYSES:

• RegressY onW • 7−→ βA

• RegressY onW1 7−→ β1

• RegressY onW2 7−→ β2

Measurement Error Slope

Λ Variance= (1 + Λ)σ2u/2 Estimate

−1 0 ?

0 σ2u/2 βA

1 σ2u β1, β2

Lecture 4 11

Logistic regression fits in Framingham using

first replicate, second replicate and average of both

Lecture 4 12

A SIMEX–type plot for the Framingham data,

where the errors are not computer–generated.

Lecture 4 13

A SIMEX–type extrapolation for the Framingham data,

where the errors are not computer–generated.

Lecture 4 14

SIMULATION AND EXTRAPOLATION STEPS: ADDING

MEASUREMENTERROR

• Framingham Example:

∗ W • 7−→ W1 = W • + (W1 −W2)/2

∗ W • 7−→ W2 = W • + (W2 −W1)/2

• In General:

∗ Best Estimator(data

) 7−→ Best Estimator(data +

√Λ {Independent N

(0, σ2

u

)

Error } )

∗ Λ controls amount (variance) of added measurement error

Lecture 4 15

SIMULATION AND EXTRAPOLATION ALGORITHM: ADDING

MEASUREMENTERROR (DETAILS)

• ForΛ ∈ {Λ1, . . . , ΛM}

• For b = 1 (1) B, compute:

∗ bth pseudo data set

Wb,i(Λ) = Wi +√

Λ Normal(0, σ2

u

)b,i

∗ bth pseudo estimate

θb(Λ) = θ({Yi,Wb,i(Λ)}n

1

)

∗ the average of the pseudo estimates

θ(Λ) = B−1B∑

b=1

θb(Λ) ≈ E(θb(Λ) | {Yj, Xj}n

1

)

Lecture 4 16

SIMULATION AND EXTRAPOLATION STEPS: EXTRAPOLATION

• Framingham Example: (two pointsΛ = 0, 1)

∗ Linear Extrapolation —a + bΛ

• In General: (multipleΛ points)

∗ Linear —a + bΛ

∗ Quadratic —a + bΛ + cΛ2

∗ Rational Linear —(a + bΛ)/(c + Λ) (exact for linear regression)

Lecture 4 17

SIMULATION AND EXTRAPOLATION ALGORITHM:

EXTRAPOLATION (DETAILS)

• Plot θ(Λ) vsΛ (Λ > 0)

• Extrapolate toΛ = −1 to getθ(−1) = θSIMEX

• For certain models and assuming the extrapolant function is chosen correctly

• E{

θ(−1) | True Data}

= θTRUE

Lecture 4 18

EXAMPLE : MEASUREMENTERROR IN SYSTOLIC BLOOD

PRESSURE

• Framingham Data:(Yj, Agej, Smokej, Cholj WA,j

), j = 1, . . . , 1615

∗ Y = indicator of CHD

∗ Age (at Exam 2)

∗ Smoking Status (at Exam 1)

∗ Serum Cholesterol (at Exam 3)

∗ Transformed SBP

WA = (W1 + W2) /2,

Wk = ln (SBP− 50) at Examk

Lecture 4 19

EXAMPLE : PARAMETER ESTIMATION

• Consider logistic regression ofY on Age, Smoke, Chol and SBP with transformed

SBP measured with error

∗ The plots on the following page illustrate the simulation extrapolation method

for estimating the parameters in the logistic regression model

Lecture 4 20

• • • • • • • •

Age

LambdaC

oeffi

cien

t (x

1e+

2)-1.0 0.0 1.0 2.0

4.36

4.95

5.54

6.13

6.72

0.05360.0554

• • • • • • • •

Smoking

Lambda

Coe

ffici

ent (

x 1

e+1)

-1.0 0.0 1.0 2.0

3.43

4.68

5.93

7.18

8.43

0.601 0.593

• • • • • • • •

Cholesterol

Lambda

Coe

ffici

ent (

x 1

e+3)

-1.0 0.0 1.0 2.0

5.76

6.82

7.87

8.93

9.98

0.00782

0.00787•

••

••

••

Log(SBP-50)

Lambda

Coe

ffici

ent (

x 1

e+0)

-1.0 0.0 1.0 2.0

1.29

1.50

1.71

1.92

2.12

1.93

1.71

Lecture 4 21

INSTRUMENTAL VARIABLES: RATIONALE

• Remember,W = X + U , U ∼ Normal(0, σ2u).

• The most direct and efficient way to get information aboutσ2u is to observeX on

a subset of the data.

• The next best way is via replication, namely to take≥ 2 independent replicates

∗ W1 = X + U1 and W2 = X + U2.

∗ If these are indeed replicates, then we can estimateσ2u via a components of

variance analysis.

• The third method is to useInstrumental Variables.

∗ SometimesX cannot be observed and replicates cannot be taken.

∗ Then IV’s can help.

Lecture 4 22

WHAT IS AN INSTRUMENTAL VARIABLE ?

Y = β0 + βxX + ε;

W = X + U ;

U ∼ Normal(0, σ2u).

• In linear regression, an instrumental variableT is a random variable which hasthree properties:

∗ T is independent ofε

∗ T is independent ofU

∗ T is related toX.

∗ You only measureT to get information about measurement error: it is not part

of the model.

∗ In our parlance,T is a surrogate forX!

• WhetherT qualifies as an instrumental variable can be a difficult question.

Lecture 4 23

AN EXAMPLE : CALIBRATING A QUESTIONNAIRE

X = usual (long–term) average intake of Fat (log scale);

Y = Fat as measured by a questionnaire;

W = Fat as measured by 6 days of 24–hour recalls

T = Fat as measured by a diary record

• In this example, the time ordering was:

∗ Questionnaire

∗ Then one year later, the recalls were done fairly close together in time.

∗ Then 6 months later, the diaries were measured.

• One could think of the recalls as replicates, but some researchers have worriedthat substantial correlations exist, i.e., they are notindependentreplicates.

• The 6–month gap with the recalls and the 18–month gap with the questionnairemakes the diary records a good candidate for an instrument.

Lecture 4 24

USING INSTRUMENTAL VARIABLES: MOTIVATION

• In what follows, we will use underscores to denote what coefficients go where.

• For example,βY |1X is the coefficient forX in the regression ofY onX.

• Let’s do a little algebra:

Y = βY |1X + βY |1XX + ε;

W = X + U ;

(ε, U) = independent ofT.

• This means

E(Y | T ) = βY |1T + βY |1TT = βY |1X + βY |1XE(X | T )

= βY |1X + βY |1XE(W | T ) = βY |1T + βY |1XβW |1TT.

Lecture 4 25

MOTIVATION — CONT.

• From previous page

E(Y | T ) = βY |1T + βY |1X βW |1TT.

• We want to estimateβY |1X

∗ From above

βY |1T = βY |1X βW |1T

∗ Equivalently,

βY |1X =βY |1TβW |1T

.

∗ RegressY onT and divide its slope by the slope of the regression ofW onT !

Lecture 4 26

THE DANGERSOF A WEAK INSTRUMENT

• Remember that we get the IV estimate using the relationship

βY |1X =βY |1TβW |1T

.

• The division causes increased variability.

∗ If the instrument is very weak, the slopeβW |1T will be near zero.

∗ This will make the IV estimate very unstable.

Lecture 4 27

IV V IA PREDICTION

• Here’s another way to do the same thing and get the IV estimator without doing

division explicitly:

∗ RegressW onT .

∗ Form predicted values:βW |1T + βW |1TT .

∗ RegressY on these predicted values

∗ The slope is the instrumental variables estimate.

Lecture 4 28

LOGISTIC REGRESSIONEXAMPLE

• These data come from a paper by Satten and Kupper.

Y = Evidence of Coronary Heart Disease (binary)W = Cholesterol LevelT = LDLZ = Age and Smoking Status

• It’s not particularly clear thatT is really an instrument. While it may well beuncorrelated with the error inW and the random variation inY , it might also beincluded as part of the model itself!

∗ There are no replicates here to compare with.

• Note here that we have added variables,Z, which are measured without error.

• The algorithm does not change: all regressions are done withZ as an extra covari-ate.

Lecture 4 29

IV A LGORITHM: NON-GAUSSIAN OUTCOME

• The IV algorithm in linear regression is:

STEP 1: RegressW onT andZ (may be a multivariate regression)

STEP 2: Form the predicted values of this regression

STEP 3: RegressY on the predicted values andZ.

STEP 4: The regression coefficients are the IV estimates.

• Only Step 3 changes if you do not have linear regression but instead have logisticregression or a GLM.

∗ Then the “regression” is logistic or GLM.

∗ Very simple to compute.

∗ Easily bootstrapped.

• This method is “valid” in GLM’s to the extent that regression calibration is valid.

Lecture 4 30

LOGISTIC REGRESSIONEXAMPLE : RESULTS

• We used LDL/100, Cholesterol/100, Age/100

• The naive logistic regression leads to the following analysis:

∗ Slope and bootstrap s.e. for Cholestrol: 0.65 and 0.29

∗ Slope and bootstrap s.e. for Smoking: 0.065 and 0.260

∗ Slope and bootstrap s.e. for Age: 7.82 and 4.26

• The IV logistic regression leads to the following analysis:

∗ Slope and bootstrap s.e. for Cholestrol: 0.91 and 0.33

∗ Slope and bootstrap s.e. for Smoking: 0.056 and 0.259

∗ Slope and bootstrap s.e. for Age: 7.79 and 4.31

• Note here how only the coefficient for cholesterol is affected by the measurementerror.

Lecture 4 31

END OF LECTURE4

Lecture 5 1

LECTURE 5: BAYESIAN METHODS

OUTLINE

• Likelihood: Outcome, measurement, and exposure models

• Priors and posteriors

• MCMC

• Example: linear regression

• Example: Mixtures of Berkson and classical error and application to thyroid can-

cer and exposure to fallout

Lecture 5 2

BAYESIAN METHODS

Outcome model:(true disease model)

f (Y |X,Z,θO)

Could be a GLM or a “nonparametric” spline model

Measurement model:(measurement error model)

f (W |Y,X,Z,θM) = f (W |X,Z,θM) if nondifferential (which is assumed here)

Could be a classical Gaussian measurement error model

Exposure model:(predictor distribution model)

f (X|Z, θX)f (Z|θZ) (usuallyf (Z|θZ) is ignored)

Could be a low-dimensional parametric model or a more flexible model (e.g., normalmixture model)

Lecture 5 3

PRIOR

π(θO), π(θM), π(θX), π(θZ) (usuallyπ(θZ) is ignored)

Lecture 5 4

L IKELIHOOD

f (Y, W |θO, θM , θX, Z) =

∫f (Y |X,Z, θO) f (W |X,Z,θM) f (X|Z, θX) dX

Major problem: computing the integral

We will go in another direction — Bayesian analysis by MCMC.

Lecture 5 5

POSTERIOR ANDMCMC

f (θO, θM , θX, X|Y, W,Z) ∝ f (Y, W,Z,X,θO, θMθX|Z)

= f (Y |X,Z,θO) f (W |X,Z,θM) f (X|Z, θX)π(θO) π(θM) π(θX)

MCMC — For example we might sample successively from:

1. [θO|others]

2. [θM |others]

3. [θX|others]

4. [X|others]

Lecture 5 6

MCMC: L INEAR REGRESSIONWITH REPLICATE

MEASUREMENTS

[Yi|Xi, Z i, β, σε] ∼ N(ZTi βz + Xiβx, σ

2ε ) (outcome model)

[Wi,j|X ] ∼ IN(Xi, σ2u), j = 1, . . . , Ji (measurement model)

[Xi|Zi] = [Xi] ∼ N(µx, σ2x) (exposure model)

[βx] = N(0, σ2β), [βz] = N(0, σ2

βI), [µx] = N(0, σ2µ)

[σ2ε ] = IG(δ, δ), [σ2

u] = IG(δ, δ), [σ2x] = IG(δ, δ),

σβ andσµ are “large” andδ is “small”

unknowns: (βx, βz, σε), (σu), (X1, . . . , XN), (µx, σx)

Lecture 5 7

MCMC: L INEAR REGRESSION— L IKELIHOOD

Let

C i =

(Z i

Xi

), Y =

Y1

...

YN

, and β =

(βz

βx

)

Likelihood:

[Yi,Wi,1, . . . , Wi,Ji|others] ∝ 1

σεσJiu

exp

{−1

2

((Yi −CT

i β)2

σ2ε

+

∑Jij=1(Wi,j −Xi)

2

σ2u

)}

Lecture 5 8

MCMC: L INEAR REGRESSION— POSTERIORFOR β

[β|others] ∝ exp

{− 1

2σ2ε

N∑i=1

(Yi −CTi β)2 − 1

2σ2β

‖β‖2

}

Let C haveith rowCTi and letλ = σ2

ε/σ2β. Then

[β|others] = N({CTC + λI

}−1 CTY , σ2ε

(CTC + λI)−1

)

• Exactly what we get for linear regression without measurement error.

∗ Except nowC will vary on each iteration of MCMC, since it contains imputed

Xs.

Lecture 5 9

MCMC: L INEAR REGRESSION— POSTERIORFOR µx

[µx|others] ∝ exp

{−

∑Ni=1(Xi − µx)

2

2σ2x

− µ2x

2σ2µ

}

Let σ2X

= σ2x/N . Then

[µx|others] ∼ N

(1

σ2X

+1

σ2µ

)−1 (X

σ2X

+0

σ2µ

),

(1

σ2X

+1

σ2µ

)−1 ,

≈ N(X, σ2

X

)if σµ is large

Lecture 5 10

MCMC: L INEAR REGRESSION— POSTERIORFOR Xi

Define

σ2w,i =

σ2u

Ji

Then

[Xi|others] ∝ exp

[−1

2

{(Yi −Xiβx −ZT

i βz)2

σ2ε

+(Xi − µx)

2

σ2x

+(W i −Xi)

2

σ2w,i

}]

After some algebra,[Xi|others] is normalwith mean

{(Yi −ZTi βz)/βx}(β2

x/σ2ε ) + (µx)(1/σ

2x) + (W i)(1/σw,i)

(β2x/σ

2ε ) + (1/σ2

x) + (1/σ2w,i)

and variance {(β2

x/σ2ε ) + (1/σ2

x) + (1/σ2w,i)

}−1

Lecture 5 11

MCMC: L INEAR REGRESSION— POSTERIORFOR σ2ε

Prior:

[σ2ε ] = IG(δ, δ)

IG(α, β):

mean =α/β and variance =α/β2 and density∝ x−(α+1) exp(−β

x

)

Posterior:

[σ2ε |others] ∝ 1

(σ2ε )

δ+N/2+1exp

{δ + 1

2

∑Ni=1(Yi −Xiβx −ZT

i βz)2

σ2ε

}

⇒ [σ2ε |others] = IG

{δ +

N

2, δ +

1

2

N∑i=1

(Yi −Xiβx −ZTi βz)

2

}

Lecture 5 12

MCMC: L INEAR REGRESSIONPOSTERIORFOR σ2x

[σ2x |others] ∝ 1

(σ2x)

δ+N/2+1exp

{δ + 1

2

∑Ni=1(Xi − µx)

2

σ2ε

}

⇒ [σ2x |others] = IG

{δ +

N

2, δ +

1

2

N∑i=1

(Xi − µx)2

}

Lecture 5 13

MCMC: L INEAR REGRESSION— POSTERIORFOR σ2u

[σ2u |others] ∝ 1

(σ2u)

δ+∑N

i=1 Ji/2+1exp

{δ + 1

2

∑Ni=1

∑Jij=1(Wi,j −Xi)

2

σ2u

}

⇒ [σ2u |others] = IG

δ +

∑Ni=1 Ji

2, δ +

1

2

N∑i=1

Ji∑j=1

(Wi,j −Xi)2

Lecture 5 14

M IXTURES OF BERKSONAND CLASSICAL MEASUREMENT

ERROR

Mallick, Hoffman, and Carroll (2002) study models where both Berkson and classical

measurement errors are present.

• W = observed dose of radiation from fallout due to nuclear testing in Nevada

• W = C ×DCF × I × TD × FP

∗ C = time-integrated radioiodine concentration of milk

I specific to producer, not individual(Berkson)

I but one component is deposition rate ofI131 across regions of study(Clas-

sical)

Lecture 5 15

M IXTURES OF BERKSONAND CLASSICAL MEASUREMENT

ERROR

∗ DCF = ingestion dose conversion factor

I specific to age and isotope(Berkson) (Classical)

∗ I = individual milk intake rate in liters per day, measured by a food frequency

questionnaire (FFQ)(Berkson)

∗ TD = time delay (depends on source of milk — from FFQ)(Berkson)

∗ FP= frequency of purchase correction factor (from FFQ)(Berkson)

Lecture 5 16

LATENT VARIABLE MODEL

• W = measured dose

• X = true dose

• L = latent intermediate variable

• Berkson and classical errors are combined in:

log(X) = log(L) + Ub

log(W ) = log(L) + Uc

∗ Ub = 0 ⇒ log(X) = log(L) ⇒ log(W ) = log(X) + Uc (Classical)

∗ Uc = 0⇒ log(W ) = log(L) ⇒ log(X) = log(W ) + Ub (Berkson)

Lecture 5 17

LATENT VARIABLE MODEL

• σ2b + σ2

c is known

• p = σ2b/(σ2

b + σ2c ) is unknown

∗ informative priors:

I p = 1 (Berkson)

I p = 0 (Classical)

I p is uniform on [0.2, 0.8](Mixture)

I compare results from these three priors (sensitivity analysis)

• Z = age at exposure, sex (known exactly)

• state = state of residence (Utah, Nevada, Arizona) (known exactly)

Lecture 5 18

EXPOSUREAND OUTCOME MODELS

• Exposure model:

[ log(L)|Z, state = s ] ∼ N( αs0 + ZTαs1, σ2s )

• Outcome model:

∗ Y = indicator of thyroid cancer

∗ logit{pr(Y = 1|Z, X)} = β0 + ZTβ1 + log(1 + θX)

Lecture 5 19

APPROXIMATE MODEL FOR [Y |W,Z]

• Outcome model (from previous page):

∗ Y = indicator of thyroid cancer

∗ logit {pr(Y = 1|Z, X)} = β0 + ZTβ1 + log(1 + θX)

• Outcome model based onW :

∗ logit{pr(Y = 1|Z, W )} ≈ β0 + ZTβ1 + log(1 + θ γ W λx|w,t

)

where

γ = exp{(

1− λx|w,t

) (µL|Z + σ2

L|Z/2)

+ σ2b/2

}

I µL|Z andσ2L|Z are the mean and variance oflog(L) givenZ

and

λx|w,t =σ2

L|Zσ2

L|Z + σ2c

Lecture 5 20

BERKSONCASE

• λx|w,t = 1

• γ = exp(σ2b/2)

and model on previous page simplifies to

logit{pr(Y = 1|Z, W )} ≈ β0 + ZTβ1 + log (1 + θ γ W )

∗ γ > 1 so effect of Berkson error is for the naive analysis tooverestimatethe

effect of dose

∗ In the regression, replaceW by γW to correct for bias (regression calibration)

Lecture 5 21

BAYESIAN ANALYSIS

• Several parametric and semiparametric models were used

• nonparametric model for dose is constrained to be monotonically increasing

• nonparametric model for the distribution oflog(L) uses a Polya-tree prior (Lavine,

1992)

Lecture 5 22

BAYESIAN ANALYSIS: SOME RESULTSFOR PARAMETRIC

MODELS

Error model Post. mean ofθ RR at 1 Gy 95% credible interval for RR

No error 38.9 9.4 (4.5, 13.8)

Classical 74.1 17.1 (8.5, 24.6)

Berkson 31.9 7.9 (3.8, 11.4)

Mixture 56.1 13.2 (5.0, 23.1)

• Effects on a naive analysis:

∗ Berkson error⇒ overestimate effect

∗ Classical error⇒ underestimate effect

∗ Mixture of errors⇒ underestimate effect (but less than if only classical error)

Lecture 5 23

BAYESIAN ANALYSIS: SEMIPARAMETRIC MODELS

• the semiparametric models have better fit by DIC

• the semiparametric model forlog(L):

∗ leads to slightly smaller estimated posterior means ofθ and RR

∗ credible intervals also widen

• the semiparametric outcome model:

∗ leads to slightly higher estimated posterior means ofθ and RR

∗ credible intervals also widen

• Overall effect of both semiparametric models on mixture of errors model:

∗ raise RR from 13.2 to 14.2

∗ change CI from (5.0, 23.1) to (1.7, 33.6)

Lecture 5 24

BAYESIAN ANALYSIS: ASSUMPTIONOF INDEPENDENTERROR

• almost certainly false

• C (radioiodine concentration) includes

∗ deposition ofI131 by region

∗ mass interception on vegetation

∗ consumption of vegetation by cows

∗ effective half-life ofI131 in vegetation

∗ milk transfer coefficient (MTC)

• Utah study generated a distribution of log-normally distributed MTC’s with esti-mated mean and variance

∗ if parameters known, then error is Berkson

∗ parameters estimated from historical data and literature review

⇒ shared classical errors

Lecture 5 25

SENSITIVITY ANALYSIS

• For the six gender/state combinations, Berkson errors for individuals in a group

had a common correlation ofρ

ρ 0.0 0.2 0.4 0.6

E(θ|data) 56.1 65.9 84.1 95.1

CI (18.6, 102) (21.4, 120) (30.4, 143) (38.2, 152)

Lecture 5 26

END OF LECTURE5

Lecture 6 1

LECTURE 6: NONPARAMETRIC REGRESSION WITH

MEASUREMENT ERROR

OUTLINE

• Earlier approaches: deconvolution kernels, SIMEX, regression calibration

• New Bayesian spline approach (Berry, Carroll, and Ruppert, 2002, JASA)

• Simulation results

• Example: A clinical trial

• Example: TSP and mortality, a sensitivity study

Lecture 6 2

THE PROBLEM OF MEASUREMENTERROR— ILLUSTRATION

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

datatrue curveSpline

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

w

y

Data with measurement error

Lecture 6 3

THE PROBLEM OF MEASUREMENTERROR— ILLUSTRATION

−4 −2 0 2 4−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

data true curveSpline

−4 −2 0 2 4−1.5

−1

−0.5

0

0.5

1

1.5

w

y

Data with measurement error

Lecture 6 4

THE PROBLEM OF MEASUREMENTERROR

• The regression model is

Y = m(X) + ε

wherem is only known to be smooth

• Observe

Y and W = X + U, where

– E(U |X) = 0 and var(U |X) = σ2u

– U |X normally distributed

• Measurement error varianceσ2u is estimated from internal replicate data. (Observe

Wij, j = 1, . . . , Ji.)

Lecture 6 5

AVAILABLE METHODS

• Deconvolution kernels

• SIMEX

• Structural spline

• Bayesian spline

Lecture 6 6

DECONVOLUTION KERNELS

• Globally consistent nonparametric regression by deconvolution kernels (Fan and

Truong, 1993,Annals)

– does not work so well

∗ Fan & Truong show very poor asymptotic rates of convergence

∗ simulations show poor finite-sample behavior

– no methodology for bandwidth selection or inference

Lecture 6 7

SIMEX

• The SIMEX method is due to Cook & Stefanski (1995,JASA).

• SIMEX has been previously applied to parametric problems.

• Makes no assumptions about the trueX ’s. (Functional)

• Results in estimators which areapproximatelyconsistent, i.e., consistent at least

to orderO(σ6u).

Lecture 6 8

SIMEX

• Carroll, Maca, Ruppert (2001,Biometrika) (CMR ) applied the SIMEX to non-parametric regression.

• CMR have asymptotic theory in the local polynomial regression (LPR) context.

∗ The estimators have the usual rates of convergence.

∗ They are approximately consistent, to orderO(σ6u).

• An asymptotic theory with rates seems very difficult for splines

∗ but, simulations inCMR indicate that SIMEX/splines works a littlebetterthan

SIMEX/kernel

∗ problem seems due to undersmoothing

• Staudenmayer and Ruppert (2004,JRSS-B) look at bandwidth selection for SIMEX/LPR.

∗ With better bandwidth selection, SIMEX/LPR is competitive with other meth-ods.

Lecture 6 9

STRUCTURAL MODELING

• The regression ofY on the observedW is

E(Y |W ) = E {m(X)|W} =

∫m(x)f (x|W )dx.

• Suppose that we had:

∗ convenient flexible form form(x; β)

∗ convenient flexible distribution forf (x|W ).

• Then we could estimatem(X ; β) by minimizing over the datan∑

i=1

{Yi −

∫m(x; β)f (x|Wi) dx

}2

.

Lecture 6 10

REGRESSIONSPLINES — “PLUS FUNCTION” B ASIS

• Model

E(Y |X) = m(X ; β) :=

J∑j=0

βjXj +

K∑

k=1

βk+J(X − ξk)J+

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 11

1.1

1.2

1.3

1.4

1.5

0 0.2 0.4 0.6 0.8 10.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 11

1.2

1.4

1.6

1.8

Lecture 6 11

10-KNOT QUADRATIC SPLINE APPROXIMATIONS

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5sin(15x)

0 0.2 0.4 0.6 0.8 1−1

−0.5

0

0.5

1

1.5

sin{5(1+x+x2)}

0 0.2 0.4 0.6 0.8 1−0.5

0

0.5

1

1.5

Φ{18(x−.05)}

0 0.2 0.4 0.6 0.8 10

0.5

1

1.50.8x+exp{−35(x−0.5)}

“blue” = function ,“red” = spline approximation

Lecture 6 12

REGRESSIONSPLINES

• Model

E(Y |X) = m(X ; β) :=

J∑j=0

βjXj +

K∑

k=1

βk+J(X − ξk)J+

• Xj is replaced byE(Xj|W )

• (X − ξk)J+ is replaced byE{(X − ξk)

J+|W}

• The key remaining issue: the joint distribution ofX andU .

∗ CMR used a mixtures of normals for[X ] and Gibbs sampling to estimate the

parameters.(Flexible structural)

I This is an extension to measurement error of an idea of Roeder & Wasser-

man (JASA, 1997).

Lecture 6 13

FULLY BAYESIAN MODEL

What’s New?Answer: Fully Bayesian MCMC method in Berry, Carroll, and Ruppert (2002,JASA)(BCR)

• Uses splines

∗ smoothing or penalized

∗ P-splines in this lecture

• Structural

∗ Xi are iid normal

∗ but seems robust to violations of normality

Lecture 6 14

FULLY BAYESIAN MODEL

• Smoothing parameter is automatic

• Inference adjusts for the data-based smoothing parameter and for measurement

error

• Allow global confidence bands

Lecture 6 15

FULLY BAYESIAN MODEL — PARAMETERS

• Regression Model

Yi = m(xi; β) + εi

– m(xi; β) is a P-spline

– εi iid N(0, σ2ε )

• Measurement Error Model

Wij = Xi + Uij whereUij iid N(0, σ2u)

• Structural Model

Xi iid N(µx, σ2x)

• Parameters:β, σ2e , σ

2u, µx, σ

2x

Lecture 6 16

FULLY BAYESIAN MODEL — PARAMETERS

• Priors

– β is N(0, (γK)−1) whereK is known. [α := γσ2e is the smoothing parameter.]

– γ is Gamma(Aγ, Bγ)

– σ2e is Inv-Gamma(Ae, Be)

– σ2u is Inv-Gamma(Au, Bu)

– µx is N(dx, t2x)

– σ2x is Inv-Gamma(Ax, Bx)

• Hyperparameters: Ae, Be, Au, Bu, Ax, Bx, dx, t2x, Aγ, Bγ

– all fixed at values making the priors noninformative

∗ E.g.,t2x = 106.

Lecture 6 17

FULLY BAYESIAN MODEL — SMOOTHING

• Model

E(Y |X) = m(X ; β) :=

J∑j=0

βjXj +

K∑

k=1

βk+J(X − ξk)J+

• β is N(0, (γK)−1) whereK is known. [α := γσ2e is the smoothing parameter.]

• Let

K = diag(∆, . . . , ∆, 1, . . . , 1)

∗ ∆ is “small” — essentially 0

∗ J + 1 ∆’s followed byK ones

∗ The prior on the jumps at the knots is that they are iid.

∗ The prior on the polynomial coefficients is “diffuse,” i.e., “non-informative”.

Lecture 6 18

GIBBS SAMPLING

• Iterate throughβ, σ2e , σ

2u, σ

2x, µx, γ,X1, . . . , Xn.

• All steps except one are easy, either gamma, inverse-gamma, or normal

– E.g.,

[β|other parameters, Y , W ] ∼ Normal

Mean = (XTX + γK)−1XTY

Cov = σ2e(X

TX + γK)−1.

∗ HereX is one of the “other parameters”

∗ Essentially we’re fitting a spline to the imputedX ’s and the observedY ’s

Lecture 6 19

GIBBS SAMPLING

∗ Estimate ofβ, call it β, is

(XTX + γK)−1XTY

averaged overγ andX.

Lecture 6 20

GIBBS SAMPLING

• The exception to the sampling being quick and easy is that a Metropolis-Hastings

step is needed forX1, . . . , Xn.

[Xi|µx, σ2x, β, σ2

e , σ2u, Y ,W]

∝ exp

− 1

2σ2u

mi∑j=1

(Wij −Xi)2 − 1

2σ2ε

{Yi −m(Xi; β)}2 − 1

2σ2x

(Xi − µx)2

.

Lecture 6 21

BAYESIAN INFERENCE

• Let X be the spline basis function evaluated on a fine grid over some interval,

[a, b].

• Xβ is the curve on[a, b].

• Xβ is the estimated curve.

• Let Kα be the(1− α) MCMC sample quantile of

maxgrid

{X(β − β)

SD(Xβ)

}.

• Then,

Xβ ±Kα SD(Xβ)

is a 100(1− α)% simultaneous confidence band for the curve on[a, b].

Lecture 6 22

BAYESIAN INFERENCEFOR DERIVATIVES

• Let X′

be derivatives of the spline basis function evaluated on a fine grid over

[a, b].

• X′β is the curve’s derivative on[a, b].

• X′β is the estimated derivative.

• Let K ′α be the(1− α) MCMC sample quantile of

maxgrid

{X

′(β − β)

SD(X′β)

}.

• Then,

X′β ±K ′

.95 SD(X′β)

is a 100(1− α)% simultaneous confidence band for the derivative on[a, b].

Lecture 6 23

SIMULATIONS

The six cases were considered.ni ≡ 2 in each case.

Lecture 6 24

SIMULATIONS

• [Case 1]

The regression function is

m(x) =sin (πx/2)

1 + 2x2{sign(x) + 1}.

with n = 100, σ2ε = 0.32, σ2

u = 0.82, µx = 0 andσ2x = 1.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Lecture 6 25

SIMULATIONS

• [Case 2] Same as Case 1 exceptn = 200.

• [Case 3] A modification of Case 1 above except thatn = 500.

• [Case 4]

Case 1 ofCMR so that

m(x) = 1000x3+(1− x)3+,

x+ = xI(x > 0), with n = 200, σ2ε = 0.00152, σ2

u = (3/7)σ2x, µx = 0.5 and

σ2x = 0.252.

−0.5 0 0.5 1 1.50

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

Lecture 6 26

SIMULATIONS

• [Case 5]

A modification of Case 4 ofCMR so that

m(x) = 10 sin(4πx),

with n = 500, σ2ε = 0.052, σ2

u = 0.1412, µx = 0.5 andσ2x = 0.252.

−0.5 0 0.5 1 1.5−10

−8

−6

−4

−2

0

2

4

6

8

10

• [Case 6]

The same as Case 1 above except thatX is a normalized chi-square(4) random

variable. (Tests robustness against violation of the structural assumptions.)

Lecture 6 27

SIMULATIONS

Mean Squared Bias×102

Method Case 1 Case 2 Case 3 Case 4 Case 5 Case 6Naive 5.59 4.92 5.21 1,108 3,733 4.83Bayes 0.78 0.38 1.04 17.4 468 1.74Flex. Structural, 5 knots 1.38 0.62 0.46 3.7 838 1.47Flex. Structural, 15 knots 1.44 0.60 0.66 3.3 226 1.75

Mean Squared Error ×102

Method Case 1 Case 2 Case 3 Case 4 Case 5 Case 6Naive 6.91 5.57 5.38 1,155 3,793 5.77Bayes 2.84 1.56 1.47 195 1,031 2.69Flex. Structural, 5 knots 8.17 3.82 1.73 217 2,032 7.27Flex. Structural, 15 knots 9.90 5.40 1.85 237 799 6.94

Results based on 200 Monte Carlo simulations for each case. SIMEX was notincluded in the table — it was not among the best estimators.

Lecture 6 28

EXAMPLE — SIMULATED

• Y = sin(2X) + ε

• X is N(1, 1)

• σu = 1

• σe = 0.15

• n = 201

• ni = 2 for all i

• 15 knot quadratic P-splines

• 2,000 iterations of Gibbs. First 667 deleted as burn-in.

Lecture 6 29

−1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

−4 −2 0 2 4 6−1.5

−1

−0.5

0

0.5

1

1.5

wbar

y

Data with measurement error

−2 0 2 4−4

−2

0

2

4

6

x

wba

r

−1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

2

mha

t

x

Fitted curveTrue curveGold Std.

Lecture 6 30

0 1000 2000−5

0

5

10beta(1)

0 1000 2000−1

−0.5

0

0.5yhat(1)

0 1000 20000

0.5

1

1.5yhat(51)

0 1000 20000

0.5

1

1.5yhat(101)

0 1000 2000−2

−1

0

1x(1)

0 1000 20000.8

1

1.2

1.4

1.6

sigmau

0 1000 2000−4

−2

0

2log(gamma)

0 1000 2000−2.5

−2

−1.5

−1

−0.5

log(σe)

0 1000 20000.8

1

1.2

1.4

1.6

mux

Results of Gibbs Sampling. Every twentieth iteration.

Note: X(1) = −1.45 and W (1) = −0.8. Also, log(σe) = −1.9.

Lecture 6 31

EXAMPLE — SIMULATED

What does the Bayes approach work so well? Here’s my explanation:

Bayes uses all possible information to estimateX and, especially,m(X).

• ‖m(X)− E{m(X)|W, Y , other param.}‖≈ ‖m(X)− ave{m(X)}‖ = 2.47

• ‖m(X)−m(E{X|W, Y , other param.})‖≈ ‖m(X)−m(ave{X})‖ = 4.67

• ‖m(X)−m(E(X|W ))‖ = 10.25

• ‖m(X)−m(W )‖ = 12.36

Lecture 6 32

EXAMPLE — CLINICAL TRIAL

• Study of a psychiatric medication.

• Treatment and control group.

• Evaluation at baseline (W ) and at end of study (Y ).

– smaller values→ more severe disease

– scale is a combination of self-report and clinical interview so there is consider-

able measurement error

– it is believe thatσ2u ≈ 0.35.

• We are interested in∆(X) := m(X)−X = E(Y −W |X).

• Preliminary Wilcoxon test found a highly significant treatment effect.

• Question: How does the treatment effect depend upon the baseline value?

Lecture 6 33

True Baseline Score

Cha

nge

-1 0 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

TreatmentControl

Lecture 6 34

True Baseline Score

Tre

atm

ent E

ffect

-1 0 1 2

-3

-2

-1

0

1

2

3

Lecture 6 35

MAXIMUM L IKELIHOOD V IA MCEM

Ganguli, Staudenmayer, and Wand (2005, to appear):

• use the same model as Berry, Carroll, and Ruppert (2002)

• compute MLEs using van Dyk’s (2002) nested Monte Carlo EM algorithm

• their estimator works well

∗ this suggests that the success of the Berry, Carroll, and Ruppert method is due

to using the likelihood rather than from the information in the prior

• Ganguli, Staudenmayer, and Wand extend their method to additive models

Lecture 6 36

EXAMPLE — A SENSITIVITY ANALYSIS

Original study from Zanobetti, Wand, Schwartz, and Ryan (2000).

This analysis from Ganguli, Staudenmayer, and Wand (2005, to appear) — also see

Ruppert, Wand, Carroll (2003,Semiparametric Regression)

• Outcome = log(natural mortality)

• Exposure

∗ TSP = total suspended particles

• Confounders

∗ DAY = days since beginning of study

∗ Mean daily temperature

∗ Relative humidity

Lecture 6 37

EXAMPLE — A SENSITIVITY ANALYSIS

reliability ratio =var{log(TSPi)}

var{log(TSPi)} + σ2u

.

Problem: No replicates so the measurement error varianceσ2u cannot be estimated.

Lecture 6 38

EXAMPLE — A SENSITIVITY ANALYSIS

0 2 4 6−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

Log(TSP)

Add

itive

Mod

el T

erm

for

log(

TS

P)

No Measurement Errorrel. ratio = 0.9rel. ratio = 0.8rel. ratio = 0.7

0 1000 2000 3000 4000−0.4

−0.2

0

0.2

0.4

0.6

Day

Add

itive

Mod

el T

erm

for

Day

−10 0 10 20 30 40−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Mean Daily Temperature

Add

itive

Mod

el T

erm

for

Tem

pera

ture

0 20 40 60 80 100−0.1

−0.05

0

0.05

0.1

Relative Humidity

Add

itive

Mod

el T

erm

for

Hum

idity

Lecture 6 39

DISCUSSION

• With the work ofCMR andBCR we now have reasonably efficient estimators for

nonparametric regression with measurement error.

– SIMEX (LPR and splines) — inCMR

– (Flexible) Structural splines — inCMR

– Fully Bayesian (hardcore structural) — inBCR

Lecture 6 40

DISCUSSION

• With BCR we have a methodology that

– automatically selects the amount of smoothing

– estimates the unknownX ’s

– allows inference that takes account of the effects of smoothing parameter se-

lection and measurement error

• Most efficient methods appear to be structural, though SIMEX may be competitive

– hardcore structural methods seem reasonably robust

Lecture 6 41

END OF LECTURE6

THANK YOU FOR YOUR ATTENTION!!!

top related