partially missing at random and ignorable inferences for parameter subsets with missing data
DESCRIPTION
Partially missing at random and ignorable inferences for parameter subsets with missing data. Roderick Little. Outline. Survey Bayesics in three slides Inference with missing data: Rubin's (1976) paper on conditions for ignoring the missing-data mechanism - PowerPoint PPT PresentationTRANSCRIPT
Partially missing at random and ignorable inferences for parameter
subsets with missing data
Roderick Little
Outline• Survey Bayesics in three slides• Inference with missing data: Rubin's (1976)
paper on conditions for ignoring the missing-data mechanism
• Rubin’s standard conditions are sufficient but not necessary: example
• Propose definitions of MAR, ignorability for likelihood (and Bayes) inference for subsets of parameters
• Examples• Joint work with Sahar Zanganeh
Graybill Conference: Partially Missing at Random 2
Calibrated Bayes– Frequentists should be Bayesian
• Bayes is optimal under assumed model
– Bayesians should be frequentist• We never know the model (and all models are wrong)• Inferences should have good repeated sampling
characteristics
– Calibrated Bayes (e.g. Box 1980, Rubin 1984, Little 2012)
• Inference based on a Bayesian model• Model chosen to yield inferences that are well-calibrated
in a frequentist sense• Aim for posterior probability intervals that have
(approximately) nominal frequentist coverage
Graybill Conference: Partially Missing at Random 3
Calibrated Bayes models for surveys should incorporate sample design features
– All models are wrong, some models are useful• Design-assisted: make the estimator more robust• Calibrated Bayes: make the model more robust – many
models yield design-consistent estimates
– Models that ignore features like survey weights are vulnerable to misspecification
– But models can be successfully applied in survey setting, with attention to design features
• Weighting, stratification, clustering
– Capture design weights as covariates in the prediction model (e.g. Gelman 2007)
Graybill Conference: Partially Missing at Random 4
Benefits of Bayes• Unified approach to all problems
– Avoids current approach -- “inferential schizophrenia”
• Not asymptotic– Propagates errors in estimating parameters
• Avoids frequentist pitfalls:– Conditions on ancillaries– Obeys likelihood principle
Graybill Conference: Partially Missing at Random 5
v
Graybill Conference: Partially Missing at Random 6
There are those who predict…
… and those who weight
Graybill Conference: Partially Missing at Random 7
Rubin (1976 Biometrika)• Landmark paper (3700+ citations, after being
rejected by many journals!)– RL wrote his first (11 page) referee report, and an
obscure discussion
• Modeled the missing data mechanism by treating missingness indicators as random variables, assigning them a distribution
• Sufficient conditions under which missing data mechanism can be ignored for likelihood and frequentist inference about parameters– Focus here on likelihood, Bayes
Graybill Conference: Partially Missing at Random 8
Ignoring the mechanism
• Full likelihood:
• Likelihood ignoring mechanism:
• Missing data mechanism can be ignored for likelihood inference when
obs mis
, |
data with no missing values, observed, missing
= response indicator matrix
( , | , ) ( | ) ( | , )D R D RD
D D D
R
f D R f D f R D
obs | mis( , | , ) const. ( | ) ( | , )D RDL D R f D f R D dD
ign obs mis( | , ) const. ( | )DL D R f D dD
obs ign obs rest obs( , | , ) ( | , ) ( | , )L D R L D R L D R
Graybill Conference: Partially Missing at Random 9
Rubin’s sufficient conditions for ignoring the mechanism
• Missing data mechanism can be ignored for likelihood inference when– (a) the missing data are missing at random (MAR):
– (b) distinctness of the parameters of the data model and the missing-data mechanism:
• MAR is the key condition: without (b), inferences are valid but not fully efficient
| obs mis | obs mis( | , , ) ( | , ) for all ,R D R Df R D D f R D D
( , ) ; for Bayes, and a-priori independent
Graybill Conference: Partially Missing at Random 10
“Sufficient for ignorable” is not the same as “ignorable”
• These definitions have come to define ignorability (e.g. Little and Rubin 2002)
• However, Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data".
• These conditions are not necessary for ignoring the mechanism in all situations.
MAR+distinctness ignorable
ignorable MAR+distinctness
Graybill Conference: Partially Missing at Random 11
Example 1: Nonresponse with auxiliary data
obs resp aux
*resp 1 2 aux 1
( , )
( , ), 1,..., , , 1,...,i i j
D D D
D y y i m D y j n
00011
??
1 1 2Y R Y Y
??
Not linked
1 aux
2 1 resp
But... mechanism is ignorable, does not need to be modeled:
Marginal distribution of estimated from
Conditional of given estimated from D
Y D
Y Y
1aux
1 2 ind 1 2
1 2 1
includes the respondent values of ,
but we do not know w
, ~ ( , | )
Pr( 1| , , ) ( , )
hich they are.
i i
i i i i
D
Y Y f y y
r y
Y
y g y
Or whole population N
1Not MAR -- missing for nonrespondents iy i
Graybill Conference: Partially Missing at Random 12
MAR, ignorability for parameter subsets• MAR and ignorability are defined in terms of
the complete set of parameters in the data model for D
• It would be useful to have a definition of MAR that applies to subsets of parameters, including parameters of substantive interest.
• A trivial example: It seems plausible that a nonignorable mechanism would be MAR for the parameters of distributions of variables that are not missing.
Graybill Conference: Partially Missing at Random 13
MAR, ignorability for parameter subsets
1 2
1 1
1 2 obs ign 1 obs rest 2 obs
1 2
=( , )
Mechanism is partially MAR for likelihood inference
about , denoted P-MAR( ), if:
( , , | , ) ( | , ) ( , | , )
for all , ,
L D R L D R L D R
1 1 1 2Mechanism is IGN( ) if MAR( ) and and ( , ) distinct
Graybill Conference: Partially Missing at Random 14
MAR, ignorability for parameter subsets
1
obs ign obs rest obs
Special case where =
Mechanism is P-MAR( ) if:
( , | , ) ( | , ) ( | , )
for all ,
A consequence of (but does not imply) Rubin's MAR condition
IGN( ) if MAR( ) and and distinct
L D R L D R L D R
Graybill Conference: Partially Missing at Random 15
Partial MAR given a function of mechanism
obs mis obs
obs
Harel and Schafer (2009) define a different kind of Partial MAR:
Mechanism is partially MAR given ( ) if:
( | , , ( ), , ) ( | , ( ), , )
for all , , ,
Here "partial" relates to the mech
g R
P R Y Y g R P R Y g R
R Y
anism,
In my definition "partial" relates to the parameters
This ideas seems quite distinct
Graybill Conference: Partially Missing at Random 16
Example 1: Auxiliary Survey Data
obs resp aux
*resp 1 2 aux 1
( , )
( , ), 1,..., , , 1,...,i i j
D D D
D y y i m D y j n
00011
??
1 1 2Y R Y Y
??
Not linked
Easy to show that mechanism is P-MAR( ),
and IGN( ) if , are distinct
aux
1 2
1 2 1 2
1 2 1
1 includes the respondent values of ,
but we do not know which they
( , ), 1,..., }
, ~ ( , | )
Pr( 1| , ,
are
( )
.
) ,
i i
i i
i i i i
D
D y y i n
Y Y f y y
y y
Y
r y g
Graybill Conference: Partially Missing at Random 17
Ex. 2: MNAR Monotone Bivariate Data
• Paper presents more interesting case with Y1, Y2 blocks of variables and missing data in each block
1 2
obs 1 2 1
1 2 1 2 1 1 2 1 2
2 1 2 1 2
( , ), 1,..., }
( , ), 1,..., and , 1,...,
, ~ ( , | ) ( | ) ( | , )
Pr ( 1| , , ) ( , , ) (MNAR)
i i
i i i
i i i i i
i i i i i
D y y i n
D y y i m y i m n
Y Y f y y f y f y y
r y y g y y
00011
??
1 2M Y Y
1
1
1
1
COMMENT: Clearly, inference about parameters
of the marginal distribution of can ignore mechanism,
since has no missing values.
In proposed definition, this mechanism is P-MAR( ),
and IGN( ) if
Y
Y
1 2 and ( , ) distinct
1
Graybill Conference: Partially Missing at Random 18
More generally…(1) (2)
1 2
(1) (2) (1)1 2 1 1 1 1 1
(2) (1)1 2 1 2 1 2 2
( , ), ( , ) blocks of incomplete variables, and
( , , , ) ( | )Pr( | , )
( | , )Pr( | , , , )
i i i i i i i
i i i i i i
Y R Y R
f y y r r f y r y
f y y r r y y
(1)1 1 1 1,obs, 1 1,mis,Assume: Pr( | ; ) ( , ) for all ,i i i ir y g y y
(2) (1) (1)1 2 2 2 1 2 2Pr( | , , ; ) ( , , , ),i i i i i i ir r y y g r y y
1 1 1
2 1 2
Mechanism is P-MAR( ), IGN( ) if and
( , , ) are distinct
Graybill Conference: Partially Missing at Random 19
Ex. 3: Complete Case Analysis in Regression
1 2
obs 1 2 1 2 1 2
1 2 1
( , ), 1,..., }
( , ), 1,..., , ~ ( , | )
Pr( 1| , , ) ( , )
i i
i i i i
i i i i
D y y i n
D y y i m Y Y f y y
r y y g y
000011
??
1 2R Y Y
??
1 2 1 1 1 2 2 1 2
1 2 obs 1 2 obs 2 1 obs
1 2 obs 2 2 1 2
2 2 1
1
2
Let ( , | ) ( | ) ( | , )
( , , | , ) const. ( | ) ( , | , ),
MNAR, but P-MAR( ), and IGN( ) if
where
(
, ( , ) distin t
| , )
c
| ) (
i i i i i
r
i ii
f y y f y f y y
L D R L D L D R
L D f y y
2 1
MNAR, but inference about parameters of
conditional distribution of given based on
complete cases is valid, ignoring the mechanism.
Y Y
Graybill Conference: Partially Missing at Random 20
Ex. 4:A normal pattern-mixture model
obs 1 2 1
2 | 2 2
( ) ( )1 2 2 ind 2 ind
2 1 2 2
( , ), 1,..., and , 1,...,
( , | , ) ( | , ) ( | )
( , | , ) ~ ( , ), 0,1, ~ Bern( )
Assume Pr( 1| , ) ( ), unknown (M
COMMENT: Dist
NA
ribution
R)
i i i
D R R
j ji i i i
i i i i
D y y i m y i m n
f D R f D R f R
y y r j G j r
r y y g y g
1 2 2 2 of given and is independent of ,
so it can be estimated from complete cases, ignoring the mechanism
Y Y R R
00011
??
2 1 2R Y Y
(0) (0) (1) (1)1 2 12 0 12 2 11 2 2 22 1 11
obs 2 1 1 2 obs 2 2 obs 2
1 1 obs 1 2 1 2 1 21
1 2 1 2 1 2
( , , ), , , ,
( , | , ) const. ( | , ) ( , | , ), where
( | ) ( | , )
MNAR, but P-MAR( ), not IGN( ) since and
m
i ii
L D R L D R L D R
L D f y y
are not distinct
Graybill Conference: Partially Missing at Random 21
Ex. 5: Subsample ignorable likelihood
• Interest concerns parameters of regression of Y on (Z,X,W)• Z complete, W and (X,Y) incomplete. W complete in P1.• Division of covariates into W, X is based on following MNAR
assumptions about the missing data mechanism:• Pr(W complete) = fn(W,X,Z) (not Y)
(X,Y) MAR in subsample with W fully observed (that is, P1)
Pattern Z W X Y
P1 √ √ ? ?
P2 √ ? ? ?
wu
1
1This mechanism is P-MAR( );corresponding analysis is
to apply an ignorable likelihood method, discarding data in P2
Little and Zhang (2011)
Columns could be vectors√ = fully observed? = observed or missing
Graybill Conference: Partially Missing at Random 22
Ex. 6: Auxiliary data, survey nonresponse
1 2 3
obs resp aux
resp 1 2 3 1
*aux 2
( , , ), 1,..., }
( , )
( , , ), 1,..., , ( ), 1,..., ,
, 1,..., , = population size
i i i
i i i i
j
D y y y i n
D D D
D y y y i r y i r n
D y j N N
??
2 1 2 3 Y Y Y Y
??
Not linked
1..r..n..N
2
1 2 1 2
2 aux 1 resp
3 1 2
NOT MAR -- missing for nonrespondents
But mechanism is P-MAR( ) if ( , , ) additive function of ( , )
Marginal of from ,marginal of from
Conditional of given , from co
i
i i i i
y
g y y y y
Y D Y D
Y Y Y
respmplete cases in D
1 2 3 1 2 3
1 2 3 1 2
, , ~ ( , , | )
Pr( 1| , , , ) ( , , )i i i
i i i i i i
Y Y Y f y y y
m y y y g y y
Graybill Conference: Partially Missing at Random 23
Simulation Study
1 2 3 1 2 3 1 2 1 2 3
1 2
3 1 2
3 1 2 1 1 2 2 12 1 2
1 2
1 2 1 1 2 2 12 1
[ , , , ] [ , ][ | , ][ | , , ]
[ , ] multinomial
[ | , ] generated as
logit Pr( 1| , ) 0.5 *
[ | , ] generated as
logit Pr( 1| , ) 0.5 *
Y Y Y M Y Y Y Y Y M Y Y Y
Y Y
Y Y Y
Y Y Y Y Y Y Y
M Y Y
M Y Y Y Y Y
2
Each , set to zero or two (various com
100,000, 20
binati
0, 1000 and 10,
on )
00
s
0
j j
Y
N n
Graybill Conference: Partially Missing at Random 24
Simulation Study: methodsCC: Complete Case estimates based on the responding units
M1: ML based on a logistic regression with interaction for Y3
M2: ML based on an additive logistic regression for Y3
NR: Weighting class estimates where nonresponse weights are obtained based on Y1
PS: Post-stratification weighted estimates (PS) based on Y2
NRPS: Adjust weights using both Y1 and Y2. For the case of
categorical variable, this method is equivalent to Linear Calibration regression, or Generalized Raking estimates
Graybill Conference: Partially Missing at Random 25
Graybill Conference: Partially Missing at Random 26
Simulation: summary findings• When response depends on Y1 *Y2 interaction,
all methods do poorly• When data are MCAR, all methods do similarly
well• Model-based methods remove almost all the
bias and perform better when response doesn’t depend on Y1 *Y2 interaction
• Qualitative patterns hold for different sample sizes
Graybill Conference: Partially Missing at Random 27
Frequentist inference• Rubin’s (1976) sufficient conditions for
ignorability for frequentist inference were even stronger (essentially MCAR)
• These can be weakened too – for example asymptotic frequentist inference based on ML and observed information matrix works under conditions given here
• Small sample inference seems more problematic
Graybill Conference: Partially Missing at Random 28
Frequentist inference• Rubin’s (1976) sufficient conditions for
ignorability for frequentist inference were even stronger (essentially MCAR)
• These can be weakened too – for example asymptotic frequentist inference based on ML and observed information matrix works under conditions given here
• Small sample inference is more complex
Graybill Conference: Partially Missing at Random 29
Summary• Proposed definitions of partial MAR,
ignorability for subsets of parameters• Expands range of situations where
missing data mechanism can be ignored• Though, in some cases, MAR analysis
entails a loss of information –– How much is lost is an interesting question,
varies by context
Graybill Conference: Partially Missing at Random 30
ReferencesHarel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in missing data problems. Biometrika, 2009, 1-14
Little, R.J.A. (1993). Pattern Mixture Models for Multivariate ‑Incomplete Data. JASA, 88, 125-134.
Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.) Wiley.
Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and ignorability for inferences about subsets of parameters with missing data. University of Michigan Biostatistics Working Paper Series.
Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. JRSSC, 60, 4, 591–605.
Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581-592.
Graybill Conference: Partially Missing at Random 31