two-way contingency tables with ... - dr. hansheng...
TRANSCRIPT
TWO-WAY CONTINGENCY TABLESWITH MARGINALLY AND
CONDITIONALLY IMPUTEDNONRESPONDENTS
By
Hansheng Wang
A dissertation submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
(Statistics)
at the
UNIVERSITY OF WISCONSIN – MADISON
2006
i
Abstract
We consider estimating the cell probabilities and testing hypotheses in a two-way
contingency table where two-dimensional categorical data have nonrespondents
imputed using either conditional imputation or marginal imputation. Under
simple random sampling, we derive asymptotic distributions for cell probability
estimators based on the imputed data. Under conditional imputation, we also
show that these estimators are more efficient than those obtained by ignoring
nonrespondents when the proportion of nonrespondents is large. A Wald type
test along with a Rao and Scott type corrected chi-square test for goodness-
of-fit are derived. We show that the naive chi-square test for independence,
which treats imputed values as observed data, is still asymptotically valid under
marginal imputation. Provided we make a simple adjustment of multiplying by
an appropriate factor, the naive chi-square test for independence is also valid
under conditional imputation. We present simulation results which examine the
size and compare the power of these tests. Some of the results are extended
to stratified sampling with imputation within each stratum or across strata.
Asymptotics are studied under two types of stratified sampling: 1) when the
number of strata is fixed with large stratum sizes and 2) when the number of
strata is large with small stratum sizes.
ii
Acknowledgements
First, I want to express my deepest gratitude to my PH.D. adviser Prof. Jun
Shao. It is him who suggested my thesis topic and led me into the field of sample
survey and imputation. For the first time in my life, I found that the world of
statistics is so exciting! My curiosities, enthusiasms, and ambitions are always
encouraged and appreciated there. It is his endless help and encourage that
makes my academic life in Madison so challenging and productive. Prof. Shao
also helps me build up my own research style, which emphasize both theories
and application. It is also Prof. Shao who introduces me to Dr. Shein-Chung
Chow, another respected researcher I am so grateful to.
Although Dr. Chow did not give me help on sample survey and imputation,
it is him who led me into the field of pharmaceutical statistics, where I believe
I am going to build up my own career. The most important thing I learned
from Dr. Chow is “practical sense”, which gives me a unique understanding of
what is statistics and what statistics should do. Statistics is neither science nor
mathematics. Instead, it is a way of reasoning and a philosophy of understand-
ing, when unexplained variation exists in the data. I believe this understanding
will play an important role in guiding my future career and research.
Next, I want to thanks all my friends for their help and support. I want to
thanks Landon Sego, David Dahl, Emmily Chow, and JoAnne Pinto for their
iii
careful proofing of my thesis. Without their help, I can not finish my thesis
writing in such a short time. I also want to thanks Bing Chen, Quan Hong, and
Yuefeng Lu for their help and support when I was defending my thesis in Madi-
son. I also want to thanks my college classmates Xuan Liu and Xiaohuang Hong
for their long-time support and encourage since college whenever I encounter
difficulty.
Furthermore, I want to thanks my PH.D. defense committee, which includes
Prof. Richard Johnson, Prof. Kam-Wah Tsui, Prof. Yi Lin, and Prof. Jun
Zhu, for their careful reading and constructive comments.
Finally, I want to give special thanks to my parents. As the only child of
the family, I was given all the love, bless, and wishes they can give. They are
my support and motivation whenever I want to give up. During the past three
years’ study in USA, I missed them so much. I wish my PH.D. degree can bring
them happiness and proud!!
iv
Contents
Abstract i
Acknowledgements ii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 An Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Imputation Under Simple Random Sampling 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Statistical Model for Nonresponse . . . . . . . . . . . . . 7
2.2 Marginal and Conditional Imputation . . . . . . . . . . . . . . . 8
2.3 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 The Case Where A and B Are Independent . . . . . . . 12
2.3.2 The Case Where A and B Are Dependent . . . . . . . . 21
2.4 Weighted Mean Squared Error . . . . . . . . . . . . . . . . . . . 23
2.5 Testing for Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . 27
2.6 Testing for Independence . . . . . . . . . . . . . . . . . . . . . . 28
3 Simulation Study Under Simple Random Sampling 32
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
v
3.2 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Weighted Mean Squared Error . . . . . . . . . . . . . . . . . . . 34
3.4 Testing for Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . 35
3.5 Testing for Independence . . . . . . . . . . . . . . . . . . . . . . 36
3.5.1 Marginal Imputation . . . . . . . . . . . . . . . . . . . . 36
3.5.2 Conditional Imputation . . . . . . . . . . . . . . . . . . 36
3.5.3 Relative Efficiency . . . . . . . . . . . . . . . . . . . . . 37
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Imputation Under Stratified Sampling 57
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Imputation Within Each Stratum . . . . . . . . . . . . . . . . . 58
4.2.1 Asymptotic Distribution . . . . . . . . . . . . . . . . . . 58
4.2.2 Rao’s Test for Goodness-of-Fit . . . . . . . . . . . . . . . 61
4.3 Imputation Across Strata with Small H . . . . . . . . . . . . . . 62
4.3.1 Asymptotic Distribution . . . . . . . . . . . . . . . . . . 63
4.3.2 Rao’s Test for Goodness-of-fit . . . . . . . . . . . . . . . 68
4.4 Imputation Across Strata with Large H . . . . . . . . . . . . . . 69
4.4.1 Asymptotic Distribution . . . . . . . . . . . . . . . . . . 70
4.4.2 Asymptotic Covariance and Estimation . . . . . . . . . . 76
5 Simulation Study Under Stratified Sampling 84
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
vi
5.2 Imputation Within Each Stratum . . . . . . . . . . . . . . . . . 85
5.2.1 Wald’s Test for Goodness-of-Fit . . . . . . . . . . . . . . 85
5.2.2 Rao’s Test for Goodness-of-Fit . . . . . . . . . . . . . . . 86
5.3 Imputation Across Strata with Small H . . . . . . . . . . . . . . 86
5.3.1 Wald’s Test for Goodness-of-Fit . . . . . . . . . . . . . . 86
5.3.2 Rao’s Test for Goodness-of-Fit . . . . . . . . . . . . . . . 87
5.4 Imputation Across Strata with Large H . . . . . . . . . . . . . . 87
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Real Data Study 104
6.1 The Beaver Dam Eye Study . . . . . . . . . . . . . . . . . . . . 104
6.2 Victimization Incidents Study . . . . . . . . . . . . . . . . . . . 106
Bibliography 111
1
Chapter 1
Introduction
1.1 Background
Two-way contingency tables are widely used for summarizing two dimensional
categorical data. Each cell in a two-way contingency table is a category de-
fined by two-dimensional categorical variables. Sample cell frequencies are often
computed based on the observed responses (of the two-dimensional categorical
variables) from a sample of units (subjects). Statistical inferences including es-
timating cell probabilities, testing the hypothesis of independence, and testing
goodness-of-fit are often carried out.
In sample surveys or medical studies, it is not uncommon for one or two of the
categorical responses to be missing. Sampled units for which both components
are missing (unit nonrespondents) can be handled by a suitable adjustment of
sampling weights. In practice, however, many sampled units may have exactly
one missing component in their responses (item nonrespondents). The approach
of ignoring data from sampled units with exactly one missing component is not
acceptable, because throwing away the observed data may result in a serious
decrease in efficiency of the analysis.
2
A popular method to handle item nonresponses is imputation, which inserts
values for the unobserved items. Justification for the use of imputation with
practical considerations can be found in Kalton and Kasprzyk (1986). After im-
putation, statistical inferences can then be made by treating the imputed values
as the observed data using formulas designed for the case of no nonresponse.
Various imputation methods have been proposed and studied by different
authors (Little and Rubin, 1987; Schafer, 1997). All the imputation methods
can be roughly divided into two categories: model based imputation methods
and nonparametric imputation methods. The model based imputation methods
assume a parametric or semi-parametric model for the responses and the miss-
ingness. The most typical example is regression imputation, which assumes a
linear model between the response and the observed covariates. The situation
where the random errors in the linear model are normally distributed was stud-
ied by Srivastava and Carter (1986). Shao and Wang (2001) extend the results
to include the case in which there was no parametric assumption for the random
error. The nonparametric imputation methods do not make any parametric as-
sumption on the distribution of responses and missingness. Typical approaches
belonging to this category include hot deck imputation, cold deck imputation,
and nearest neighborhood imputation (Chen and Shao 2000; Chen and Shao,
2001).
However, all the above methods investigate either continuous data or one-
dimensional categorical data. The imputation methods for multi-dimensional
3
categorical data are not well studied. For example, for a two-way contingency
table, which is essentially a multi-dimensional categorical data problem, the
statistical problems of interest include the following: How dose one impute
the data? How does the relative efficiency of imputation compare with other
methods, (e.g. re-weighting method)? How can tests be performed in a valid
way?
Another important problem for imputation is the variance/covariance esti-
mation. It is well known that the variance/covariance of the estimators given
by imputation may be different from the variance/covariance of the estimators
for complete data sets. As a result, the estimators designed to estimate the
variance and covariance of estimators for complete data sets may not be valid
for the estimators generated by imputation. There are three commonly used
approaches to obtain estimators for the variance and covariance of estimators
given by imputation. One is linearization, which uses Taylor’s expansion to ob-
tain an explicit theoretical formula for the covariance structure of the estimators
and then replace all the unknown quantities by their consistent point estima-
tors. The merit of the linearization method is that it requires less computation
as compared with other methods, e.g., resampling methods. However, it is not
uncommon that the theoretical formula is too complex to be used. As an al-
ternative, resampling methods such as jackknife and bootstrap (Rao and Shao,
1992) are commonly used to obtain the variance/covariance estimators. The
third approach is multiple imputation (Rubin, 1987), which imputes the same
4
data set more than once and then obtains the variance estimator by combining
the between and within imputation variability in an appropriate way.
The main purpose of this thesis is to investigate the statistical properties of
a conditional imputation method, which imputes nonrespondents using the esti-
mated conditional probabilities. More specifically, we study (i) the consistency
of estimators of cell probabilities based on imputed data; (ii) the asymptotic
variances and covariances of estimators of cell probabilities, which lead to con-
sistent variance and covariance estimators; and (iii) the validity of chi-square
type tests for goodness of fit or independence. For testing independence of
the two components of the categorical variable, we also study a marginal im-
putation method, which imputes nonrespondents using the estimated marginal
probabilities.
1.2 An Outline
The rest of this thesis is organized as follows. In Chapter 2, we study both con-
ditional and marginal imputation under simple random sampling. In Chapter 3,
extensive simulations are performed to evaluate the finite sample performance
of the procedures described in Chapter 2. In Chapter 4, we study conditional
imputation for stratified sampling, which includes imputation within stratum
and imputation across strata. For the method of imputation across strata, two
different types of asymptotics are considered. One deals with a small num-
ber of strata with a large stratum size. The other is a large number of strata
5
with a small stratum size. Extensive simulations are carried out in Chapter 5
to evaluate the finite sample performance of the procedures obtained in Chap-
ter 4. Finally, several real data sets are presented to illustrate the proposed
imputation methods in Chapter 6.
6
Chapter 2
Imputation Under Simple
Random Sampling
2.1 Introduction
In this chapter, we introduce two imputation methods under simple random
sampling: marginal imputation and conditional imputation. Our results show
that the point estimators obtained by conditional imputation are consistent,
but those obtained by marginal imputation are usually not unless the two
components of the categorical variables are independent of each other. The
asymptotic distributions of point estimators under both imputation methods
are derived when appropriate. In order to evaluate the statistical performances
of the point estimators, we propose a measure called Weighted Mean Squared
Error (WMSE). Then the estimators given by conditional imputation and re-
weighting methods are compared in terms of WMSE. The results show that
conditional imputation can improve efficiency when the proportion of complete
units is small. Testing goodness-of-fit is also considered. We first propose a
7
Wald type statistic, which is asymptotically valid. Then we show that perform-
ing the naive method of treating the imputed value as observed and applying
Pearson’s chi-square test is not valid. We propose a Rao type correction to the
naive method. Finally, the performances of Wald type statistic and Rao type
statistic are compared. Results show that their performances are comparable
in terms of type I error. Testing independence is also considered. We show
that the naive method of applying Pearson’s chi-square statistic directly is still
asymptotically valid under marginal imputation but not under conditional im-
putation. By simply fixing a constant, the naive method is also asymptotically
correct under conditional imputation.
2.1.1 Statistical Model for Nonresponse
Consider a two-dimensional response vector (A,B)′, where both A and B are
categorical responses taking values from {1, · · · , a} and {1, · · · , b}, respectively.
In practice, imputation is carried out by first creating some imputation
classes and imputing nonrespondents within each imputation class. Imputation
classes are sub-populations of the original population and are usually formed by
using an auxiliary variable without nonresponse. For example, in many business
surveys, imputation classes are strata or unions of strata. In medical studies, if
data are obtained under several different treatments, the treatment groups are
imputation classes.
Throughout this chapter, we make the following assumption:
8
Assumption A. For each sampled unit within an imputation class, πA de-
notes the probability of observing A and missing B, πB denotes the probability
of observing B and missing A, and πC denotes the probability of observing both
A and B.
As discussed in the previous chapter, the units with both respondents miss-
ing (unit nonresponses) can be ignored by suitably adjusting the sampling
weights. As a result, we assume there is no unit nonresponse, i.e., πA+πB+πC =
1. Note that the probability πA, πB, and πC may be different in different impu-
tation classes.
For simplicity and without loss of generality, we only consider the case of
simple random sampling with replacement. In practice, the data may be ob-
tained by sampling without replacement. Our derived results are still valid if
the sampling fraction is negligible.
For the sake of convenience, we assume that there is only one imputation
class, since the extension to multiple imputation classes is straightforward.
2.2 Marginal and Conditional Imputation
Suppose there are n sampled units, which are indexed by k (i.e., (Ak, Bk), k =
1, · · · , n). To simplify the notation, (Ak, Bk) may also be referred to as (A,B).
Let CA, CB, and CC be the collection of the indices of the units with B missing,
A missing, and neither A nor B missing, respectively. Let nA = |CA|, nB = |CB|,
9
and nC = |CC |, where |S| denotes the number of elements contained in a finite
set S. In other words, nA, nB, and nC are the number of the units with B
missing, with A missing, and neither A nor B missing, respectively. Therefore,
the total sample size is given by n = nA + nB + nC . Let nCij denote the total
number of completers such that (A,B) = (i, j). Let pij = P ((A,B) = (i, j)),
pi· = P (A = i), and p·j = P (B = j). A typical method for estimation of pi· and
p·j can be obtained by using completers as follows
pCi· =
∑bj=1 nC
ij∑ij nC
ij
=nC
i·nC
and
pC·j =
∑ai=1 nC
ij∑ij nC
ij
=nC·j
nC,
where nCi· =
∑bj=1 nC
ij, nC·j =
∑ai=1 nC
ij, and nC =∑
ij nCij.
Given an incompleter (A,B) = (i, ∗), where * denotes the missing value,
marginal imputation imputes the missing value j (1 ≤ j ≤ b) with probabil-
ity pC·j . It means that the missing value is imputed according to its estimated
marginal distribution without conditioning on the observed item (A = i). Miss-
ing values from incompleters are imputed independently. Intuitively, parameters
such as p·j and pi· can be estimated consistently based on marginally imputed
data, but parameters such as pij cannot be estimated consistently, since the
relationship between A and B is not preserved during marginal imputation.
The conditional probability P (B = j|A = i) is denoted by pij|A = pij/pi·.
10
Thus, an estimator based on completers for pij|A is given by
pCij|A =
pCij
pCi·
=nC
ij/nC
nCi· /nC
=nC
ij
nCi·
.
Conditional imputation imputes the missing value j (1 ≤ j ≤ b) with probability
pCij|A. In other words, given the completers, conditional imputation imputes
a missing value according to its estimated conditional distribution given the
observed component. Imputation for an incompleter with A missing is similar,
and incompleters are imputed independently, conditioning on the completers
and their observed components.
After imputation, estimators of pij are obtained using the standard formulas
in a two-way contingency table by treating the imputed values as observed
data. We denote those estimators (based on either marginal or conditional
imputation) by pIij.
Recall that CA is the collection of the indices of the units with A observed
and B missing. Let
pAij =
1
nA
∑
k∈CA
I{(Ak, Bk) = (i, j)},
where Bk is the value obtained by imputation (either marginal imputation or
conditional imputation) for any k ∈ CA. pBij and pC
ij are similarly defined. The
relationship between pIij and pA
ij, pBij, pC
ij is given by
pIij =
nApAij + nB pB
ij + nC pCij
n.
11
For the sake of convenience, we define
p = (p11, · · · , p1b, · · · , pa1, · · · , pab)′
pA = (p1·, · · · , pa·)′
pB = (p·1, · · · , p·b)′
pI = (pI11, · · · , pI
1b, · · · , pIa1, · · · , pI
ab)′
pA = (pA11, · · · , pA
1b, · · · , pAa1, · · · , pA
ab)′
pB = (pB11, · · · , pB
1b, · · · , pBa1, · · · , pB
ab)′
pC = (pC11, · · · , pC
1b, · · · , pCa1, · · · , pC
ab)′.
(2.1)
2.3 Asymptotic Distribution
In order to obtain the limiting distribution of pIij, Lemma 1 is given here without
proof. Lemma 1 is also used when we study stratified sampling in Chapter 4.
A more general form of the lemma can be found in Schenker and Welsh (1988).
Lemma 1 Let Xn be a sample, and Un(Xn), Wn be two random vectors, such
that
Un →d N(0, Σ1)
and
Wn|Xn →d N(0, Σ2),
then
Un + Wn →d N(0, Σ1 + Σ2).
12
2.3.1 The Case Where A and B Are Independent
When A and B are independent, the estimators produced by both marginal and
conditional imputation are consistent. However, their variances and covariances
are different from the variances and covariances of the standard estimator of pij
when there is no nonresponse. The following theorem establishes the asymptotic
distributions and covariance matrices of pI , which were defined in (2.1), under
both conditional imputation and marginal imputation.
Theorem 1 Assume that A and B are independent. If πC > 0, then, as n →∞,
√n(pI − p) →d N(0, Σ),
where
(a) under marginal imputation
Σ = PA ⊗ PB + (πC+2πCπA+π2
A
πC)(PA ⊗ (pBpB
′))
+(πC+2πCπB+π2
B
πC)(pApA
′)⊗ PB;
(b) under conditional imputation
Σ = ( 1πC
+ 1− πC)PA ⊗ PB + (πC+2πCπA+π2
A
πC)(PA ⊗ (pBpB
′))
+(πC+2πCπB+π2
B
πC)(pApA
′)⊗ PB.
⊗ is the Kronecker product; pA and pB are given in (2.1). PA = diag(pA)−pAp′A,
where diag(pA) denotes a diagonal matrix with the same dimension as pA and
with its ith (1 ≤ i ≤ a) diagonal element to be the ith component of pA.
13
PB = diag(pB)− pBp′B.
Proof:
After imputation, each unit becomes “complete”. For a given unit, (Ak, Bk),
we can define IAk to be an a-dimensional vector with the ith element equal to 1
and the others equal to 0 if Ak = i. Similarly, define IBk to be a b-dimensional
vector with jth element equal to 1 and the others equal to 0 if Bk = j. Under
the hypothesis that A and B are independent, IAk and IB
k are independent.
According to (2.1), note that
pI =1
n
n∑t=1
IAk ⊗ IB
k
pA =1
nA
∑t∈CA
IAk ⊗ IB
k
pB =1
nB
∑t∈CB
IAk ⊗ IB
k
pC =1
nC
∑t∈CC
IAk ⊗ IB
k .
It follows that
√n(pI − p)
=√
n
[nA(pA − p) + nB(pB − p) + (pC − p)
n
]
=√
n
[nA(E(pA|σ(C))− p) + nB(E(pB|σ(C))− p) + nC(pC − p)
n
]
14
+√
n
[nA(pA − E(pA|σ(C))) + nB(pB − E(pB|σ(C)))
n
],
where E(·|σ(C)) denotes E(·|nA, nB, nC , (Ak, Bk), k ∈ CA). In other words,
E(·|σ(C)) denotes the expectation conditional on the completers and the number
of incompleters. Let
Un =√
n
[nA(E(pA|σ(C))− p) + nB(E(pB|σ(C))− p) + nC(pC − p)
n
], (2.2)
and
Wn =√
n
[nA(pA − E(pA|σ(C))) + nB(pB − E(pB|σ(C)))
n
]. (2.3)
(a) Marginal imputation.
Given σ(C) (i.e., nA, nB, nC , and the completers), {IAk ⊗ IB
k }k∈CAare i.i.d
random vector with mean E(pA|σ(C)). According to Central Limit Theo-
rem,√
nA(pA − E(pA|σ(C)))|σ(C) →d N(0, ΣW ),
and
ΣW = diag{E(pA|σ(C))} − (E(pA|σ(C)))(E(pA|σ(C)))′.
On the other hand,
E(pAij|σ(C)) = pi·pC
·j →a.s pi·p·j = pij as nC →∞.
Therefore,
ΣW →a.s diag{p} − pp′ .= P, (2.4)
15
where “.=” denotes “defined to be”. This leads to Wn|σ(C) →d N(0, P ).
Consequently,
Wn|σ(C)
=
√nA
n
√nA(pA − E(pA|σ(C)))
+
√nB
n
√nB(pB − E(pB|σ(C)))
=√
πA
√nA(pA − E(pA|σ(C)))
+√
πB
√nB(pB − E(pB|σ(C))) + op(1)
→d
√πAN(0, P ) +
√πBN(0, P ) = N(0, (1− πC)P ).
Under the assumption that A and B are independent, it follows that
E[pA|σ(C)]− p =
p1·pC·1 − p11
...
p1·pC·b − p1b
...
pa·pC·1 − pa1
...
pa·pC·b − pab
=
p1·(pC·1 − p·1)
...
p1·(pC·b − p·b)
...
pa·(pC·1 − p·1)
...
pa·(pC·b − p·b)
= pA ⊗ [1
nC
∑
k∈CC
(IBk − pB)]
=1
nC
∑
k∈CC
pA ⊗ (IBk − pB).
16
Similarly,
E(pB|σ(C))− p =1
nC
∑(IA
k − pA)⊗ PB.
Thus, we have
Un
=1√n
[nA(E(pA|σ(C))− p) + nB(E(pB|σ(C))− p) + nC(pC − p)
]
=nA
√nnC
√nC(E(pA|σ(C))− p) +
nB
√nnC
√nC(E(pB|σ(C))− p)
+
√nC
n
√nC(pC − p)
=πA√πC
√nC(E(pA|σ(C)− p)
+πB√πC
√nC(E(pB|σ(C)− p)
+√
πC
√nC(pC − p) + op(1)
=1√nC
∑
k∈CC
[√πC(IA
k − pA)⊗ (IBk − pB) +
πC + πA√πC
pA ⊗ (IBk − pB)
+πC + πB√
πC
(IAk − pA)⊗ pB
]+ op(1)
→d N(0, ΣU),
where
ΣU = var
(√πC(IA
k − pA)⊗ (IBk − pB) +
πC + πA√πC
pA ⊗ (IBk − pB)
+πC + πB√
πC
(IAk − pA)⊗ pB
).
17
Let PA = diag{pA} − pApA′ and PB = diag{pB} − pBpB
′. Then we have
var[(IAk − pa)⊗ (IB
k − pB)]
= E{[(IAk − pA)⊗ (IB
k − pB)][(IAk − pA)⊗ (IB
k − pB)]′}
= E{[(IAk − pA)(IA
k − pA)′]⊗ [(IBk − pB)(IB
k − pB)′]}
= E[(IAk − pA)(IA
k − pA)′]⊗ E[(IBk − pB)(IB
k − pB)′]
= PA ⊗ PB.
The third equation holds because IAk and IB
k are independent. Similarly
var[pA ⊗ (IBk − pB)] = (pApA
′)⊗ PB
var[(IAk − pA)⊗ pB] = PA ⊗ (pBpB
′)
cov[(IAk − pA)⊗ (IB
k − pB), pA ⊗ (IBk − pB)]
= E{[(IAk − pA)pA
′]
⊗[(IBk − pB)(IB
k − pB)′]}
= 0⊗ PB = 0
cov[(IAk − pA)⊗ (IB
k − pB), (IAk − pA)⊗ pB]
= E{[(IAk − pA)(IA
k − pA)′]
⊗[(IBk − pB)pB
′]
= PA ⊗ 0 = 0
18
cov[pA ⊗ (IBk − pB), (IA
k − pA)⊗ pB]
= E{[pA(IAk − pA)′]
⊗[(IBk − pB)pB
′]}
= 0.
As a result, ΣU is given by
ΣU = πCPA ⊗ PB +(πC + πA)2
πC
(pApA′)⊗ PB +
(πC + πB)2
πC
PA ⊗ (pBpB′).
Therefore, Un →d N(0, ΣU). Since Wn →d N(0, (1− πC)P ) and
P = diag{p} − pp′
= diag{pA ⊗ pB} − (pApA′)⊗ (pBpB
′)
= (diag{pA} − pApA′)⊗ (diag{pB} − pBpB
′)
+(pApA′)⊗ (diag{pB} − pBpB
′)
+(diag{pA} − (pApA′)⊗ (pBpB
′)
= PA ⊗ PB + (pApA′)⊗ PB + PA ⊗ (pBpB
′),
we have
√n(p− p) = Wn + Un
→d N(0, (1− πC)P + ΣU)
= N(0, Σ),
19
where
Σ = PA ⊗ PB + (πC + 2πCπA + π2
A
πC
)(PA ⊗ (pBpB′))
+(πC + 2πCπB + π2
B
πC
)(pApA′)⊗ PB.
(b) Conditional Imputation.
Suppose the total sample size is large and P (nC = 0) is negligible. Simi-
larly, under conditional imputation we have
Wn|σ(C) →d N(0, (1− πC)P ).
On the other hand, by Taylor’s expansion,
E(pA|σ(C))− pA ⊗ pB
=
p1·(pI11
p1·− p·1)
...
p1·(pI1b
p1·− p·b)
...
pa·(pI
a1
pa·− p·1)
...
pa·(pI
ab
pa·− p·b)
=
pI11 − p1·p·1
...
pI1b − p1·p·b
...
pIa1 − pa·p·1
...
pIab − pa·p·b
−
(pI1· − p1·)p·1
...
(pI1· − p1·)p·b
...
(pIa· − pa·)p·1
...
(pIa· − pa·)p·b
+op(1)
=1
nC
∑
k∈CC
[(IAk ⊗ IB
k − pA ⊗ pB)− (IAk − pA)⊗ pB] + op(1)
=1
nC
∑
k∈CC
[(IAk − pA)⊗ (IB
k − pB) + pA ⊗ (IBk − pB)] + op(1).
20
Similarly,
E(pB|σ(C))− pA ⊗ pB
=1
nC
∑
k∈CC
[(IAk − pA)⊗ (IB
k − pB) + (IAk − pA)⊗ pB] + op(1).
As a result,
Un =1√nC
[√πC(IA
k ⊗ IBk − pA ⊗ pB)
+πA√πC
((IAk − pA)⊗ (IB
k − pB) + pA ⊗ (IBk − pB))
+πB√πC
((IAk − pA)⊗ (IB
k − pB) + (IAk − pA)⊗ pB)
]+ op(1)
=1√nC
∑[1√πC
(IAk − pA)⊗ (IB
k − pB)
+πC + πA√
πC
pA ⊗ (IBk − pB)
+πC + πB√
πC
(IAk − pA)⊗ pB
]+ op(1)
→d N
(0,
1
πC
PA ⊗ PB +(πC + πA)2
πC
(pApA′)⊗ PB
+(πC + πB)2
πC
PA ⊗ (pBpB′))
.
Consequently,
Wn + Un →d N(0, Σ),
where
Σ = (1
πC
+ 1− πC)PA ⊗ PB +πC + 2πCπA + π2
A
πC
(pApA′)⊗ PB
+πC + 2πCπB + π2
B
πC
PA ⊗ (pBpB′).
21
2.3.2 The Case Where A and B Are Dependent
When A and B are dependent, the point estimators obtained by marginal im-
putation are not consistent. Therefore, marginal imputation is usually not con-
sidered in this case. The asymptotic results are established for conditional
imputation only.
Theorem 2 Assume that πC > 0. Under conditional imputation,
√n(pI − p) →d N(0, Σ),
where Σ = MPM ′ + (1− πC)P ,
M =1√πC
Ia×b − πA√πC
diag{pB|A}Ia ⊗ Ub − πB√πC
diag{pA|B}Ua ⊗ Ib, (2.5)
and
pA|B = (p11/p·1, · · · , p1b/p·b, · · · , pa1/p·1, · · · , pab/p·b)′,
pB|A = (p11/p1·, · · · , p1b/p1·, · · · , pa1/pa·, · · · , pab/pa·)′.(2.6)
Id (d = a × b, a, or b) is a d-dimensional identity matrix, and Ud is a d-
dimensional square matrix with all elements equal to 1.
Proof:
Un and Wn are defined in (2.2) and (2.3). Under conditional imputation, we
have
Wn|σ(C) →d N(0, (1− πC)P ),
22
where P was given in (2.4). On the other hand,
√nC [E(pA|σ(C))− p]
=√
nC
p1·pC11
pC1·− p11
...
p1·pC1b
pC1·− p1b
...
pa·pC
a1
pCa·− pab
...
pa·pC
ab
pCa·− pab
=√
nC
pC11 − p11
...
pC1b − p1b
...
pCa1 − pa1
· · ·pC
ab − pab
−√
nC
p11
p1·(pC
1· − p1·)
...
p1b
p1·(pC
1· − p1·)
...
pa1
pa·(pC
a· − pa·)
...
pab
pa·(pC
a· − pa·)
+op(1).
As a result,
√nC [E(pA|σ(C))− p] = [Ia×b − diag{pB|A}Ia ⊗ Ub][
√nC(pC − p)] + op(1),
where pB|A is defined in (2.6). Similarly, it can be shown that
√nC [E(pB|σ(C))− p] = [Ia×b − diag{pA|B}(Ua ⊗ Ib)}][
√nC(pC − p)] + op(1).
23
Hence,
Un =√
nC [πA√πC
(E(pA|σ(C))− p) +πB√πC
(E(pB|σ(C))− p)
+√
πC(pC − p)] + op(1)
= M√
nC(pC − p) + op(1)
→d N(0,MPM ′),
where M is given in (2.5). Consequently,
√n(pI − p) = Wn + Un →d N(0,MPM ′ + (1− πC)P ) = N(0, Σ).
Thus, asymptotic covariance matrices can be estimated by replacing pij, πA,
πB, and πC in Σ by pIij, nA/n, nB/n, and nC/n, respectively. The asymptotic
covariance matrix is denoted by Σ.
2.4 Weighted Mean Squared Error
Let p be an arbitrary estimator of the cell probability vector p. To evaluate
its performance, we propose a measure called weighted mean squared error
(WMSE), which is defined by
WMSE(p) =∑ij
E(pij − pij)2
pij
.
Theorem 3 Under conditional imputation,
WMSE(pI) = 1n[ 1πC
(ab + π2Aa + π2
Bb− 2πAa− 2πBb + 2πAπB
+2πAπBδ)− πCab + (ab− 1)] + o( 1n),
24
where δ is a non-centrality parameter given by
δ =∑ (pij − pi·p·j)2
pi·p·j.
Intuitively δ can be interpreted as a measure for the dependency between A and
B. When A and B are independent, δ reaches its smallest possible value 0.
Proof:
It has been shown that√
n(pI − p) = N(0, Σ) + op(1), where Σ is given in
Theorem 2. For the sake of convenience, we define
1/√
p = (1/√
p11, · · · , 1/√
p1b, · · · , 1/√
pa1, · · · , 1/√
pab)′
p2/(pApB) = (p211/(p1·p·1), · · · , p2
1b/(p1·p·b), · · · , p2a1/(pa·p·1), · · · , p2
ab/(pa·p·b))′.
It follows that√
ndiag{1/√p}(pI − p) →d N(0, Σ∗), where
Σ∗ = diag{1/√p} · Σ · diag{1/√p}.
On the other side,
Σ = M · diag{p} ·M ′ + (1− πC) · diag{p} − pp′,
and
tr[diag{1/√p · diag{p} · diag{1/√p}] = a× b
tr[diag{1/√p · pp′} · diag{1/√p}] = 1.
25
As a result, we only need to focus on the evaluation of
tr[diag{1/p} ·M · diag{P} ·M ′ · diag{1/p}].
It can be verified that
a = tr[diag{1/√p} · diag{pB|A} · {Ia ⊗ Ub} · diag{p}·{Ia ⊗ Ub} · diag{pB|A} · diag{1/√p}]
b = tr[diag{1/√p} · diag{pA|B} · (Ua ⊗ Ib) · diag{p}·(Ua ⊗ Ib) · diag{pA|B} · diag{1/√p}]
a = tr[diag{1/√p} · diag{p} · (Ia ⊗ Ub) · diag{pB|A}·diag{1/√p}]
a = tr[diag{1/√p} · diag{pB|A} · (Ia ⊗ Ub)
·diag{p} · diag{1/√p}]b = tr[diag{1/√p} · diag{pA|B} · (Ua ⊗ Ib)
·diag{p} · diag{1/√p}b = tr[diag{1/√p} · diag{p}
·(Ua ⊗ Ib) · diag{pA|B} · diag{1/√p}].
Note that
δ =∑ (pij−pi·p·j)2
pi·p·j
=∑ p2
ij
pi·p·j− 2
∑pij +
∑pi·p·j
=∑ p2
ij
pi·p·j− 1
= tr(diag{p2/(pApB)})− 1,
26
it follows that
tr[diag(1/√
p) · diag(pA|B) · diag(Ua ⊗ Ib) · diag(p)
·diag(Ia ⊗ Ub) · diag(pB|A) · diag(1/√
p)]
= tr(diag(p2/(pApB))) = δ + 1
tr[diag(1/√
p) · diag(pB|A) · diag(Ia ⊗ Ub) · diag(p)
·diag(Ua ⊗ Ib) · diag(pA|B) · diag(1/√
p)]
= tr(p2/(pApB)) = δ + 1.
Thus,
tr(Σ∗) =1
n[
1
πC
(ab + π2Aa + π2
Bb− 2πAa− 2πBb + 2πAπB + 2πAπBδ)
−πCab + (ab− 1)] + o(1
n).
The proof is completed by noting that
nWMSE(pI) = E||√ndiag{1/√p}(pI − p)||2 = tr(Σ∗) + o(1).
According to Theorem 3, WMSE(pI) depends on the probabilities πA, πB
and πC , and the cell probabilities through a non-centrality parameter δ. Also,
WMSE(pI) is an increasing function of δ.
Under Assumption A, pC (the estimator using the complete units only) is
also consistent. The relative efficiency between pI and pC can be assessed by
the difference between WMSE(pI) and WMSE(pC). Our simulation results in
Chapter 3 show that estimators given by conditional imputation can increase
the efficiency if the proportion of the completers is small.
27
2.5 Testing for Goodness-of-Fit
A direct application of Theorem 2 is to obtain a Wald type test for goodness-
of-fit. Consider the null hypothesis of H0 : p = p0, where p0 is a known vector.
Under H0,
X2W
.= n(p∗ − p∗0)
′Σ∗−1(p∗ − p∗0) →d χ2ab−1,
where χ2v denotes a chi-square random variable with v degrees of freedom. p∗ (p∗0)
is obtained by dropping the last component of pI (p0) and Σ∗ is the estimated
asymptotic covariance matrix of p∗, which can be obtained by dropping the last
row and column of Σ, the estimated asymptotic covariance matrix of pI .
Although X2W provides an asymptotically correct chi-square test, the compu-
tation of Σ∗−1 is complicated. Instead of looking for an asymptotically correct
test, we consider a simple correction of the standard Pearson’s chi-square statis-
tic by matching the first order moment (Rao and Scott, 1981). When there is
no nonresponse, under H0, the Pearson’s statistic is asymptotically distributed
as a chi-square random variable with ab− 1 degrees of freedom:
X2P = n
∑ (pij − pij)2
pij
→d χ2ab−1. (2.7)
Therefore, E(X2P ) ≈ ab− 1. However, under conditional imputation, it follows
from Theorem 3 that
E(X2P ) ≈ 1
πC
(ab + π2Aa + π2
Bb− 2πAa− 2πBb + 2πAπB + 2πAπBδ)
−πCab + (ab− 1).
28
If we let
λ =1
πC(ab− 1)(ab + π2
Aa + π2Bb− 2πAa− 2πBb + 2πAπB
+2πAπBδ)− πCab
ab− 1+ 1, (2.8)
it follows that E(X2P /λ) ≈ (ab− 1). Thus, we propose the corrected Pearson’s
statistic X2C = X2
P /λ. The performance of this corrected chi-square test, the
naive chi-square test, and Wald’s test are evaluated by a simulation study in
Chapter 3.
2.6 Testing for Independence
An application of Theorem 1 is testing the independence of A and B. When
πC = 1, (i.e., there are no cases of nonresponses) the chi-square statistic is given
by
X2I
.= n
∑(pij − pi·p·j)2
pi·p·j→d χ2
(a−1)(b−1).
The following theorem establishes the asymptotic behavior of X2I under marginal
imputation and conditional imputation when πC > 0.
Theorem 4 Assume that πC > 0 and that A and B are independent.
(a) When marginal imputation is applied to impute nonrespondents,
X2I →d χ2
(a−1)(b−1).
29
(b) When conditional imputation is applied to impute nonrespondents,
X2I /(π−1
C + 1− πC) →d χ2(a−1)(b−1).
Proof:
(a) After marginal imputation, the test statistics given by
X2I = n
∑ (pIij − pI
i·pI·j)
2
pIi·p
I·j
= ||Vn||2,
where Vn is a ab-dimensional vector with
√n(pI
ij − pIi·p
I·j)√
pIi·p
I·j
as its dth component, where d = (i− 1)b + j. By Taylor’s expansion and
Slusky’s theorem,
√n(pI
ij − pIi·p
I·j)√
pIi·p
I·j
=
√n(pI
ij − pIi·p·j − pi·pI
·j + pi·p·j)√pi·p·j
+ op(1).
Define
U = Ia×b − (pA1′a)⊗ Ib − Ia ⊗ (pB1′b),
where 1a and 1b are two vectors with all elements equal to 1 and dimension
a and b, respectively. Let 1/√
pA denote the vector (1/√
p1·, · · · , 1/√
pa·)′,
and let 1/√
pB be defined similarly. Define S = diag{(1/√pA)⊗(1/√
pB)}.Note that
Vn = SU(√
n(pI − p)) + op(1) →d N(0, SUΣU ′S ′),
30
where Σ is given in Theorem 1, which is of the form
κPA ⊗ PB + x(pApA′)⊗ PB + yPA ⊗ (pBpB
′),
where κ = 1/πC + 1− πC under conditional imputation, and κ = 1 under
marginal imputation.
Note that
PA(1pA′) = (pA1′)PA = 0
PB(1pB′) = (pB
′1)PB = 0
(pApA′)(1pA
′) = (pA1′)(pApA′) = (pApA
′)
(pBpB′)(1pB
′) = (pB1′)(pBpB′) = (pBpB
′).
As a result,
UΣU ′ = κPA ⊗ PB.
Since
S = diag{1/√pi·} ⊗ diag{1/√p·j},
it follows that
SUΣU ′S ′ = κP ∗A ⊗ P ∗
B,
where
P ∗A = diag(1/
√pA) · PA · diag(1/
√pA)
= diag(1/√
pA) · (diag{pA} − pApA′) · diag{1/√pA}
= Ia −√pA√
pA′.
31
Similarly,
P ∗B = diag{1/√pB} · PB · diag{1/√pB} = Ib −√pB
√pB
′.
Because both P ∗A and P ∗
B are projection matrices with rank (a − 1) and
(b − 1), respectively, P ∗A ⊗ P ∗
B is also a projection matrix, but with rank
(a− 1)(b− 1). Since
Vn →d N(0, SUΣU ′S ′) = N(0, κP ∗A ⊗ P ∗
B),
it follows that
1
κX2
I = || 1√κVn||2 →d χ2
(a−1)(b−1).
The proof is completed by recalling that under marginal imputation κ = 1
and under conditional imputation κ = 1/πC + 1− πC .
32
Chapter 3
Simulation Study Under Simple
Random Sampling
3.1 Introduction
All the results obtained in the previous chapter are based on large sample the-
ory. In this chapter, extensive simulations are carried out to evaluate the finite
sample performances of the proposed estimators and tests.
In Section 3.2, we study the asymptotic normality through chi-square scores,
which is used by Johnson and Wichern (1998) as a tool to study the normality
of multivariate normal distributions. In Section 3.3 the relative efficiencies of pI
and pC are compared using WMSE. In Section 3.4, two test statistics (Wald type
and Rao type) for goodness-of-fit are compared in terms of size (type I error
probability). In Section 3.5, testing independence under marginal imputation,
conditional imputation, and re-weighting method are studied. Their relative
efficiencies are compared in terms of power.
33
3.2 Asymptotic Normality
Let X be a random vector and Σ be a positive definite matrix. The chi-square
score of X with respect to Σ is given by
s.= X ′Σ−1X. (3.1)
Under the assumption that X is normally distributed with mean 0 and covari-
ance matrix Σ, s is a chi-square random variable with d degrees of freedom,
where d is the dimension of X. Therefore, chi-square scores can be used as a
tool to evaluate the normality of a multivariate random vector.
Since pI has a degenerate covariance matrix, instead of studying pI , we study
p∗, which is obtained by dropping the last component of pI . In each simulation,
the total sample size n is fixed to be 1,000. Two types of contingency tables (2×2
and 5× 5) are considered. For a given type of contingency table, it is assumed
that pij ≡ 1/(ab). Thirty-two different missing patterns (i.e., (πA, πB, πC)) are
considered for simulation. For each missing pattern, 10,000 data sets are gener-
ated based on the given parameters (i.e., n, a, b, pij, πA, πB, πC). For each data
set, conditional imputation is performed and pI is calculated. The asymptotic
covariance matrix is also calculated according to Theorem 2. The chi-square
score is obtained according to (3.1) based on p∗ and Σ∗. Asymptotically, the
scores are distributed as chi-square random variables with ab − 1 degrees of
freedom. Therefore, each of the 10,000 chi-square scores is compared with χ20.05
and χ20.95, where χ2
p is the pth upper quantile of a chi-square random variable
34
with appropriate degrees of freedom, i.e., P (χ2 > χ2p) = p. The empirical up-
per tail probabilities, i.e., P (s > χ20.05) and P (s > χ2
0.95), are estimated by the
proportion of chi-square scores which are larger than χ20.05 or χ2
0.95. The results
are summarized in Table 1.
In order to provide a better understanding of the true distribution of the
chi-square scores after conditional imputation, the chi-square scores’ density is
estimated by R for selected cases. They are compared with the true chi-square
densities with appropriate degrees of freedom. The results are given in Figure
1 and Figure 2. The results show that the true distributions of the chi-square
scores are well approximated by its asymptotic distribution.
3.3 Weighted Mean Squared Error
In this section, a simulation study is performed to compare the efficiency of pI
and pC in terms of the WMSE as defined in Section 2.4.
Two distributions for 2 × 2 contingency tables are considered here. They
are (0.25, 0.25; 0.25, 0.25) and (0.01, 0.49; 0.49, 0.01), respectively. The noncen-
trality parameters of these two distributions are given by 0.0000 and 0.9216,
respectively. Based on the same simulation scheme as described in Section
3.2, 10,000 data sets are generated for each parameter setting (i.e., n = 1, 000,
pij = 1/(ab), a,b,πA, πB, πC). For each data set WMSEs are calculated for pI and
pC . In order to compare the efficiency of imputation and re-weighting methods,
the difference of WMSE is considered and magnified by the total sample size n.
35
This difference is estimated by the average of 10,000 data sets and is denoted
by ∆. The results are summarized in Table 2 with negative values indicating
better performance of pI .
It can be seen that when the missing probability is large, conditional impu-
tation can improve the efficiency of point estimators.
3.4 Testing for Goodness-of-Fit
Two methods are proposed in Chapter 2 for testing goodness-of-fit. One is
a Wald type statistic and the other is a Rao type statistic. Under the null
hypothesis of testing goodness-of-fit, the Wald type statistic is essentially the
chi-square score constructed in section (3.2). Therefore, in this section we only
study the Rao type statistic.
Based on the same simulation scheme as described in Section 3.2, 10,000
data sets are generated for each parameter setting. For each data set the Rao
score is calculated and compared with appropriate chi-square quantiles. The
empirical upper tail probabilities are estimated by the proportion of the Rao
scores which are larger than the quantiles. The results are summarized in Table
3.
On the other hand, the density of the Rao type statistic is also estimated by
R and compared with the standard one. The results are given in Figure 3 and
4. All the results show that the performance of the Rao type test is comparable
to the Wald type test in terms of type I error.
36
3.5 Testing for Independence
3.5.1 Marginal Imputation
Based on the same simulation scheme described in Section 3.2, 10,000 data sets
are generated for each parameter setting. For each data set marginal imputation
is applied. Then the naive Pearson chi-square score is calculated. According
to our result, the naive Pearson chi-square score should be approximately chi-
square distributed with (a − 1)(b − 1) degrees of freedom. Therefore, it is
compared with the standard chi-square upper quantiles (5% and 95%) with
(a− 1)(b− 1) degrees of freedom. The empirical upper tail probabilities of the
Pearson chi-square scores are estimated. The results are summarized in Table
4.
On the other hand, the densities of the naive Pearson scores are also esti-
mated by R and compared with the densities of standard chi-square random
variables with appropriate degrees of freedom. The results are given in Figure
5 and 6.
3.5.2 Conditional Imputation
Based on the same simulation scheme described in Section 3.2, 10,000 data
sets are generated for each parameter setting. For each data set conditional
imputation is applied. The naive Pearson chi-square score is calculated and
corrected by an appropriate constant, which is given in Theorem 4. According to
37
our result, the density of the corrected Pearson scores should be approximately
chi-square distributed with (a − 1)(b − 1) degrees of freedom. Therefore, the
corrected Pearson scores are compared with the standard chi-square quantiles
(5% and 95%) with appropriate degrees of freedom. The empirical upper tail
probabilities are estimated. The results are reported in Table 5.
On the other hand, the density of the corrected Pearson scores is also es-
timated by R and compared with the densities of standard chi-square random
variables with appropriate degrees of freedom. The results are given in Figure
7 and 8.
3.5.3 Relative Efficiency
Let X2,∗I , X2,c
I , and X2,mI be the chi-square statistics for testing the indepen-
dence of A and B based on completers, conditional imputation, and marginal
imputation, respectively. According to our asymptotic theory, they define three
asymptotically correct tests with the rejection regions given by
I{X2,∗I > χ2
1−α,(a−1)(b−1)},
I{X2,cI /κ > χ2
1−α,(a−1)(b−1)},
and
I{X2,mI > χ2
1−α,(a−1)(b−1)},
respectively, where κ = nn−1C + 1 − nCn−1. Under the null hypothesis that A
and B are independent, all three tests have asymptotic size α. Therefore, the
38
relative efficiency of the three tests becomes a problem of interest.
In this section, a simulation study is performed to study the relative effi-
ciency of the above three tests in terms of power. The simulation is performed
based on a 2 × 2 contingency table with distribution (0.28,0.22; 0.22,0.28).
Thirty-two different missing patterns are considered. For each parameter set-
ting, 10,000 data sets are generated and three different chi-square scores are
calculated. One is based on the re-weighting method, one is Pearson’s test after
marginal imputation, and the other is the corrected Pearson test after condi-
tional imputation. The power is estimated by the proportion of the scores which
correctly reject the null hypothesis. The results are summarized in Table 6.
In order to have a better understanding of how the power of the three tests
changes as a function of πC , we perform a simulation based on a 2 × 2 contin-
gency table with distribution (0.28, 0.22; 0.22, 0.28). For a given probability of
completeness (πC), we set πA = πB = (1 − πC)/2. Fifty different values of πC ,
which are evenly distributed between 0.1 and 1.0, are studied. For each given
parameter setting, 10,000 iterations are carried out. The estimated empirical
power is plotted versus the πC . The results are given in Figure 9.
We also study the power of the three tests as the function of the non-
centrality parameter δ. In this case, we fix the missing pattern to be
(πC , πA, πB) = (0.5, 0.3, 0.2).
39
Note that the following type of 2× 2 contingency tables
w =
0.25 0.25
0.25 0.25
+
√δ
16
1 −1
−1 1
has non-centrality parameter equal to δ/16, which is proportional to δ. There-
fore, simulations are performed based on this type of 2× 2 contingency tables.
Fifty equally spaced δ values from 0.01 to 0.50 are studied. The estimated
empirical power is plotted versus δ and is given in Figure 10.
The results in this section suggest that when testing independence, greatest
power is achieved by using the complete units only. In contrast, using the
chi-square test under marginal imputation results in the smallest power. An
intuitive explanation is that marginal imputation makes the two categorical
responses of the incompleters independent with each other conditional on the
completers. It decreases the dependency of these two components. The effect
is even more pronounced when the proportion of incompleters is large. As a
result, the power of marginal imputation for testing independence is the lowest.
On the other hand, since the conditional imputation successfully captures the
dependent structure of the two responses, its power is significantly higher than
marginal imputation and comparable to, but not as good as, the re-weighting
method due to the fact that additional noise is created by imputation.
However, the merit of the marginal imputation is that the naive Pearson
test statistic is still valid, which indicates that the marginally imputed data
set can be processed by standard software without modification. This is useful
40
when the proportion of the nonrespondents is not too large. If the proportion
of the incompleters is relatively large, conditional imputation is recommended.
Therefore, the naive Pearson statistic should be modified by a constant which
depends on the proportion of complete units only.
3.6 Conclusion
For the selected sample sizes and parameters, our simulation results show that
the empirical distribution of all the Wald type statistics can be well approxi-
mated by the derived asymptotic distributions.
In addition, the simulation demonstrates that the empirical distributions of
all the Rao type statistics can be well approximated by standard chi-square
distributions with appropriate degrees of freedom.
With regard to testing independence by different methods (re-weighting,
marginal imputation, or conditional imputation) the simulation results indicate
that greatest power is achieved by using the complete units only. In contrast, us-
ing the chi-square test under marginal imputation results in the smallest power.
41
Table 1: Empirical upper tail probability of Wald type statistic
Missing Probability 2× 2 5× 5πC πA πB p0.05 p0.95 p0.05 p0.95
0.1 0.0 0.9 0.042 0.565 0.003 0.2180.1 0.1 0.8 0.042 0.660 0.008 0.3820.1 0.2 0.7 0.044 0.780 0.017 0.6340.1 0.3 0.6 0.047 0.886 0.031 0.8400.1 0.4 0.5 0.049 0.938 0.046 0.9210.2 0.0 0.8 0.046 0.711 0.010 0.4830.2 0.1 0.7 0.047 0.795 0.019 0.6750.2 0.2 0.6 0.046 0.874 0.030 0.8340.2 0.3 0.5 0.049 0.930 0.038 0.9210.2 0.4 0.4 0.048 0.947 0.047 0.9490.3 0.0 0.7 0.046 0.792 0.019 0.6590.3 0.1 0.6 0.050 0.860 0.028 0.8080.3 0.2 0.5 0.050 0.916 0.038 0.9000.3 0.3 0.4 0.053 0.948 0.047 0.9430.3 0.4 0.3 0.052 0.948 0.047 0.9430.4 0.0 0.6 0.047 0.840 0.023 0.7700.4 0.1 0.5 0.048 0.896 0.036 0.8800.4 0.2 0.4 0.056 0.939 0.043 0.9290.4 0.3 0.3 0.048 0.946 0.051 0.9480.5 0.0 0.5 0.049 0.878 0.033 0.8500.5 0.1 0.4 0.046 0.920 0.039 0.9140.5 0.2 0.3 0.045 0.948 0.045 0.9460.6 0.0 0.4 0.055 0.902 0.036 0.8890.6 0.1 0.3 0.045 0.938 0.045 0.9360.6 0.2 0.2 0.052 0.955 0.050 0.9460.7 0.0 0.3 0.048 0.922 0.042 0.9200.7 0.1 0.2 0.051 0.950 0.050 0.9470.7 0.2 0.1 0.049 0.948 0.049 0.9480.8 0.0 0.2 0.050 0.938 0.045 0.9380.8 0.1 0.1 0.052 0.946 0.052 0.9480.9 0.0 0.1 0.051 0.952 0.048 0.9521.0 0.0 0.0 0.053 0.949 0.046 0.950
Number of iterations: 10,000; sample size: 1,000; pij = 1/(ab);p0.05 and p0.95: 5% and 95% empirical upper tail probabilities, respectively.
42
Table 2: Efficiency comparison by WMSE
Missing Probability δ = 0.0000 δ = 0.9216
pC pA pB ∆ ∆0.1 0.0 0.9 −7.13 −7.090.1 0.1 0.8 −8.69 −8.940.1 0.2 0.7 −9.93 −10.170.1 0.3 0.6 −10.64 −10.740.1 0.4 0.5 −11.24 −11.240.2 0.0 0.8 −2.51 −2.490.2 0.1 0.7 −3.13 −3.010.2 0.2 0.6 −3.48 −3.700.2 0.3 0.5 −3.77 −3.800.2 0.4 0.4 −4.10 −4.120.3 0.0 0.7 −0.84 −0.940.3 0.1 0.6 −1.30 −1.350.3 0.2 0.5 −1.57 −1.580.3 0.3 0.4 −1.79 −1.780.3 0.4 0.3 −1.73 −1.750.4 0.0 0.6 −0.25 −0.270.4 0.1 0.5 −0.52 −0.520.4 0.2 0.4 −0.71 −0.690.4 0.3 0.3 −0.69 −0.700.5 0.0 0.5 0.00 −0.030.5 0.1 0.4 −0.12 −0.140.5 0.2 0.3 −0.24 −0.260.6 0.0 0.4 0.19 0.150.6 0.1 0.3 0.03 −0.010.6 0.2 0.2 −0.05 0.030.7 0.0 0.3 0.15 0.180.7 0.1 0.2 0.10 0.100.7 0.2 0.1 0.16 0.090.8 0.0 0.2 0.14 0.160.8 0.1 0.1 0.12 0.130.9 0.0 0.1 0.10 0.081.0 0.0 0.0 0.00 0.00
Number of iterations: 10,000; sample size: 1,000; p0.05 and p0.95: 5% and 95%empirical upper tail probabilities, respectively; δ=0 comes from distribution
p = (0.25, 0.25; 0.25, 0.25) and δ=0.9216 comes from distributionp = (0.01, 0.49; 0.49, 0.01).
43
Table 3: Empirical upper tail probability of Rao type statistic
Missing Probability 2× 2 5× 5πC πA πB p0.05 p0.95 p0.05 p0.95
0.1 0.0 0.9 0.083 0.928 0.054 0.9100.1 0.1 0.8 0.068 0.930 0.054 0.9180.1 0.2 0.7 0.065 0.932 0.052 0.9220.1 0.3 0.6 0.064 0.935 0.058 0.9220.1 0.4 0.5 0.059 0.934 0.055 0.9230.2 0.0 0.8 0.068 0.935 0.057 0.9300.2 0.1 0.7 0.066 0.937 0.056 0.9320.2 0.2 0.6 0.060 0.939 0.054 0.9400.2 0.3 0.5 0.054 0.941 0.054 0.9360.2 0.4 0.4 0.052 0.938 0.059 0.9360.3 0.0 0.7 0.061 0.938 0.054 0.9380.3 0.1 0.6 0.058 0.937 0.054 0.9410.3 0.2 0.5 0.055 0.946 0.055 0.9430.3 0.3 0.4 0.050 0.947 0.053 0.9400.3 0.4 0.3 0.053 0.947 0.050 0.9440.4 0.0 0.6 0.056 0.943 0.052 0.9430.4 0.1 0.5 0.054 0.946 0.051 0.9460.4 0.2 0.4 0.052 0.943 0.056 0.9460.4 0.3 0.3 0.055 0.950 0.051 0.9460.5 0.0 0.5 0.053 0.937 0.052 0.9480.5 0.1 0.4 0.053 0.942 0.051 0.9460.5 0.2 0.3 0.051 0.948 0.053 0.9480.6 0.0 0.4 0.054 0.944 0.055 0.9440.6 0.1 0.3 0.049 0.948 0.050 0.9490.6 0.2 0.2 0.051 0.951 0.051 0.9440.7 0.0 0.3 0.050 0.950 0.052 0.9480.7 0.1 0.2 0.052 0.947 0.052 0.9460.7 0.2 0.1 0.047 0.951 0.051 0.9490.8 0.0 0.2 0.050 0.949 0.051 0.9510.8 0.1 0.1 0.049 0.950 0.050 0.9440.9 0.0 0.1 0.051 0.946 0.050 0.9471.0 0.0 0.0 0.052 0.949 0.046 0.950
Number of iterations: 10,000; sample size: 1,000; pij = 1/(ab);p0.05 and p0.95: 5% and 95% empirical upper tail probabilities, respectively.
44
Table 4: Testing independence under marginal imputation
Missing Probability 2× 2 5× 5πC πA πB p0.05 p0.95 p0.05 p0.95
0.1 0.0 0.9 0.051 0.952 0.048 0.9480.1 0.1 0.8 0.051 0.950 0.051 0.9500.1 0.2 0.7 0.049 0.952 0.050 0.9500.1 0.3 0.6 0.048 0.952 0.052 0.9500.1 0.4 0.5 0.050 0.950 0.047 0.9500.2 0.0 0.8 0.048 0.951 0.048 0.9510.2 0.1 0.7 0.050 0.948 0.046 0.9480.2 0.2 0.6 0.050 0.950 0.048 0.9510.2 0.3 0.5 0.051 0.947 0.050 0.9500.2 0.4 0.4 0.052 0.951 0.051 0.9490.3 0.0 0.7 0.049 0.949 0.047 0.9490.3 0.1 0.6 0.050 0.948 0.050 0.9470.3 0.2 0.5 0.047 0.953 0.049 0.9470.3 0.3 0.4 0.049 0.954 0.048 0.9490.3 0.4 0.3 0.048 0.948 0.045 0.9540.4 0.0 0.6 0.049 0.949 0.051 0.9510.4 0.1 0.5 0.045 0.953 0.051 0.9490.4 0.2 0.4 0.049 0.950 0.050 0.9500.4 0.3 0.3 0.047 0.949 0.045 0.9480.5 0.0 0.5 0.050 0.952 0.048 0.9500.5 0.1 0.4 0.046 0.956 0.049 0.9480.5 0.2 0.3 0.049 0.952 0.052 0.9470.6 0.0 0.4 0.046 0.949 0.048 0.9520.6 0.1 0.3 0.049 0.948 0.046 0.9490.6 0.2 0.2 0.051 0.947 0.049 0.9520.7 0.0 0.3 0.052 0.946 0.049 0.9470.7 0.1 0.2 0.049 0.948 0.046 0.9540.7 0.2 0.1 0.048 0.954 0.052 0.9520.8 0.0 0.2 0.049 0.952 0.051 0.9500.8 0.1 0.1 0.050 0.949 0.047 0.9450.9 0.0 0.1 0.051 0.949 0.048 0.9481.0 0.0 0.0 0.047 0.950 0.052 0.956
Number of iterations: 10,000; sample size: 1,000; pij = 1/(ab);p0.05 and p0.95: 5% and 95% empirical upper tail probabilities, respectively.
45
Table 5: Testing independence under conditional imputation
Missing Probability 2× 2 5× 5πC πA πB p0.05 p0.95 p0.05 p0.95
0.1 0.0 0.9 0.054 0.944 0.040 0.9280.1 0.1 0.8 0.049 0.950 0.038 0.9320.1 0.2 0.7 0.047 0.946 0.038 0.9330.1 0.3 0.6 0.047 0.947 0.039 0.9280.1 0.4 0.5 0.053 0.946 0.036 0.9300.2 0.0 0.8 0.049 0.951 0.044 0.9440.2 0.1 0.7 0.050 0.948 0.042 0.9410.2 0.2 0.6 0.048 0.949 0.044 0.9450.2 0.3 0.5 0.049 0.949 0.044 0.9420.2 0.4 0.4 0.047 0.943 0.046 0.9430.3 0.0 0.7 0.050 0.947 0.047 0.9450.3 0.1 0.6 0.052 0.945 0.043 0.9490.3 0.2 0.5 0.050 0.952 0.046 0.9470.3 0.3 0.4 0.054 0.951 0.046 0.9480.3 0.4 0.3 0.052 0.949 0.045 0.9500.4 0.0 0.6 0.052 0.947 0.045 0.9480.4 0.1 0.5 0.047 0.947 0.046 0.9510.4 0.2 0.4 0.046 0.950 0.047 0.9480.4 0.3 0.3 0.050 0.951 0.046 0.9490.5 0.0 0.5 0.048 0.948 0.047 0.9500.5 0.1 0.4 0.052 0.946 0.046 0.9510.5 0.2 0.3 0.044 0.950 0.048 0.9510.6 0.0 0.4 0.048 0.951 0.047 0.9500.6 0.1 0.3 0.050 0.950 0.046 0.9520.6 0.2 0.2 0.052 0.950 0.049 0.9460.7 0.0 0.3 0.052 0.949 0.050 0.9510.7 0.1 0.2 0.051 0.946 0.048 0.9510.7 0.2 0.1 0.053 0.951 0.046 0.9500.8 0.0 0.2 0.052 0.953 0.046 0.9470.8 0.1 0.1 0.052 0.949 0.052 0.9480.9 0.0 0.1 0.054 0.948 0.049 0.9501.0 0.0 0.0 0.045 0.948 0.047 0.950
Number of iterations: 10,000; sample size: 1,000; pij = 1/(ab);p0.05 and p0.95: 5% and 95% empirical upper tail probabilities, respectively.
46
Table 6: Power comparison for testing independence
Missing Probability δ = 0.0000 δ = 0.0144
πC πA πB X2,∗I X2,m
I X2,cI X2,∗
I X2,mI X2,c
I
0.1 0.0 0.9 0.055 0.050 0.053 0.219 0.069 0.2070.1 0.1 0.8 0.049 0.050 0.055 0.224 0.064 0.2080.1 0.2 0.7 0.053 0.050 0.053 0.227 0.065 0.2180.1 0.3 0.6 0.050 0.046 0.052 0.221 0.070 0.2130.1 0.4 0.5 0.054 0.050 0.056 0.228 0.071 0.2110.2 0.0 0.8 0.048 0.048 0.049 0.401 0.116 0.3510.2 0.1 0.7 0.049 0.050 0.050 0.390 0.116 0.3450.2 0.2 0.6 0.050 0.054 0.052 0.398 0.114 0.3570.2 0.3 0.5 0.052 0.050 0.050 0.403 0.124 0.3560.2 0.4 0.4 0.053 0.054 0.052 0.398 0.117 0.3580.3 0.0 0.7 0.050 0.051 0.048 0.552 0.210 0.4810.3 0.1 0.6 0.048 0.054 0.049 0.546 0.205 0.4680.3 0.2 0.5 0.046 0.050 0.048 0.542 0.202 0.4760.3 0.3 0.4 0.050 0.055 0.049 0.544 0.202 0.4700.3 0.4 0.3 0.049 0.051 0.048 0.561 0.208 0.4850.4 0.0 0.6 0.047 0.050 0.048 0.672 0.330 0.5810.4 0.1 0.5 0.049 0.050 0.052 0.670 0.328 0.5730.4 0.2 0.4 0.050 0.047 0.052 0.664 0.328 0.5800.4 0.3 0.3 0.050 0.053 0.048 0.667 0.321 0.5780.5 0.0 0.5 0.048 0.052 0.048 0.763 0.478 0.6700.5 0.1 0.4 0.052 0.050 0.053 0.766 0.476 0.6750.5 0.2 0.3 0.050 0.056 0.049 0.763 0.475 0.6700.6 0.0 0.4 0.049 0.048 0.052 0.837 0.632 0.7540.6 0.1 0.3 0.047 0.050 0.049 0.837 0.620 0.7510.6 0.2 0.2 0.051 0.052 0.051 0.834 0.623 0.7520.7 0.0 0.3 0.052 0.051 0.054 0.889 0.758 0.8230.7 0.1 0.2 0.053 0.050 0.049 0.893 0.760 0.8270.7 0.2 0.1 0.047 0.048 0.048 0.883 0.757 0.8240.8 0.0 0.2 0.055 0.052 0.053 0.920 0.859 0.8820.8 0.1 0.1 0.051 0.049 0.051 0.922 0.861 0.8880.9 0.0 0.1 0.048 0.047 0.048 0.948 0.925 0.9331.0 0.0 0.0 0.046 0.046 0.046 0.963 0.963 0.963
Number of iterations: 10,000; sample size: 1,000; pij = (0.28, 0.22; 0.22, 0.28);p0.05 and p0.95: 5% and 95% empirical upper tail probabilities, respectively.
47
Figure 1: Wald’s empirical density for 2× 2 tables
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.1 , 0.5 , 0.4 )
χ2−score
Den
sity
theoreticalempirical
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.8 , 0.1 , 0.1 )
χ2−score
Den
sity
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.2 , 0.3 , 0.5 )
χ2−score
Den
sity
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.2 , 0.4 , 0.4 )
χ2−score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.25;
48
Figure 2: Wald’s empirical density for 5× 5 tables
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.1 , 0.5 , 0.4 )
χ2−score
Den
sity
theoreticalempirical
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.8 , 0.1 , 0.1 )
χ2−score
Den
sity
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.2 , 0.3 , 0.5 )
χ2−score
Den
sity
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.2 , 0.4 , 0.4 )
χ2−score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.04;
49
Figure 3: Rao scores empirical density for 2× 2 tables
0 5 10 15 20 25
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.1 , 0.5 , 0.4 )
Rao’s score
Den
sity
χ2scoreRao’s score
0 5 10 15 20 25
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.3 , 0.4 , 0.3 )
Rao’s score
Den
sity
0 5 10 15 20 25
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.6 , 0.1 , 0.3 )
Rao’s score
Den
sity
0 5 10 15 20 25
0.00
0.05
0.10
0.15
0.20
0.25
r=( 0.8 , 0.1 , 0.1 )
Rao’s score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.25;
50
Figure 4: Rao scores empirical density for 5× 5 tables
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.1 , 0.5 , 0.4 )
Rao’s score
Den
sity
χ2scoreRao’s score
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.3 , 0.4 , 0.3 )
Rao’s score
Den
sity
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.6 , 0.1 , 0.3 )
Rao’s score
Den
sity
0 10 20 30 40 50 60
0.00
0.01
0.02
0.03
0.04
0.05
0.06
r=( 0.8 , 0.1 , 0.1 )
Rao’s score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.04;
51
Figure 5: Empirical density under marginal imputation for 2× 2 tables
0 1 2 3 4 5 6
01
23
45
r=( 0.3 , 0.3 , 0.5 )
χ2−score
Den
sity
empiricaltheoretical
0 1 2 3 4 5 6
01
23
45
r=( 0.4 , 0.3 , 0.3 )
χ2−score
Den
sity
0 1 2 3 4 5 6
01
23
45
r=( 0.8 , 0.1 , 0.1 )
χ2−score
Den
sity
0 1 2 3 4 5 6
01
23
45
r=( 0.7 , 0.1 , 0.2 )
χ2−score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.25;
52
Figure 6: Empirical density under marginal imputation for 5× 5 tables
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.3 , 0.3 , 0.5 )
χ2−score
Den
sity
empiricaltheoretical
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.4 , 0.3 , 0.3 )
χ2−score
Den
sity
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.8 , 0.1 , 0.1 )
χ2−score
Den
sity
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.7 , 0.1 , 0.2 )
χ2−score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.04;
53
Figure 7: Empirical density under conditional imputation for 2× 2 tables
0 1 2 3 4 5 6
01
23
45
r=( 0.3 , 0.3 , 0.5 )
χ2−score
Den
sity
empiricaltheoretical
0 1 2 3 4 5 6
01
23
45
r=( 0.4 , 0.3 , 0.3 )
χ2−score
Den
sity
0 1 2 3 4 5 6
01
23
45
r=( 0.8 , 0.1 , 0.1 )
χ2−score
Den
sity
0 1 2 3 4 5 6
01
23
45
r=( 0.7 , 0.1 , 0.2 )
χ2−score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.25;
54
Figure 8: Empirical density under conditional imputation for 5× 5 tables
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.3 , 0.3 , 0.5 )
χ2−score
Den
sity
empiricaltheoretical
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.4 , 0.3 , 0.3 )
χ2−score
Den
sity
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.8 , 0.1 , 0.1 )
χ2−score
Den
sity
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
r=( 0.7 , 0.1 , 0.2 )
χ2−score
Den
sity
r = (πC , πA, πB); Number of iterations: 10,000; sample size: 1,000; pij = 0.04;
55
Figure 9: Power of testing independence vs. πC .
0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
πC
Pow
er
Re−Weighting MethodMarginal ImputationConditonal Imputation
πA = πB = (1− πC)/2; Number of iterations: 10,000; sample size: 1,000;p = (0.28, 0.22; 0.22, 0.28);
56
Figure 10: Power of testing independence vs. δ.
0.0 0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
1.0
δ
Pow
er
Re−Weighting MethodMarginal ImputationConditonal Imputation
(πC , πA, πB) = (0.5, 0.3, 0.2); Number of iterations: 10,000; sample size: 1,000;
p = (0.25, 0.25; 0.25, 0.25) +√
δ16
(1,−1;−1, 1);
57
Chapter 4
Imputation Under Stratified
Sampling
4.1 Introduction
In the previous chapters, we have studied various imputation methods under
simple random sampling. In sample survey problems, however, stratified sam-
pling is more frequently considered. Therefore, in this chapter, we study statis-
tical properties of conditional imputation under stratified sampling.
For stratified sampling, an additional index for stratum is added to various
quantities. For example, for simple random sampling, nC is the number of
completers, but for stratified sampling, nCh denotes the number of completers
within the hth stratum. Quantities nAh , nB
h , ph, ph,ij, ph,i·, and ph,·j are similarly
defined. Within the hth stratum, we assume that a simple random sample of
size nh = nAh + nB
h + nCh is obtained, and samples across strata are obtained
independently. The total sample size is given by n =∑H
h=1 nh, where H is the
number of strata. The overall probability distribution vector is p =∑H
h=1 whph
and its estimator is pI =∑
whpIh, where wh is the known stratum weight.
58
In practice, one usually encounters one of the following two situations
1. H is fixed and nh/n → ρh > 0, h = 1, ..., H.
2. H is large and {nh : h = 1, 2, ...} is bounded.
For the first situation, two types of conditional imputation methods are studied.
One is the method of imputation within stratum and the other is the method
of imputation across strata. For the second situation, since nh is small, we
may observe some strata without any completer. Therefore, the method of
imputation within stratum may not be applicable to this situation. Instead, the
method of imputation across strata is applied. For different imputation methods
under different situations, the asymptotic distributions of point estimators are
derived and appropriate tests are studied.
The rest of this chapter is arranged as following. In Section 4.2, the method
of imputation within stratum is studied under situation 1. In Sections 4.3 and
4.4, the method of imputation across strata is studied under situation 1 and
situation 2, respectively.
4.2 Imputation Within Each Stratum
4.2.1 Asymptotic Distribution
We first consider the simple case where H is fixed and nh is large. In this
case, strata are often used as imputation classes. Even if strata are not used
59
as imputation classes (i.e., each imputation class may contain several strata),
imputation within each stratum is often applied when nh is large.
A direct generalization of Theorem 2 leads to the following theorem.
Theorem 5 Assume that conditional imputation is carried out within each stra-
tum under the situation 1. Then
√n(pI − p) →d N(0, Σ∗),
as n →∞ and
Σ∗ =H∑
h=1
w2h
ρh
Σh, (4.1)
where Σh is the Σ in Theorem 2 but within the hth stratum.
Proof:
It can be noted that
√n(pI − p) =
√n
∑wh(p
Ih − ph)
=∑
wh
√n
nh
√nh(p
Ih − ph)
=∑ wh√
ρh
√nh(p
Ih − ph) + op(1).
According to Theorem 2,
√nh(p
Ih − ph) → N(0, Σh),
60
where Σh is the asymptotic covariance matrix of√
nh(ph − p) which can be
calculated according to Theorem 2.
Therefore,
√n(pI − p) →d
∑
h
wh√ρh
N(0, Σh) = N(0,∑
h
w2h
ρh
Σh).
Based on (4.1), a Wald type test can be constructed for a general hypothesis
H0 : R(p) = 0 vs. HA : R(p) 6= 0,
where R is a first order differentiable function from Ra×b to Rd for some d. Let
C(p) =∂R(p)
∂p.
Assume C(p) has full rank. It follows that
√n[R(pI)−R(p)] = C(p)
√n(pI − p) + op(1)
→d N(0, C(p)Σ∗[C(p)]′),
where Σ∗ is given in (4.1). Therefore, a Wald type statistic can be obtained by
W = n[R(pI)−R(p)]′[C(p)Σ∗C ′(p)]−1[R(pI)−R(p)],
which under the null hypothesis is asymptotically distributed as a χ2 random
variable with d degrees of freedom.
61
4.2.2 Rao’s Test for Goodness-of-Fit
Although the Wald type test is asymptotically valid, it is difficult to implement
due to the complexity of the covariance matrix. Therefore, the Rao type test,
which is proposed in Section 2.5, is studied.
The naive Pearson statistic is given by
X2P = n
∑ (pIij − pij)
2
pij
.
Since the asymptotic covariance of pI is given by Σ∗, the first order moment of
X2P can be approximated by
λ = λ(πh,A, πh,B, πh,C , ph) = tr(diag{1/√p}Σ∗diag{1/√p}), (4.2)
where
1/√
p = (1/√
p11, · · · , 1/√
p1b, · · · , 1/√
pa1, · · · , 1/√
pab)′.
However, unlike the situation for simple random sampling, a closed formula
for λ does not exist due to the complexity of Σ∗. Instead, we propose the
following bootstrap procedure to estimate λ.
Let Sh denote the data set from the hth stratum. Our proposed bootstrap
procedure is outlined below:
(1) Perform conditional imputation within each Sh;
(2) Get estimators πh,A, πh,B, πh,C , and pIh;
(3) Generate the bootstrap data set S∗h by treating πh,A, πh,B, πh,C , and pIh as
the true probabilities;
62
(4) Perform conditional imputation for each S∗h;
(5) Get the overall probability estimator (pI∗) based on the imputed bootstrap
data set;
(6) Calculated the naive Pearson statistic based on the bootstrap data set,
i.e.,
X2∗P = n
∑ (pI∗ij − pI
ij)2
pIij
.
(7) Repeat steps (3)-(6) B times and average the B naive Pearson statistics
(X2∗P ), which gives λ (an estimator for λ);
(8) Finally, the Rao statistic is given by
ab− 1
λn
∑ij
(pIij − pij)
2
pij
.
It can be seen that, conditional on Sh, h = 1, · · · , H, each S∗h has the same
type of distribution as Sh but with different parameters. In other words, the
distribution of Sh is determined by {πh,A, πh,B, πh,C , ph}, while the distribution
of S∗h is determined by {πh,A, πh,B, πh,C , pIh}. Therefore, conditional on Sh, λ
provides a consistent estimator for λ(πh,A, πh,B, πh,C , pIh). On the other hand,
λ is a continuous function of {πh,A, πh,B, πh,C , ph}, and (πh,A, πh,B, πh,C) →a.s
(πh,A, πh,B, πh,C), pIh →a.s ph. Therefore, λ is a consistent estimator of λ.
4.3 Imputation Across Strata with Small H
In this section, we study the method of imputation across strata with small H.
63
For a sampled incomplete unit, e.g., (A,B) = (i, ∗), the missing value is
imputed by j according to the conditional probability
pij|A.= P (B = j|A = i and B is missing)
=P ((A,B) = (i, j) and B is missing)
P (A = i and B is missing)
=
∑h P ((A, B) from strata h,A = i, B = j, and B is missing)∑
h P ((A,B) from strata h,A = i and B is missing)
=
∑h whπh,Aph,ij∑h whπh,Aph,i·
.
Note that in this situation since we have a large sample size for each stratum,
we do not need to assume (πh,A, πh,B, πh,C) are the same for different stratum.
Similarly, for a sampled unit with (A,B) = (∗, j), the missing value is imputed
by i according to the conditional probability
pij|B.=
∑whπh,Bph,ij∑whπh,Bph,·j
.
To carry out imputation, the parameters πh,A, πh,B, ph,ij, ph,i·, and ph,·j should
be replaced by their estimators, e.g., nAh /nh for πh,A and nC
h,ij/nCh for ph,ij.
4.3.1 Asymptotic Distribution
Theorem 6 Assume that H is fixed and nh/n → ρh > 0, h = 1, ..., H. Based
on conditional imputation across strata, we have that
√n(pI − p) →d N(0, Σ1 + Σ2),
64
where
Σ1 =∑
h
w2h
ρh
MhPhM′h
Σ2 =∑
h
[w2
h
ρh
(πh,AΣAh + πh,BΣB
h )
]
Mh =1√πh,C
I − πh,A√πh,C
NAh (Ia ⊗ Ub)− πh,B√
πh,C
NBh (Ua ⊗ Ib)
Ph = diag(ph)− php′h
NAh = diag(p11|A, · · · , p1b|A, · · · , pa1|A, · · · , pab|A)
NBh = diag(p11|B, · · · , p1b|B, · · · , pa1|B, · · · , pab|B)
ΣAh = diag(ah)− aha
′h
ΣBh = diag(bh)− bhb
′h
ah = (ph,1.p11|A, · · · , ph,1.p1b|A, · · · , ph,a·pa1|A, · · · , ph,a·pab|A)′
bh = (p11|Bph,.1, · · · , p1b|Bph,·b · · · , pa1|Bph,.1, · · · , pab|Bph,·b)′.
Proof:
The overall cell probability estimator after conditional imputation across strata
can be obtained by
pI =∑
whnA
h pAh + nB
h pBh + nC
h pCh
nh
.
Let σ(C) denote the σ-field generated by all completers and {nAh , nB
h , nCh }.
65
The asymptotic normality can be established by considering
√n(pI − p)
=√
n∑
whnA
h (pAh − ph) + nB
h (pBh − ph) + nC
h (pCh − ph)
nh
=√
n
[∑ wh
nh
(nAh (ah − ph) + nB
h (bh − ph) + nCh (pC
h − ph))
+∑ wh
nh
(nAh (pA
h − ah) + nBh (pB
h − bh)
],
where
ah = E(pAh |σ(C))
= (p11|Aph,1·, · · · , p1b|Aph,1·, · · · , pa1|Aph,a·, · · · , pab|Aph,a·)′
bh = E(pBh |σ(C))
= (p11|Bph,·1, · · · , p1b|Bph,·b, · · · , pa1|Bph,·1, · · · , pab|Aph,·b)′.
Let
Un =√
n∑ wh
nh
[nA
h (ah − ph) + nBh (bh − ph)) + nC
h (pCh − ph)
]
Wn =√
n∑ wh
nh(nA
h (pAh − ah) + nB
h (pBh − bh)).
(4.3)
As in the simple random case, it can be shown that
Wn|σ(C) =√
n
[∑wh
nAh (pA
h − ah) + nBh (pB
h − bh)
nh
]
=∑
wh
√nA
h
nh
n
nh
nAh (pA
h − ah) +
√nB
h
nh
n
nh
nBh (pB
h − bh)
66
→d N(0, Σ2),
where
Σ2 =∑
h
[w2
h
ρh
(πh,AΣAh + πh,BΣB
h )
],
and ΣAh , ΣB
h are given in Theorem 6. On the other hand,
√n
∑wh
nAh
nh
(pij|Aph,i· − ph,ij)
=√
n
(
∑wh
nAh
nh
ph,i·)(
∑wh
nAh
nhph,ij
∑wh
nAh
nhph,i·
)−∑
whnA
h
nh
ph,ij
=√
n
[∑wh
nAh
nh
(ph,ij − ph,ij)− pij|A∑
whnA
h
nh
(ph,i· − ph,i·)]
+op(1).
Let
NAh = diag
{(p11|A, · · · , p1b|A, · · · , pa1|A, · · · , pab|A)
}.
It follows that
√n
∑wh
nAh
nh
(ah − ph) =√
n∑
whnA
h
nh
(I −NAh (Ia ⊗ Ub))(p
Ch − ph).
Similarly, let
NBh = diag
{(p11|B, · · · , p1b|B, · · · , pa1|B, · · · , pab|B)
}.
It follows that
√n
∑wh
nBh
nh
(bh − ph) =√
n∑
whnB
h
nh
(I −NBh (Ua ⊗ Ib))(p
Ch − ph).
67
Consequently,
Un
=√
n
[∑wh(I − nA
h
nh
NAh Ia ⊗ Ub − nB
h
nh
NBh Ua ⊗ Ib)(ph − ph)
]+ op(1)
=∑ wh
√n√
nCh
[I − nA
h
nh
NAh Ia ⊗ Ub − nB
h
nh
NBh Ua ⊗ Ib
]
√nC
h (pCh − ph) + op(1)
=∑ wh√
ρhπh,C
[1− πh,ANA
h (Ia ⊗ Ub)− πh,BNBh (Ua ⊗ Ib)
]
×√
nCh (pC
h − ph) + op(1)
→ N(0, Σ1),
where
Σ1 =∑
h
w2h
ρh
MhPhM′h
Mh =1√πh,C
− πh,A√πh,C
NAh (Ua ⊗ Ib)− πh,B√
πh,C
NBh (Ia ⊗ Ub)
Ph = diag{ph} − php′h.
As a result,
√n(pI − p) = Wn + Un → N(0, Σ1 + Σ2).
68
4.3.2 Rao’s Test for Goodness-of-fit
In this section, we also propose a Rao type testing statistic based on a bootstrap
procedure. Define
λ = E
[∑ (pIij − pij)
2
pij
].
Let Sh denote the data from the hth stratum. Our proposed bootstrap procedure
can be carried out as follows:
(1). Perform conditional imputation within each Sh;
(2). Get estimators πh,A, πh,B, πh,C , and pIh;
(3). Generate bootstrap data set S∗h according to πh,·A, πh,·B, πh,·C and pIh;
(4). Perform the method of conditional imputation across strata for the boot-
strapped data sets {S∗h};
(5). Get the overall probability estimator (pI∗) based on the imputed bootstrap
data sets;
(6). Calculated the naive Pearson statistic based on the bootstrap data set,
i.e.,
X2∗P = n
∑ (pI∗ij − pI
ij)2
pIij
.
(7). Repeat steps (3)-(6) B times and average the B naive Pearson statistics
(X2∗P ), which gives λ (an estimator for λ);
69
(8). Finally, the Rao statistic is given by
ab− 1
λn
∑ij
(pIij − pij)
2
pij
.
As in the case for imputation within stratum, λ is a continuous function in
(πh,A, πh,B, πh,C , ph). Therefore, λ is a consistent estimator for λ.
4.4 Imputation Across Strata with Large H
In this section, we study the method of imputation across strata with large H.
The entire discussion is based on the Assumption B, given below.
Assumption B:
(1) For all h and some (πA, πB, πC), (πh,A, πh,B, πh,B) = (πA, πB, πC);
(2) 2 ≤ nh ≤ m, ∀h and for some m;
(3) ε1 ≤ Hwh ≤ ε2, ∀h and ∀H;
(4) For all h and i ∈ {1, · · · , a}, j ∈ {1, · · · , b}, δ1 ≤ ph,ij ≤ δ2.
Consequently, pij|A and pij|B, which are defined in Section 4.3, can be simplified
to
pij|A =
∑whph,ij∑whph,i·
and pij|B =
∑whph,ij∑whph,·j
.
Their estimators can be obtained by
pij|A =
∑wh
nCh,ij
nh∑wh
nCh,i·nh
and pij|B =
∑wh
nCh,ij
nh∑wh
nCh,·jnh
.
70
The method of conditional imputation across strata will be based on pij|A and
pij|B. Here Un and Wn are the same as defined in (4.3). In this section, we use
σ(C) to denote all the observed responses, which includes not only all completers
and {nAh , nB
h , nCh }, but also the observed respondents from the incompleters.
4.4.1 Asymptotic Distribution
Lemma 2 Conditional on σ(C), Wn is asymptotically normally distributed.
Proof:
Since conditional on the observed units,
√n
∑wh
nAh (pA
h − E(pAh |σ(C)))
nh
and√
n∑
whnB
h (pBh − E(pB
h |σ(C)))
nh
are independent, it suffices to show the result for either of them. Without loss
of generality, consider
V =√
n∑
whnA
h (pAh − E(pA
h |σ(C)))
nh
,
and assume πA > 0 (otherwise V = 0). Let l be an arbitrary ab-dimensional
vector with lij be its [(i − 1)b + j]th component. Without loss of generality,
assume
l11 = max{lij} and lab = min{lij}.
Assume l11 > lab, otherwise l′V =a.s 0.
71
In order to establish asymptotic normality for V , it suffices to do it for l′V .
This can be done by checking Liapunouv’s condition. Let
σ2n = var[l′V ]
= var[√
n∑
whnA
h
nh
l′(pAh − E(pA
h |σ(C)))]
= Var
[√
n∑
whnA
h
nh
∑ij
lij pAh,i·
nAh,ij
nAh,i·
].
Thus
σ2n = n
∑
h
w2h(
nAh
nh
)2∑
i
(pA
h,i·nA
h,i·)2var(
∑j
lijnAh,ij))
= n∑
h
w2h(
nAh
nh
)2∑
i
(pA
h,i·nA
h,i·)2nA
h,i·
(∑j
pij|A(lij −∑
j′lij′ pij′|A)2
)
≥ n
H2
∑
h
(Hwh)2 (nA
h )2
n2hn
Ah,i·
∑j
(pAh,i·)
2p11|Ap2ab|A(l11 − lab)
2
≥ 1
mH
∑
h
ε21(
nAh
nh
)2δ21 p11|Apab|A(l11 − lab)
2
= ε21(l11 − lab)
2δ21 p11|B p2
ab|A1
mH
∑ (nAh )2
(nh)2.
Note that
pij|A →a.s. pij|A =
∑whpij∑whpi·
≥∑
whδ1∑whbδ2
=δ1
bδ2
> 0,
and
1
H
∑ (nAh )2
n2h
≥(
1
N
∑ nAh
nh
)2
→a.s π2A > 0.
Therefore, it can be concluded that σ2n = Op(1).
72
On the other hand, ∀δ > 0,
∑
h
E|√nwhnA
h
nh
l′(pAh − E(pA
h |σ(C)))|2+δ
≤∑
h
E|√
n
H(Hwh)l
′(pAh + E(pA
h |σ(C)))|2+δ
≤∑
h
n1+δ/2
H2+δε2+δ2 E|l′(pA
h + E(pAh |σ(C)))|2+δ
≤∑ n1+δ/2
H2+δε2+δ2 (2ab|l11|)2+δ
≤ n1+δ/2
H1+δε2+δ2 (2ab|l11|)2+δ
≤ n−δ/2m1+δ(2abε2|l11|)2+δ = O(n−δ/2) = o(σ2+δn ).
Therefore, Liapunouv’s condition is satisfied.
Lemma 3 Under the Assumption B,√
n∑
wh(nC
h
nhpC
h − πcph) is asymptotically
normally distributed.
Proof:
Let
V =√
n∑
wh(nC
h
nh
pCh − πCph),
73
and l as in Lemma 2. Then
σ2n = var[l′V ] = nvar
[∑wh
∑lijn
Ch,ij
nh
]
= n∑ w2
h
nh
(∑ij
ph,ij(lij −∑
i′j′ph,i′j′li′j′)
2)
≥ n
H2
∑ (whH)2
nh
ph,11(l11 −∑
i′j′ph,i′j′li′j′)
2
≥ n
H2
∑ ε21
nh
δ1(l11 − lab)2δ2
1
≥ 1
H
∑ ε21
m(l11 − lab)
2δ31 = m−1ε2
1δ31(l11 − lab)
2 = O(1).
On the other hand, ∀δ > 0,
∑E|√nwh(
nCh
nh
l′pCh − πC l′ph)|2+δ
≤∑
h
n1+δ/2w2+δh E|l′n
Ch
nh
pCh + πC l′ph|2+δ
≤∑
h
n1+δ/2
H2+δ(Hwh)
2+δ(2ab|l11|)2+δ
=n1+δ/2
H1+δ(2ε2ab|l11|)2+δ
≤ n1+δ/2
(n/m)1+δ(2ε2ab|l11|)2+δ
= n−δ/2m−(1+δ)(2ε2ab|l11|)2+δ = O(n−δ/2) = o(σ2+δn ).
Therefore, Liapunouv’s condition is satisfied and
√n
∑wh(
nCh
nh
pCh − πCph)
74
is asymptotically normal.
Theorem 7 Under the assumption (B),√
n(pI − p) is asymptotically normal.
Proof:
Note√
n(pI−p) = Un+Wn, where Un and Wn are defined in (4.3). According
to Lemma 2, conditional on the observed units, Wn is asymptotically normally
distributed. In order to establish the asymptotic normality of√
n(pI − p), it
suffices to show it for Un by Lemma ??.
For all i, j, the [(i− 1)b + j]th component of Un is given by
V =√
n∑
wh
nCh (pC
h,ij − ph,ij) + nAh (pij|ApA
h,i· − ph,ij) + nBh (pij|B pB
h,·j − ph,ij)
nh
=√
n
[∑wh
nCh
nh
pCh,ij + pij|A
∑wh
nAh
nh
pAh,i·
+ph,ij|B∑
whnB
h
nh
pBh,·j −
∑whph,ij
]
=√
n
∑
whnC
h
nh
pCh,ij +
∑wh
nCh
nhpC
h,ij
∑wh
nCh
nhpC
h,i·
(
∑wh
nAh
nh
pAh,i·)
+
∑wh
nCh
nhpC
h,ij
∑wh
nCh
nhpC
h,·j
(
∑wh
nBh
nh
pBh,·j)−
∑whph,ij
.
75
Then, by Taylor’s expansion
V
=√
n
[∑wh(
nCh
nh
pCh,ij − πCph,ij) +
πA
πC
∑wh(
nCh
nh
pCh,ij − πCph,ij)
−πA
πC
pij|A∑
wh(nC
h
nh
pCh,i· − πCph,i·) + pij|A
∑wh(
nAh
nh
pAh,i· − πAph,i·)
+πB
πC
∑wh(
nCh
nh
pCh,ij − πCph,ij)− πB
πC
pij|B∑
wh(nC
h
nh
pCh,·j − πCph,·j)
+pij|B∑
wh(nB
h
nh
pBh,·j − πBph,·j)
]+ op(1)
=√
n
[1
πC
∑wh(
nCh
nh
pCh,ij − πCph,ij)− πA
πC
pij|A∑
wh(nC
h
nh
pCh,i· − πCph,i·)
−πB
πC
pij|B∑
wh(nC
h
nh
pCh,·j − πCph,·j) + pij|A
∑wh(
nAh
nh
pAh,i· − πAph,i·)
+pij|B∑
wh(nB
h
nh
pBh,·j − πBph,·j)
]+ op(1). (4.4)
Therefore, Un can be written as
Un
=√
n∑
wh
[nC
h
nh
(1
πC
I − πA
πC
NAh (Ia ⊗ Ub)− πB
πC
NBh (Ua ⊗ Ib))p
Ch
+nA
h
nh
NAh (Ia ⊗ Ub)p
Ah +
nBh
nh
NBh (Ua ⊗ Ib)p
Bh − ph
]+ op(1)
= (1
πC
I − πA
πC
NAh (Ia ⊗ Ub)− πB
πC
NBh (Ua ⊗ Ib))
√n
∑wh(
nCh
nh
pCh − πCph)
+NAh (Ia ⊗ Ub)
√n
∑wh(
nAh
nh
pAh − πAph)
+NBh (Ua ⊗ Ib)
√n
∑wh(
nBh
nh
pBh − πBph) + op(1).
76
According to Lemma 3,√
n∑
wh(nC
h
nhpC
h −πCph) is asymptotically normal. Sim-
ilarly,√
n∑
wh(nA
h
nhpA
h − πAph) and√
n∑
wh(nB
h
nhpB
h − πBph) are also asymptoti-
cally normal. By the delta method, Un is also asymptotically normal.
4.4.2 Asymptotic Covariance and Estimation
In this section, we derive the explicit form of the covariance matrix of√
n(pI −p) under conditional imputation across strata with large H. An appropriate
estimator for this covariance matrix is also proposed based on the principle of
substitution.
For the sake of convenience, we first define a set of indicators.
δi·(h, k) = I{Ah,k = i}
δ·j(h, k) = I{Bh,k = j}
δA(h, k) = I{Ah,k is observed}
δB(h, k) = I{Bh,k is observed}.
According to our discussion, the asymptotic covariance matrix consists of two
components. One is the covariance of Un and the other is the expectation of
the conditional variance of Wn, where Un and Wn are defined in (4.3). Our first
task is to obtain var(Un). According to (4.4), the dth component of Un (where
77
d = (i− 1)b + j for ∀i, j) is given by
V =√
n∑
h
wh
nh
nh∑
k=1
(1
πC
δi·(h, k)δ·j(h, k)δA(h, k)δB(h, k)
−πA
πC
pij|Aδi·(h, k)δA(h, k)δB(h, k)
−πB
πC
pij|Bδ·j(h, k)δA(h, k)δB(h, k)
+pij|Bδ·j(h, k)δB(h, k)(1− δA(h, k))
+pij|Aδi·(h, k)δA(h, k)(1− δB(h, k)))
+ op(1).
To see this, we only need to note that
nCh pC
h,ij =
nh∑
k=1
δi·(h, k)δ·j(h, k)δA(h, k)δB(h, k)
nCh pC
h,·j =
nh∑
k=1
δ·j(h, k)δA(h, k)δB(h, k)
nCh pC
h,i· =
nh∑
k=1
δi·(h, k)δA(h, k)δB(h, k)
nCh pA
h,i· =
nh∑
k=1
δi·(h, k)δA(h, k)(1− δB(h, k))
nCh pB
h,·j =
nh∑
k=1
δ·j(h, k)δB(h, k)(1− δA(h, k)).
Therefore, if we define an ab-dimensional vector
K(h, k) = (K11(h, k), · · · , K1b(h, k), · · · , Ka1(h, k), · · · , Kab(h, k))′,
78
where
Kij(h, k) =1
πC
δi·(h, k)δ·j(h, k)δA(h, k)δB(h, k)
−πA
πC
pij|Aδi·(h, k)δA(h, k)δB(h, k)
−πB
πC
pij|Bδ·j(h, k)δA(h, k)δB(h, k)
+pij|Bδ·j(h, k)δB(h, k)(1− δA(h, k))
+pij|Aδi·(h, k)δA(h, k)(1− δB(h, k)),
it follows that
Un =√
n∑
h
wh
nh
∑
k
K(h, k),
and for a given h, K(h, k), i = 1, · · · , nh are i.i.d random vectors. Consequently,
var(Un) = n∑
h
w2h
nh
Σh .= Σ1,
where Σh = cov(K(h)). This immediately implies a consistent estimator for Σ1,
which is given by Σ1 is given by
Σ1 = n∑
h
w2h
nh
Σh,
where Σh is the sample covariance matrix of
K(h, k) = (K11(h, k), · · · , K1b(h, k), · · · , Ka1(h, k), · · · , Kab(h, k))′,
79
where Kij(h, k) is defined to be
Kij(h, k) =1
πC
δi·(h, k)δ·j(h, k)δA(h, k)δB(h, k)
− πA
πC
pij|Aδi·(h, k)δA(h, k)δB(h, k)
− πB
πC
pij|Bδ·j(h, k)δA(h, k)δB(h, k)
+pij|Bδ·j(h, k)δB(h, k)(1− δA(h, k))
+pij|Aδi·(h, k)δA(h, k)(1− δB(h, k)),
where
πA =
∑h nA
h∑h nh
, πB =
∑h nB
h∑h nh
, πC =
∑h nC
h∑h nh
.
The theoretically explicit form of Σh can also be obtained. For a given (i1, j1)
and (i2, j2), a direct calculation can show that the ((i1−1)b+ j1, (i2−1)b+ j2)th
80
component of Σh is given by
σ2(i1,j1),(i2,j2) =
1
πC
ph,ij +π2
A
πC
p2ij|Aph,i· +
π2B
πC
p2ij|Bph,·j
−2πApij|A
πC
ph,ij − 2πBpij|B
πC
ph,ij
+2πAπB
πC
pij|Bpij|Aph,ij + p2ij|A(πApi·)
+p2ij|B(πBp·j)− p2
h,ij
if (i1, j1) = (i2, j2) = (i, j);
=π2
A
πC
pij1|Apij2|Aph,i· − πA
πC
pij2|Aph,ij1
−πA
πC
pij1|Aph,ij2 +πAπB
πC
pij1|Bpij2|Aph,ij1
+πAπB
πC
pij1|Apij2|Bpij2 + πAph,i·pij1|Apij2|A
−ph,ij1ph,ij2
if i = i1 = i2 and j1 6= j2.
=π2
B
πC
pi1j|Bpi2j|Bp·j − (πB
πC
)pi2j|Bph,i1j
−πB
πC
pi1j|Bph,i2j +πAπB
πC
pi1j|Bpi2j|Aph,i2j
+πAπB
πC
pi2j|Bpi1j|Aph,i1j + πBph,·jpi1j|Bpi2j|B
−ph,i1jph,i2j
if i1 6= i2 and j = j1 = j2;
81
= −ph,i1j1ph,i2j2
if i1 6= i2 and j1 6= j2.
Then we can consider the expectation of the conditional variance of Wn. Recall
that in the definition of Wn, it can be seen that
E(var(Wn|σ(C))) = ΣA + ΣB,
where
ΣA = E
(var(
√n
∑wh
nAh pA
h
nh
|σ(C))
)
ΣB = E
(var(
√n
∑wh
nBh pB
h
nh
|σ(C))
).
Then for a given (i1, j1) and (i2, j2), the [(i1 − 1)b + j1] and [(i2 − 1)b + j2]th
component of
√n
∑wh
nAh pA
h
nh
is given by
√n
∑wh
nAh,i1·(n
Ah,i1j1
/nAh,i1·)
nh
and√
n∑
wh
nAh,i2·(n
Ah,i2j2
/nAh,i2·)
nh
.
Note that, given σ(C) (the observed units), nAh,i1· and nA
h,i2· are fixed. In-
stead, (nAh,i1j1
/nAh,i1·) and (nA
h,i2j2/nA
h,i2·) have variance (pi1j1|A − p2i1j1|A)/nA
h,i1·
and (pi2j2|A − p2i2j2|A)/nA
h,i2·, respectively. Also note that their covariance is
given by −pi1j|Api2j|A/nAh,i· if i1 = i2 = i, and 0 if i1 6= i2. Therefore, the
82
((i1 − 1)b + j1, (i2 − 1)b + j2)th component of ΣA is given by
σ2(i1,j1),(i2,j2) = E
(n
∑ w2hn
Ah,i·
n2h
(pij|A − p2ij|A)
)
= n(pij|A − p2ij|A)
∑ w2hπAph,i·
nh
if (i1, j1) = (i2, j2) = (i, j);
= −E
(n
∑ w2hn
Ah,i·
n2h
pij1|Apij2|A
)
= −npij1|Apij2|A∑ w2
hπAph,i·nh
if j1 6= j2 and i1 = i2 = i;
= 0
if i1 6= i2 and j1 6= j2.
Similarly, the same component of ΣB can be obtained by
σ2(i1,j1),(i2,j2) = E
(n
∑ w2hn
Bh,·j
n2h
(pij|B − p2ij|B)
)
83
= n(pij|B − p2ij|B)
∑ w2hπBph,·j
nh
if (i1, j1) = (i2, j2) = (i, j);
= −E
(n
∑ w2hn
Bh,·j
n2h
pi1j|Bpi2j|B
)
= −npi1j|Bpi2j|B∑ w2
hπBph,·jnh
if i1 6= i2 and j1 = j2 = j;
= 0
if j1 6= j2 and j1 6= j2.
The consistent estimator for ΣA and ΣB can be obtained by replacing πAph,ij
and πBph,ij by their unbiased estimators nAh,ij/nh and nB
h,ij/nh, respectively.
84
Chapter 5
Simulation Study Under
Stratified Sampling
5.1 Introduction
In the previous chapter, we have studied the asymptotic properties of cell prob-
ability estimators under various conditional imputation methods for stratified
sampling. In this chapter, extensive simulations are carried out to evaluate the
finite sample performance of the test statistics obtained in the previous chapter.
In Section 5.2, we study the test statistics obtained under the method of
imputation within stratum. In Section 5.3, we study the test statistics obtained
under the method of imputation across strata when H is small. In Section
5.4, we study the asymptotic results obtained under the method of imputation
across strata when H is large. In each section, the finite sample performances
of the Wald type test and the Rao type test for goodness-of-fit are evaluated in
terms of its empirical size.
85
5.2 Imputation Within Each Stratum
5.2.1 Wald’s Test for Goodness-of-Fit
For simplicity, we only consider 3 strata with nh = 1, 000 for each of them. Let
the weights for the three strata be 0.3, 0.3, and 0.4, respectively. Two types of
contingency tables (2×2 and 5×5) are considered. ph,ij is chosen to be 0.25 for
the 2× 2 table and 0.04 for the 5× 5 table. πC is allowed to be different across
the strata. But within each strata, it is assumed that πA = πB = (1− πC)/2.
For each parameter setting, 10,000 data sets are generated and conditionally
imputed within each stratum. For each data set, the Wald type test statistic
for goodness-of-fit is calculated. Each of them is compared with the 5% and
95% upper quantiles of standard chi-square random variables with appropriate
degrees of freedom. The empirical upper tail probability is estimated by the
proportion of the chi-square scores which are larger than the corresponding
quantiles. The results are reported in Table 7.
The estimated densities based on the 10,000 chi-square scores are plotted
for selected cases. As a comparison, the densities for a standard chi-square
distribution with appropriate degrees of freedom are also plotted on the same
figure. They are given in Figure 11 and Figure 12.
86
5.2.2 Rao’s Test for Goodness-of-Fit
In this section, simulation is performed to evaluate the performance of the Rao
type test for goodness-of-fit. We use the same parameter setting as the previous
section. B, the number of bootstrap samples, is set to be 100. Due the intensity
of computation, the number of iterations is decreased to 500. The results about
the empirical upper tail probability are reported in Table 8.
The estimated densities based on 500 corrected chi-square scores are plotted
for selected cases. As a comparison, the densities for a standard chi-square
distribution with appropriate degrees of freedom are also plotted on the same
figure. They are given in Figure 13 and Figure 14
5.3 Imputation Across Strata with Small H
5.3.1 Wald’s Test for Goodness-of-Fit
In this section, simulation is used to study the finite sample performance of the
Wald test based on the method of imputation across strata. We continue to use
the same parameter setting as the previous section. The simulation is based
on 10,000 iterations and the empirical upper tail probabilities are reported in
Table 9.
The estimated densities based on the 10,000 corrected chi-square scores are
plotted for selected cases. As a comparison, the densities for a standard chi-
square distribution with appropriate degrees of freedom are also plotted on the
87
same figure. They are given in Figure 15 and Figure 16.
5.3.2 Rao’s Test for Goodness-of-Fit
In this section, the simulations concern the finite sample performance of the
Rao test based on the method of imputation across strata. We continue to use
the same parameter setting as the previous section. B, the number of bootstrap
samples, is set to be 100. The simulation is based on 500 iterations and the
empirical upper tail probabilities are reported in Table 10.
The estimated densities based on the 500 corrected chi-square scores are
plotted for selected cases. As a comparison, the densities for the appropriate
standard chi-square distribution are also plotted on the same figure. They are
given in Figure 17 and Figure 18
5.4 Imputation Across Strata with Large H
In this section, simulation is carried out to study the finite sample performance
of the Wald test for goodness-of-fit with large H. In this simulation study, H
is set to be 1000 and nh = 4, wh = 0.001. ph,ij is set to be 1/(ab), where
(a, b) = (2, 2) or (5, 5). 10,000 iterations are performed for each parameter
setting. The results are reported in Table 11.
The estimated densities based on the 10,000 scores are plotted for selected
cases. As a comparison, the densities for a standard chi-square distribution with
88
appropriate degrees of freedom are also plotted on the same figure. They are
given in Figure 19 and Figure 20
5.5 Conclusion
For the selected sample sizes and parameters, our simulation results show that
the empirical distribution of all the Wald type statistics can be well approxi-
mated by the derived asymptotic distributions.
In addition, the simulation demonstrates that the empirical distributions of
all the Rao type statistics can be well approximated by standard chi-square
distributions with appropriate degrees of freedom.
89
Table 7: Wald’s test for imputation within stratum with small H
2× 2 5× 5π1,c π2,c π3,c p0.95 p0.05 p0.95 p0.05
0.2 0.2 0.2 0.9504 0.0518 0.9523 0.05640.2 0.2 0.4 0.9469 0.0488 0.9530 0.04980.2 0.2 0.6 0.9512 0.0504 0.9548 0.05580.2 0.2 0.8 0.9450 0.0477 0.9512 0.05370.2 0.2 1.0 0.9494 0.0475 0.9583 0.05400.2 0.4 0.4 0.9480 0.0494 0.9562 0.05340.2 0.4 0.6 0.9483 0.0475 0.9531 0.05780.2 0.4 0.8 0.9463 0.0530 0.9511 0.05000.2 0.4 1.0 0.9491 0.0489 0.9527 0.05120.2 0.6 0.6 0.9486 0.0497 0.9541 0.05500.2 0.6 0.8 0.9541 0.0543 0.9520 0.05370.2 0.6 1.0 0.9509 0.0480 0.9500 0.05170.2 0.8 0.8 0.9498 0.0512 0.9524 0.05550.2 0.8 1.0 0.9492 0.0478 0.9497 0.05080.2 1.0 1.0 0.9505 0.0506 0.9547 0.05790.4 0.4 0.4 0.9525 0.0508 0.9536 0.05450.4 0.4 0.6 0.9500 0.0515 0.9510 0.05210.4 0.4 0.8 0.9492 0.0512 0.9492 0.05310.4 0.4 1.0 0.9453 0.0539 0.9516 0.05120.4 0.6 0.6 0.9491 0.0470 0.9505 0.05080.4 0.6 0.8 0.9475 0.0504 0.9520 0.04720.4 0.6 1.0 0.9488 0.0493 0.9546 0.05150.4 0.8 0.8 0.9485 0.0490 0.9483 0.04770.4 0.8 1.0 0.9536 0.0525 0.9493 0.05190.4 1.0 1.0 0.9517 0.0532 0.9495 0.05100.6 0.6 0.6 0.9494 0.0509 0.9484 0.05250.6 0.6 0.8 0.9485 0.0510 0.9518 0.04810.6 0.6 1.0 0.9505 0.0497 0.9516 0.04940.6 0.8 0.8 0.9480 0.0491 0.9504 0.04920.6 0.8 1.0 0.9497 0.0532 0.9534 0.04740.6 1.0 1.0 0.9499 0.0519 0.9494 0.04840.8 0.8 0.8 0.9497 0.0481 0.9472 0.05070.8 0.8 1.0 0.9516 0.0479 0.9511 0.04950.8 1.0 1.0 0.9447 0.0507 0.9515 0.05001.0 1.0 1.0 0.9507 0.0502 0.9500 0.0515
Number of iterations: 10,000; sample size: (n1, n2, n3) = (1, 000, 1, 000, 1, 000);(w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 1/(ab); πh,A = πh,B = (1− πh,C)/2.
90
Table 8: Rao’s test for imputation within stratum with small H
2× 2 5× 5π1,c π2,c π3,c p0.95 p0.05 p0.95 p0.05
0.2 0.2 0.2 0.944 0.074 0.944 0.0380.2 0.2 0.4 0.940 0.038 0.938 0.0600.2 0.2 0.6 0.950 0.066 0.972 0.0680.2 0.2 0.8 0.924 0.060 0.952 0.0560.2 0.2 1.0 0.946 0.060 0.956 0.0740.2 0.4 0.4 0.944 0.050 0.954 0.0520.2 0.4 0.6 0.944 0.048 0.946 0.0660.2 0.4 0.8 0.958 0.062 0.956 0.0720.2 0.4 1.0 0.918 0.058 0.964 0.0500.2 0.6 0.6 0.950 0.048 0.942 0.0580.2 0.6 0.8 0.946 0.062 0.932 0.0860.2 0.6 1.0 0.928 0.050 0.936 0.0780.2 0.8 0.8 0.962 0.066 0.952 0.0400.2 0.8 1.0 0.940 0.056 0.950 0.0680.2 1.0 1.0 0.944 0.082 0.958 0.0680.4 0.4 0.4 0.940 0.064 0.940 0.0580.4 0.4 0.6 0.928 0.052 0.940 0.0600.4 0.4 0.8 0.942 0.072 0.960 0.0600.4 0.4 1.0 0.954 0.064 0.956 0.0640.4 0.6 0.6 0.956 0.050 0.952 0.0440.4 0.6 0.8 0.928 0.044 0.952 0.0740.4 0.6 1.0 0.964 0.048 0.950 0.0520.4 0.8 0.8 0.942 0.060 0.956 0.0540.4 0.8 1.0 0.954 0.038 0.932 0.0440.4 1.0 1.0 0.956 0.056 0.956 0.0460.6 0.6 0.6 0.950 0.072 0.942 0.0520.6 0.6 0.8 0.944 0.062 0.938 0.0660.6 0.6 1.0 0.950 0.052 0.956 0.0540.6 0.8 0.8 0.962 0.054 0.928 0.0440.6 0.8 1.0 0.948 0.038 0.958 0.0600.6 1.0 1.0 0.950 0.052 0.958 0.0400.8 0.8 0.8 0.952 0.056 0.936 0.0580.8 0.8 1.0 0.944 0.046 0.942 0.0660.8 1.0 1.0 0.950 0.066 0.970 0.0541.0 1.0 1.0 0.958 0.060 0.962 0.054
Number of iterations: 500; number of bootstrap samples: 100; sample size:(n1, n2, n3) = (1, 000, 1, 000, 1, 000); (w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 1/(ab);
πh,A = πh,B = (1− πh,C)/2.
91
Table 9: Wald’s test for imputation across strata with small H
2× 2 5× 5π1,c π2,c π3,c p0.95 p0.05 p0.95 p0.05
0.2 0.2 0.2 0.9468 0.0493 0.9512 0.05010.2 0.2 0.4 0.9474 0.0522 0.9524 0.04980.2 0.2 0.6 0.9470 0.0507 0.9524 0.04990.2 0.2 0.8 0.9500 0.0477 0.9512 0.05050.2 0.2 1.0 0.9493 0.0456 0.9520 0.05500.2 0.4 0.4 0.9551 0.0510 0.9509 0.04760.2 0.4 0.6 0.9541 0.0496 0.9521 0.04990.2 0.4 0.8 0.9476 0.0513 0.9493 0.05160.2 0.4 1.0 0.9480 0.0528 0.9498 0.05120.2 0.6 0.6 0.9514 0.0517 0.9515 0.05250.2 0.6 0.8 0.9508 0.0487 0.9503 0.05230.2 0.6 1.0 0.9505 0.0508 0.9503 0.05040.2 0.8 0.8 0.9494 0.0529 0.9531 0.04820.2 0.8 1.0 0.9471 0.0472 0.9511 0.05010.2 1.0 1.0 0.9498 0.0529 0.9513 0.05900.4 0.4 0.4 0.9484 0.0506 0.9519 0.05100.4 0.4 0.6 0.9473 0.0488 0.9488 0.04990.4 0.4 0.8 0.9504 0.0522 0.9521 0.04860.4 0.4 1.0 0.9502 0.0479 0.9517 0.05150.4 0.6 0.6 0.9501 0.0495 0.9488 0.05140.4 0.6 0.8 0.9529 0.0455 0.9454 0.04630.4 0.6 1.0 0.9486 0.0516 0.9532 0.05100.4 0.8 0.8 0.9519 0.0478 0.9513 0.04910.4 0.8 1.0 0.9471 0.0494 0.9476 0.04710.4 1.0 1.0 0.9481 0.0496 0.9503 0.05220.6 0.6 0.6 0.9501 0.0492 0.9520 0.04900.6 0.6 0.8 0.9556 0.0506 0.9496 0.05430.6 0.6 1.0 0.9492 0.0481 0.9565 0.05170.6 0.8 0.8 0.9498 0.0512 0.9509 0.04770.6 0.8 1.0 0.9493 0.0539 0.9483 0.04840.6 1.0 1.0 0.9498 0.0498 0.9472 0.04270.8 0.8 0.8 0.9497 0.0498 0.9510 0.04900.8 0.8 1.0 0.9513 0.0489 0.9503 0.05130.8 1.0 1.0 0.9503 0.0500 0.9499 0.04901.0 1.0 1.0 0.9514 0.0494 0.9501 0.0505
Number of iterations: 10,000; sample size: (n1, n2, n3) = (1, 000, 1, 000, 1, 000);(w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 1/(ab); πh,A = πh,B = (1− πh,C)/2.
92
Table 10: Rao’s test for imputation across strata with small H
2× 2 5× 5π1,c π2,c π3,c p0.95 p0.05 p0.95 p0.05
0.2 0.2 0.2 0.934 0.068 0.946 0.0380.2 0.2 0.4 0.944 0.038 0.940 0.0640.2 0.2 0.6 0.944 0.068 0.956 0.0720.2 0.2 0.8 0.946 0.058 0.950 0.0440.2 0.2 1.0 0.948 0.056 0.940 0.0540.2 0.4 0.4 0.932 0.062 0.946 0.0500.2 0.4 0.6 0.936 0.094 0.964 0.0560.2 0.4 0.8 0.956 0.040 0.920 0.0700.2 0.4 1.0 0.944 0.056 0.954 0.0660.2 0.6 0.6 0.950 0.044 0.938 0.0500.2 0.6 0.8 0.948 0.056 0.946 0.0380.2 0.6 1.0 0.954 0.038 0.948 0.0660.2 0.8 0.8 0.948 0.072 0.944 0.0640.2 0.8 1.0 0.938 0.072 0.958 0.0600.2 1.0 1.0 0.956 0.058 0.936 0.0560.4 0.4 0.4 0.962 0.058 0.938 0.0560.4 0.4 0.6 0.944 0.048 0.944 0.0580.4 0.4 0.8 0.948 0.066 0.958 0.0520.4 0.4 1.0 0.958 0.044 0.968 0.0540.4 0.6 0.6 0.954 0.050 0.944 0.0600.4 0.6 0.8 0.960 0.042 0.936 0.0640.4 0.6 1.0 0.958 0.050 0.934 0.0500.4 0.8 0.8 0.966 0.068 0.952 0.0480.4 0.8 1.0 0.934 0.046 0.934 0.0500.4 1.0 1.0 0.952 0.062 0.950 0.0340.6 0.6 0.6 0.944 0.050 0.946 0.0600.6 0.6 0.8 0.942 0.052 0.946 0.0660.6 0.6 1.0 0.948 0.054 0.948 0.0540.6 0.8 0.8 0.960 0.044 0.950 0.0340.6 0.8 1.0 0.968 0.074 0.944 0.0580.6 1.0 1.0 0.956 0.062 0.952 0.0520.8 0.8 0.8 0.960 0.066 0.948 0.0440.8 0.8 1.0 0.942 0.054 0.948 0.0700.8 1.0 1.0 0.944 0.052 0.932 0.0541.0 1.0 1.0 0.946 0.066 0.930 0.068
Number of iterations: 500; number of bootstrap samples: 100; sample size:(n1, n2, n3) = (1, 000, 1, 000, 1, 000); (w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 1/(ab);
πh,A = πh,B = (1− πh,C)/2.
93
Table 11: Wald’s test for imputation across strata with large H
2× 2 5× 5πC πA πB p0.95 p0.05 p0.95 p0.05
1.0 0.0 0.0 0.050 0.952 0.054 0.9550.9 0.0 0.1 0.048 0.948 0.050 0.9460.8 0.0 0.2 0.048 0.951 0.051 0.9470.8 0.1 0.1 0.049 0.949 0.042 0.9180.7 0.0 0.3 0.046 0.945 0.048 0.9350.7 0.1 0.2 0.049 0.945 0.042 0.9150.7 0.2 0.1 0.046 0.947 0.042 0.9280.6 0.0 0.4 0.045 0.946 0.053 0.9460.6 0.1 0.3 0.047 0.946 0.043 0.9130.6 0.2 0.2 0.045 0.948 0.043 0.9260.5 0.0 0.5 0.043 0.946 0.054 0.9350.5 0.1 0.4 0.045 0.950 0.048 0.9070.5 0.2 0.3 0.048 0.949 0.042 0.9230.4 0.0 0.6 0.045 0.946 0.059 0.9400.4 0.1 0.5 0.048 0.951 0.050 0.9060.4 0.2 0.4 0.052 0.948 0.049 0.9220.4 0.3 0.3 0.052 0.952 0.043 0.9210.3 0.0 0.7 0.050 0.951 0.062 0.9450.3 0.1 0.6 0.048 0.948 0.055 0.9090.3 0.2 0.5 0.053 0.953 0.050 0.9300.3 0.3 0.4 0.058 0.953 0.047 0.9260.3 0.4 0.3 0.057 0.952 0.044 0.9200.2 0.0 0.8 0.045 0.945 0.078 0.9480.2 0.1 0.7 0.050 0.949 0.062 0.9240.2 0.2 0.6 0.059 0.951 0.056 0.9260.2 0.3 0.5 0.064 0.953 0.057 0.9290.2 0.4 0.4 0.067 0.953 0.055 0.9230.1 0.0 0.9 0.054 0.947 0.132 0.9590.1 0.1 0.8 0.060 0.950 0.102 0.9360.1 0.2 0.7 0.084 0.955 0.102 0.9340.1 0.3 0.6 0.106 0.959 0.103 0.9310.1 0.4 0.5 0.115 0.960 0.093 0.930
Number of iterations: 10,000; nh = 4; H = 1, 000; wh = 0.001; ph,ij = 1/(ab).
94
Figure 11: Empirical density for Wald’s statistic based on 2× 2 tables
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 10,000; sample size: (n1, n2, n3) = (1, 000, 1, 000, 1, 000);(w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.25; πh,A = πh,B = (1− πh,C)/2.
95
Figure 12: Empirical density for Wald’s statistic based on 5× 5 tables
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 10,000; sample size: (n1, n2, n3) = (1, 000, 1, 000, 1, 000);(w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.04; πh,A = πh,B = (1− πh,C)/2.
96
Figure 13: Empirical density for Rao’s statistic based on 2× 2 tables
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 500; number of bootstrap samples: 100; sample size:(n1, n2, n3) = (1, 000, 1, 000, 1, 000); (w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.25;
πh,A = πh,B = (1− πh,C)/2.
97
Figure 14: Empirical density for Rao’s statistic based on 5× 5 tables
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 500; number of bootstrap samples: 100; sample size:(n1, n2, n3) = (1, 000, 1, 000, 1, 000); (w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.04;
πh,A = πh,B = (1− πh,C)/2.
98
Figure 15: Empirical density for Wald’s statistic based on 2 × 2 tables withsmall H
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 10,000; sample size: (n1, n2, n3) = (1, 000, 1, 000, 1, 000);(w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.25; πh,A = πh,B = (1− πh,C)/2.
99
Figure 16: Empirical density for Wald’s statistic based on 5 × 5 tables withsmall H
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 10,000; sample size: (n1, n2, n3) = (1, 000, 1, 000, 1, 000);(w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.04; πh,A = πh,B = (1− πh,C)/2.
100
Figure 17: Empirical density for Rao’s statistic based on 2×2 tables with smallH
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 500; number of bootstrap samples: 100; sample size:(n1, n2, n3) = (1, 000, 1, 000, 1, 000); (w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.25;
πh,A = πh,B = (1− πh,C)/2.
101
Figure 18: Empirical density for Rao’s statistic based on 5×5 tables with smallH
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.2,0.4,0.6)
empiricaltheoretical
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.3,0.5,0.3)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.4,0.3,0.5)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(π1,cπ2,cπ3,c) = (0.5,0.4,0.4)
Number of iterations: 500; number of bootstrap samples: 100; sample size:(n1, n2, n3) = (1, 000, 1, 000, 1, 000); (w1, w2, w3) = (0.3, 0.3, 0.4); ph,ij = 0.04;
πh,A = πh,B = (1− πh,C)/2.
102
Figure 19: Empirical density for Wald’s statistic based on 2 × 2 tables withlarge H
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(πAπBπC) = (0.4,0.3,0.3)
empiricaltheoretical
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(πAπBπC) = (0.2,0.4,0.4)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(πAπBπC) = (0.1,0.1,0.8)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
χ2−score
Den
sity
(πAπBπC) = (0.1,0.2,0.7)
Number of iterations: 10,000; nh = 4; H = 1, 000; wh = 0.001; ph,ij = 0.25
103
Figure 20: Empirical density for Wald’s statistic based on 5 × 5 tables withlarge H
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(πAπBπC) = (0.4,0.3,0.3)
empiricaltheoretical
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(πAπBπC) = (0.2,0.4,0.4)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(πAπBπC) = (0.1,0.1,0.8)
0 10 20 30 40
0.00
0.01
0.02
0.03
0.04
0.05
0.06
χ2−score
Den
sity
(πAπBπC) = (0.1,0.2,0.7)
Number of iterations: 10,000; nh = 4; H = 1, 000; wh = 0.001; ph,ij = 0.04
104
Chapter 6
Real Data Study
6.1 The Beaver Dam Eye Study
The first example concerns data observed from the Beaver Dam Eye Study
(BDES), which is an ongoing population-based cohort study of age-related eye
disease, cataract, and maculopathy. Detailed description of the target popu-
lation and study can be found in Klein, Klein, and Linton (1992). Five-year
follow-up data have been collected and the ten-year follow-up of the cohort is
in progress.
Briefly, a private census of the population of Beaver Dam, Wisconsin (99%
white), was performed from September 15, 1987 to May 4, 1988, to identify all
residents in the city or township of Beaver Dam who were 43 to 84 years of
age. Afterward, the population was examed over a 30-month period. Of 5,925
eligible people, 4,926(83.1%) participated in the study. Photographs of each eye
were taken and graded. An examination and a standardized questionnaire were
administered.
One objective of this trial is to study the relationship of macular edema
and proteinuria, which are two common complications in people with diabetes.
105
Macular edema (absent = 0 or present = 1) is a condition in which fluid builds
up in the retina in the macular area at the back of the eye. It can cause swelling
that will seriously impair vision. Proteinuria (absent = 0 or present = 1) is the
presence of excessive protein in the urine, which is usually a symptom of kidney
malfunction. The data for macular edema are available for both eyes. However,
due to the high correlations between the left and right eyes, our analysis will be
presented for right eyes only.
Besides the two primary end points, some auxiliary variables with no missing
values are also available, e.g., sex (male = 1, female = 2). As a result, two
imputation classes are created according to sex = 1 or sex = 2. We refer to
these two classes as class 1 and class 2, respectively. In this example, the
imputation class can also be viewed as strata. Therefore, we are essentially
performing imputation within each stratum. The subjects with both primary
end points (macular edema and proteinuria) missing are simply ignored. As a
result, a reduced data set is produced. Based on the reduced data set, some
summary statistics about this data set are provided in Table 12.
Conditional imputation is then performed within each imputation class. The
summary statistics after imputation are reported in Table 13.
After conditional imputation, the cell probability estimates are obtained for
each stratum. For the purpose of comparison, the same estimates are also ob-
tained by re-weighting, which uses the complete units only. Since the sampling
weight is not available in BDES study, we assume an equal sampling weight
106
(0.5) for the two strata (female and male), which means we are assuming that
in the whole population, the number of females is approximately equal to the
number of males. The overall cell probability estimates are obtained and the
test of independence is carried out. The results are reported in Table 14. The
results indicate a strong relationship between macular edema and proteinuria.
6.2 Victimization Incidents Study
The second example comes from the victimization incidents survey conducted
by U.S. Department of Justice in 1989. The original data set contains the
information about sampling weight, sex, violence type, injury, medical care, re-
porting of incidence, and the number of offenders involved in the crime. One
of the problems of interest is to test if the sex (female and male) is related to
the number of offenders. The original data contain 2219 observations. They
are divided into two imputation classes according to violence type (violent and
nonviolent). We refer to them as class 1 and class 2. Within each imputation
class, 50 strata are created artificially by combining the units with similar sam-
pling weights. For example, there are a total of 720 observations contained in
class 1. The observations are sorted according to their sampling weights from
small to large. Then every 14 (≈ 720/50) consecutive observations are grouped
together into one strata with the last strata containing 34 observations. Then
from each strata a simple random sample of 4 observations are taken to form our
subsample. A similar procedure is also carried out for class 2. Consequently,
107
our subsample contains two imputation classes (class 1 and class 2). Each class
contains 50 strata with stratum size equal to 4. The sampling weights are also
adjusted appropriately.
There are two primary variables of interest. One is sex (female and male)
with 1 denoting male and 2 denoting female. The other one is number of of-
fenders involved in the crime (NUM) with 1 denoting only one, 2 denoting more
than one. The sex is observed for every unit, but NUM suffers an appreciable
number missing responses (with 11% missing in the original sample).
The summary statistics of the selected subsample are provided in Table 15.
The method of conditional imputation across strata is performed on the sub-
sample. The summary statistics after imputation is given in Table 16. The cell
probabilities based on the complete units only and the whole data set after im-
putation are reported in Table 17. The results indicate no relationship between
the sex and the number of offenders. The covariance matrix of the cell prob-
ability estimates are reported in Table 18. From Table 18 it can be seen that
the standard deviation of cell probability estimate after conditional imputation
is comparable to the re-weighting method and in some cases it improves the
efficiency.
108
Table 12: Summary statistics of BDES data
ProteinuriaStrata Absent Present Missing
Absent 337 51 151 Present 14 17 5
macular Missing 24 40 0edema Absent 332 50 16
2 Present 15 10 1Missing 30 26 0
Table 13: Summary statistics of BDES data after imputation
ProteinuriaStrata Absent Present
1 Absent 369 89macular Present 19 26edema 2 Absent 378 71
Present 16 15
109
Table 14: Statistical analysis of BDES data after imputation
ProteinuriaConditional Imputation Re-Weighting
Strata Absent Present Absent Present1 Absent 0.731 0.177 0.804 0.122
macular Present 0.038 0.054 0.033 0.041edema 2 Absent 0.788 0.148 0.815 0.123
Present 0.033 0.031 0.037 0.025overall Absent 0.760 0.163 0.810 0.123
Present 0.036 0.043 0.035 0.033Test Independence: χ2 = 36.01 p < 0.01 χ2 = 39.34 p < 0.01
Table 15: Summary statistics for the selected subsample
NUMclass sex 1 2 missing
1 1 44 15 202 69 21 31
2 1 83 32 22 64 18 1
Table 16: Summary statistics after imputation
NUMclass 1 2
1 1 56 23sex 2 92 29
2 1 85 322 65 18
110
Table 17: Statistical analysis for criminal survey data
NUMConditional Imputation Re-Weighting
class 1 2 1 21 1 0.304 0.099 0.304 0.099
sex 2 0.466 0.131 0.460 0.1372 1 0.427 0.170 0.425 0.172
2 0.324 0.079 0.323 0.080overall 1 0.385 0.146 0.384 0.148
2 0.372 0.097 0.369 0.099Testing Independence χ2(1) = 2.996 p = 0.08345 χ2(1) = 1.4930 p=0.22
Table 18: Covariance matrix estimates
Imputation(×10−4) Re-Weighting(×10−4)Class (SEX,NUM) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2)
1 (1, 1) 15.5 −1.66 −7.80 −1.86 12.3 −3.84 −6.83 −1.62(1, 2) −1.66 5.94 −2.74 −0.27 −3.84 6.66 −2.59 −0.23(2, 1) −7.80 −2.74 20.5 −2.72 −6.83 −2.59 15.5 −6.10(2, 2) −1.86 −0.27 −2.72 7.16 −1.62 −0.23 −6.10 7.95
2 (1, 1) 13.9 −5.43 −6.74 −1.59 13.8 −5.60 −6.66 −1.55(1, 2) −5.43 8.58 −2.37 −0.78 −5.60 8.70 −2.33 −0.77(2, 1) −6.74 −2.37 10.8 −1.38 −6.66 −2.33 10.5 −1.49(2, 2) −1.59 −0.78 −1.38 3.75 −1.55 −0.77 −1.49 3.80
Overall (1, 1) 7.91 −2.58 −3.85 −0.91 7.47 −2.90 −3.70 −0.87(1, 2) −2.58 4.45 −1.35 −0.37 −2.90 4.59 −1.32 −0.36(2, 1) −3.85 −1.35 7.08 −0.91 −3.70 −1.32 6.37 −1.34(2, 2) −0.91 −0.37 −0.91 2.46 −0.87 −0.36 −1.34 2.57
111
Bibliography
[1] Chen, J., and Shao, J. (2000). Nearest-Neighbor Imputation for Survey Data,
J. of Official Statistics, 16, 583-599.
[2] Chen, J., and Shao, J. (2001). Jackknife Variance Estimation for Nearest-
Neighbor Imputation. JASA, 96, 260-269.
[3] Durrett, Richard (1995). Probability: Theory and Examples, Duxbury Press,
New York.
[4] Lehmann, E.L. (1997). Testing Statistical Hypothesis. Springer, New York.
[5] Lehmann, E.L. (1997). Theory of Point Estimation. Springer, New York.
[6] Kalton, G. and Kasprzyk, D. (1986). The treatment of missing data. Survey
Methodology, 12, 1-16.
[7] Little, R.J. and Rubin, D.B. (1987). Statistical Analysis with Missing Data.
Wiley, New York.
[8] Rao, J.N.K. and Scott, A.J. (1987). On simple adjustments to chi-square
tests with sample survey data. Annals of Statistics, 15, 1-12.
[9] Rao, J.N.K. and Shao, J. (1992). Jackknife variance estimation with survey
data under hot deck imputation. Biometrika, 79, 811-822.
112
[10] Johnson, R. and Wichern, D. (1998). Applied Multivariate Statistical Anal-
ysis. Prentice Hall, Inc. Englewood Cliffs, N.J.
[11] Rubin, D.B. (1987). Multiple Imputation for Nonresponses in Surveys. Wi-
ley, New York.
[12] Shao, J. (1998). Mathematical Statistics. Springer, New York
[13] Shao, J. and Wang, H. (2001). Sample correlation coefficients based on
survey data under regression imputation. To appear in JASA.
[14] Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data. Chapman
and Hall, London.
[15] Schenker, N. and Welsh, A.H. (1988). Asymptotic results for multiple im-
putation, Annals of Statistics, 16, No. 4. (Dec., 1988), pp. 1550-1566.
[16] Srivastava, M.S. and Carter, E.M. (1986). The maximum likelihood method
for non-response in sample surveys. Survey Methodology, 12, 61-72.
[17] Vaart, A.W. (1998). Asymptotic Statistics. Cambridge, New York.
[18] Venables, W.N. and Ripley, B.D. Modern Applied Statitics with S-Plus.
Springer, New York.