the performance of multiple imputation for likert-type...
TRANSCRIPT
![Page 1: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/1.jpg)
1
The Performance of Multiple Imputation for Likert-type Items with Missing Data
Walter L. Leite UNIVERSITY OF FLORIDA
S. Natasha BeretvasTHE UNIVERSITY OF TEXAS AT AUSTIN
Copies of the paper can be obtained from:[email protected]
![Page 2: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/2.jpg)
2
Types of missing data
Data missing completely at random (MCAR);
Data missing at random (MAR);
Data missing not at random (MNAR)
This classification is based on the relationships between the missing values, the incomplete variable and the other variables in the design.
Variable X Variable Y
?
?
?
??
?
?
![Page 3: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/3.jpg)
3
Common Methods to Deal with Missing Data
Listwise deletion;
Pairwise deletion;
Mean substitution;
Regression-based single imputation.
![Page 4: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/4.jpg)
4
Maximum-likelihood missing data methods;
Expectation Maximization Algorithm;
Multiple imputation.
Modern Methods to Deal with Missing Data
![Page 5: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/5.jpg)
5
Advantages of Multiple Imputation
Provides unbiased parameter estimates when the data is not missing completely at random
Preserves the variability of each variable
Preserves the variability of the sample covariance matrix
![Page 6: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/6.jpg)
6
Combining Parameter Estimates: ∑=
=m
iiq
mq
1
ˆ1
Calculating the total variance of each parameter:
Within imputations variance:
Between imputation variance:
Total variance:
∑=
=m
iiu
mu
1
ˆ1
2
1)ˆ(
11 ∑
=
−−
=m
ii qq
mB
Bm
uT ⎟⎠⎞
⎜⎝⎛ ++=
11
![Page 7: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/7.jpg)
7
Main Research Questions
How does MI perform with Likert-scale data under the assumption of multivariate normality?
How does the magnitude of the variables’ inter-correlations affect the performance of MI?
How does MI perform with non-normally distributed data under the assumption of multivariate normality?
![Page 8: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/8.jpg)
8
Conditions manipulated in this study
The underlying distribution of the item responses (normal versus non-normal);
The magnitude of the variables’ inter-correlations (ρ = 0.3, ρ = 0.8);
The bluntness of the categorization of the data into discrete item scores (three, five and seven);
The missing data mechanism (MCAR, MAR and MNAR);
The proportion of missing data.
![Page 9: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/9.jpg)
9
The Simulation MethodSimulation of item data:
Defined correlation matrices with ρ = 0.8 or ρ = 0.3
Generated 1000 samples with 10 multivariate normal variables and 400 cases.
Introduced skewness and kurtosis into the variables using the transformation designed by Valle and Maurelli (1983).
Categorized each variable in the dataset into Likert scales with3, 5, or 7 points.
Computed the correlation matrices for the categorized data.
![Page 10: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/10.jpg)
10
Distribution of categorized items
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Non-Normally distributed variable
Normally distributed variable
![Page 11: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/11.jpg)
11
The Simulation MethodSimulation of missing values:
Created MCAR missing data by randomly deleting values;
Deleted values according to a predictor variable to create MAR missing data;
Deleted values in each variable according to its own distribution of values, to create the MNAR missing data.
![Page 12: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/12.jpg)
12
Missing Data Conditions Simulated7-point Likert scale
MAR-Linear/MNARI II III
Proportions of missingness
1 .02 .20 .302 .04 .24 .353 .08 .28 .404 .12 .32 .455 .16 .36 .506 .20 .40 .557 .24 .44 .60 .50.30.157
.45.20.106
.40.10.055
.35.05.024
.40.10.053
.45.20.102
.50.30.151
Proportions of missingness
IIIIIIMAR-Convex
![Page 13: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/13.jpg)
13
Simulated Proportions of Missing Data
Likert Scale
Type Level k=3 k=5 k=7
MAR-linear I .10 .14 .11
MAR-linear II .18 .27 .29
MAR-linear III .45 .45 .40
MAR-convex I .06 .07 .05
MAR-convex II .15 .15 .10
MAR-convex III .42 .42 .36
MNAR I .11 .15 .12
MNAR II .20 .30 .32
MNAR III .48 .50 .45 .34.42.41IIIMNAR
.23.22.12IIMNAR
.04.07.06IMNAR
.42.51.52IIIMAR-convex
.22.24.25IIMAR-convex
.11.12.08IMAR-convex
.30.38.38IIIMAR-linear
.21.20.11IIMAR-linear
.04.06.05IMAR-linear
k=7k=5k=3LevelType
Likert Scale
Normally distributed data Non-normally distributed data
![Page 14: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/14.jpg)
14
Missing Data AnalysisValues for the missing data were imputed with Splus6.0
The multivariate normal model was assumed.Ten imputations were created for each dataset.
The correlation between each pair of variables was calculated for each imputed data set.
The correlations were transformed to Fisher’s Z:
The ten transformed correlation matrices were combined using Rubin’s (1987) rule:
The between-imputations variance, B, of the transformed correlation estimates was calculated:
⎥⎦⎤
⎢⎣⎡−+
=rrZr 1
1ln)2/1(
∑=
=m
iiq
mq
1
ˆ1
2
1)ˆ(
11 ∑
=
−−
=m
ii qq
mB
![Page 15: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/15.jpg)
15
Analysis of the Performance of MI
The Fisher’s Zs for the complete data (before missingness was introduced) were compared with the MI estimates.
The comparisons were performed using relative bias averaged across replications.
The relative bias is considered acceptable if its magnitude is less than .05 (Hoogland & Boomsma, 1998).
ρ
ρ
ζζ−
= rr
ZZB
ˆ)ˆ(
![Page 16: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/16.jpg)
16
Analysis of the Performance of MI
The variance associated with the multiply imputed parameter estimate is a function of the average within-imputation variance and the between-imputation variance.
Because the parameter estimate of interest is the transformed correlation (Fisher’s Z), its within-imputation variance is solely a function of sample size:
The between-imputations variance associated with Z did vary across conditions. For this reason, the efficiency of the Z-transformed correlations was summarized by calculating the average between-imputation variances by condition.
31ˆ−
=n
u
![Page 17: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/17.jpg)
17
Unbiased when the data were missing completely at random (MCAR) for both levels of missingness (10% and 30%).
ρ = .8 ρ = .3
TypeLevel
k=3 k=5 k=7 k=3 K=5 k=7MCAR I -.004 -.004 -.005 -.003 -.004 -.004MCAR II -.032 -.036 -.039 -.035 -.034 -.033
Normally distributed data
-.033-.038-.046-.040-.040-.035IIMCAR-.004-.006-.008-.005-.007-.004IMCARk=7k=5k=3k=7k=5k=3LevelType
ρ = .3ρ = .8Non-Normally distributed data
Results – Fisher’s Z
![Page 18: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/18.jpg)
18
Results – Fisher’s ZMI used for MAR conditions showed robustness to skewnessand categorization under the conditions with the two lowest degrees of missingness (I and II) for both MAR-linear and MAR-convex conditions.
Normally distributed data
ρ = .8 ρ = .3
TypeLevel
k=3 k=5 k=7 k=3 K=5 k=7MAR-linear I -.007 -.010 -.009 -.014 -.014 -.012MAR-linear II -.017 -.037 -.047 -.024 -.041 -.045MAR-linear III -.121 -.126 -.102 -.139 -.132 -.099
MAR-convex I -.004 -.005 -.005 -.012 -.012 -.010MAR-convex II -.011 -.014 -.010 -.022 -.020 -.014MAR-convex III -.105 -.113 -.079 -.126 -.115 -.076
![Page 19: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/19.jpg)
19
Results – Fisher’s Z
Non-Normally distributed data
ρ = .8 ρ = .3
TypeLevel
k=3 k=5 k=7 k=3 k=5 k=7MAR-linear I .004 .008 .008 -.003 -.001 .003MAR-linear II .005 -.011 -.011 -.008 -.019 -.016MAR-linear III -.074 -.080 -.039 -.114 -.083 -.042
MAR-convex I -.014 -.019 -.021 -.012 -.018 -.017MAR-convex II -.064 -.051 -.053 -.046 -.043 -.040MAR-convex III -.218 -.205 -.128 -.224 -.196 -.121
![Page 20: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/20.jpg)
20
Results – Fisher’s ZAcceptable bias was found for MNAR, with the exception of the conditions where the highest proportion of missing data had been introduced.
ρ = .8 ρ = .3
Type Level k=3 k=5 k=7 k=3 K=5 k=7
MNAR I -.010 -.009 -.010 -.030 -.018 -.016
MNAR II -.017 -.038 -.049 -.029 -.044 -.051
MNAR III -.094 -.134 -.109 -.093 -.146 -.112
Non-Normally distributed data
-.116-.155-.184-.051-.093-.079IIIMNAR
-.062-.068-.077-.017-.019-.006IIMNAR
-.029-.038-.048.003.002-.006IMNAR
k=7k=5k=3k=7k=5k=3LevelTypeρ = .3ρ = .8
Normally distributed data
![Page 21: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/21.jpg)
21
Results – Between-Imputations Variance
The between imputation variance accounts for the extra amount oferror introduced by the imputation process.
It was observed that as the overall proportion of missingnessincreases so did the amount of between-imputations variance.
In the conditions with high percentage of missing data, the between-imputation variances increased as the correlation between variables increased from ρ = .3 to ρ = .8.
![Page 22: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/22.jpg)
22
Between-imputation variances - Normally distributed data
ρ = .8 ρ = .3
Type Level k=3 k=5 k=7 k=3 k=5 k=7
MCAR I .0004 .0004 .0004 .0005 .0005 .0005
MCAR II .0033 .0050 .0063 .0021 .0021 .0021
MAR-linear I .0004 .0008 .0006 .0005 .0007 .0005
MAR-linear II .0012 .0051 .0079 .0010 .0019 .0021
MAR-linear III .0137 .0225 .0212 .0046 .0050 .0042
MAR-convex I .0002 .0003 .0002 .0003 .0004 .0002
MAR-convex II .0011 .0012 .0006 .0009 .0008 .0005
MAR-convex III .0148 .0206 .0161 .0043 .0046 .0034
MNAR I .0005 .0009 .0007 .0005 .0008 .0006
MNAR II .0013 .0053 .0082 .0012 .0021 .0024
MNAR III .0135 .0246 .0228 .0050 .0058 .0048
![Page 23: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/23.jpg)
23
ρ = .8 ρ = .3
Type Level k=3 k=5 k=7 k=3 k=5 k=7
MCAR I .0004 .0004 .0005 .0005 .0005 .0005
MCAR II .0043 .0059 .0071 .0022 .0022 .0022
MAR-linear I .0003 .0005 .0003 .0004 .0004 .0003
MAR-linear II .0010 .0038 .0049 .0007 .0014 .0015
MAR-linear III .0153 .0208 .0149 .0039 .0041 .0029
MAR-convex I .0003 .0006 .0005 .0004 .0005 .0005
MAR-convex II .0021 .0030 .0025 .0014 .0014 .0011
MAR-convex III .0174 .0267 .0217 .0052 .0058 .0043
MNAR I .0004 .0005 .0003 .0004 .0004 .0003
MNAR II .0011 .0038 .0049 .0008 .0016 .0016
MNAR III .0153 .0219 .0150 .0042 .0045 .0031
Between-imputation variances – Non-Normally distributed data
![Page 24: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/24.jpg)
24
DiscussionThe results indicate that multiple imputation is robust to violations of both continuity and normality.
The biases of the parameter estimates resulting from using MI were found to be consistently negative across all conditions.
The statistical tests performed after MI will tend to be less powerful.
It is possible to conclude that multiple imputation can be safely used to estimate parameters if the overall proportion of missing data in the dataset does not exceed a maximum of about 30%.
![Page 25: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/25.jpg)
25
Limitations
The datasets used in this study contained ten variables inter-correlated with each other. The results of this study may have been different if uncorrelated variables were used.
The proportions of missing data were not consistent across conditions, which make comparisons across conditions somewhat harder.
This simulation uses only a sample size of 400, which is relatively large. Different results could be obtained if smaller sample sizes were used.
![Page 26: The Performance of Multiple Imputation for Likert-type ...plaza.ufl.edu/leitewl/Presentation_AERA2004... · zThe Fisher’s Zs for the complete data (before missingness was introduced)](https://reader033.vdocuments.us/reader033/viewer/2022060501/5f1b4d6c17cf1062b30c3be6/html5/thumbnails/26.jpg)
26
Future Research Questions
What is the maximum amount of missing data for which MI still functions adequately?
How much can the inclusion of predictors in the dataset help MI when the proportion of missing data is large?
How does sample size affect the performance of MI?