iaos 2014 conference – meeting the demands of a changing world da nang, vietnam, 8-10 october 2014...
TRANSCRIPT
![Page 1: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/1.jpg)
IAOS 2014 Conference – Meeting the Demands of a Changing WorldDa Nang, Vietnam, 8-10 October 2014
Diagnosing the Imputation of Missing Values in Official Economic Statistics via Multiple Imputation:
Unveiling the Invisible Missing Values
National Statistics Center (Japan)Masayoshi Takahashi
Notes: The views and opinions expressed in this presentation are the authors’ own, not necessarily those of the institution.
![Page 2: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/2.jpg)
2
Outline
1. Problems of Missing Values and Imputation
2. Theory of MI and the EMB Algorithm3. Mechanism Behind the Diagnostic
Algorithm4. Data and Missing Mechanism5. Assessment of the Diagnostic Algorithm6. Conclusions and Future Work
![Page 3: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/3.jpg)
3
Problems of Missing Values
Prevalence of missing values Effects of missing values
Reduction in efficiency Introduction of bias
Assumptions and solution Missing At Random (MAR) Imputation
1. Problems of Missing Values and Imputation
![Page 4: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/4.jpg)
4
Problematic Nature of Single Imputation (SI)
ijiij YY ˆˆˆ,
ˆ, jiij YY
1. Problems of Missing Values and Imputation
Deterministic SI
Stochastic SI
There is only one set of regression coefficients.
Random noise
^ = OLS estimate
![Page 5: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/5.jpg)
5
Multiple Imputation (MI) Comes for Rescue
2. Theory of Multiple Imputation and the EMB Algorithm
ijiij YY ~~~,
~ = random sampling from a posterior distribution
Multiple sets of regression coefficients
Need multiple values of
&
![Page 6: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/6.jpg)
6
Likelihood of Observed Data
2. Theory of Multiple Imputation and the EMB Algorithm
n
iobsiobsiobsiobs YNYL
1,,, ,||,
Random sampling from observed likelihood
Various computation algorithms
Not easy!!
Solution
![Page 7: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/7.jpg)
7
Computational Algorithms
EMB algorithm Expectation-Maximization Bootstrapping Most computationally efficient
Other MI algorithms MCMC FCS
2. Theory of Multiple Imputation and the EMB Algorithm
![Page 8: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/8.jpg)
8
Graphical Presentation of the EMB Algorithm
2. Theory of Multiple Imputation and the EMB Algorithm
![Page 9: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/9.jpg)
9
Paradox in Imputation
Imputed values Estimates, not true values Diagnosis
True values Always missing Cannot compare the imputed values
with the truth How do we go about imputation
diagnostics?
3. Mechanism Behind the Diagnostic Algorithm
![Page 10: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/10.jpg)
10
Solution to the Paradox
Indirect diagnostics of imputation Abayomi, Gelman, and Levy (2008) Honaker and King (2010)
MI Within-imputation variance Between-imputation variance
3. Mechanism Behind the Diagnostic Algorithm
![Page 11: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/11.jpg)
11
Disadvantage of multiple imputation
Dozens of imputed datasets Computational burden Multiple values for one cell Unrealistic to directly use in official
statistics
3. Mechanism Behind the Diagnostic Algorithm
![Page 12: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/12.jpg)
12
Proposal in this Research
Two-step procedure Imputation step: Stochastic SI Diagnostic step: MI
Advantage Can have only one imputed value
Advantage of SI Can know the confidence about each
imputed valueAdvantage of MI
3. Mechanism Behind the Diagnostic Algorithm
New!!
![Page 13: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/13.jpg)
13
Multiple Imputation as a Diagnostic Tool
Variation among M imputed datasets Estimation uncertainty in imputation
Our diagnostic algorithm Utilizes this variability Can examine the stability & confidence
of imputation models What does this mean?
See the next slide for illustration
3. Mechanism Behind the Diagnostic Algorithm
![Page 14: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/14.jpg)
14
Illustration: Two Cases of Variation in Imputations
3. Mechanism Behind the Diagnostic Algorithm
![Page 15: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/15.jpg)
15
Mathematical Representation
3. Mechanism Behind the Diagnostic Algorithm
ijiij YY ~~~,
ijiij YY ˆˆˆ,
Imputation Step:Stochastic SI
Diagnostic Step:MI
~ˆ
0)~( ijYsd
If , then
no uncertainties
What we actuallycheck is whether
![Page 16: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/16.jpg)
16
Data
Multivariate log-normal distribution Mean vector & variance-covariance matrix
Simulated dataset Manufacturing Sector 2012 Japanese Economic Census
Number of observations 1,000
Variables turnover, capital, worker
4. Data and Missing Mechanism
![Page 17: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/17.jpg)
17
Missing Mechanism
Target variable turnover
Missing rate 20%
Missing mechanism MAR A logistic regression to estimate the
probability of missingness according to the values of explanatory variables (capital and worker)
4. Data and Missing Mechanism
![Page 18: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/18.jpg)
18
R-Function diagimpute
New function developed in R Graphical detection of problematic
imputations as outliers Graphical presentation of the stability of
imputation via control chart Not yet publicly available
A work in progress Once finalized, planning to make it
publicly available
5. Assessment of the Diagnostic Algorithm
![Page 19: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/19.jpg)
19
Preliminary Result 1
5. Assessment of the Diagnostic Algorithm
![Page 20: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/20.jpg)
20
Preliminary Result 2
5. Assessment of the Diagnostic Algorithm
![Page 21: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/21.jpg)
21
Conclusions
MI as a diagnostic tool A novel way
Diagnostic algorithm Still a work in progress A preliminary assessment given Useful to detect problematic imputations
Help us strengthen the validness of official economic statistics.
6. Conclusions and Future Work
![Page 22: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/22.jpg)
22
Future Work
Intend to further refine the algorithm Test it against a variety of real datasets Use several imputation models
6. Conclusions and Future Work
![Page 23: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/23.jpg)
23
References 1
1. Abayomi, Kobi, Andrew Gelman, and Marc Levy. (2008). “Diagnostics for Multivariate Imputations,” Applied Statistics vol.57, no.3, pp.273-291.
2. Allison, Paul D. (2002). Missing Data. CA: Sage Publications.3. Congdon, Peter. (2006). Bayesian Statistical Modelling, Second Edition. West
Sussex: John Wiley & Sons Ltd.4. de Waal, Ton, Jeroen Pannekoek, and Sander Scholtus. (2011). Handbook of
Statistical Data Editing and Imputation. Hoboken, NJ: John Wiley & Sons.5. Honaker, James and Gary King. (2010). “What to do About Missing Values in
Time Series Cross-Section Data,” American Journal of Political Science vol.54, no.2, pp.561–581.
6. Honaker, James, Gary King, and Matthew Blackwell. (2011). “Amelia II: A Program for Missing Data,” Journal of Statistical Software vol.45, no.7.
7. King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve. (2001). “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation,” American Political Science Review vol.95, no.1, pp.49-69.
8. Little, Roderick J. A. and Donald B. Rubin. (2002). Statistical Analysis with Missing Data, Second Edition. New Jersey: John Wiley & Sons.
![Page 24: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/24.jpg)
24
References 2
9. Oakland, John S. and Roy F. Followell. (1990). Statistical Process Control: A Practical Guide. Oxford: Heinemann Newnes.
10. Rubin, Donald B. (1978). “Multiple Imputations in Sample Surveys — A Phenomenological Bayesian Approach to Nonresponse,” Proceedings of the Survey Research Methods Section, American Statistical Association, pp.20-34.
11. Rubin, Donald B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons.
12. Schafer, Joseph L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman & Hall/CRC.
13. Scrucca, Luca. (2014). “Package qcc: Quality Control Charts,” http://cran.r-project.org/web/packages/qcc/qcc.pdf.
14. Statistics Bureau of Japan. (2012). “Economic Census for Business Activity,” http://www.stat.go.jp/english/data/e-census/2012/index.htm.
15. Takahashi, Masayoshi and Takayuki Ito. (2012). “Multiple Imputation of Turnover in EDINET Data: Toward the Improvement of Imputation for the Economic Census,” Work Session on Statistical Data Editing, UNECE, Oslo, Norway, September 24-26, 2012.
![Page 25: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/25.jpg)
25
References 3
16. Takahashi, Masayoshi and Takayuki Ito. (2013). “Multiple Imputation of Missing Values in Economic Surveys: Comparison of Competing Algorithms,” Proceedings of the 59th World Statistics Congress of the International Statistical Institute, Hong Kong, China, August 25-30, 2013, pp.3240-3245.
17. Takahashi, Masayoshi. (2014a). “An Assessment of Automatic Editing via the Contamination Model and Multiple Imputation,” Work Session on Statistical Data Editing, United Nations Economic Commission for Europe, Paris, France, April 28-30, 2014.
18. Takahashi, Masayoshi. (2014b). “Keiryouchi Data no Kanrizu (Control Chart for Continuous Data),” Excel de Hajimeru Keizai Toukei Data no Bunseki (Statistical Data Analysis for Economists Using Excel) , 3rd edition. Tokyo: Zaidan Houjin Nihon Toukei Kyoukai..
19. van Buuren, Stef. (2012). Flexible Imputation of Missing Data. London: Chapman & Hall/CRC.
![Page 26: IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d6f5503460f94a50707/html5/thumbnails/26.jpg)
26
Thank you