center for statistical research and methodology fy … q3.pdf · the models based on both their...

28
CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY 2013 THIRD QUARTER REPORT –April through June 2013 – COLLABORATION DECENNIAL DIRECTORATE Decennial Management Division/Decennial Statistical Studies Division/American Community Survey Office (Sponsors) Project Project Number Title FTEs 5610302 Statistical Design and Estimation ........................................................................................................... 2.10 6410301 Automating Field Activities (Infra & Opers) .......................................................................................... 2.21 6410307 Non-ID Processing.................................................................................................................................. .25 A. Decennial Record Linkage B. Synthetic Decennial Microdata File C. Coverage Measurement Research D. Accuracy of Coverage Measurement E. Record Linkage Error-Rate Estimation Methods F. Modeling Successive Enumerator Contacts for Nonresponse Follow-up (NRFU) G. Master Address File (MAF) Error Model and Quality Assessment H. Supplementing and Supporting Non-Response with Administrative Records I. Local Update of Census Addresses (LUCA) Program Improvement J. Identifying “Good” Administrative Records for 2020 Census NRFU Curtailment Targeting 6510301 Coding, Editing, and Imputation Study .................................................................................................. 1.00 A. Software Development (Tea) B. Software Analysis and Evaluation 6810304 Privacy and Confidentiality Study ........................................................................................................... .70 A. Privacy and Confidentiality for the 2020 Census B. Social Media Monitoring of Privacy and Confidentiality Concerns 6810305 Matching Process Improvement ............................................................................................................. 1.50 A. 2020 Unduplication Research 5385260 American Community Survey (ACS) ..................................................................................................... 2.48 A. ACS Applications for Time Series Methods B. ACS Imputation Research and Development C. Data Analysis of ACS CATI-CAPI Contact History D. Assessing Uncertainty in ACS Ranking Tables DEMOGRAPHIC DIRECTORATE Demographic Statistical Methods Division (Sponsor) Project Project Number Title FTEs TBA Tobacco Use Supplement (NCI) Small Domain Models.......................................................................... .08 TBA Special Project on Weighting and Estimation ....................................................................................... TBA

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY 2013 THIRD QUARTER REPORT

–April through June 2013 – COLLABORATION DECENNIAL DIRECTORATE Decennial Management Division/Decennial Statistical Studies Division/American Community Survey Office (Sponsors) Project Project Number Title FTEs 5610302 Statistical Design and Estimation ........................................................................................................... 2.10 6410301 Automating Field Activities (Infra & Opers) .......................................................................................... 2.21 6410307 Non-ID Processing .................................................................................................................................. .25

A. Decennial Record Linkage B. Synthetic Decennial Microdata File C. Coverage Measurement Research D. Accuracy of Coverage Measurement E. Record Linkage Error-Rate Estimation Methods F. Modeling Successive Enumerator Contacts for Nonresponse Follow-up (NRFU) G. Master Address File (MAF) Error Model and Quality Assessment H. Supplementing and Supporting Non-Response with Administrative Records I. Local Update of Census Addresses (LUCA) Program Improvement J. Identifying “Good” Administrative Records for 2020 Census NRFU Curtailment Targeting

6510301 Coding, Editing, and Imputation Study .................................................................................................. 1.00 A. Software Development (Tea) B. Software Analysis and Evaluation

6810304 Privacy and Confidentiality Study ........................................................................................................... .70 A. Privacy and Confidentiality for the 2020 Census B. Social Media Monitoring of Privacy and Confidentiality Concerns

6810305 Matching Process Improvement ............................................................................................................. 1.50 A. 2020 Unduplication Research

5385260 American Community Survey (ACS) ..................................................................................................... 2.48 A. ACS Applications for Time Series Methods B. ACS Imputation Research and Development C. Data Analysis of ACS CATI-CAPI Contact History D. Assessing Uncertainty in ACS Ranking Tables

DEMOGRAPHIC DIRECTORATE Demographic Statistical Methods Division (Sponsor) Project Project Number Title FTEs TBA Tobacco Use Supplement (NCI) Small Domain Models .......................................................................... .08 TBA Special Project on Weighting and Estimation ....................................................................................... TBA

Page 2: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

Demographic Surveys Division (Sponsor) Project Project Number Title FTEs 0906/1442 Demographic Surveys Division Special Projects (CPS) ......................................................................... .64

A. Data Integration 7523013 National Crime Victimization Survey ..................................................................................................... .13 7523014 National Crime Victimization Survey ..................................................................................................... .12

A. Analyzing the Effects of Sample Reinstatement B. Analysis of Refresher Training Experiment C. Process Monitoring and Fitness for Use

Population Division (Sponsor) Project Project Number Title FTEs TBA Population Division Projects ................................................................................................................. TBA

A. Population Projections Social, Economic, and Housing Statistics Division (Sponsor) Project Project Number Title FTEs 1465444 Survey of Income and Program Participation Improvements Research ................................................. 1.20

A. Model-based Imputation for the Demographic Directorate 7165013 Social, Economic, and Housing Statistics Division Small Area Estimation Projects ............................. 2.20

A. Research for Small Area Income and Poverty Estimates (SAIPE) B. Small Area Health Insurance Estimates (SAHIE)

189115 Improving Poverty Measures/IOE .......................................................................................................... 2.64 A. Tract Level Estimates of Poverty from Multi-year ACS Data B. Small Area Estimates of Disability

ECONOMIC DIRECTORATE Project Project Number Title FTEs 2320354 Editing Methods Development ................................................................................................................ .25

A. Investigation of Selective Editing Procedures for Foreign Trade Programs 2320352 Time Series Research ............................................................................................................................ 2.39

A. Seasonal Adjustment Support B. Seasonal Adjustment Software Development and Evaluation C. Research on Seasonal Time Series - Modeling and Adjustment Issues D. Identifying Edits in the Quarterly Financial Report E. Supporting Documentation and Software for X-13ARIMA-SEATS

TBA Governments Division Project on Decision-Based Estimation.............................................................. TBA TBA Use of Big Data for Retail Sales Estimates ............................................................................................ TBA TBA Statistical Computing for LEHD Audit ................................................................................................. TBA CENSUS BUREAU Project Project Number Title FTEs 0381000 Program Division Overhead ................................................................................................................ 11.33

A. Center Leadership and Support B. Research Computing

Page 3: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

GENERAL RESEARCH AND SUPPORT Project Project Number Title FTEs 0351000 General Research and Support .............................................................................................................. 15.09 1871000 General Research ................................................................................................................................... 7.29 MISSING DATA, EDIT, AND IMPUTATION

A. Editing B. Editing and Imputation C. Missing Data and Imputation: Multiple Imputation Feasibility Study

RECORD LINKAGE A. Disclosure Avoidance for Microdata B. Noise Multiplication for Statistical Disclosure Control C. Record Linkage and Analytic Uses of Administrative Lists D. Modeling, Analysis and Quality of Data

SMALL AREA ESTIMATION A. Small Area Estimation B. Small Area Methods with Misspecification C. Visualization of Small Area Estimates

SURVEY SAMPLING-ESTIMATION AND MODELING A. Survey Productivity and Cost Analysis B. Household Survey Design and Estimation C. Sampling and Estimation Methodology: Economic Surveys D. The Ranking Project: Methodology Development and Evaluation E. Statistical Design for 2020 Planning, Experimentation, and Evaluations F. Sampling and Apportionment G. Interviewer-Respondent Interactions: Gaining Cooperation

STATISTICAL COMPUTING AND SOFTWARE A. R Users Group B. Web Scraping Feasibility Investigation

TIME SERIES AND SEASONAL ADJUSTMENT A. Seasonal Adjustment B. Time Series Analysis

EXPERIMENTATION, SIMULATION, AND MODELING A. Synthetic Survey and Processing Experiments B. Improved Nonparametric Tolerance Intervals C. Ratio Edits Based on Statistical Tolerance Intervals D. Data Visualization Study Group

SUMMER AT CENSUS RESEARCH SUPPORT AND ASSISTANCE PUBLICATIONS - Journal Articles, Publications - Books/Book Chapters - Proceedings Papers - Center for Statistical Research & Methodology Research Reports - Center for Statistical Research & Methodology Studies TALKS AND PRESENTATIONS CENTER FOR STATISTICAL RESEARCH & METHODOLOGY SEMINAR SERIES PERSONNEL ITEMS - Honors/Awards/Special Recognition - Significant Service to Profession - Personnel Notes

Page 4: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models
Page 5: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

1.1 STATISTICAL DESIGN AND ESTIMATION

(Decennial Project 5610302)

1.2 AUTOMATING FIELD ACTIVITIES (INFRA & OPENS)

(Decennial Project 6410301)

1.3 NON-ID PROCESSING (Decennial Project 6410307)

A. Decennial Record Linkage Description: Under this project, staff will provide advice, develop computer matching systems, and develop and perform analytic methods for adjusting statistical analyses for computer matching error with a decennial focus. Highlights: No significant progress this quarter. [See also project 1.2 (H) Supplementing and Supporting Non-Response with Administrative Records.] Staff: William Winkler (x34729), William Yancey, Joshua Tokle B. Synthetic Decennial Microdata File Description: In some cases, data users have an interest in the full microdata file whose disclosure is prohibited by law. We seek to produce synthetic individual records (microdata files) so that when we produce tables from them, the results are the same as the comparable publicly available tables from the original individual records. The synthetic individual records should be close to the underlying microdata while protecting confidentiality. The goal of this project is to produce synthetic microdata files from the decennial short form (now the American Community Survey) variables for block level geography. We are approaching the problem by using iterative proportional fitting and log linear models from fully cross-classified tables of short form variables and then creating synthetic microdata records by randomly sampling records using the estimated parameters. Highlights: No significant progress in this quarter. Staff: Martin Klein (x37856) C. Coverage Measurement Research Description: Staff members conduct research on model-based small area estimation of census coverage, and they consult and collaborate on modeling census coverage measurement (CCM).

Highlights: In Q3 of FY 2013, staff continued to use simulation testing to study three small-area models that were used for CCM Housing Unit Estimation in FY 2012. Staff tested different models in various combinations of generation and analysis, using simulated parameter estimates, and settled on an ideal parametrization for each of the generation models. Staff used preliminary data analysis to compare coverage properties of the methods and assess which method was most useful. Staff: Jerry Maples (x32873), Aaron Gilary, Ryan Janicki, Eric Slud D. Accuracy of Coverage Measurement Description: 2010 Census Coverage Measurement (CCM) Research conducts the research necessary to develop methodology for evaluating the coverage of the 2010 Census, including new research on feasibility of triple-system estimation methods. This includes planning, designing, and conducting the research, as well as analyzing and synthesizing the results to evaluate their accuracy and quality. The focus of this research is on the design of the Census Coverage Measurement survey and estimation of components of coverage error, with a secondary emphasis on the estimation of net coverage error. Overcount and undercount estimation has not been done separately for previous censuses because of the difficulty of obtaining adequate data for unbiased estimates. Highlights: In Q3 of FY 2013, staff collaborated with staff in the Center for Survey Measurement (CSM) on models for recall error for reported move dates in surveys. U.S. censuses and census coverage measurement surveys ask respondents to recall where they lived on Census Day, April 1. Some interviews are up to eleven months later for evaluations. Respondents may be asked when they moved to their current address. The assumption has been that respondents who move around April 1 are able to give correct answers. Error in recalling a move date may cause respondents to be enumerated at the wrong location in the census. To study the recall error, staff used data from a study sponsored by the Census that examined the accuracy of respondent’s memory associated with dates of moves by comparing the self-reported move month and year provided by the National Longitudinal Survey of Youth, 1997 (NLSY97) cohort to records in a commercial database. The NLSY97 interviews the cohort every year, but the study focused on responses the sample members gave in 2008 and 2009 when they were ages 23 to 29. The commercial database was not an ideal “gold standard” so both sources had their own error structure that presented challenges for a matching study.

1. COLLABORATION

1

Page 6: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

However, using regression models, the study found some evidence of memory error surrounding move dates. Results were presented at the 2013 International Total Survey Error Workshop. Staff: Mary Mulry (x31759), Eric Slud, Ryan Janicki E. Record Linkage Error-Rate Estimation Methods Description: This project develops methods for estimating false-match and false-nonmatch rates without training data and with exceptionally small amounts of judiciously chosen training data. It also develops methods/software for adjusting statistical analyses of merged files when there is linkage error. Highlights: In Q3 of FY 2013, staff received full access to Decennial Statistical Studies Division (DSSD) Census Coverage Measurement (CCM) files, preliminary documentation, software, and operational terminals in this center’s secure terminal room. We are still resolving anomalies in the documentation and the data files prior to doing research on error-rate estimation. We chose 2010 DSSD CCM because these were supposedly very similar to – but of higher quality than – earlier files that we had used for error-rate estimation research. Staff: William E. Winkler (x34729), William E. Yancey, Joshua Tokle, Tom Mule (DSSD), Lynn Imel (DSSD), Mary Layne (CARRA) F. Modeling Successive Enumerator Contacts for Nonresponse Followup (NRFU) Description: One facet of the NRFU operations analysis includes the number of contact attempts made and the impact on data quality. Current summaries of these data include the distribution of the number of contacts and the distribution of the final mode of contact. This project aims to provide additional assessments of the data, including the distribution of waiting times between successive contacts as well as the effect of these times on the eventual outcome. The intent of any additional analysis is to provide guidance regarding possible designed experiments in anticipation of the 2020 Decennial Census. Highlights: No significant progress this quarter. Staff: Derek Young (x36347) G. Master Address File (MAF) Error Model and Quality Assessment Description: The MAF is an inventory of addresses for all known living quarters in the U.S. and Puerto Rico. This project will develop a statistical model for MAF errors for housing units (HUs), group quarters (GQs), and transitory locations (TLs). This model, as well as an independent team, will be used to conduct independent

quality checks on updates to the MAF and to ensure that these quality levels meet the 2020 Census requirements. Highlights: In Q3 of FY 2013, staff determined that zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) regression models provide good frameworks for modeling the number of block-level adds and number of block-level deletes as functions of various housing unit, geographical, and demographic variables. We determined a reduced set of candidate predictors for the models based on both their statistical and practical significance in the models. Predictions from these models will be used for the multinomial cell proportions in the sample design being built for Field Test #22. In Q3, we also increased our collaboration with the Targeted Address Canvassing Research, Model, and Area Classification (TRMAC) group. A statistical modeling effort under discussion is the use of Bayesian spatial scanning for the purposes of identifying clusters of blocks having a high likelihood for change. Staff: Derek Young (x36347), Pete Davis (DSSD), Nancy Johnson (DSSD), Kathleen Kephart (DSSD) H. Supplementing and Supporting Non-Response with Administrative Records Description: This project researches how to use administrative records in the planning, preparation, and implementation of nonresponse followup to significantly reduce decennial census cost while maintaining quality. The project is coordinated by one of the 2020 Census Integrated Project Teams. Highlights: In Q3 of FY2013, staff continued to participate in the planning and preparation process for the research, especially the development of a Hard-To-Follow-up (HTF) index. Staff received a second tract-level file which included data related to average number of nonresponse followup (NRFU) contacts and NRFU conversion rates. Staff merged the second file with the tract-level file received earlier that was created from the block group level planning database. Staff later added several variables from the new tract level planning database. Staff looked at correlations among potential covariates and between potential covariates and outcome variables. For both Pearson and Kendall Tau b correlations, the covariates generally had a correlation with average number of NRFU contacts that was opposite in sign and roughly equal in absolute magnitude to the correlation with cumulative NRFU conversion rate after the second contact. Staff ran multiple linear regressions for three outcome variables (cumulative NRFU conversion rate after the second contact, average number of NRFU contacts, average number of person NRFU contacts for occupied housing units with household respondents) and multiple logistic regressions for cumulative conversion rate after the second contact. Staff produced a draft initial project report summarizing

2

Page 7: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

the initial results. Log of population density was the most important explanatory variable in the current set of covariates for the three outcome variables mentioned above and appears to provide most of the explanatory power available from the current set of covariates. Staff: Michael Ikeda (x31756), Mary Mulry I. Local Update of Census Addresses (LUCA) Program Improvement Description: The purpose of this project is to assess all facets of the LUCA program to identify cost effective changes, improve the quality of the Master Address File, and optimize the benefits derived by the Census Bureau and the participants. The project is coordinated by one of the 2020 Census Integrated Project Teams. Highlights: No significant progress this quarter. Staff: Michael Ikeda (x31756), Ned Porter J. Identifying “Good” Administrative Records for 2020 Census NRFU Curtailment Targeting Description: As part of the Census 2020 Administrative Records Modeling team, staff are researching scenarios of NRFU contact strategies and utilization of administrative records data. We want to identify scenarios that have reduction in NRFU workloads while still maintaining good census coverage. We are researching identification of “good” administrative records via models of the match between Census and administrative records person/address assignments for use in deciding which NRFU households to continue to contact and which to primary allocate. We are exploring various models, methods and classification rules to determine a targeting strategy that obtains good Census coverage – and good characteristic enumeration – with the use of administrative records. Highlights: In Q3 of FY 2013, staff continued to develop and assess classification methods for determining which administrative records are suitable for Census 2020 NRFU purposes. Staff participated in sharing research with the National Academy of Science advisory panel and with the Census 2020 communications team. Staff: Yves Thibaudeau (x31706), Darcy Steeg Morris

1.4 CODING, EDITING, AND IMPUTATION STUDY

(Decennial Project 6510301) A. Software Development (Tea) Description: Here we report applications of Tea software to the coming 2020 Census. For the broader project, see General Research 0351000 and 1871000, Statistical and Computing Software (A) Tea Software Development.

Highlights: No significant progress this quarter. Staff: Ben Klemens (x36864), Rolando Rodriguez, Yves Thibaudeau B. Software Analysis and Evaluation Description: This project will compare competing imputation methods for the 2020 Decennial Census. Staff will establish testing procedures for the comparison and will produce statistical and graphical output to inform any production-level decisions. The current donor-based imputation method will be tested along with numerous other methods, both from in-house software and from external sources (where feasible). Coordination with production divisions will help ensure that the procedures meet all the necessary production criteria. Highlights: No significant progress this quarter. Staff: Rolando Rodriguez (x31816), Ben Klemens, Yves Thibaudeau

1.5 PRIVACY AND CONFIDENTIALITY STUDY

(Decennial Project 6810304)

A. Privacy and Confidentiality for the 2020 Census Description: This project undertakes research to understand privacy and confidentiality concerns related to methods of contact, response, and administrative records use which are under consideration for the 2020 Census. Methods of contact and response under consideration include internet alternatives such as social networking, email, and text messages. The project objectives are to determine privacy and confidentiality concerns related to these methods, and to identify a strategy to address the concerns. Highlights: In Q3 of FY 2013, staff reviewed and provided recommendations on the experimental design for a planned upcoming national study of internet contact strategies. Staff: Martin Klein (x37856)

B. Social Media Monitoring of Privacy and Confidentiality Concerns Description: The purpose of this study is to investigate public perception on topics related to privacy and confidentiality. Using a social media listening textual mining tool, staff will access public conversation from social networking sites like Twitter to identify topics, themes, and sentiments relating to privacy and confidentiality. This study will enable staff to monitor changes over time in preparation for and implementation

3

Page 8: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

of advertising campaigns or other communications strategies for the 2020 census. Highlights: In Q3 of FY 2013, staff met regularly with the Social Media Subteam to assist in the development of research plans for the analysis of public discussions in social media. In these plans, staff outlined research questions and methodology in preparation for upcoming training sessions with representatives from the social media software company Sysomos, which will be used to conduct the study. Staff: Taniecea Arceneaux (x33440)

1.6 MATCHING PROCESS IMPROVEMENT

(Decennial Project 6810305)

A. 2020 Unduplication Research Description: The goal of this project is to conduct research to guide the development and assessment of methods for conducting nationwide matching and unduplication in the 2020 Decennial Census, future Censuses and other matching projects. Our staff will also develop and test new methodologies for unduplication. The project is coordinated by one of the 2020 Census Integrated Project Teams. Highlights: In Q3 of FY 2013, staff continued to participate in the planning and preparation process for the research. The study plan for the Project Team and other related documents have been revised. Staff provided input into an overview the Project Team put together for the Acting Director. The overviews from various Project Teams are being used as part of the Acting Director's preparation for a Congressional hearing. Staff has been assigned to a subgroup that will investigate modifying the existing Decennial area matching systems to match IRS persons to Census persons. Staff also continued to modify the matching system used in previous research for use on the 2010 Census Unedited File (CUF) and continued to look at results of preliminary matching runs of the CUF against itself. Preliminary results suggest that it may be useful to include the geographic distance between linked units when evaluating links. Staff: Michael Ikeda (x31756), Ned Porter, Bill Winkler, Bill Yancey, Joshua Tokle

1.7 AMERICAN COMMUNITY

SURVEY (ACS) (Decennial Project 5385260)

A. ACS Applications for Time Series Methods

Description: This project undertakes research and studies on applying time series methodology in support of the American Community Survey (ACS). Highlights: In Q3 of FY 2013, staff received comments from referees to a paper describing how to interpret multi-year estimates and made substantial revisions. Staff: Tucker McElroy (x33227) B. ACS Imputation Research and Development Description: The American Community Survey process of editing and post-edit data-review is currently time and labor intensive. It involves repeatedly submitting an entire collection year of micro-data to an edit-enforcement program (SAS software). After each pass through the edit-enforcement program, a labor-intensive review process is conducted by a staff of analysts to identify inconsistencies and quality problems remaining in the micro-data. Before the data are ready for public release, they have least three passes through the edit-enforcement program and three review processes by the analysts, taking upward of three months. The objective of this project is to experiment with a different strategy for editing, while keeping the same edit rules, and to assess if the new strategy can reduce the number of passes through the edit process and the duration of the review process.

Highlights: In Q3 of FY 2013, staff presented a proposed methodology based on the decision-theoretical concepts to assess the best data collection action. The methodology involves “real-time imputation” to assess the risk of an imputation versus more data collection efforts. Staff will present preliminary results at the 2013 Joint Statistical Meetings. Staff: Yves Thibaudeau (x31706), Chandra Erdman, Darcy Steeg Morris C. Data Analysis of ACS CATI-CAPI Contact History Description: The aim of this project is to reanalyze data on Computer Assisted Telephone Interview (CATI) and Computer Assisted Personal Interview (CAPI) contact histories so as to inform policy decisions on altering the control parameters governing termination of CATI contact attempts with a view to minimizing perceived harassment by the American Community Survey (ACS) sampled households without incurring large costs in lost CATI interviews or increased CAPI workload. Highlights: During Q3 of FY 2013, staff gave a Brown Bag seminar talk -- joint with Decennial Statistical Studies Division (DSSD) and American Community Survey Office (ACSO) staff working on this project -- on the first phase of the data analysis of CATI response in terms of contact history and reluctance. Staff did some additional analyses of the completeness of ACS

4

Page 9: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

questionnaire responses as a function of subgroups defined by contact-history, finding that completeness depended little on contact-history apart from whether the final response was recorded as a “sufficient partial” questionnaire. In addition, staff began discussions concerning a follow-on to this project that will study personal-interviewer (CAPI) responses to ACS in terms of subgroups defined dynamically by contact history. Staff: Eric Slud (x34991), Darcy Morris, Josh Tokle, Jerzy Wieczorek, Tom Petkunas, Debbie Griffin (ACSO), Chandra Erdman. D. Assessing Uncertainty in ACS Ranking Tables Description: This project presents results from applying statistical methods which provide statements of how good the rankings are in the ACS Ranking Tables (see The Ranking Project: Methodology Development and Evaluation Research Section under Projects 0351000 and 1871000). Highlights: In Q3 of FY 2013, staff focused on visualizations of results in ACS Ranking Tables. Staff: Tommy Wright (x31702), Martin Klein, Jerzy Wieczorek, Derrick Simmons

1.8 DEMOGRAPHIC STATISTICAL METHODS DIVISION SPECIAL PROJECTS

(Demographic Project TBA) A. Tobacco Use Supplement (NCI) Small Domain Models Description: In the first quarter of FY 2013, staff worked with Demographic Statistical Methods Division (DSMD) on a project for the National Cancer Institute (NCI), studying the relationship between smoking status and a range of geographic/demographic covariates. Using the Tobacco Use Supplement to the Current Population Survey (TUS-CPS), staff is assisting NCI toward making estimates of smoking related behavior using county-level or state-level dependent variables (e.g., percent of males, percent Hispanic, percent below poverty level). The goal is to identify where anti-smoking funds could best be directed. Highlights: In Q3 of FY13, staff met to discuss modeling strategies for the six smoking covariates, and tested the models, previously arcsine transformed, on the probability scale. Staff provided data to NCI for benchmarking estimates to the state level and for further testing of models with arcsine transformation removed. Staff also imported more recent TUS-CPS data and began using it for testing.

Staff: Aaron Gilary (x39660), Partha Lahiri (University of Maryland, Benmei Liu (NIH)

B. Special Project on Weighting and Estimation Description: This project involves regular consulting with Current Population Survey (CPS) Branch staff on design, weighting, and estimation issues regarding the CPS. Issues discussed include design strategy for systematic sampling intervals, for rotating panels, composite estimation, variance estimation, and also the possibility of altering CPS weighting procedures to allow for a single simultaneous stage of weight-adjustment for nonresponse and population controls. Highlights: In Q3 of FY 2013, staff continued regular discussions, wrote a Joint Statistical Meetings paper describing an optimization-based single-stage weight-adjustment methodology for CPS which will allow both the usual calibration to population controls but will also accomplish approximate calibration to other population totals formerly used in multi-stage weight adjustments. Staff also began specification of variables to use in the penalty terms used in approximate calibration, and did preliminary calculations exploring the extent to which current CPS methodology allows the balance equations used for early-stage weight adjustments to become only approximate in later stages. Staff: Eric Slud (x34991), Reid Rottach (DSMD), Christopher Grieves (DSMD), Yang Cheng (DSMD)

1.9 DEMOGRAPHIC SURVEYS DIVISION (DSD) SPECIAL PROJECTS

(Demographic Project 0906/1442) A. Data Integration Description: The purpose of this research is to identify microdata records at risk of disclosure due to publicly available databases. Microdata from all Census Bureau sample surveys and censuses will be examined. Potentially linkable data files will be identified. Disclosure avoidance procedures will be developed and applied to protect any records at risk of disclosure.

Highlights: No significant progress this quarter. Staff: Ned Porter (x31798), Lisa Singh (CDAR), Rolando Rodríguez

1.10 NATIONAL CRIME VICTIMIZATION SURVEY

(Demographic Project 7523013/7523014) A. Analyzing the Effects of Sample Reinstatement, Refresher Training Experiment, and Process Monitoring and Fitness for Use

5

Page 10: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

Analyzing the Effects of Sample Reinstatement Description: During 2010 and 2011, the National Crime Victimization Survey (NCVS) sample size was restored (increased) to previous levels. This, in conjunction with the realignment imposed by the closing of six Regional Offices, brought changes to interviewer workloads, with possible impact on victimization measures for households and persons. Through analysis of survey outcomes and paradata, we seek to quantify the effects of reinstatement and realignment on victimization rates.

Highlights: In Q3 of FY 2013, staff applied the models applied to two key paradata variables (household response rate, avg. screener time) and to two key survey outcomes (rate of household property crimes, rate of personal crimes). Models were refined and revised in response to input from Bureau of Justice Statistics. Findings were summarized in a written report that was distributed to members of the NCVS Data Review Panel and orally presented to the Panel in July. Analysis of Refresher Training Experiment Description: In 2011, an experiment was embedded within the NCVS. Teams of interviewers were randomly assigned to two cohorts. The first cohort received specialized training that was designed to improve the quality of the interview process, and the second cohort received the same training six months later. Through modeling of survey outcomes and paradata, we seek to quantify the effects of the so-called Refresher Training program on victimization rates. Highlights: In Q3 of FY 2013, a technical report was expanded to include estimated treatment effects on household response rates and average screener times. Additional revisions were made in response to input from the Bureau of Justice Statistics (BJS) and internal reviewers from the Decennial Management Division (DMD), Demographic Statistical Methods Division (DSMD), and our center. Process Monitoring and Fitness for Use Description: Information gathered from NCVS field operations is synthesized into variables that serve as indicators of data quality. In this project, we are developing classes of flexible models and graphical tools for describing how these variables evolve over time. These techniques are intended to help the Census Bureau and the Bureau of Justice Statistics staff to monitor the performance of field staff, to describe the effects of interventions on the data collection process, to quickly alert survey management to unexpected developments that may require remedial action, and to assess the overall quality of NCVS data and their fitness for use. Highlights: Our book was published and is available. Staff: Joe Schafer (x31823)

1.11 POPULATION DIVISION PROJECTS

(Demographic Project TBA) A. Population Projections Description: This project provides methodology and software to generate long-term forecasts for fertility, mortality, and migration data using vector time series techniques. Highlights: In Q3 of FY 2013, staff revised its research report describing methods and code for projecting fertility and mortality data by age and race group over a 50-year time horizon. Staff: Tucker McElroy (x33227), Osbert Pang, William Bell (R&M).

1.12 SURVEY OF INCOME AND PROGRAM

PARTICIPATION IMPROVEMENTS RESEARCH

(Demographic Project 1465444) A. Model-Based Imputation for the Demographic Directorate Description: Staff has been asked to review and improve ultimately all of the imputation methodology in demographic surveys, beginning with the Survey of Income and Program Participation and the Current Population Survey.

Highlights: In Q3 of FY 2013, staff was asked to review and improve ultimately all of the imputation methodology in demographic surveys, beginning with the Survey of Income and Program Participation (SIPP) and the Current Population Survey (CPS). Staff: Maria Garcia (x31703), Chandra Erdman, Ben Klemens, Yves Thibaudeau

1.13 SOCIAL, ECONOMIC, AND HOUSING STATISTICS DIVISION SMALL AREA

ESTIMATION PROJECTS (Demographic Project 7165013)

A. Research for Small Area Income and Poverty Estimates (SAIPE) Description: The purpose of this research is to develop, in collaboration with the Small Area Estimates Branch in the Social, Economic, and Housing Statistics Division (SEHSD), methods to produce “reliable” income and poverty estimates for small geographic areas and/or small demographic domains (e.g., poor children age 5-17 for counties). The methods should also produce realistic measures of the accuracy of the estimates (standard

6

Page 11: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

errors). The investigation will include assessment of the value of various auxiliary data (from administrative records or surveys) in producing the desired estimates. Also included would be an evaluation of the techniques developed, along with documentation of the methodology. Highlights: In Q3 of FY 2013, staff continued research on using multiple years of American Community Survey (ACS) data to examine whether we can obtain better (i.e., lower variances) for county level poverty rate estimates. This bivariate model has two variations. The first version uses the previous 5-year combined ACS dataset for the second equation. This 5-year model would take the place of the decennial long form data. Staff is investigating this model with two different data series (single year ACS 2011, 5-year ACS 2006-2010, and single year ACS 2010, 5-year ACS 2005-2009). Staff is preparing this work to be presented at the 2013 Joint Statistical Meetings. Staff: Jerry Maples (x32873), Jerzy Wieczorek, William Bell (R&M) B. Small Area Health Insurance Estimates (SAHIE) Description: At the request of staff from the Social, Economic, and Housing Statistics Division (SEHSD), our staff will review current methodology for making small area estimates for health insurance coverage by state and poverty level. Staff will work on selected topics of SAHIE estimation methodology, in conjunction with SEHSD. Highlights: No significant progress this quarter. Staff: Ryan Janicki (x35725)

1.14 IMPROVING POVERTY MEASURES/IOE

(Demographic Project 189115) A. Tract Level Estimates of Poverty from Multi-year ACS Data Description: This project is from the Development Case Proposal to improve the estimates of poverty related outcomes from the American Community Survey (ACS) at the tract level. Various modeling techniques, including model-based and model-assisted, will be used to improve on the design-based multi-year estimates currently produced by the ACS. The goal is to produce more accurate estimates of poverty and income at the tract level and develop a model framework that can be extended to outcomes beyond poverty and income. Highlights: In Q3 of FY 2013, staff improved the artificial population system's ACS-like sampling process by including additional raking and weighting adjustments, storing housing-unit-level samples, and

computing additional summary statistics on each sample. Staff also began investigating a more sophisticated nonresponse model to replace the initial simple version. Finally, staff ported the system to the “sae” server for use by Small Area Estimates Branch staff, who began work on developing “pseudotracts” – groups of Census tracts whose ACS samples can be pooled to create tract-sized groups in the artificial population. Staff: Jerry Maples (x32873), Jerzy Wieczorek, Ryan Janicki, Aaron Gilary, William Bell (R&M), Carolina Franco B. Small Area Estimates of Disability Description: This project is from the Development Case proposal to create subnational estimates of specific disability characteristics (e.g., number of people with autism). This detailed data is collected in a supplement of the Survey of Income and Program Participation (SIPP). However, the SIPP is only designed for national level estimates. This project is to explore small area models to combine SIPP with the large sample size of the American Community Survey to produce state and county level estimates of reasonable quality. Highlights: In Q3 of FY 2013, staff implemented a method introduced in Kim and Rao (2012, Biometrika) to produce state-level estimates of disability as defined by SIPP. Staff found two factors to be highly predictive of disability status, age and the number of ACS disability questions answered yes. Other demographic factors improved the fit of the model, through main effects and interactions. Staff also implemented both methods of variance estimation. However, the expected variance reduction was not realized and is currently being investigated. Staff plans to present this work at the 2013 Joint Statistical Meetings. Staff: Jerry Maples (x32873)

1.15 EDITING METHODS DEVELOPMENT (Economic Project 2320354)

A. Investigation of Selective Editing Procedures for Foreign Trade Programs Description: The purpose of this project is to develop selective editing strategies for the U.S. Census Bureau foreign trade statistics program. The Foreign Trade Division (FTD) processes more than five million transaction records every month using a parameter file called the Edit Master. In this project, we investigate the feasibility of using selective editing for identifying the most erroneous records without the use of parameters. Highlights: Previously, staff had developed score functions for selective editing of our foreign trade data. Our score functions include measures for how suspicious

7

Page 12: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

a record is and the potential impact errors in the suspicious records have on the estimated totals. In Q3 of FY 2013, we found the measure of the potential impact a given record has on estimated totals does not work as well when using the full data set. In this case, the size of the domain for computing totals is too large, except at the lowest levels of aggregations. Thus, the impact of any given record on the estimated totals within the domain becomes very small as the estimated totals grow very large. We developed a new score function that does not include a measure for potential impact; rather we compute a measure of the expected error in the variables and consider a global score function based on how suspicious a record is and the size of the anticipated error. Staff completed an evaluation study using four consecutive months of reported and edited exports data along with the FTD’s editing and imputation parameter file to simulate an application of selective editing to this larger test data file (more than 3.25 million records). Our results show rapidly decreasing pseudo-biases as the percentage number of records flagged for review increases. We also found the proportion of records ranked as highly suspicious that are true rejects to be large. This is an indication selective editing is performing well in terms of correctly tracking erroneous records. However, we also found that the proportion of false rejects is too large. We conclude selective editing is optimal for prioritizing manual review of rejects but not suitable for ranking suspicious records earlier in the editing process. Staff: Maria Garcia (x31703), Yves Thibaudeau, Andreana Able (FTD)

1.16 TIME SERIES RESEARCH

(Economic Project 2320352) A. Seasonal Adjustment Support Description: This is an amalgamation of projects whose composition varies from year to year but always includes maintenance of the seasonal adjustment and benchmarking software used by the Economic Directorate. Highlights: In Q3 of FY 2013, staff provided seasonal adjustment and X-13ARIMA-SEATS support to the following: Hispanic America, PayPal, Statistics New Zealand, Instituto Nacional de Estadística (Spain), Federal Office of Statistics (Switzerland), Reserve Bank of India, Bundesbank, Central American Monetary Council, Bureau of Labor Statistics, James Cook University (Australia), Ohio University. Staff from our center’s Time Series Research Group and the Office of Statistical Methods and Research for Economic Programs (OSMREP) Time Series Methods Staff met with economists from the Longitudinal

Employer-Household Dynamics (LEHD) staff to discuss the seasonal adjustment of quarterly LEHD series. Staff from the Time Series Research Group met again later with the LEHD staff to discuss recent research in multivariate seasonal adjustment, as well as other issues related to seasonal adjustment of multiple series. Staff: Brian Monsell (x31721), David Findley (Consultant) B. Seasonal Adjustment Software Development and Evaluation Description: The goal of this project is a multi-platform computer program for seasonal adjustment, trend estimation, and calendar effect estimation that goes beyond the adjustment capabilities of the Census X-11 and Statistics Canada X-11-ARIMA programs, and provides more effective diagnostics. This fiscal year’s goals include: (1) continue developing a version of the X-13ARIMA-SEATS program with accessible output and updated source code so that, when appropriate, SEATS adjustments can be produced by the Economic Directorate; (2) developing software system that provides a simulation environment for X-13ARIMA-SEATS seasonal adjustments called iMetrica; and (3) incorporating further improvements to the X-13ARIMA-SEATS user interface, output and documentation. In coordination and collaboration with the Time Series Methods Staff of the Office of Statistical Methods and Research for Economic Programs (OSMREP), the staff will provide internal and/or external training in the use of X-13ARIMA-SEATS and the associated programs, such as X-13-Graph, when appropriate. Highlights: In Q3 of FY 2013, staff continued development and testing of an updated version of X-13ARIMA-SEATS, Version 1.1, for eventual release to the public. Updates to the software include revised HTML output, corrections to a file containing a summary of SEATS results when 25 or more series are run with X-13ARIMA-SEATS, a new runtime option that allows XHTML output to be produced, and a new data format option (format=x13save, which is the same as format=x12save - both will be acceptable). In addition, defects to the software were corrected in the estimation of irregular regression models. Staff compiled an updated version of the TRAMO time series modeling software and developed a simulation interface to enable a study of TRAMO’s model identification procedure. Staff updated iMetrica in the following ways: we developed a data interface with FRED (Federal Reserve Economic Data) an developed a new multivariate direct filter approach (MDFA) construction tool for better statistical inference in signal extraction.

8

Page 13: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

Staff: Brian Monsell (x31721), Christopher Blakely C. Research on Seasonal Time Series - Modeling and Adjustment Issues Description: The main goal of this research is to discover new ways in which time series models can be used to improve seasonal and calendar effect adjustments. An important secondary goal is the development or improvement of modeling and adjustment diagnostics. This fiscal year’s projects include: (1) continuing research on goodness of fit diagnostics (including signal extraction diagnostics and Ljung-Box statistics) to better assess time series models used in seasonal adjustment; (2) studying the effects of model based seasonal adjustment filters; (3) studying multiple testing problems arising from applying several statistics at once; (4) determining if information from the direct seasonally adjusted series of a composite seasonal adjustment can be used to modify the components of an indirect seasonal adjustment, and more generally investigating the topics of benchmarking and reconciliation for multiple time series; (5) studying alternative models of seasonality, such as Bayesian and/or long memory models and/or heteroskedastic models, to determine if improvement to seasonal adjustment methodology can be obtained; (6) studying the modeling of stock holiday and trading day on Census Bureau time series; (7) studying methods of seasonal adjustment when the data is no longer univariate or discrete (e.g., multiple frequencies or multiple series); (8) studying alternative seasonal adjustment methods that may reduce revisions or have alternative properties; and (9) studying nonparametric methods for estimating regression effects, and their behavior under long range dependence and/or extreme values. Highlights: In Q3 of FY 2013, staff did the following research: (a) staff continued empirical studies and examples for non-nested model comparisons research; (b) staff met with external researchers to identify additional projects related to the visual significance graphical tool, and derived a general result on asymptotic properties of discrete Fourier transforms at non-Fourier frequencies; (c) staff continued work on spectral density estimated using lag windows with a fixed bandwidth proportion, deriving higher order approximations and developing the asymptotic critical values; and (d) staff derived signal extraction recursions useful for efficient computation of matrix formulas. Staff: Tucker McElroy (x33227), Christopher Blakely, Brian Monsell, Osbert Pang, William Bell (Research and Methodology Directorate), David Findley (Consultant) D. Identifying Edits in the Quarterly Financial Report Description: The Quarterly Financial Report gets frequently revised due to reporting error. This project utilizes statistical analysis of the revision time series across vintages to identify potential edit mistakes.

Highlights: In Q3 of F 2013 staff tested some preliminary methods which were later abandoned. Staff found that grouping across variables produced better estimates of revision error distributions. Staff continued investigating parametric and nonparametric approaches. Staff: Tucker McElroy (x33227), Osbert Pang E. Supporting Documentation and Software for X-13ARIMA-SEATS Description: The purpose of this project is to develop supplementary documentation and utilities for X-13ARIMA-SEATS that enable both inexperienced seasonal adjustors and experts to use the program as effectively as their backgrounds permit. This fiscal year’s goals include improving the X-13ARIMA-SEATS documentation, further developing the iMetrica software and documentation, and exploring the use of component and Java software developed at the National Bank of Belgium. Highlights: In Q3 of FY 2013, staff updated and improved the X-13ARIMA-SEATS reference manual to include information on new options and diagnostics. Staff updated the quick reference for the X-13ARIMA-SEATS program as well. A “getting started” guide to the X-13ARIMA-SEATS has been revised in preparation for its eventual release. Staff continued developing documentation for the different modules of the iMetrica software. Maintenance of the X-12-ARIMA and X-13ARIMA-SEATS websites continued to ensure that they follow standards established by the Census Bureau. Staff: Brian Monsell (x31721), Chris Blakely, David Findley, William Bell (Research and Methodology Directorate)

1.17 GOVERNMENTS DIVISION PROJECT ON DECISION-BASED ESTIMATION

(Economic Project TBA) Description: This project involves providing consultative work for Governments Division on point and variance estimation for total government employment and payrolls in the Survey of Public Employment and Payroll, within a framework of stratumwise GREG estimation, after possibly collapsing substrata of small versus large units according to the results of hypothesis tests on equality of regression slopes. Further design issues and small area estimation of totals within government-function subtypes are also discussed.

9

Page 14: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

Highlights: In Q3 of FY 2013, staff continued occasional discussions with GOVS Division staff on regression-based model-assisted estimation in the Annual Survey of Public Employment & Payroll (ASPEP) and on small-area estimation relating to domain estimates restricted to state-by-government-function subtypes. Staff: Eric Slud (x34991), Gauri Datta, Bac Tran (GOVS) 1.18 USE OF BIG DATA FOR RETAIL SALES

ESTIMATES (Economic Project TBA)

Description: In this project, we are investigating the use of “Big Data” to fill gaps in retail sales estimates currently produced by the Census Bureau. First Data, a global payment processing company, collects data on all electronic payment transactions (e.g. credit card and debit card transactions) from their merchant locations as a byproduct of the services they offer. We are interested in exploring possibilities of 1) using First Data tabulations to improve/enhance Census Bureau estimates of monthly retail sales - for example, validation and calibration, and 2) combining First Data tabulations with other Census Bureau data to potentially produce a new product - for example, producing estimates at smaller geographies. Highlights: No significant progress this quarter. Staff: Darcy Steeg Morris (x33989), Osbert Pang, Tommy Wright, Scott Scheleur (SSSD), Bill Davie Jr. (SSSD)

1.19 STATISTICAL CONSULTING FOR LEHD AUDIT

(Economic Project TBA)

Description: Our center staff will assist Economic Directorate staff with an internal audit of the Longitudinal Employer-Household Dynamics (LEHD) program, as needed, when specific statistical knowledge outside of the auditors’ area of expertise is required to determine compliance with Office of Management and Budget (OMB) and Census Bureau standards. Highlights: In Q3 of FY 2013, staff assisted Economic Directorate staff in auditing LEHD methodological documentation related to imputation and synthetic data. Staff contributed to the final audit report and attended the final audit meeting. Staff: Darcy Steeg Morris (x33989), Andrea Chamberlain (Economic Directorate)

1.20 PROGRAM DIVISION OVERHEAD (Census Bureau Project 0381000)

A. Center Leadership and Support This staff provides ongoing leadership and support for the overall collaborative consulting, research, and administrative operation of the center. Staff: Tommy Wright (x31702), Alisha Armas, Michael Hawkins, Michael Leibert, Erica Magruder, Joe Schafer, Eric Slud, Kelly Taylor, Sarah Wilson B. Research Computing Description: This ongoing project is devoted to ensuring that Census Bureau researchers have the computers and software tools they need to develop new statistical methods and analyze Census Bureau data. Highlights: During Q3 of FY 2013, the IT High Performance Computing (HPC) project team completed its evaluation of MRG, PBSPro, and Lustre. Based on the test results which are still being compiled, the team will choose which products will proceed to pre-production testing in the fourth quarter. Staff: Chad Russell (x33215)

10

Page 15: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

2.1 GENERAL RESEARCH AND SUPPORT (Census Bureau Project 0351000)

2.2 GENERAL RESEARCH

(Census Bureau Project 1871000) Missing Data, Edit, and Imputation Motivation: Missing data problems are endemic to the conduct of statistical experiments and data collection projects. The instigators almost never observe all the outcomes they had set to record. When dealing with sample surveys or censuses that means individuals or entities in the survey omit to respond, or give only part of the information they are being asked to provide. In addition the information provided may be logically inconsistent, which is tantamount to missing. To compute official statistics, agencies need to compensate for missing data. Available techniques for compensation include cell adjustments, imputation and editing. All these techniques involve mathematical modeling along with subject matter experience. Research Problems: Compensating for missing data typically involves explicit or implicit modeling. Explicit methods include Bayesian multiple imputation and propensity score matching. Implicit methods revolve around donor-based techniques such as hot-deck imputation and predictive mean matching. All these techniques are subject to edit rules to ensure the logical consistency of remedial product. Research on integrating together statistical validity and logical requirements into the process of imputing continues to be challenging. Another important problem is that of correctly quantifying the reliability of predictors that have been produced in part through imputation, as their variance can be substantially greater than that computed nominally. Potential Applications: Research on missing data leads to improved overall data quality and predictors accuracy for any census or sample survey with a substantial frequency of missing data. It also leads to methods to adjust the variance to reflect the additional uncertainty created by the missing data. Given the ever rising cost of conducting censuses and sample surveys, imputation and other missing-data compensation methods may come to replace actual data collection, in the future, in situations where collection is prohibitively expensive. A. Editing Description: This project covers development of methods for statistical data editing. Good methods allow us to produce efficient and accurate estimates and higher quality microdata for analyses.

Highlights: No significant progress this quarter. Staff: Maria Garcia (x31703) B. Editing and Imputation Description: Under this project, our staff provides advice, develops computer edit/imputation systems in support of demographic and economic projects, implements prototype production systems, and investigates edit/imputation methods. Highlights: In Q3 of FY 2013, staff began to write specifications for a “truth-deck” to evaluate the quality of the edits and imputations for the household characteristics for several edit-imputation methods under evaluation for Census 2020. Staff: Yves Thibaudeau (x31706), Maria Garcia, Martin Klein, Darcy Steeg Morris C. Missing Data and Imputation: Multiple Imputation Feasibility Study Description: Methods for imputing missing data are closely related to methods used for synthesizing sensitive items for disclosure limitation. One method currently applied to both issues is multiple imputation. Although the two issues may be addressed separately, techniques have been developed that allow data users to analyze data in which both missing data imputation and disclosure limitation synthesis have been accomplished via multiple imputation techniques (e.g., synthetic data). This project ascertains the effectiveness of applying multiple imputation to both missing data and disclosure limitation in the American Community Survey (ACS) group quarters data. Statistical models are used to generate several synthetic data sets for use within the multiple-imputation framework. Highlights: No significant progress this quarter. Staff: Rolando Rodriguez (x31816), Ben Klemens, Yves Thibaudeau Record Linkage Motivation: Record linkage is intrinsic to efficient, modern survey operations. It is used for unduplicating and updating name and address lists. It is used for applications such as matching and inserting addresses for geocoding, coverage measurement, Primary Selection Algorithm during decennial processing, Business Register unduplication and updating, re-identification experiments verifying the confidentiality of public-use microdata files, and new applications with groups of administrative lists. Significant theoretical and

2. RESEARCH

11

Page 16: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

algorithmic progress (Winkler 2004ab, 2006ab, 2008, 2009a; Yancey 2005, 2006, 2007, 2011) demonstrates the potential for this research. For cleaning up administrative records files that need to be linked, theoretical and extreme computational results (Winkler 2010, 2011b) yield methods for editing, missing data and even producing synthetic data with valid analytic properties and reduced/eliminated re-identification risk. Easy means of constructing synthetic make it straightforward to pass files among groups. Research Problems: The research problems are in three major categories. First, we need to develop effective ways of further automating our major record linkage operations. The software needs improvements for matching large sets of files with hundreds of millions records against other large sets of files. Second, a key open research question is how to effectively and automatically estimate matching error rates. Third, we need to investigate how to develop effective statistical analysis tools for analyzing data from groups of administrative records when unique identifiers are not available. These methods need to show how to do correct demographic, economic, and statistical analyses in the presence of matching error. Potential Applications: Presently, the Census Bureau is contemplating or working on many projects involving record linkage. The projects encompass the Demographic, Economic, and Decennial areas. A. Disclosure Avoidance for Microdata Description: Our staff investigates methods of microdata masking that preserves analytic properties of public-use microdata and avoid disclosure. Highlights: In Q3 of FY 2013, staff reviewed additional literature on differential privacy and one paper on the use of estimating equations for doing analyses on masked or synthetic data. Staff: William Winkler (x34729), William Yancey, Joshua Tokle B. Noise Multiplication for Statistical Disclosure Control Description: When survey organizations release data to the public, a major concern is the protection of individual records from disclosure while maintaining quality and utility of the data. Procedures that deliberately alter data prior to their release fall under the general heading of statistical disclosure control. This project develops and studies data analysis under noise perturbation in which data are multiplied by randomly drawn noise variables prior to release. Major goals include (1) developing procedures for drawing inference on population parameters based on noise multiplied data, and (2)

comparing these procedures with those based on synthetic data obtained by multiple imputation. Highlights: Work completed in Q3 of FY 2013 is as follows. Staff completed revising a paper entitled, “Statistical Analysis of Noise Multiplied Data Using Multiple Imputation.” In the paper, a new method is proposed for analyzing noise multiplied data using multiple imputation based techniques. The revised paper includes the following new material based on our recent work: (a) an empirical study of the amount of privacy protection provided by the new method, (b) a comparison of the proposed method with the synthetic data method, and (c) an outline of how to extend the method to multivariate data. The revised paper was re-submitted to the Journal of Official Statistics, and it was accepted for publication. Staff continued work on the development of a likelihood based method to analyze univariate data coming from either a normal, log-normal, or exponential population, where each original observation is perturbed by multiplicative noise for the purpose of statistical disclosure control. The method has been developed, and the accuracy of the resulting inference has been assessed through simulation. A manuscript describing this work entitled, “Likelihood Based Inference Under Noise Multiplication,” was prepared, and submitted to a journal for publication. Staff continued work on the development of a likelihood based method to analyze log-normally distributed data where any large value (exceeding a fixed threshold) is perturbed by multiplicative noise for the purpose of statistical disclosure control. For income data it is common for the large values to require privacy protection, and because income data are often log-normally distributed, we have focused on the log-normal in the present study. A likelihood based method was developed to analyze the privacy protected data under two types of data releases: (i) each released value includes an indicator of whether or not it has been noise multiplied, and (ii) no such indicator is provided. Through simulation study, the accuracy of inference under the proposed method was assessed. Because top coding and synthetic data methods are already available as disclosure control strategies for extreme values, some comparisons with the proposed method were made through a simulation study. Results under the proposed method were illustrated using an example from 2000 U.S. Current Population Survey data. Staff began writing a manuscript entitled, “Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-normal Samples,” to document this work. Staff: Martin Klein (x37856), Bimal Sinha (CDAR), Thomas Mathew

12

Page 17: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

C. Record Linkage and Analytic Uses of Administrative Lists Description: Under this project, staff will provide advice, develop computer matching systems, and develop and perform analytic methods for adjusting statistical analyses for computer matching error. Highlights: No significant progress this quarter. Staff: William Winkler (x34729), William Yancey, Ned Porter D. Modeling, Analysis, and Quality of Data Description: Our staff investigates methods of the quality of microdata primarily via modeling methods and new software techniques that accurately describe one or two of the analytic properties of the microdata. Highlights: In Q3 of FY 2013, staff continued research on methods for estimating record linkage error rates both with and without training data. Staff continued research on methods for adjusting statistical analyses for linkage error. Staff received full access to Decennial Statistical Studies Division (DSSD) Census Coverage Measurement (CCM) data at the beginning of May but are still having difficulty in using the data due to certain anomalies in the files and in the documentation. Staff: William Winkler (x34729), William Yancey, Joshua Tokle, Ned Porter, Maria Garcia Small Area Estimation Motivation: Small area estimation is important in light of a continual demand by data users for finer geographic detail of published statistics. Traditional demographic surveys designed for national estimates do not provide large enough samples to produce reliable direct estimates for small areas such as counties and even most states. The use of valid statistical models can provide small area estimates with greater precision, however bias due to an incorrect model or failure to account for informative sampling can result. Research Problems: • Development/evaluation of multilevel random effects models for capture/recapture models. • Development of small area models to assess bias in synthetic estimates. • Development of expertise using nonparametric modeling methods as an adjunct to small area estimation models. • Development/evaluation of Bayesian methods to combine multiple models. • Development of models to improve design-based sampling variance estimates. • Extension of current univariate small-area models to

handle multivariate outcomes. Potential Applications: • Development/evaluation of binary, random effects models for small area estimation, in the presence of informative sampling, cuts across many small area issues at the Census Bureau. • Using nonparametric techniques may help determine fixed effects and ascertain distributional form for random effects. • Improving the estimated design-based sampling variance estimates leads to better small area models which assumes these sampling error variances are known. • For practical reasons, separate models are often developed for counties, states, etc. There is a need to coordinate the resulting estimates so smaller levels sum up to larger ones in a way that correctly accounts for accuracy. • Extension of small area models to estimators of design-base variance. A. Small Area Estimation Description: Methods will be investigated to provide estimates for geographic areas or subpopulations when sample sizes from these domains are inadequate. Highlights: A Weighted Likelihood Approach to Model-based Small Area Estimation with Unit-level Data Small area estimation uses area-level or unit-level models. Area-level models apply to direct survey estimates that are typically design consistent, leading to design consistent model predictions. Unit-level models, however, typically apply to survey microdata ignoring the sampling weights, not leading to design consistent model predictions. Kott, Rao and others developed methods to incorporate sampling weights in the unit-level normal nested error regression model to achieve design consistency. Staff is considering several pseudo-likelihoods incorporating sampling weights that apply to normal or non-normal data, including binary and count data. Using a Bayesian approach, staff intends to apply our method to American Community Survey data. Staff presented preliminary results of the proposed method at the Annual Meeting of the Statistical Society of Canada. Based on comments from the meeting, staff will revise our method and apply the result to poverty data from the ACS. A Finite Mixture Model to Accommodate Outliers in Area-level Small Area Models Staff is considering a modification of the Fay-Herriot model, using a two-component mixture of normal distribution for the random small area effects, to account for outliers in the data. Staff proposes to carry out a non-informative hierarchical Bayesian approach and an empirical best linear unbiased estimation of small area

13

Page 18: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

means. Our method will provide a robust yet simple alternative to the widely used Fay-Herriot model in various Census Bureau small area estimation projects. Staff has recently obtained some methodological results and plans to apply this new method to its Small Area Income and Poverty Estimates (SAIPE) project. A Bootstrap Approach to Small Area Estimation with Random Clusters Staff is considering a bootstrap approach to partition a group of small areas with possible non-identical random small area effects in an attempt to build flexible small area estimation method. In problems with a large number of small areas, such as county level or school district level applications, the standard Fay-Herriot model may not provide a good fit. It is believed that the small areas can be partitioned into several subgroups or clusters where the small area effects within a cluster are exchangeable. Staff plans to adapt the multiple testing ideas to identify the clusters, and subsequently develop robust estimators of small area means. Additionally, staff plans to conduct the estimation after identification of clusters via bootstrap approach. A Measurement Error Approach to Small Area Estimation Staff prepared a manuscript by modifying the Fay-Herriot method of small area estimation when covariates are obtained from a related or a different survey. Such covariates are subject to large sampling variation leading to non-ignorable measurement error. Staff will present this at the 2013Joint Statistical Meetings. Staff plans to apply this model in the Small Area Estimates of Disabilities Project which uses data from the ACS and the SIPP. By treating the ACS disability estimates as a covariate related to the SIPP estimates but subject to measurement (sampling) error, the formulation in the manuscript can be used to small area estimation of disability rates based on the SIPP measures of disability. Under certain scenarios, the new measurement error model becomes equivalent to a bivariate Fay-Herriot model. Small Area Estimation of Payroll and Employment Characteristics In the context of the Annual Survey of Public Employment and Payroll (ASPEP), the Government Statistics Division is interested in accurate estimation of salaries and hours of full-time and part-time employees at the state level for different government types and function codes of the employing government units. Staff found out in an earlier data set that within each state and government type, significant reduction of the mean squared error of regression estimates is realized if some of the other variables from the previous census are also used. Staff anticipates that further gain in reduction of the MSE may be possible by including even some of the other variables from ASPEP. Staff intends to build a two-

stage nested error regression model to investigate the extent of improvement over a standard univariate nested error regression model. Staff also intends to explore various decompositions of the small area random effects, when the small areas are defined by cross-classification of state and function type. Staff plans to consider both a hierarchical Bayes approach and an empirical best linear unbiased prediction approach. Staff: Jerry Maples (x32873), Aaron Gilary, Ryan Janicki, Jerzy Wieczorek, Gauri Datta B. Small Area Methods with Misspecification Description: In this project, we undertake research on area-level methods with misspecified models, primarily directed at development of diagnostics for misspecification using robust sandwich-formula variances, cross-validation, and others, and on Bayesian estimation of model parameters within two-component Fay-Herriot models . Highlights: In Q3 of FY 2013, staff continued discussions and did preliminary theoretical work related to testing for the need to accommodate (at least) two different error- or mean-structures within Fay-Herriot models, and to Bayesian estimation of model parameters within such a multi-component structure. Staff prepared a paper and talk for the Joint Statistical Meetings on this research. Staff: Eric Slud (x34991), Gauri Datta C. Visualization of Small Area Estimates Description: Methods are needed to display estimates for a large number of small areas or domains. Displays should accurately convey the level of statistical uncertainty and should guide readers in making comparisons appropriately between domains or over time. Highlights: In Q3 of FY 2013, staff developed and began testing a preliminary R package for dissemination of these display methods. Staff: Jerzy Wieczorek (x35725), Derrick Simmons Survey Sampling-Estimation and Modeling Motivation: The demographic sample surveys of the Census Bureau cover a wide range of topics but use similar statistical methods to calculate estimation weights. It is desirable to carry out a continuing program of research to improve the accuracy and efficiency of the estimates of characteristics of persons and households. Among the methods of interest are sample designs, adjustments for non-response, proper use of population

14

Page 19: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

estimates as weighting controls, small area estimation, and the effects of imputation on variances. The Economic Directorate of the Census Bureau encounters a number of issues in sampling and estimation in which changes might increase the accuracy or efficiency of the survey estimates. These include, but are not restricted to, a) estimates of low-valued exports and imports not currently reported, b) influential values in retail trade survey, and c) surveys of government employment. The Decennial Census is such a massive undertaking that careful planning requires testing proposed methodologies to achieve the best practical design possible. Also, the U.S. Census occurs only every ten years and is the optimal opportunity to conduct evaluations and experiments with methodologies that might improve the next census. Sampling and estimation are necessary components of the census testing, evaluations, and experiments. The scale and variety of census operations require an ongoing research program to achieve improvements in methodologies. Among the methods of interest are coverage measurement sampling and estimation, coverage measurement evaluation, evaluation of census operations, uses of administrative records in census operations, improvements in census processing, and analyses that aid in increasing census response. Research Problems: • How can methods making additional use of administrative records, such as model-assisted and balanced sampling, be used to increase the efficiency of household surveys? • Can non-traditional design methods such as adaptive sampling be used to improve estimation for rare characteristics and populations? • How can time series and spatial methods be used to improve ACS estimates or explain patterns in the data? • Can generalized weighting methods be implemented via optimization procedures that allow better understanding of how the various steps relate to each other? • Some unusual outlying responses in the surveys of retail trade and government employment are confirmed to be accurate, but can have an undesired large effect on the estimates - especially estimates of change. Procedures for detecting and addressing these influential values are being extended and examined through simulation to measure their effect on the estimates, and to determine how any such adjustment best conforms with the overall system of estimation (monthly and annual) and benchmarking. • What models aid in assessing the combined effect of all the sources of estimable sampling and nonsampling error on the estimates of population size? • How can administrative records improve census coverage measurement, and how can census coverage

measurement data improve applications of administrative records? • What analyses will inform the development of census communications to encourage census response? • How should a national computer matching system for the Decennial Census be designed in order to find the best balance between the conflicting goals of maximizing the detection of true duplicates and minimizing coincidental matches? How does the balance between these goals shift when modifying the system for use in other applications? • What can we say about the additional information that could have been obtained if deleted census persons and housing units had been part of the Census Coverage Measurement (CCM) Survey? Potential Applications: • Improve estimates and reduce costs for household surveys via the introduction of additional design and estimation procedures. • Produce improved ACS small area estimates through the use of time series and spatial methods. • Apply the same weighting software to various surveys. • New procedures for identifying and addressing influential values in the monthly trade surveys could provide statistical support for making changes to weights or reported values that produce more accurate estimates of month-to-month change and monthly level. The same is true for influential values in surveys of government employment. • Provide a synthesis of the effect of nonsampling errors on estimates of net census coverage error, erroneous enumerations, and omissions and identify the types of nonsampling errors that have the greatest effects. • Describe the uncertainty in estimates of foreign-born immigration based on American Community Survey (ACS) used by Demographic Analysis (DA) and the Postcensal Estimates Program (PEP) to form estimates of population size. • Improve the estimates of census coverage error. • Improve the mail response rate in censuses and thereby reduce the cost. • Help reduce census errors by aiding in the detection and removal of census duplicates. • Provide information useful for the evaluation of census quality. • Provide a computer matching system that can be used with appropriate modifications for both the Decennial Census and several Decennial-related evaluations. A. Survey Productivity and Cost Analysis Description: The Survey Productivity and Cost Analysis (SPCA) Group has been established as a cross-directorate analytic team to conduct methodological research toward the goal of continuous improvement in survey operational efficiency. The group will both initiate and respond to issues related to survey performance indicators including cost, data quality, and data collection progress, as they relate to survey design. Our Center is represented on this team along with other staff from the Research and

15

Page 20: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

Methodology Directorate, the Demographic Programs Directorate, the Decennial Directorate, the Center for Economic Studies (CES), the Field Directorate, and the Center for Survey Measurement (CSM). Highlights: During Q3 of FY 2013, members of the SPCA group wrote a draft paper entitled “An Analysis of Adaptive Sampling Procedures in the National Health Interview Survey.” Staff: Chandra Erdman (x31235) B. Household Survey Design and Estimation [See Project 5385260, Decennial Directorate – American Community Survey (ACS)] C. Sampling and Estimation Methodology: Economic Surveys Description: The Economic Directorate of the Census Bureau encounters a number of issues in sampling and estimation in which changes might increase the accuracy or efficiency of the survey estimates. These include estimates of low-valued exports not currently reported, alternative estimation for the Quarterly Financial Report, and procedures to address nonresponse and reduce respondent burden in the surveys. Further, general simulation software might be created and structured to eliminate various individual research efforts. An observation is considered influential if the estimate of total monthly revenue is dominated by its weighted contribution. The goal of the research is to find methodology that uses the observation but in a manner that assures its contribution does not dominate the estimated total or the estimates of period-to-period change. Highlights: In Q3 of FY 2013, staff continued collaborating with a team in the Economic Directorate on research to find methodology for detecting and treating influential values in economic surveys. Recent research has shown M-estimation suitable for application in the Census Bureau’s economic surveys. However, the algorithm for implementing the method requires setting initial values for several parameters, which affect the performance of the algorithm. The team conducted empirical analysis with 25 consecutive months of data from the Monthly Wholesale Trade Survey to identify the best way of setting the initial tuning constant, the most important parameter. The challenge has been to find a method flexible enough for a large number of industries in the monthly economic surveys, some of which are volatile while others are not. Also, seasonal effects are present in many industries. The M-estimation method has proven flexible enough to lend itself to a strategy for setting the variables that is systematic and conservative. Staff started documenting the method for setting parameters so that it can be sent for internal review and ultimately side-by-side testing.

Staff: Mary Mulry (x31759) D. The Ranking Project: Methodology Development and Evaluation Description: This project undertakes research into the development and evaluation of statistical procedures for using sample survey data to rank several populations with respect to a characteristic of interest. The research includes an investigation of methods for quantifying and presenting the uncertainty in an estimated ranking of populations. As an example, a series of ranking tables are released from the American Community Survey in which the fifty states and the District of Columbia are ordered based on estimates of certain characteristics of interest. Highlights: In Q3 of FY 2013, most work focused on preparation for coming talk at the International Statistical Institute Meeting. Staff discussed the topic of visualizing rankings with a SUMMER AT CENSUS scholar. Staff: Tommy Wright (x31702), Martin Klein, Jerzy Wieczorek, Derrick Simmons E. Statistical Design for 2020 Planning, Experimentation, and Evaluations Description: The purpose of this project is to investigate the use of social network methodology, tools, and software in the planning, preparation, and implementation of the decennial census, while reducing costs and maintaining quality. Highlights: No significant progress this quarter. Staff: Taniecea Arceneaux (x33440) F. Sampling and Apportionment Description: This short-term effort demonstrated the equivalence of two well-known problems – the optimal allocation of the fixed overall sample size among L strata under stratified random sampling and the optimal allocation of the H=435 seats among the 50 states for the apportionment of the U.S. House of Representatives following each decennial census. Highlights: In Q3 of FY 2013, work continued on the optimal allocation of samples given cost constraints. Staff: Tommy Wright (x31702), Pat Hunley G. Interviewer-Respondent Interactions: Gaining Cooperation Description: Survey nonresponse rates have been increasing, leading to concerns about the accuracy of (demographic) sample survey estimates. For example, from 1990 to 2004, initial contact nonresponse rates have approximately doubled for selected household sample surveys including the Current Population Survey (CPS)

16

Page 21: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

(from 5.7 percent to 10.1 percent). While mailout/mailback is a relatively inexpensive data collection methodology, decreases in mailback rates to censuses and sample surveys mean increased use of methodologies that bring respondents into direct contact with Census Bureau interviewers (e.g., field representatives) using CATI (computer assisted telephone interviewing) or CAPI (computer assisted personal interviewing). CAPI can include face-to-face or telephone contact. Unsuccessful interviewer-respondent interactions can lead to increased costs due to the need for additional follow-up, and can also decrease data quality. So, unsuccessful interviewer-respondent interactions should be minimized. This project will analyze data from 512 field representatives (interviewers) as part of an exploratory study, examining their beliefs regarding what works in gaining respondents’ cooperation and investigating associations with field representatives’ performance in terms of completed interview rates. We will also study associations between field representatives’ beliefs and what they say they do. Highlights: In Q3 of FY 2013, some limited analyses were performed to detect differences in responses among five different questionnaire types and to detect differences among the twelve regional offices. In general, few significant differences were found, and we will likely combine data for future analyses. Staff: Tommy Wright (x31702), Tom Petkunas Statistical Computing and Software Motivation: Modern statistics and computing go hand in hand, and new statistical methods need to be implemented in software to be broadly adopted. The focus of this research area is to develop general purpose software using sound statistical methods that can be used in a variety of Census Bureau applications. These application areas include: survey processing - editing, imputation, non-response adjustment, calibration and estimation; record linkage; disclosure methods; time series and seasonal adjustment; variance estimation; small-area estimation; and data visualization, exploratory data analysis and graphics. Also see the other sections in this document for more detail on some of the topics. Research Problems: • Investigate the current best and new statistical methods for each application. • Investigate alternative algorithms for statistical methods. • Determine how to best implement the statistical algorithms in software.

Potential Applications: • Anywhere in the Census Bureau where statistical software is used. A. R Users Group Description: The initial objective of the R User group is to identify the areas of the Census Bureau where R software is developed and those other areas that could benefit from such development. The scope of the topics is broad and it includes estimation, missing data methods, statistical modeling, Monte-Carlo and resampling methods. The ultimate goal is to move toward integrated R tools for statistical functionality at the Census Bureau. Initially the group will review basic skills in R and provide remedial instruction as needed. The first topic for deeper investigation is complex-survey infrastructure utilities, in particular an evaluation of the “Survey Package” and its relevance at the Census Bureau in the context of weighing, replication, variance estimation and other structural issues. Highlights: No significant progress this quarter. Staff: Yves Thibaudeau (x31706) Chad Russell B. Web Scraping Feasibility Investigation Description: The goal of this project is to investigate the feasibility of developing and implementing a Web scraping tool. This tool will collect publicly available information posted by businesses. Knowledge of this auxiliary information may be useful in improving estimates with economic data. Highlights: No significant progress this quarter. Staff: Chris Blakely (x31722) Time Series and Seasonal Adjustment Motivation: Seasonal adjustment is vital to the effective presentation of data collected from monthly and quarterly economic surveys by the Census Bureau and by other statistical agencies around the world. As the developer of the X-12-ARIMA Seasonal Adjustment Program, which has become a world standard, it is important for the Census Bureau to maintain an ongoing program of research related to seasonal adjustment methods and diagnostics, in order to keep X-12-ARIMA up-to-date and to improve how seasonal adjustment is done at the Census Bureau. Research Problems: • All contemporary seasonal adjustment programs of interest depend heavily on time series models for trading day and calendar effect estimation, for modeling abrupt

17

Page 22: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

changes in the trend, for providing required forecasts, and, in some cases, for the seasonal adjustment calculations. Better methods are needed for automatic model selection, for detection of inadequate models, and for assessing the uncertainty in modeling results due to model selection, outlier identification and non-normality. Also, new models are needed for complex holiday and calendar effects. • Better diagnostics and measures of estimation and adjustment quality are needed, especially for model-based seasonal adjustment. • For the seasonal, trading day and holiday adjustment of short time series, meaning series of length five years or less, more research into the properties of methods usually used for longer series, and perhaps into new methods, are needed. Potential Applications: • To the effective presentation of data collected from monthly and quarterly economic surveys by the Census Bureau and by other statistical agencies around the world. A. Seasonal Adjustment Description: This research is concerned with improvements to the general understanding of seasonal adjustment and signal extraction, with the goal of maintaining, expanding, and nurturing expertise in this topic at the Census Bureau. Highlights: In Q3 of FY 2013, staff: (a) continued implementation of multivariate seasonal adjustment software, by refining the R code to produce mean squared errors, and tested the method on ten retail series; (b) continued work on signal extraction revisions minimization, conducting numerous synthetic and empirical studies to assess the model fitting capabilities; (c) continued revising two papers on direct filter approach to signal extraction and seasonal adjustment; and (d) staff made revisions to a paper on seasonal adjustment for mixed frequency time series. Staff: Tucker McElroy (x33227) B. Time Series Analysis Description: This research is concerned with broad contributions to the theory and understanding of discrete and continuous time series, for univariate or multivariate time series. The goal is to maintain and expand expertise in this topic at the Census Bureau. Highlights: In Q3 of FY 2013, staff: (a) continued work on stable parametrizations of vector autoregressive time series models, and performed numerical studies of performance; (b) derived recurrence relations among autocovariances and crosscovariances of data and volatility series for GARCH processes; and (c) continued numerical studies and revisions to a paper on fitting

vector autoregressive models with constrained parameters. Staff: Tucker McElroy (x33227), David Findley (Consultant), Anindya Roy Experimentation, Simulation, and Modeling Motivation: Experiments at the Census Bureau are used to answer many research questions, especially those related to testing, evaluating, and advancing survey methods. A properly designed experiment provides a valid, cost-effective framework that ensures the right type of data is collected as well as sufficient sample sizes and power are attained to address the questions of interest. The use of valid statistical models is vital to both the analysis of results from designed experiments and in characterizing relationships between variables in the vast data sources available to the Census Bureau. Statistical modeling is an essential component for wisely integrating data from previous sources (e.g., censuses, sample surveys, and administrative records) in order to maximize the information that they can provide. Monte Carlo simulation techniques aid in the design of complicated experiments as well as the evaluation of complex statistical models. Research Problems: • Develop models for the analysis of measurement errors in Demographic sample surveys (e.g., Current Population Survey or the Survey of Income and Program Participation). • Develop methods for designed experiments embedded in sample surveys. Simulation studies can provide further insight (as well as validate) any proposed methods. • Assess feasibility of established design methods (e.g., factorial designs) in Census Bureau experimental tests. • Identify and develop statistical models (e.g., loglinear models, mixture models, and mixed-effects models) to characterize relationships between variables measured in censuses, sample surveys, and administrative records. • Assess the applicability of post hoc methods (e.g., multiple comparisons and tolerance intervals) with future designed experiments and when reviewing previous data analyses. Potential Applications: • Modeling approaches with administrative records can help enhance the information obtained from various sample surveys. • Experimental design can help guide and validate testing procedures proposed for the 2020 census. • Expanding the collection of experimental design procedures currently utilized with the ACS.

18

Page 23: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

A. Synthetic Survey and Processing Experiments Description: To improve operational efficiencies and reduce costs of survey processing, this project will simulate a survey, in which an artificial team of interviewers seek out an artificial set of respondents, to test alternative methods of allocating resources in the field and to test alternatives for the post-processing of the gathered survey data. When calibrated with survey paradata, the model may also serve as a test bed for new methods of missing data imputation. Highlights: No significant progress this quarter. Staff: Ben Klemens (x36864) B. Improved Nonparametric Tolerance Intervals Description: Nonparametric tolerance intervals can be used for a set of univariate data where no reasonable distributional assumption is made. For the nonparametric setup, tolerance intervals are typically constructed from the order statistics based on an independent and identically distributed sample. However, two primary issues with this approach are (i) the tolerance interval is typically conservative, thus resulting in wider intervals, and (ii) for a fixed sample size, there may not exist order statistics that satisfy the conditions of a tolerance interval. Interpolation and extrapolation procedures are proposed to handle these issues. For planning purposes and cost evaluations, various projects conducting test surveys (e.g., the American Community Survey) could benefit from calculating these improved nonparametric tolerance intervals for projecting statistical bounds on various characteristics measured by the survey (e.g., household income). Highlights: No significant progress this quarter. Staff: Derek Young (x36347), Thomas Mathew C. Ratio Edits Based on Statistical Tolerance Intervals Description: Ratio edit tolerances are bounds used for identifying errors in the data obtained by Economic Census Programs so that they can be flagged for further review. The tolerances represent upper and lower bounds on the ratio of two highly correlated items, and the bounds are used for outlier detection; i.e., to identify units that are inconsistent with the rest of the data. A number of outlier detection methods are available in the literature and can be used for developing ratio edit tolerances; however, statistical tolerance intervals have not been employed in the literature. This project is focused on the application of statistical tolerance for setting ratio edit tolerances. Highlights: In Q3 of FY 2013, staff developed an approach to properly apply statistical tolerance intervals when setting ratio edit tolerances. Staff focused on the

setting of normal-based tolerance intervals when errors are believed to be in both tails of the ratio distribution and Weibull-based tolerance intervals when the errors are believed to be in only one tail of the ratio distribution. Staff also applied their approach to data from the Annual Survey of Manufacturers. A manuscript has been prepared and is currently undergoing internal review. This research will also be presented at the Joint Statistical Meetings in August 2013. Staff: Derek Young (x36347), Thomas Mathew D. Data Visualization Study Group Description: This group meets to keep up to date with data visualization classes and events around the Census Bureau; to give each other feedback on ongoing projects; and to share advice on navigating the approvals process for visualization products that do not fall under standard classifications such as report, poster, etc. Highlights: In Q3 of FY 2013, staff continued disseminating lessons from visualization seminars and courses, including lectures by Hadley Wickham in April and Alberto Cairo in May. We used the Center for Applied Technology (CAT) Lab for a demonstration of the simplified map-making/visualization tool “Esri Maps For Office” and shared our experiences with related alternatives (Esri StoryMaps, JMP, etc.) that are approved for Census Bureau staff use. Staff: Jerzy Wieczorek (x32248), Tiffany Julian (SEHSD), Tom Petkunas, Chandra Erdman Summer at Census Description: Recognized scholars in the following and related fields applicable to censuses and large-scale sample surveys are invited for short-term visits (one to ten days) primarily between May and September: statistics, survey methodology, demography, economics, geography, social and behavioral sciences, and computer science. Scholars present a seminar based on their research and engage in collaborative research with Census Bureau researchers and staff. Scholars are identified through an annual Census Bureau-wide solicitation by the Center for Statistical Research and Methodology. Highlights: In Q3 of FY 2013, we worked with Census Bureau staff members to identify and host over twenty-five 2013 SUMMER AT CENSUS scholars over a variety of topics. Some are listed in Section 5 of this report. Staff: Tommy Wright (x31702), Michael Leibert

19

Page 24: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

Research Support and Assistance This staff provides substantive support in the conduct

of research, research assistance, technical assistance, and secretarial support for the various research efforts. Staff: Alisha Armas, Erica Magruder, Kelly Taylor

20

Page 25: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

3. PUBLICATIONS 3.1 JOURNAL ARTICLES, PUBLICATIONS Klein, M. and Sinha, B. (In Press). “Statistical Analysis of Noise Multiplied Data Using Multiple Imputation,”

Journal of Official Statistics. Klein, M. and Linton, P. (In Press). “On a Comparison of Tests of Homogeneity of Binomial Proportions,” Journal

of Statistical Theory and Applications. Lorenc, B., Loosveldt, G., Mulry, M. H., and Wrighte D. (2013). “Understanding and Improving the External

Survey Environment of Official Statistics. Survey Methods: Insights from the Field.” Swiss Foundation for Research in Social Sciences. Lausanne, Switzerland. Retrieved from <http://surveyinsights.org/?p=161>.

Mathew, T. and Young, D. (2013). “Fiducial-Based Tolerance Intervals for Some Discrete Distributions,”

Computational Statistics and Data Analysis, 61, 38-49. 3.2 BOOKS/BOOK CHAPTERS 3.3 PROCEEDINGS PAPERS World Congress of Statistics, International Statistical Institute, Hong Kong, August 25 - August 31, 2013.

• Monsell, B. C. and Blakely, C. D., “X-13ARIMA-SEATS and iMetrica.” • Wright, T., Klein, M., and Wieczorek, J., “An Overview of Some Concepts for Potential Use in Ranking

Populations Based on Sample Survey Data.”

3.4 CENTER FOR STATISTICAL RESEARCH & METHODOLOGY RESEARCH REPORTS <http://www.census.gov/srd/www/byyear.html> 3.5 OTHER REPORTS

21

Page 26: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

4. TALKS AND PRESENTATIONS

2013 International Total Survey Error Workshop. Ames, IA, June 2-4 2013. • Mulry, Mary H. Elizabeth M. Nichols, and Jennifer Hunter Childs, Parvati Krishnamurty. “Evaluating

Recall Error in Survey Reports of Move Dates through a Comparison with Records in a Commercial Database.” Invited.

Washington Statistical Society, Rockville, MD, June 4, 2013.

• Winkler, William E., “Background and Research in Methods for Adjusting Statistical Analyses for Record Linkage Error.”

2013 Joint Conference by the International Chinese Statistical Association (ICSA) and the International Society for Biopharmaceutical Statistics (ISBS), Bethesda, MD, June 9-12, 2013.

• Klein, Martin, “Imputation for Nonmonotone Nonresponse in the Survey of Industrial Research and Development.”

22

Page 27: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

5. CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY SEMINAR SERIES

Roderick Little, University of Michigan & U.S. Census Bureau, “Partially-Missing at Random and Ignorability for Inferences about Subsets of Parameters with Missing Data,” April 24, 2013. Tommy Wright, CSRM, U.S. Census Bureau, “The Equivalence of Neyman Optimum Allocation for Sampling and Equal Proportions for Apportioning the U.S. House of Representatives,” April 30, 2013. Patrick Zimmerman (U.S. Census Bureau Dissertation Fellow), University of Minnesota, “Finite Population Sampling and Multiple Stratifications,” May 21, 2013. Paul Ohm, University of Colorado Law School/Federal Trade Commission, SUMMER AT CENSUS, “How Law and Policy Have Responded (and Should Yet Respond) to the Failure of Anonymization,” May 30, 2013. Rebecca Steorts, Carnegie Mellon University, SUMMER AT CENSUS, “Will the Real Steve Fienberg Please Stand Up: Getting to Know a Population From Multiple Incomplete Files,” June 4, 2013. Tucker McElroy, CSRM, U.S. Census Bureau, “A Multivariate Seasonal Adjustment of Regional Housing Starts,” June 4, 2013. Xiaofeng Shao, University of Illinois at Urbana-Champaign, SUMMER AT CENSUS, “Self-normalization,” June 12, 2013. Barry Graubard, National Cancer Institute, SUMMER AT CENSUS, “Conditional Logistic Regression with Survey Data,” June 13, 2013. Andrew Gelman, Columbia University, SUMMER AT CENSUS, “Choices in Statistical Graphics: My Stories,” June 13, 2013. Andrew Gelman, Columbia University, SUMMER AT CENSUS, “Weakly Informative Priors,” June 14, 2013. Omer Ozturk, The Ohio State University, SUMMER AT CENSUS, “Estimation of Population Mean and Total in a Finite Population Setting Using Multiple Auxiliary Variables,” June 18, 2013. Zhiqiang Tan, Rutgers University, SUMMER AT CENSUS, “Improved Shrinkage Estimation with Applications,” June 20, 2013. Glen Meeden, University of Minnesota, SUMMER AT CENSUS, “Objective Stepwise Bayes Weights in Survey Sampling,” June 25, 2013. Yong Ming Jeffrey Woo (U.S. Census Bureau Dissertation Fellow), The Pennsylvania State University, “Optimization and Statistical Estimation for the Post Randomization Method,” June 26, 2013.

23

Page 28: CENTER FOR STATISTICAL RESEARCH AND METHODOLOGY FY … Q3.pdf · the models based on both their statistical and practical significance in the models. Predictions from these models

6. PERSONNEL ITEMS 6.1 HONORS/AWARDS/SPECIAL RECOGNITION 6.2 SIGNIFICANT SERVICE TO PROFESSION Taniecea Arceneaux • Refereed a paper for Social Science Computer Review. Martin Klein • Refereed a paper for Current Bioinformatics. • Member, Ph.D. Dissertation in Statistics Committee, University of Maryland, Baltimore County. Mary H. Mulry • Vice President, American Statistical Association. • Associate Editor, Journal of Official Statistics. • Participant, NAS Experts Meeting to review NASS methodology. Eric Slud • Associate Editor, Journal of the Royal Statistical Society, Series B. • Associate Editor, Lifetime Data Analysis. • Associate Editor, Journal of Survey Statistics and Methodology. • Associate Editor, Biometrika. • Member, Washington Statistical Society Morris Hansen Lecture Selection Committee. William Winkler • Associate Editor, Journal of Privacy and Confidentiality. • Associate Editor, Transactions on Data Privacy. • Member group under the auspices of the Royal Academy advising the UK government on “Data Linkage and

Anonymisation” in the social sciences. • Member, Ph.D. Dissertation in Statistics Committee, Carnegie-Mellon University.

Tommy Wright • Associate Editor, The American Statistician. • Member, Advisory Board, Department of Mathematics and Statistics, Georgetown University. • Participant, INGenIOuS Workshop (on The Nation’s Mathematical Sciences Workforce), National Science

Foundation, Mathematical Association of America, American Statistical Association, American Mathematical Society, and Society for Industrial and Applied Mathematics.

Derek Young • Refereed a paper for the Journal of Educational and Behavioral Statistics.

6.3 PERSONNEL NOTES

24