the interplay between research innovation and federal statistical practice

26
The Interplay Between The Interplay Between Research Innovation Research Innovation and Federal and Federal Statistical Practice Statistical Practice Stephen E. Fienberg Department of Statistics Center for Automated Learning & Discovery Carnegie Mellon University Pittsburgh PA

Upload: mendel

Post on 21-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

The Interplay Between Research Innovation and Federal Statistical Practice. Stephen E. Fienberg Department of Statistics Center for Automated Learning & Discovery Carnegie Mellon University Pittsburgh PA. Some Themes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Interplay Between Research Innovation and Federal Statistical Practice

The Interplay Between The Interplay Between Research Innovation and Research Innovation and

Federal Statistical PracticeFederal Statistical Practice

Stephen E. Fienberg

Department of Statistics

Center for Automated Learning & Discovery

Carnegie Mellon University

Pittsburgh PA

Page 2: The Interplay Between Research Innovation and Federal Statistical Practice

Some ThemesSome Themes

• Methodological innovation comes from many places and in different colors and flavors.

• Importance of cross-fertilization.

• Cycle of innovation.

• Examples: Bayesian inference, Log-linear models, Missing data and imputation, X-11-ARIMA.

Page 3: The Interplay Between Research Innovation and Federal Statistical Practice

Some Themes (cont.)Some Themes (cont.)

• Statistical thinking for non-statistical problems:– e.g., disclosure limitation.

• Putting several innovations to work simultaneously:– New proposal for census adjustment.

• New challenges

Page 4: The Interplay Between Research Innovation and Federal Statistical Practice

Methodological Innovation in Methodological Innovation in Statistics Statistics

• Comes from many places and in different colors and flavors:– universities:

• statisticians.

• other methodologists.

– government agencies.– Private industry.– nonprofits, e.g., NISS.– committees at the NRC.

Page 5: The Interplay Between Research Innovation and Federal Statistical Practice

Role of Statistical ModelsRole of Statistical Models

• “All models are wrong, but some models are useful.” John Tukey?

• Censuses and surveys attempt to describe and measure stochastic phenomena:– 1977 conference story.– U.S. census coverage assessment:

• “Enumeration”.

• Accuracy and Coverage Evaluation (ACE) survey.

• Demographic Analysis.

Page 6: The Interplay Between Research Innovation and Federal Statistical Practice

Importance of Cross-Importance of Cross-FertilizationFertilization

• Between government and academia.• Among fields of application:

– e.g., agriculture, psychology, computer science

• Between methodological domains e.g., sample surveys and experimental design: Jan van den Brakel Ph.D. thesis, EUR

• Using other fields to change statistics e.g., cognitive aspects of survey design:Krosnick and Silver; Conrad and Shober

Page 7: The Interplay Between Research Innovation and Federal Statistical Practice

Cycle of InnovationCycle of Innovation

• Problem need

• Formal formulation and abstraction– implementation on original problem

• Generalizations across domains

• General statistical theory

• Reapplication to problem and development of extensions– reformulation

Page 8: The Interplay Between Research Innovation and Federal Statistical Practice

Example 1: Bayesian InferenceExample 1: Bayesian Inference

• Laplace– role of inverse probability.– ratio-estimation for population estimation.

• Empirical Bayes and shrinkage (borrowing strength) - hierarchical models.

• NBC election night prediction model.• Small area estimation.

Elliott and Little on census adjustment

Singh, Folsom, and Vaish

Page 9: The Interplay Between Research Innovation and Federal Statistical Practice

Example 2: Log-linear ModelsExample 2: Log-linear Models

• Deming-Stephan iterative proportional fitting algorithm (raking).

• Log-linear models and MLEs:– margins are minimal sufficient statistics.– implications for statistical reporting.

• Some generalizations: generalized IPF, graphical models, exponential families.

• Model-based inference for raking.

Page 10: The Interplay Between Research Innovation and Federal Statistical Practice

Graphical & Decomposable Graphical & Decomposable Log-linear ModelsLog-linear Models

• Graphical models defined in terms of simultaneous conditional independence relationships or absence of edges in graph.

Example:

• Decomposable models correspond to triangulated graphs.

7

2 3

41

2 5

6

Page 11: The Interplay Between Research Innovation and Federal Statistical Practice

Example 3: Missing Data and Example 3: Missing Data and ImputationImputation

• Missing data problems in survey and censuses.– solutions: cold deck and hot deck imputation

• Rubin 1974 framework representing missingness as random variable.

• 1983 CNSTAT Report, Madow et al (3 vol.).• Rubin (1997) - Bayesian approach of multiple

imputation.– NCHS and other agency implementations session on imputation and missing data

Page 12: The Interplay Between Research Innovation and Federal Statistical Practice

Example 4: X-11-ARIMAExample 4: X-11-ARIMA

• BLS seasonal adjustment methods:– X-11 and seasonal factor method (Shishkin et al., 1967)

• ARIMA methods a la Box-Jenkins (1970):– Cleveland and Tiao (1976).– X-11 + ARIMA =X-11/ARIMA (Dagum, 1978).

• Levitan Commission Report (1979).

• X-12-ARIMA.Burck and Salama

Page 13: The Interplay Between Research Innovation and Federal Statistical Practice

Statistical Thinking for Non-Statistical Thinking for Non-Statistical Problems Statistical Problems

• Disclosure limitation was long thought of as non-statistical problem.

• Duncan and Lambert (1986, 1989) and others explained statistical basis.

• New methods for bounds for table entries as well as new perturbation methods are intimately linked to log-linear models.

NISS Disclosure Limitation Project - Karr and Sanil

Page 14: The Interplay Between Research Innovation and Federal Statistical Practice

Query System Illustration Query System Illustration ((kk=3)=3)

Challenge: Scaling up approach for large k.

Page 15: The Interplay Between Research Innovation and Federal Statistical Practice

Bounds for Entries in Bounds for Entries in kk-Way -Way Contingency TableContingency Table

• Explicit formulas for sharp bounds if margins correspond to cliques in decomposable graph. (Dobra and Fienberg , 2000)– Upper bound: minimum of relevant margins.– Lower bound: maximum of zero, or sum of relevant

margins minus separators.• Example:

– cliques are {14}, {234}, {345}, {456}, {567}– separators are {4}, {34},

{45}, {56}7

2 3

41

2 5

6

Page 16: The Interplay Between Research Innovation and Federal Statistical Practice

More Bound ResultsMore Bound Results

• Log-linear model related extensions for– reducible graphical case. – if all (k-1) dimensional margins fixed.

• General “tree-structured” algorithm. Dobra (2000)

– Applied to 16-way table in a recent test.

• Relationship between bounds results and methods for perturbing tables.

• Measures of risk and their role.

Page 17: The Interplay Between Research Innovation and Federal Statistical Practice

NISS Table Server: 8-Way TableNISS Table Server: 8-Way Table

Page 18: The Interplay Between Research Innovation and Federal Statistical Practice

Census EstimationCensus Estimation

• 2000 census adjustment model uses variant of dual systems estimation, combining census and ACE data, to correct for – omissions.– erroneous enumerations (subtracted out before DSE

is applied).

• DSE is based on log-linear model of independence.UK and US papers on DSE adjustment methods

Page 19: The Interplay Between Research Innovation and Federal Statistical Practice

Sample

In Out Total

In a b n1

Out c d?? N -n1

Total n2 N - n2 N??N = n1 n2/a^

Census

Dual Systems SetupDual Systems Setup

Page 20: The Interplay Between Research Innovation and Federal Statistical Practice

Putting Several Innovations to Putting Several Innovations to Work SimultaneouslyWork Simultaneously

• Third system for each block based on administrative records - StARS/AREX

• Log-linear models allow us to model dependence between census and ACE:– in all K blocks have census and AREX data.– in sample of k blocks also have ACE data.– missing data on ACE in remaining K-k blocks.

Page 21: The Interplay Between Research Innovation and Federal Statistical Practice

Administrative List P A

Sample Sample P A P A

P 58 12 69 41 n = 268 A 11 43 34 -

Census

Triple System ApproachTriple System Approach

• Example of small minority stratum from 1988 St. Louis census dress rehearsal.

Page 22: The Interplay Between Research Innovation and Federal Statistical Practice

New Bayesian ModelNew Bayesian Model

• Bayesian triple-systems model:– correction for EEs in census and administrative

records.– missing ACE data in K-k blocks.– dependencies (heterogeneity) among lists.– smoothing across blocks and/or strata using

something like logistic regression model.

• Revised block estimates to borrow strength across blocks and from new source.

Page 23: The Interplay Between Research Innovation and Federal Statistical Practice

New ChallengesNew Challenges

Page 24: The Interplay Between Research Innovation and Federal Statistical Practice

Challenges of Challenges of Multiple-Media Survey Data Multiple-Media Survey Data

• Example: Wright-Patterson A.F. Base Sample Survey– Demographic data PLUS three 3-dimensional

full-body laser surface scans per respondent.– Interest in “functionals” measured from scans.

• Ultimately will have 50 MB of data for each of 10,000 individuals in U.S. and The Netherlands/Italy.

Page 25: The Interplay Between Research Innovation and Federal Statistical Practice

Some Methodological Issues Some Methodological Issues With Image DataWith Image Data

• Data storage and retrieval.

• Data reduction, coding, and functionals:– design vs. model-based inference.

• Disclosure limitation methodology.

Page 26: The Interplay Between Research Innovation and Federal Statistical Practice

Standing Statistics On Its HeadStanding Statistics On Its Head