workshop overview: transparency and inference for big data micah altman director of research mit...

20
Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series MIT December 2015 Transparency and Inference for Big Data 1

Upload: caitlin-montgomery

Post on 17-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Transparency and Inference for Big Data Credits & Disclaimers 3

TRANSCRIPT

Page 1: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data

1

Workshop Overview: Transparency and Inference for Big Data

Micah AltmanDirector of Research

MIT Libraries

Prepared for

Census-MIT Big Data Workshop Series

MITDecember 2015

Page 2: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data2

Roadmap Workshop series:

Challenges of big data forofficial statistics

What to expect today and tomorrow Big Data

Challenges

Acquisition

Access

Governance

Protection

Analysis

Page 3: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data3

Credits&

Disclaimers

Page 4: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data4

DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators

Secondary disclaimer:

“It’s tough to make predictions, especially about the future!”

-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx,

Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. 

Page 5: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data5

Collaborators & Co-Conspirators

Workshop Series Organizers US Census

Cavan Capps, Ron Prevost

MIT Micah Altman

Workshop Co-Organizers (US Census) Peter Miller Benjamin Reist Michael Thieme

Research Support Supported by the U.S. Census Bureau

Page 6: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data6

Related Work

Main Project: Census-MIT Big Data Workshop Series

projects.informatics.mit.edu/bigdataworkshops Related publications:(Reprints available from: informatics.mit.edu ) Altman M, Capps C, Prevost R. Using New Forms of Information for Official Economic Statistics --

Examining the Commodity Flow Survey: Executive Summary from the 1rst Workshop in the MIT Big Data Workshop Series. SSN: Social Science Research Network [Internet]. Working Paper.

Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study: Request for Information.”

Altman, M Altman M, Wood A, O'Brien D, Gasser M., Vadhan, S. Towards a Modern Approach to Privacy-Aware Government Data Releases. Berkeley Journal of Technology Law. Forthcoming.

Altman M, McDonald MP. 2014. Public Participation GIS : The Case of Redistricting. Proceedings of the 47th Annual Hawaii International Conference on Systems Science .

Page 8: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data8

Workshop Series:Big Data and

Official Statistics

Page 9: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data9

Trends and Challenges Trends

Increasingly data-driven economy Individuals are increasingly mobile Technology changes data uses Stakeholder expectations are changing Agency budgets and staffing remain flat.

The next generation of official statistics Utilize broad sources of information Increase granularity, detail, and timeliness Reduce cost & burden Maintain confidentiality and security

Multi-disciplinary challenges : Computation, Statistics, Informatics, Social Science, Policy

Page 10: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data10

Workshops and OutcomesAcquisition ChallengesUsing New forms of Information for Official Economic Statistics[August 3-4]

Privacy ChallengesLocation Confidentiality and Official Surveys [November 30-Dec 1]

Inference ChallengesTransparency and Inference

[December 7-8]

Expected outcomes:

Workshop reports (September, January)

Integrated white paper(February)

Identifying new opportunities for statistical agencies

Inform the Census Big Data Research Program.

Page 11: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data11

Themes from Workshop 1: Big Data Sources Broad new sources of information have the potential to enhance official

statistics increased granularity & detail increased timeliness reduced burdens

Incorporating big data creates challenges Acquisition challenges Management, confidentiality and governance challenges Analytic challenges

Incorporating big data into statistical agencies will require adaptation: Agencies will need to broaden from data collection to information provisioning. Agencies will require different sources of data to support different types of

decisions. Agencies will need to develop more extensive relationships with business

stakeholders. Agencies have the potential to take on new roles with respect to big data source,

as… standards leaders certification authorities clearinghouses infrastructure for durable, trusted access

Page 12: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data12

Themes from Workshop 2: Big Data Privacy Value of Census Reputation

Reputation to census is a primary concern Reputation affects willingness to participate cost of participation Reliability & transparency is needed for official statistics to serve their policy

purpose To ensure accountability of process and programs To create a public data good – where results can be accepted across multiple sectors To support reliable inferences for a range of purposes

Consider data needs in terms of computations Source of big data may not be willing to distribute data directly Sources of big data may not be able to distribute all data directly – typically

internally distributed and reaggregated Access through computation

Custom / private API’s could provide the analytics needed Where privacy and security are challenges Secure Multi-Party Computing methods could

be used in place of trusted systems Characterizing risks and harms

official statistics reflect an implicit harm/benefit balance –although not legally framed explicitly

need to move from binary measures (identification) to formal measures census could be a leader -- Many countries/industries/states use aggregation or

suppression with no formal risk/harm characterization

Page 13: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data13

What to ExpectToday,

Tomorrow,& Beyond

Page 14: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data14

Workshop ScheduleMonday12:00 Lunch and Introductions1:00 Workshop Overview 1:15 Overview of SIPP 2:15 Overview of Census Needs for Reliable and

Transparent Inference3:00 Coffee3:30 Preliminary Discussion of Workshop Questions4:00 Challenges in Extracting Information from Big

Data 4:45 Transparency Challenges5:15 Discussion & Provocations6:00 Transportation to Hotel 7-10 Hosted Dinner

Tuesday 8:30 Breakfast9:00 Recap / Review of Days 9:15 Overview of Census Uses – Implications for

Inference 10:15 Discussion: Key Challenges and

Opportunities11:30 Lunch1:00 Emerging Approaches to Using Big Data

in Official Statistics2:00 Discussion: Potential approaches to

reliable, transparent & reproducible inference with Big Data

3:00 Coffee3:30 Synthesis and next steps4:30 Taxis leave for airport5:00 (Optional) Beer/snacks and informal

chat for those staying over in Boston

Page 15: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data15

Workshop Questions What are the errors and

biases in the collection, cleaning, editing, assembly, linking and other operations that affect Big Data utility?

How can bias, construct validity, and reliability be measured and evaluated?

What methods are most promising for discovering relationships that are substantively interesting, statistically reliable, and causally plausible?

What are methods for

ensuring transparency and replicability with big data sources? How do we detect dependencies among data sources?

How can the integrity and authenticity, of official statistics be maintained when integrating big data from outside sources?

How should we assess the quality of Big Data information for different official statistics uses?

Page 16: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data16

Use Cases Survey of Income and Program

Participation

Use cases may focus discussion – they should not limit discussion

Page 17: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data17

What will be Shared Chatham-House Rules

When a meeting, or part thereof, is held under the Chatham House Rule, participants are free to use the information received, but neither the identity nor the affiliation of the speakers, nor that of any other participant, may be revealed.

Please do not name individuals or companies in social media, etc. What’s Public

Ideas/information shared(We will be taking notes and recording – but only for summary reports)

Formal presentations Attendance & Participant List (unless opted-out) Attribution – when requested/verified (opt-in)

Future Outputs Draft summary report from workshop [December]

Circulated to participants for comments Public Summary of Report [January]

Including corrections and attribution where requested White Paper – Series Summary & Synthesis [February]

To appear on project site

Page 18: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data18

Suggested Readings Reimsbach-Kounatze, C. (2015), “The Proliferation of

“Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing.

Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343 (14 March): 1203-1205. Copy at http://j.mp/1ii4ETo

Kreuter, Frauke, Marcus Berg, Paul Biemer, Paul Decker, Cliff Lampe, Julia Lane, Cathy O'Neil, and Abe Usher. AAPOR Report on Big Data. No. 4eb9b798fd5b42a8b53a9249c7661dd8. Mathematica Policy Research, 2015.

NRC, 2013, Frontiers in Massive Data Analysis, National Academies Press.

Page 19: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

19

Questions?E-mail: [email protected]

Web: informatics.mit.edu

Transparency and Inference for Big Data

Page 20: Workshop Overview: Transparency and Inference for Big Data Micah Altman Director of Research MIT Libraries Prepared for Census-MIT Big Data Workshop Series

Transparency and Inference for Big Data20

Creative Commons License

This work. by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.