making the case for metadata at srs-nsf national science foundation division of science resources...

50
Making the Case for Metadata at SRS-NSF National Science Foundatio Division of Science Resources Statisti Jeri Mulrow, Geetha Srinivasarao, and John Gawalt FedCASIC Workshops, BLS March 17, 2010 National Science Foundation Division of Science Resources Statistics www.nsf.gov/statistics/ 1

Upload: weston-kinman

Post on 14-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Making the Case for Metadata at SRS-NSF

National Science Foundation

Division of Science Resources Statistics

Jeri Mulrow, Geetha Srinivasarao, and John Gawalt

FedCASIC Workshops, BLSMarch 17, 2010

National Science FoundationDivision of Science Resources Statistics

www.nsf.gov/statistics/1

Page 2: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1984

National Science Foundation

Division of Science Resources Statistics

2

Page 3: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1,984

National Science Foundation

Division of Science Resources Statistics

3

Page 4: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1

National Science Foundation

Division of Science Resources Statistics

4

Page 5: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1 9

National Science Foundation

Division of Science Resources Statistics

5

Page 6: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1 9 8

National Science Foundation

Division of Science Resources Statistics

6

Page 7: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1 9 8 4

National Science Foundation

Division of Science Resources Statistics

7

Page 8: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Today’s Talk

National Science Foundation

Division of Science Resources Statistics

• A bit about SRS

•Historical perspective of data and metadata dissemination

• Metadata users and their metadata needs

• Standardization efforts

• Challenges and future vision

8

Page 9: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

A bit about the Division of Science Resources

Statistics (SRS)

National Science Foundation

Division of Science Resources Statistics

• Federal Statistical agency within NSF

• 11 periodic data collections on the U.S. Science and Engineering enterprise

• Data dating back to the 1950s

9

Page 10: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Historical Perspective of SRS data and metadata dissemination

National Science Foundation

Division of Science Resources Statistics

• 1950s – early 1990s paper only

• Detailed statistical tables withminimum metadata as footnotes

• Publications included Highlights about the survey Scope and method of survey Questionnaire Cover letters

10

Page 11: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example -- 1950s publication

National Science Foundation

Division of Science Resources Statistics

11

Page 12: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1990’s thru 2000’s

National Science Foundation

Division of Science Resources Statistics

• 1992 – electronic format

• Detailed statistical tables in spreadsheetswith minimum metadata as footnotes

• Kept paper, added electronic text Survey Methodology, Limitations to the data, Definitions, Historical revisions, List of tables

• PDF added Questionnaire, Cover letters, Instructions

12

Page 13: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example --1993 PDF

National Science Foundation

Division of Science Resources Statistics

13

Page 14: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example – 1991 Electronic spreadsheet

National Science Foundation

Division of Science Resources Statistics

14

Page 15: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example – 1991 text

National Science Foundation

Division of Science Resources Statistics

15

Page 16: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Today

National Science Foundation

Division of Science Resources Statistics

• Source data tables in Excel with footnotes

• HTML / PDF Highlights of the survey Links to references Survey description

• PDF Survey Questionnaire Instructions Definitions

16

Page 17: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example – 2007 Excel spreadsheet

National Science Foundation

Division of Science Resources Statistics

17

Page 18: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example -- 2007 SIRD1

National Science Foundation

Division of Science Resources Statistics

18

Page 19: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example – 2007 HTML

National Science Foundation

Division of Science Resources Statistics

19

Page 20: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example – 2007 PDF

National Science Foundation

Division of Science Resources Statistics

20

Page 21: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

BUT THAT’S NOT ALL

National Science Foundation

Division of Science Resources Statistics

• Electronic databases Create and download your own customized

aggregate tables

• Public use files Access to some microdata series

21

Page 22: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

National Science Foundation

Division of Science Resources Statistics

22

Page 23: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Metadata in WebCASPAR ….

National Science Foundation

Division of Science Resources Statistics

23

Page 24: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Metadata in WebCASPAR

National Science Foundation

Division of Science Resources Statistics

• Variable specific metadata available under Info link

• Metadata not tightly integrated with the data itself – does not get downloaded with the data

24

Page 25: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

WebCASPAR Taxonomy

National Science Foundation

Division of Science Resources Statistics

• Survey specific taxonomies

•NCES IPEDS Classification of Instructional program codes (CIP)

• Integrated taxonomy for querying across surveys

http://webcaspar.nsf.gov/

25

Page 26: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

National Science Foundation

Division of Science Resources Statistics

26

Page 27: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

National Science Foundation

Division of Science Resources Statistics

27

Page 28: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Metadata in SESTAT

National Science Foundation

Division of Science Resources Statistics

• Metadata Explorer is separate from the data Individual variable information

Description Question Domain/Availability – history Valid response categories Keywords

•Metadata is not tightly integrated with the data itself – it does not get downloaded with the data

28

https://sestat.nsf.gov/sestat/sestat.html

Page 29: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example -- Public Use file

National Science Foundation

Division of Science Resources Statistics

29

Page 30: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Example -- Public Use file

National Science Foundation

Division of Science Resources Statistics

30

Page 31: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Summary – Where are we?

National Science Foundation

Division of Science Resources Statistics

• Different surveys have evolved differently Varying levels of details/metadata

• Not in an standardized structure

Hodge-podge

31

Page 32: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

National Science Foundation

Division of Science Resources Statistics

32

Metadata Users & Their Metadata Needs

• Not a one-to-one relationship, but many-to-many

• They occur at all stages of the survey process

Page 33: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Process Data

National Science Foundation

Division of Science Resources Statistics

Define research objectives

Choose mode of collection Choose sampling frame

Construct and pretest questionnaire Design and select sample

Develop Survey Instrument Develop Sample Design

33

Survey Process

Source: Survey Methodology (2009) Groves, Fowler, Couper, Lepkowski, Singer & Tourangeau.

Recruit and measure sample

Code and edit data

Make postsurvey adjustments

Perform analysis

Define Scope

Collect Data

Disseminate Data

Page 34: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Define Scope

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User GeneralSurvey Manager TopicSubject Matter Expert Population of interestStatistician Other data sourcesSurvey Methodologist SpecificRespondent Frame options Sample design options Historical info/data User needs Federal Register notices

34

Page 35: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Develop Survey Instrument

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User QuestionsSurvey Manager Answer choicesSubject Matter Expert Definition of termsStatistician InstructionsSurvey Methodologist Logic flow of questionsRespondent Cognitive work Validity assessments Reliability assessments Functionality testing Alternative questions Instrument design specs – paper, web, CATI

Page 36: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Develop Sample Design

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User Population of interestSurvey Manager Sampling frame / Universe specsSubject Matter Expert Update scheduleStatistician Sample design specs Desired criteria Sample selection techniques Historical information on performance of designs Estimation methods

36

Page 37: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Collect Data

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User Variable names and formatsSurvey Manager Variable data typesSubject Matter Expert Physical storageStatistician Tables and relationshipsDatabase Administrators Mapping of questions toSoftware Developers variables and definitions Logic flow of questions Response rates over time Paradata Cover letter

37

Page 38: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Process Data

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User Item response ratesSurvey Manager Zero vs. null vs. missingSubject Matter Expert Edit specificationsStatistician Imputation specificationsDatabase Administrators Recode specificationsSoftware Developers Data table specifications Changes across survey cycles

38

Page 39: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Data Dissemination and Publication

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User History of changesSurvey Manager Methodology reportSubject Matter Expert Public use files with Statistician documentationDatabase Administrators Author/contact source Software Developers Who can access whatArchivist Type of product Content format URL; Keywords Relationships Metadata schema

39

Page 40: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Who are the Metadata Users?

National Science Foundation

Division of Science Resources Statistics

• Data users Basic & advanced Analysts General public

• Respondent• Survey Manager• Survey Methodologist• Statistician• Subject Matter Expert• Software Developer• Database Administrator• Archivist

40

Page 41: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Need for Standardization of Metadata

is Apparent

is Critical

National Science Foundation

Division of Science Resources Statistics

41

Page 42: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Standardization Efforts

National Science Foundation

Division of Science Resources Statistics

• Dublin Core

• SDMX (aggregate level)

• DDI 3.0 (record level)

42

Page 43: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Recent SRS Efforts

National Science Foundation

Division of Science Resources Statistics

• Data Repository (Oracle)

• Inclusion of some metadata

• SAS/ACCESS User Interface for internal users

• Evaluating external user interfaces

43

Page 44: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

SRS Efforts -- Working with Commercial Contractors

National Science Foundation

Division of Science Resources Statistics

• Requirements for Data / Metadata delivery

• Examples document

• Standard contracting language

• Checklist

44

Page 45: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

SRS AdoptedBasic Operating Procedures

National Science Foundation

Division of Science Resources Statistics

• Using Oracle to store microdata and metadata

• Collecting metadata in whatever format

• Keeping it all organized

45

Page 46: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Challenges

National Science Foundation

Division of Science Resources Statistics

• Getting all the players on the same page Many different users Many different uses Many different providers Many different products Many different formats

• Cost

• Keeping it all straight

46

Page 47: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Near Future Vision

National Science Foundation

Division of Science Resources Statistics

SRS Data Repository

Data and Metadata

Taxonomy Efforts

Data & Metadata

Dissemination

Analytic tools

DDI 3.0, SDMX…

47

Page 48: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Near Future Vision

National Science Foundation

Division of Science Resources Statistics

SRS Data Repository

Data and Metadata

Taxonomy Efforts

Data & Metadata

Dissemination

Analytic tools

DDI 3.0, SDMX…

48

Paradata

Page 49: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

1984

National Science Foundation

Division of Science Resources Statistics

49

Page 50: Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John

Thank you!

National Science Foundation

Division of Science Resources Statistics

50