making the case for metadata at srs-nsf national science foundation division of science resources...

Post on 14-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making the Case for Metadata at SRS-NSF

National Science Foundation

Division of Science Resources Statistics

Jeri Mulrow, Geetha Srinivasarao, and John Gawalt

FedCASIC Workshops, BLSMarch 17, 2010

National Science FoundationDivision of Science Resources Statistics

www.nsf.gov/statistics/1

1984

National Science Foundation

Division of Science Resources Statistics

2

1,984

National Science Foundation

Division of Science Resources Statistics

3

1

National Science Foundation

Division of Science Resources Statistics

4

1 9

National Science Foundation

Division of Science Resources Statistics

5

1 9 8

National Science Foundation

Division of Science Resources Statistics

6

1 9 8 4

National Science Foundation

Division of Science Resources Statistics

7

Today’s Talk

National Science Foundation

Division of Science Resources Statistics

• A bit about SRS

•Historical perspective of data and metadata dissemination

• Metadata users and their metadata needs

• Standardization efforts

• Challenges and future vision

8

A bit about the Division of Science Resources

Statistics (SRS)

National Science Foundation

Division of Science Resources Statistics

• Federal Statistical agency within NSF

• 11 periodic data collections on the U.S. Science and Engineering enterprise

• Data dating back to the 1950s

9

Historical Perspective of SRS data and metadata dissemination

National Science Foundation

Division of Science Resources Statistics

• 1950s – early 1990s paper only

• Detailed statistical tables withminimum metadata as footnotes

• Publications included Highlights about the survey Scope and method of survey Questionnaire Cover letters

10

Example -- 1950s publication

National Science Foundation

Division of Science Resources Statistics

11

1990’s thru 2000’s

National Science Foundation

Division of Science Resources Statistics

• 1992 – electronic format

• Detailed statistical tables in spreadsheetswith minimum metadata as footnotes

• Kept paper, added electronic text Survey Methodology, Limitations to the data, Definitions, Historical revisions, List of tables

• PDF added Questionnaire, Cover letters, Instructions

12

Example --1993 PDF

National Science Foundation

Division of Science Resources Statistics

13

Example – 1991 Electronic spreadsheet

National Science Foundation

Division of Science Resources Statistics

14

Example – 1991 text

National Science Foundation

Division of Science Resources Statistics

15

Today

National Science Foundation

Division of Science Resources Statistics

• Source data tables in Excel with footnotes

• HTML / PDF Highlights of the survey Links to references Survey description

• PDF Survey Questionnaire Instructions Definitions

16

Example – 2007 Excel spreadsheet

National Science Foundation

Division of Science Resources Statistics

17

Example -- 2007 SIRD1

National Science Foundation

Division of Science Resources Statistics

18

Example – 2007 HTML

National Science Foundation

Division of Science Resources Statistics

19

Example – 2007 PDF

National Science Foundation

Division of Science Resources Statistics

20

BUT THAT’S NOT ALL

National Science Foundation

Division of Science Resources Statistics

• Electronic databases Create and download your own customized

aggregate tables

• Public use files Access to some microdata series

21

National Science Foundation

Division of Science Resources Statistics

22

Metadata in WebCASPAR ….

National Science Foundation

Division of Science Resources Statistics

23

Metadata in WebCASPAR

National Science Foundation

Division of Science Resources Statistics

• Variable specific metadata available under Info link

• Metadata not tightly integrated with the data itself – does not get downloaded with the data

24

WebCASPAR Taxonomy

National Science Foundation

Division of Science Resources Statistics

• Survey specific taxonomies

•NCES IPEDS Classification of Instructional program codes (CIP)

• Integrated taxonomy for querying across surveys

http://webcaspar.nsf.gov/

25

National Science Foundation

Division of Science Resources Statistics

26

National Science Foundation

Division of Science Resources Statistics

27

Metadata in SESTAT

National Science Foundation

Division of Science Resources Statistics

• Metadata Explorer is separate from the data Individual variable information

Description Question Domain/Availability – history Valid response categories Keywords

•Metadata is not tightly integrated with the data itself – it does not get downloaded with the data

28

https://sestat.nsf.gov/sestat/sestat.html

Example -- Public Use file

National Science Foundation

Division of Science Resources Statistics

29

Example -- Public Use file

National Science Foundation

Division of Science Resources Statistics

30

Summary – Where are we?

National Science Foundation

Division of Science Resources Statistics

• Different surveys have evolved differently Varying levels of details/metadata

• Not in an standardized structure

Hodge-podge

31

National Science Foundation

Division of Science Resources Statistics

32

Metadata Users & Their Metadata Needs

• Not a one-to-one relationship, but many-to-many

• They occur at all stages of the survey process

Process Data

National Science Foundation

Division of Science Resources Statistics

Define research objectives

Choose mode of collection Choose sampling frame

Construct and pretest questionnaire Design and select sample

Develop Survey Instrument Develop Sample Design

33

Survey Process

Source: Survey Methodology (2009) Groves, Fowler, Couper, Lepkowski, Singer & Tourangeau.

Recruit and measure sample

Code and edit data

Make postsurvey adjustments

Perform analysis

Define Scope

Collect Data

Disseminate Data

Define Scope

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User GeneralSurvey Manager TopicSubject Matter Expert Population of interestStatistician Other data sourcesSurvey Methodologist SpecificRespondent Frame options Sample design options Historical info/data User needs Federal Register notices

34

Develop Survey Instrument

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User QuestionsSurvey Manager Answer choicesSubject Matter Expert Definition of termsStatistician InstructionsSurvey Methodologist Logic flow of questionsRespondent Cognitive work Validity assessments Reliability assessments Functionality testing Alternative questions Instrument design specs – paper, web, CATI

Develop Sample Design

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User Population of interestSurvey Manager Sampling frame / Universe specsSubject Matter Expert Update scheduleStatistician Sample design specs Desired criteria Sample selection techniques Historical information on performance of designs Estimation methods

36

Collect Data

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User Variable names and formatsSurvey Manager Variable data typesSubject Matter Expert Physical storageStatistician Tables and relationshipsDatabase Administrators Mapping of questions toSoftware Developers variables and definitions Logic flow of questions Response rates over time Paradata Cover letter

37

Process Data

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User Item response ratesSurvey Manager Zero vs. null vs. missingSubject Matter Expert Edit specificationsStatistician Imputation specificationsDatabase Administrators Recode specificationsSoftware Developers Data table specifications Changes across survey cycles

38

Data Dissemination and Publication

National Science Foundation

Division of Science Resources Statistics

Users Metadata

Data User History of changesSurvey Manager Methodology reportSubject Matter Expert Public use files with Statistician documentationDatabase Administrators Author/contact source Software Developers Who can access whatArchivist Type of product Content format URL; Keywords Relationships Metadata schema

39

Who are the Metadata Users?

National Science Foundation

Division of Science Resources Statistics

• Data users Basic & advanced Analysts General public

• Respondent• Survey Manager• Survey Methodologist• Statistician• Subject Matter Expert• Software Developer• Database Administrator• Archivist

40

Need for Standardization of Metadata

is Apparent

is Critical

National Science Foundation

Division of Science Resources Statistics

41

Standardization Efforts

National Science Foundation

Division of Science Resources Statistics

• Dublin Core

• SDMX (aggregate level)

• DDI 3.0 (record level)

42

Recent SRS Efforts

National Science Foundation

Division of Science Resources Statistics

• Data Repository (Oracle)

• Inclusion of some metadata

• SAS/ACCESS User Interface for internal users

• Evaluating external user interfaces

43

SRS Efforts -- Working with Commercial Contractors

National Science Foundation

Division of Science Resources Statistics

• Requirements for Data / Metadata delivery

• Examples document

• Standard contracting language

• Checklist

44

SRS AdoptedBasic Operating Procedures

National Science Foundation

Division of Science Resources Statistics

• Using Oracle to store microdata and metadata

• Collecting metadata in whatever format

• Keeping it all organized

45

Challenges

National Science Foundation

Division of Science Resources Statistics

• Getting all the players on the same page Many different users Many different uses Many different providers Many different products Many different formats

• Cost

• Keeping it all straight

46

Near Future Vision

National Science Foundation

Division of Science Resources Statistics

SRS Data Repository

Data and Metadata

Taxonomy Efforts

Data & Metadata

Dissemination

Analytic tools

DDI 3.0, SDMX…

47

Near Future Vision

National Science Foundation

Division of Science Resources Statistics

SRS Data Repository

Data and Metadata

Taxonomy Efforts

Data & Metadata

Dissemination

Analytic tools

DDI 3.0, SDMX…

48

Paradata

1984

National Science Foundation

Division of Science Resources Statistics

49

Thank you!

National Science Foundation

Division of Science Resources Statistics

50

top related