«doing research with secondary data» eva deuchert bern, november 7 2014

Post on 18-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

«Doing research with secondary data»

Eva DeuchertBern, November 7 2014

Seite 2

Relevance

• Social sciences (such as economics, sociology etc.) are empirical sciences (mostly quantitative).

• Up-to-date empirical methods usually require data of high quality (in terms of content and number of observations).

• Collecting primary data is extremely time consuming and costly.

Social sciences heavily rely on secondary data analysis.

Seite 3

Secondary data (used in Microeconometrics)

• Survey data Cross sectional data (Swiss Health Survey, Census, PISA, …) Panel data (Swiss Household Survey, Share, SOEP, …)

• Administrative data Contributions to social security Process data from social insurances …

• Linked data Survey data/administrative data (SAKE, SHARE, ….) Follow-up studies (PISA/Tree) “Big” data (internet/social media, geographic information, etc.)

Seite 4

Advantages of secondary data

• Stimulates research Reduces overall costs. Reduces costs of “failing” (but: publication bias) Generates new research ideas (but: data mining). Allows independent research (benefits particularly young researchers

without access to financial resources).

• Results can be replicated Flaws can be detected. Sets incentives to publish research only when sufficiently robust. Powerful tool for teaching.

Seite 5

Challenges of using secondary data

• Data documentation.

• Data access.

• Data quality: Representativeness of the data. Missing information. Misreported information. Sample sizes.

Seite 6

Best practice: data documentation

Data can only be used when researchers understand which information is provided: Making data searchable!

Seite 7

Best practice: data documentation

•Provide interactive online tools to search variables (incl. documentation) and helps generating software syntax to analyze data.

•Provides an on-line tool (COMPASS) to explore data without installing statistical software.

•Provides access to information at the variable level.

Data can only be used when researchers understand which information is provided: Making data searchable!

Seite 8

Best practice: data access

Data can only be used when researchers can easily access data: Provide access to data under transparent rules!

(“Perceived data protection is often much stronger than legal situation”)

Seite 9

Best practice: data access

• Provides access to (at least core) data free of charge and easy to use via the internet or on CD-ROMs (Public Usage Files).

• Provides access to restricted data via NCHS Research Data Center (on-site, via remote access, or with the paid assistance).

• The Research Data Centre provides Scientific Use Files free of charge under transparent terms and rules.

Data can only be used when researchers can easily access data: Provide access to data under transparent rules!

(“Perceived data protection is often much stronger than legal situation”)

Seite 10

Open issue: Quality of data

Empirical research can be only as good as the underlying data.

Seite 11

Open issue: Quality of data

• How representative is available survey data? Is the population from which the survey was drawn representative? Is nonresponse random? Are provided survey weights doing any good?

Empirical research can be only as good as the underlying data.

Seite 12

Open issue: Quality of data

• How representative is available survey data? Is the population from which the survey was drawn representative? Is nonresponse random? Are provided survey weights doing any good?

Possible to control for sample selection but requires access toInformation on non-respondents (linking surveys with administrative data).

“Instruments” for response (randomized timing of call; randomized incentives to participate, randomized interviewers, ...)

Empirical research can be only as good as the underlying data.

Seite 13

• Missing information: Relevant information is not provided. Information cannot be used (Filter, retrospective data unavailable, etc.)

Open issue: Quality of data

Empirical research can be only as good as the underlying data.

Seite 14

• Missing information: Relevant information is not provided. Information cannot be used (Filter, retrospective data unavailable, etc.)

Open issue: Quality of data

Empirical research can be only as good as the underlying data.

Allowing researchers to implement own questionnaires:SOEP Innovation Sample.

RAND American Life Panel.

Link survey and administrative dataSAKE/SESAM.

SHARE-RV.

Seite 15

Open issue: Quality of data

• How reliable is the provided information? Cultural differences how respondents answer to questions. People do not want or simply cannot disclose the truth. Misreporting unlikely random.

Empirical research can be only as good as the underlying data.

Seite 16

Open issue: Quality of data

• How reliable is the provided information? Cultural differences how respondents answer to questions. People do not want or simply cannot disclose the truth. Misreporting unlikely random.

Link survey and administrative data.Use innovative survey designs (vignettes, list experiments, …).

Empirical research can be only as good as the underlying data.

Seite 17

Open issue: Quality of data

• Up-to-date empirical methods often require high number of observations.

• For policy evaluation we are often interested in subpopulations (people with disabilities, elderly, young women with children, ... ).

Empirical research can be only as good as the underlying data.

Seite 18

Open issue: Quality of data

• Up-to-date empirical methods often require high number of observations.

• For policy evaluation we are often interested in subpopulations (people with disabilities, elderly, young women with children, ... ).

Provide better access to administrative data.Oversample “policy-relevant” groups.

Design special surveys for “policy-relevant” groups.

Empirical research can be only as good as the underlying data.

Seite 19

Conclusion

• Empirical sciences require access to high quality secondary data; state-of-the-art (in Microeconomics): Large panel data sets. Administrative data. Combination of the two.

• Data paid with tax payers’ money (including administrative data!) should be made available to research under transparent rules, easy to use and free of charge.

• Research needs need to be considered when designing data collection.

Seite 20

Conclusion

• Empirical sciences require access to high quality secondary data; state-of-the-art (in Microeconomics): Large panel data sets. Administrative data. Combination of the two.

• Data paid with tax payers’ money (including administrative data!) should be made available to research under transparent rules, easy to use and free of charge.

• Research needs need to be considered when designing data collection.

Warning sign: We know far more about the consequences of reforms in Austria than in Switzerland.

top related