data discovery the reference interview. always begin by clarifying the distinction between...
TRANSCRIPT
Data Discovery
The reference interview
The reference interview
• Always begin by clarifying the distinction between statistics and data with your patron. Never assume that the patron clearly knows this distinction.
• Ask a question that will help you understand what they might be seeking using our frameworks from yesterday.
• Asking them if they want statistics or data isn’t a good starting question, though.
Frameworks
In print
E-publications E-tables Databases
Online
Statistics
Aggregate Microdata
Data
Statistical Information Table Dimensions:
•Geography
•Time
•Subject content
The reference interview
• What the patron intends or needs to do with the numbers? What is their objective?– Does the patron need them for a report or for data
analysis?
• What geographic area is needed?– Smallest geographic area to be described
• What time period is needed?
• What subject matter (variables) expressed in numbers is needed?
The reference interviewIf you determine the patron does need data:• Population (unit of observation) to be
described• Do they need aggregate data, microdata,
spatial data?
• What software does the patron intend to use?
• How would the patron like the data delivered?
level of service
• How much you do depends on the level of service you are offering.– Finding a resource– Retrieving a resource from an online
service– Tailoring a product for the patron– Creating a product for a patron (e.g., postal
code conversion linkage)
Does the person want onenumber? Are they pursuing a fact or figure?Want to know “how many?”
Statistics in printor ready-ref. electronicsource?
YES
YES
Go to print or ready ref.electronic source.
Does the person want onenumber? Are they pursuing a fact or figure?Want to know “how many?”
Statistics in printor ready-ref. electronicsource?
YES
YES
Go to print or ready ref.electronic source.
NO Are the data accessible incomputer-readable form?
YES
Go to computer-readablesource.
Extract relevant datafrom computer-readablesource and compile statisticsusing appropriatesoftware.
To Use Data You Need 3 Things
• Datafile (the raw numbers)
• “Codebook” (where the numbers are and what they mean)
• Statistical Software (for reading the datafile and analyzing the data)
Field California Poll (newsletter) September 24, 1996as reproduced on microfiche in the collection, American Public Opinion Data.
The Statistics
3001101 1999503 1 3001102122322288181818 112999999999999 999911111199999911111999993311182818 3001103182818 89214888211111111111111199999999999999 122883 2299821948 30011046601893249242331 111 212190100 9000311 300110500000000010000000000000000000000 3001106 1.1951 1.1345 1.1474 1.1585 3001107 1.1559 1.0007 1.0461 1.1416 3001201 2329503 2 3001202238543388881288 112999999999999 999999999911881199999111113231282882 3001203222882 18828822229999999999999911231221221212 322814 8103011942 30012043209492892242314 221 282071000 9470711 300120510010000000000000000000000000000 3001206 1.0056 0.8949 0.9050 0.8557 3001207 1.0988 0.9358 0.8786 0.8586 3001301 5349503 1 3001302358332888111888 117999999999999 999988881199999933333999992221181822 3001303181822 18848223112121112111241499999999999999 212884 3399811948 30013046405399393111511 211 212121000 9550311 300130510000000000000000000000000000000 3001306 1.1951 0.8094 0.6256 0.8518 3001307 1.1559 0.5942 0.4393 0.8840 3001401 1029503 2 3001402342342218111111 111128888888122 100199999922888299999822882212121828 3001403118821 11122223119999999999999912112182221122 212213 2202538148 30014044805399119381311 211 131491000 9540311 300140500000000010000000000000000000010 3001406 0.7594 0.6758 0.7376 0.7498 3001407 0.7829 0.6668 0.7040 0.7600
The Data
VARIABLE 15 RATE PERFORMANCE-BARBARA BOXER DECK 2/17
Q7. WHAT KIND OF JOB DO YOU THINK BARBARA BOXER IS DOING AS U.S. SENATOR - A VERY GOOD, GOOD, FAIR, POOR OR VERY POOR JOB?
N OF CASES VALUE VALUE LABEL
33 1 VERY GOOD 130 2 GOOD 134 3 FAIR 63 4 POOR 43 5 VERY POOR 107 8 NO OPINION 513 9 NOT APPLICABLE (NOT FORM B) ____ 1023 TOTAL
From the codebook for the data:The Field (California) Poll #96-04THE FIELD INSTITUTEINTERVIEWING PERIODS: AUGUST 29 - SETEMBER 7, 1996NUMBER OF CASES: 1023
The Codebook
Statistical Software
• Designed to read large files of raw numeric data• Not a spreadsheet!
– Can handle many more variables and cases.– Can do more elaborate and accurate statistics.– Designed to handle data (cases, observations, variables,
weights), not unstructured “cells.”
GAUSSJMP
MiniTab S-PlusSAS
SPSSStataSystat
SPSS
3001101 1999503 1 3001102122322288181818 112999999999999 999911111199999911111999993311182818 3001103182818 89214888211111111111111199999999999999 122883 2299821948 30011046601893249242331 111 212190100 9000311 300110500000000010000000000000000000000 3001106 1.1951 1.1345 1.1474 1.1585 3001107 1.1559 1.0007 1.0461 1.1416 3001201 2329503 2 3001202238543388881288 112999999999999 999999999911881199999111113231282882 3001203222882 18828822229999999999999911231221221212 322814 8103011942 30012043209492892242314 221 282071000 9470711
Codebook
Describe data layout
Write commands to analyze data
(data)
RESPONDENTS SEX * recoded question 7 Crosstabulation
64 70 65 50 249
25.7% 28.1% 26.1% 20.1% 100.0%
12.6% 13.8% 12.8% 9.8% 48.9%
96 64 42 58 260
36.9% 24.6% 16.2% 22.3% 100.0%
18.9% 12.6% 8.3% 11.4% 51.1%
160 134 107 108 509
31.4% 26.3% 21.0% 21.2% 100.0%
31.4% 26.3% 21.0% 21.2% 100.0%
Count
% withinRESPONDENTS SEX
% of Total
Count
% withinRESPONDENTS SEX
% of Total
Count
% withinRESPONDENTS SEX
% of Total
MALE
FEMALE
RESPONDENTSSEX
Total
Very Good/ Good Fair
Poor /Very Poor no opinion
recoded question 7
Total
RESPONDENTS SEX * RATE PERFORMANCE-BARBARA BOXER Crosstabulation
7.0% 25.0% 35.0% 18.0% 15.0% 100.0%
3.5% 12.4% 17.4% 9.0% 7.5% 49.8%
8.9% 38.6% 31.7% 12.9% 7.9% 100.0%
4.5% 19.4% 15.9% 6.5% 4.0% 50.2%
8.0% 31.8% 33.3% 15.4% 11.4% 100.0%
8.0% 31.8% 33.3% 15.4% 11.4% 100.0%
% withinRESPONDENTS SEX
% of Total
% withinRESPONDENTS SEX
% of Total
% withinRESPONDENTS SEX
% of Total
MALE
FEMALE
RESPONDENTSSEX
Total
VERY GOOD GOOD FAIR POOR VERY POOR
RATE PERFORMANCE-BARBARA BOXER
Total
reference strategies• Gov publications approach
– What agency would produce such a statistic?
• Does the mandate or goals include the scope of content?
• Who are the members of the agency, if the agency is a membership organization?
– What jurisdiction responsible for this content?
– Is this likely an official or non-official statistic?
– What publication titles are related to this content?
– What is the availability of statistics from the agency
• Data librarian approach– What data source would be
used to produce such a statistic?
– Who would collect such data?– What unit of observation
would be needed to produce such a statistics?
– What would the structure of the table look like given time, geography and attributes of the unit of observation?
– Would the source be in the realm of official or non-official statistics?
– Use the literature trail and its indexes (non-official vs. official publications)
the data reference interview process
• The information-seeking context is as important to statistics and data as other reference interviews.
• How is the data reference interview similar to general reference interviews?
• How is the data reference interview different?
research on the data reference interview process
• A colleague is developing a model from which comparisons can be made between the general and data reference interviews.
• One aspect of the model, namely the discovery and clarification of concepts and language, is being investigated using items from a specialist discussion list and a blog.
http://blogs.library.ualberta.ca/digrs/