survey documentation and analysis (sda)
Post on 12-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
Survey Documentation Survey Documentation and Analysis (SDA)and Analysis (SDA)
Workshop AgendaWorkshop Agenda
OverviewOverview What is online analysis?What is online analysis? Available SDA data setsAvailable SDA data sets Statistical procedures (Frequencies, Statistical procedures (Frequencies,
Crosstabs, Regression)Crosstabs, Regression) Recoding, subsetting, downloadingRecoding, subsetting, downloading Teaching resources for SDA and Teaching resources for SDA and
developing instructional materialsdeveloping instructional materials
SSRICSSRICSocial Science Research & Instructional CouncilSocial Science Research & Instructional Council
http://www.ssric.orghttp://www.ssric.org
The CouncilThe Council Oldest CSU discipline councilOldest CSU discipline council
Founded in 1972Founded in 1972 Representatives from CSU campuses Representatives from CSU campuses
meet three times per yearmeet three times per year Negotiates with data providers for Negotiates with data providers for
access to dataaccess to data Promotes use of data analysis in Promotes use of data analysis in
research and teachingresearch and teaching
The CouncilThe Council Annual student research conferenceAnnual student research conference
at CSU Long Beach in 2008at CSU Long Beach in 2008 at CSU Sacramento in 2009at CSU Sacramento in 2009
Sponsors travel to ICPSR summer Sponsors travel to ICPSR summer workshops in Ann Arbor, Michiganworkshops in Ann Arbor, Michigan http://www.ssric.org/participate/icpsr_summerhttp://www.ssric.org/participate/icpsr_summer
Works with Field ResearchWorks with Field Research Question credits to California Field PollQuestion credits to California Field Poll Selects faculty fellowSelects faculty fellow
What is Online Analysis?What is Online Analysis?
““Online data analysis" refers to the ability Online data analysis" refers to the ability to perform statistical analysis using special to perform statistical analysis using special Web-based software as an alternative to Web-based software as an alternative to downloading data into a standalone downloading data into a standalone statistical package on your computer. statistical package on your computer.
The software we’re using is called Survey The software we’re using is called Survey Documentation and Analysis (SDA), which Documentation and Analysis (SDA), which was developed at the University of was developed at the University of California, Berkeley. California, Berkeley.
Alternative Statistical PackagesAlternative Statistical Packages
You can get a complete list of available You can get a complete list of available online statistical packages at online statistical packages at http://statpages.org/http://statpages.org/
Some of these include:Some of these include: OpenStatOpenStat ViStaViSta StatextStatext SISASISA
AdvantagesAdvantages
Many like SDA are free – don’t require a Many like SDA are free – don’t require a site licensesite license
Only require a computer with an internet Only require a computer with an internet connectionconnection
Some like SDA are easy to learnSome like SDA are easy to learn Can show students how to use some of Can show students how to use some of
them in 30 minutes or lessthem in 30 minutes or less
DisadvantagesDisadvantages
Some online statistical packages (certainly Some online statistical packages (certainly not all) are limited in what they can do not all) are limited in what they can do statisticallystatistically
Documentation is not very good for someDocumentation is not very good for some Some (like SDA) can only be used with Some (like SDA) can only be used with
data sets that have already been created data sets that have already been created in a format that can be read by that in a format that can be read by that packagepackage
Available SDA Data SetsAvailable SDA Data Sets
SDA Data SetsSDA Data Sets
While SDA is an extremely easy statistical While SDA is an extremely easy statistical package to learn to use, it’s difficult to package to learn to use, it’s difficult to create SDA data sets.create SDA data sets.
You have to purchase a SDA site license You have to purchase a SDA site license to create a data set and then learn how to to create a data set and then learn how to use it.use it.
So we typically use SDA data sets that So we typically use SDA data sets that have been created for us.have been created for us.
Sources for SDA Data SetsSources for SDA Data Sets
SDA Archive located at UC Berkeley (SDA Archive located at UC Berkeley (http://sda.berkeley.edu/archive.htmhttp://sda.berkeley.edu/archive.htm) )
ICPSR Topical Archives (ICPSR Topical Archives (http://www.icpsr.org/cocoon/ICPSR/all/archives.xmlhttp://www.icpsr.org/cocoon/ICPSR/all/archives.xml) )
Field data located at UC Berkeley Field data located at UC Berkeley ((http://ucdata.berkeley.edu/data_record.php?http://ucdata.berkeley.edu/data_record.php?recid=3#analyzerecid=3#analyze) )
List of SDA data sets at CSU Long Beach List of SDA data sets at CSU Long Beach (http://www.csulb.edu/library/eref/datasets.html) (http://www.csulb.edu/library/eref/datasets.html)
University of Denver’s IDEA project University of Denver’s IDEA project (http://www.du.edu/idea/data.htm (http://www.du.edu/idea/data.htm
SDA Archive at UC Berkeley SDA Archive at UC Berkeley
(http://sda.berkeley.edu/archive.htm (http://sda.berkeley.edu/archive.htm) ) GSS Cumulative Datafile (1972-2008; GSS Cumulative Datafile (1972-2008;
2008 is a preliminary version).2008 is a preliminary version). ANES Cumulative Datafile (1948-2000) ANES Cumulative Datafile (1948-2000)
and ANES datafiles for 1996, 2000, and and ANES datafiles for 1996, 2000, and 2004.2004.
Census microdata including 2000-2003 Census microdata including 2000-2003 American Community Surveys and 1990 American Community Surveys and 1990 and 2000 U.S. 1% PUMS with separate and 2000 U.S. 1% PUMS with separate files for 2000 and 1990 California PUMS.files for 2000 and 1990 California PUMS.
ICPSRICPSR
National Archive of Computerized Data on Aging National Archive of Computerized Data on Aging (http://www.icpsr.umich.edu/NACDA/) (http://www.icpsr.umich.edu/NACDA/)
National Archive of Criminal Justice Data National Archive of Criminal Justice Data (http://www.icpsr.umich.edu/NACJD/) (http://www.icpsr.umich.edu/NACJD/)
Substance Abuse and Mental Health Data Substance Abuse and Mental Health Data Archive (http://www.icpsr.umich.edu/SAMHDA/) Archive (http://www.icpsr.umich.edu/SAMHDA/)
International Archive of Education Data International Archive of Education Data (http://www.icpsr.umich.edu/IAED/) (http://www.icpsr.umich.edu/IAED/)
Field DataField Data http://ucdata.berkeley.edu/data_record.php?recid=3#analyzehttp://ucdata.berkeley.edu/data_record.php?recid=3#analyze
Field Polls from 1956 through 2006 are Field Polls from 1956 through 2006 are available as publicly-accessible SDA data available as publicly-accessible SDA data setssets
More recent Field Polls are available as More recent Field Polls are available as SPSS data sets (through FTP) for CSU SPSS data sets (through FTP) for CSU faculty, staff, and students. faculty, staff, and students.
Other Sources of SDA Data Sets Other Sources of SDA Data Sets at ICPSRat ICPSR
Voting Behavior: The 2004 Election by Voting Behavior: The 2004 Election by Charles Prysby and Carmine Scavo Charles Prysby and Carmine Scavo (http://www.icpsr.umich.edu/SETUPS/) (http://www.icpsr.umich.edu/SETUPS/)
Investigating Community and Social Investigating Community and Social Capital by Lori Weber Capital by Lori Weber (http://www.icpsr.umich.edu/ICSC/index.ht(http://www.icpsr.umich.edu/ICSC/index.htm) m)
Statistical ProceduresStatistical Procedures
Available Statistical ProceduresAvailable Statistical Procedures
Frequencies and crosstabulation Frequencies and crosstabulation (discussed in this workshop)(discussed in this workshop)
Comparison of meansComparison of means Correlation matrixCorrelation matrix Comparison of correlationsComparison of correlations Multiple regression (discussed in this Multiple regression (discussed in this
workshop)workshop) Logit/Probit regressionLogit/Probit regression
Using SDAUsing SDA
Select the data setSelect the data set Look at the codebookLook at the codebook Decide what statistical procedure to useDecide what statistical procedure to use Fill in what you want to doFill in what you want to do Run itRun it
Data SetData Set
We’re going to use the GSS 1972-2008 We’re going to use the GSS 1972-2008 Cumulative Data File (2008 is preliminary Cumulative Data File (2008 is preliminary data) data) http://sda.berkeley.edu/archive.htmhttp://sda.berkeley.edu/archive.htm
We’re going to use three variablesWe’re going to use three variables SEXSEX RELITENRELITEN PORNLAWPORNLAW
FrequenciesFrequencies
List the variables you want to useList the variables you want to use ROW: SEX,RELITEN,PORNLAWROW: SEX,RELITEN,PORNLAW
Click on “Run the Table”Click on “Run the Table”
CrosstabsCrosstabs
Now let’s use RELITEN as our Now let’s use RELITEN as our independent variable and PORNLAW as independent variable and PORNLAW as our dependent variable to create two our dependent variable to create two bivariate crosstabulations.bivariate crosstabulations.
List the variablesList the variables ROW: PORNLAWROW: PORNLAW COLUMN: RELITENCOLUMN: RELITEN
Crosstabulation ContinuedCrosstabulation Continued
OptionsOptions Percentaging: columnPercentaging: column StatisticsStatistics Question text Question text Color codingColor coding
Run the TableRun the Table
Your TurnYour Turn
Let’s run two more bivariate crosstabsLet’s run two more bivariate crosstabs Independent variable: SEXIndependent variable: SEX Dependent variables: RELITEN and Dependent variables: RELITEN and
PORNLAWPORNLAW Go ahead and run these crosstabsGo ahead and run these crosstabs
What Did we Discover?What Did we Discover?
RELITEN is strongly related to PORNLAW.RELITEN is strongly related to PORNLAW. SEX is also related to both RELITEN and PORNLAW.SEX is also related to both RELITEN and PORNLAW. Could the relationship between RELITEN and Could the relationship between RELITEN and
PORNLAW be spurious? SEX is related to both PORNLAW be spurious? SEX is related to both RELITEN and PORNLAW and could be creating the RELITEN and PORNLAW and could be creating the relationship between RELITEN and PORNLAW.relationship between RELITEN and PORNLAW.
How do we test this possibility? Let’s run a three-How do we test this possibility? Let’s run a three-variable crosstabulation with RELITEN as our variable crosstabulation with RELITEN as our independent variable, PORNLAW as our dependent independent variable, PORNLAW as our dependent variable, and SEX as our control variable.variable, and SEX as our control variable.
Multivariate CrosstabulationMultivariate Crosstabulation
List the variablesList the variables ROW: PORNLAWROW: PORNLAW COLUMN: RELITENCOLUMN: RELITEN CONTROL: SEXCONTROL: SEX
OptionsOptions Percentaging: columnPercentaging: column StatisticsStatistics Question text Question text Color codingColor coding
SpuriousnessSpuriousness
Was the relationship between RELITEN Was the relationship between RELITEN and PORNLAW spurious due to SEX?and PORNLAW spurious due to SEX?
How do you know?How do you know? Does that mean that the relationship can Does that mean that the relationship can
never be spurious?never be spurious?
RegressionRegression
Crosstabulation is used when all the Crosstabulation is used when all the variables are categorical.variables are categorical.
What do we do when our variables are What do we do when our variables are continuous (i.e., interval and/or ratio)?continuous (i.e., interval and/or ratio)?
Regression is the answer.Regression is the answer.
Bivariate RegressionBivariate Regression Let’s look at the relationship between the Let’s look at the relationship between the
respondent’s socioeconomic status (SEI) and respondent’s socioeconomic status (SEI) and the amount of television one watches the amount of television one watches (TVHOURS).(TVHOURS).
List the variablesList the variables Dependent: TVHOURSDependent: TVHOURS Independent: SEIIndependent: SEI
OptionsOptions T-TestsT-Tests Correlation matrixCorrelation matrix Color codingColor coding Question TextQuestion Text
Multivariate RegressionMultivariate Regression
Now let’s add in another variable: SEXNow let’s add in another variable: SEX But sex is not a continuous variable. How But sex is not a continuous variable. How
do we enter a variable like SEX into the do we enter a variable like SEX into the regression analysis? Answer: create a regression analysis? Answer: create a dummy variable.dummy variable.
Dummy variables take on the values of 1 Dummy variables take on the values of 1 and 0. and 0.
Creating a Dummy VariableCreating a Dummy Variable
SEX (d:1)SEX (d:1) SEX is the name of the variable to want to SEX is the name of the variable to want to
make into a dummy variablemake into a dummy variable d indicates that you want to create a dummy d indicates that you want to create a dummy
variablevariable 1 indicates that the value 1 will be assigned 1 indicates that the value 1 will be assigned
the value 1. All other values will be assigned the value 1. All other values will be assigned the value 0.the value 0.
Run the tableRun the table
Recoding, Subsetting, DownloadingRecoding, Subsetting, Downloading
Recoding Existing VariablesRecoding Existing VariablesExample (from GSS Cumulative File): ATTEND (How often Example (from GSS Cumulative File): ATTEND (How often
Respondent attends religious services)Respondent attends religious services)
ATTENDATTEND0 Never0 Never1 Less than once a year1 Less than once a year2 Once a year2 Once a year3 Several times a year3 Several times a year4 Once a month4 Once a month5 2 to 3 times a month5 2 to 3 times a month6 Nearly Every Wk6 Nearly Every Wk7 Every week7 Every week8 More than once a week8 More than once a week9 DK/NA (Missing)9 DK/NA (Missing)
ATTENDRATTENDR1 Seldom (0 to 3)1 Seldom (0 to 3)2 Sometimes (4 to 5)2 Sometimes (4 to 5)3 Often (6 to 8)3 Often (6 to 8)9 Missing (9)9 Missing (9)
Your TurnYour Turn
Recode AGE into the following categories: Recode AGE into the following categories:
1 = 18-291 = 18-29
2 = 30-642 = 30-64
3 = 65 and older3 = 65 and older
Obtain FREQUENCIES for the resultObtain FREQUENCIES for the result
For More Information, See:For More Information, See:
http://sda.berkeley.edu/HELPDOCS/helpnehttp://sda.berkeley.edu/HELPDOCS/helpnewv.htm#recodewv.htm#recode
Compute a New VariableCompute a New Variable Example (from GSS Cumulative File): Alienation IndexExample (from GSS Cumulative File): Alienation Index
Create measure of ALIENATION from these variables asked in 1978 Create measure of ALIENATION from these variables asked in 1978 only (all coded as 1=agree, 2=disagree, other = missing data)only (all coded as 1=agree, 2=disagree, other = missing data)
ALIENAT1ALIENAT1 PEOPLE RUNNING COUNTRY DONT CARE PEOPLE RUNNING COUNTRY DONT CARE ALIENAT2ALIENAT2 RICH GET RICHER, POOR POORER RICH GET RICHER, POOR POORER ALIENAT3ALIENAT3 WHAT YOU THINK DOESNT COUNT WHAT YOU THINK DOESNT COUNT ALIENAT4ALIENAT4 YOU'RE LEFT OUT OF THINGS YOU'RE LEFT OUT OF THINGS ALIENAT5ALIENAT5 POWERFUL PEOPLE TAKE ADVANTAGE OF YOU POWERFUL PEOPLE TAKE ADVANTAGE OF YOU ALIENAT6ALIENAT6 PEOPLE IN WASH D.C. ARE OUT OF TOUCH PEOPLE IN WASH D.C. ARE OUT OF TOUCH
Your TurnYour Turn
Create an index of parental education: Create an index of parental education: (MAEDUC + PAEDUC)/2(MAEDUC + PAEDUC)/2
For More Information, See:For More Information, See:
http://sda.berkeley.edu/HELPDOCS/helpnehttp://sda.berkeley.edu/HELPDOCS/helpnewv.htm#computewv.htm#compute
Subsetting and DownloadingSubsetting and Downloading
Example: create and download a subset of Example: create and download a subset of the GSS cumulative file, selecting only the GSS cumulative file, selecting only cases from 2008, all Case Identification cases from 2008, all Case Identification variables and some Personal and Family variables and some Personal and Family Information variables (MARITAL, Information variables (MARITAL, AGEWED, DIVORCE, WIDOWED).AGEWED, DIVORCE, WIDOWED).
At end of each intermediate step, click on At end of each intermediate step, click on “Continue” button.“Continue” button.
SPSS Syntax FileSPSS Syntax File
Creating an SPSS system fileCreating an SPSS system file
Run SPSS (syntax) file against data (ASCII) file.Run SPSS (syntax) file against data (ASCII) file. For more information, see For more information, see
http://www.ssric.org/data/icpsr_direct (scroll down)http://www.ssric.org/data/icpsr_direct (scroll down) http://www.ssric.org/data/icpsr_directhttp://www.ssric.org/data/icpsr_direct (scroll to (scroll to
“Syntax Files”)“Syntax Files”) http://www.icpsr.com/cocoon/ICPSR/FAQ/0062.xmlhttp://www.icpsr.com/cocoon/ICPSR/FAQ/0062.xml http://web.pdx.edu/~stipakb/download/Data/SDA_dathttp://web.pdx.edu/~stipakb/download/Data/SDA_dat
a_to_SPSS.pdfa_to_SPSS.pdf (portions outdated) (portions outdated)
File DirectoryFile Directory
Your TurnYour Turn
Subset and download your own custom Subset and download your own custom GSS SPSS system file.GSS SPSS system file.
Sample Instructional Applications:Sample Instructional Applications:Crosstabs With a Control VariableCrosstabs With a Control Variable
Example 1Example 1
GSS Cumulative File (selecting 2002 and GSS Cumulative File (selecting 2002 and 2004 only): 2004 only):
1.1. Crosstab Voting in 2000 election Crosstab Voting in 2000 election (VOTE00) by computer usage (VOTE00) by computer usage (COMPUSE). (COMPUSE).
2.2. Repeat, but with a control for Repeat, but with a control for respondent’s education level (DEGREE). respondent’s education level (DEGREE).
Example 2Example 2
ANES 2004 Study:ANES 2004 Study:Instructor’s note: In addition to using this example in teaching use of Instructor’s note: In addition to using this example in teaching use of
control variables, I also use it in teaching about reactivity in control variables, I also use it in teaching about reactivity in interviewing.interviewing.
1.1. Run frequency distribution for V5205 (Working mother can have Run frequency distribution for V5205 (Working mother can have warm relationship with kids). warm relationship with kids).
2.2. Crosstab V5205 with V1109a (Respondent gender). Weight by Crosstab V5205 with V1109a (Respondent gender). Weight by Post-election weightPost-election weight
3.3. Repeat, but use V4103 (Interviewer gender) as independent Repeat, but use V4103 (Interviewer gender) as independent variablevariable
4.4. Run frequency distribution for V4103Run frequency distribution for V41035.5. Repeat #1 with a control for V4103Repeat #1 with a control for V41036.6. Repeat #2 with a control for V1109a Repeat #2 with a control for V1109a
Teaching Resources for SDATeaching Resources for SDAand Developing Instructional Materialsand Developing Instructional Materials
ICPSR Web-Based Instructional MaterialsICPSR Web-Based Instructional Materialshttp://www.icpsr.umich.edu/ICPSR/training/index.html#instructional http://www.icpsr.umich.edu/ICPSR/training/index.html#instructional
Investigating Community & Social CapitalInvestigating Community & Social Capitalhttp://www.icpsr.umich.edu/ICSC/index.htmlhttp://www.icpsr.umich.edu/ICSC/index.html
Voting Behavior: the 2004 ElectionVoting Behavior: the 2004 Election
http://www.icpsr.umich.edu/SETUPS/index.htmlhttp://www.icpsr.umich.edu/SETUPS/index.html
top related