introduction to research design statlab workshop, fall 2010 jeremy green nancy hite
TRANSCRIPT
Introduction to Introduction to Research DesignResearch Design
Statlab Workshop, Fall 2010Statlab Workshop, Fall 2010
Jeremy GreenJeremy Green
Nancy HiteNancy Hite
Outline of a paperOutline of a paper
IntroductionIntroductionTheoryTheoryData DescriptionData DescriptionAnalysisAnalysisConclusionConclusion
Identifying a QuestionIdentifying a Question
Tradeoff between work in and resultsTradeoff between work in and results Easy to do, trivial resultsEasy to do, trivial results Result is interesting, but difficulty is highResult is interesting, but difficulty is high
New tools open up new questionsNew tools open up new questions New statistical or computational tools New statistical or computational tools
make formerly difficult questions make formerly difficult questions approachableapproachable
New theory opens up new questionsNew theory opens up new questions
IntroductionIntroductionTopicTopic
Most general levelMost general level
QuestionQuestion What is the question you want to What is the question you want to
answer?answer? Be specificBe specific Ask only what you can answerAsk only what you can answer
Review the LiteratureReview the Literature ““Stay the course”Stay the course”
TheoryTheoryCategorize your theoryCategorize your theory
Descriptive vs. causalDescriptive vs. causal
Write down your theoryWrite down your theory In paragraph formIn paragraph form Using a statistical modelUsing a statistical model
HypothesisHypothesisIdentify testable hypotheses.Identify testable hypotheses.- how well does the hypothesis test the how well does the hypothesis test the
theorytheory
- what is the counterfactual argument- what is the counterfactual argument- What is the scope of the hypothesis What is the scope of the hypothesis
testtest- Spurious factors, contamination, Spurious factors, contamination,
endogenous factorsendogenous factors
Do you need statistics after Do you need statistics after all?all?
Methodological Concerns: Methodological Concerns: Consort ChecklistConsort Checklist
VariablesVariablesDependent Variable (Dependent Variable (response, response,
outcome, criterion)outcome, criterion)
Independent Variables (Independent Variables (explanatory explanatory or predictor variables)or predictor variables)
Treatment VariableTreatment Variable Covariates / Confounding Variables Covariates / Confounding Variables
Categorical and Continuous Categorical and Continuous VariablesVariables
Remember: Types of variables we choose, Remember: Types of variables we choose, determine the statistics we usedetermine the statistics we use
You need DataYou need DataThink about analyses early!Think about analyses early!Collecting your own dataCollecting your own data
Retrospective, prospective, experimental & Retrospective, prospective, experimental & observational methodsobservational methods
Can find most data you’ll need on-line!Can find most data you’ll need on-line!Statlab Webpage Statlab Webpage (http://statlab.stat.yale.edu)(http://statlab.stat.yale.edu)
AdvisorsAdvisors Yale StatCat (http://ssrs.yale.edu/statcat/)Yale StatCat (http://ssrs.yale.edu/statcat/) ICPSR (http://www.icpsr.umich.edu)ICPSR (http://www.icpsr.umich.edu) Reference Librarian (Julie Linden)Reference Librarian (Julie Linden)
So, you want to make a So, you want to make a surveysurvey
Extensive on-line resources and softwareExtensive on-line resources and software Question types determine analysesQuestion types determine analyses
Open vs. close ended questions, Likert scales, rank Open vs. close ended questions, Likert scales, rank order dataorder data
Assumptions of normalityAssumptions of normality
ValidityValidity Internal & External validityInternal & External validity
Pilot testingPilot testing You need variance to analyze!You need variance to analyze!
Sample sizeSample size It depends; power, effect size, cost (UCLA power It depends; power, effect size, cost (UCLA power
calculator)calculator)
Once You’ve Found or Once You’ve Found or Collected your dataCollected your data
Download the data and documentationDownload the data and documentation StatTransfer (Statlab)StatTransfer (Statlab)
Determine data file typeDetermine data file type Probably a text file (.txt, .dat, .raw)Probably a text file (.txt, .dat, .raw)
Converting text & delimited filesConverting text & delimited files
Choose a statistical software programChoose a statistical software program SPSS, Stata, SAS, Matlab, Excel, R, C+SPSS, Stata, SAS, Matlab, Excel, R, C+
++
Managing your dataManaging your dataBack up all Master Data FilesBack up all Master Data Files
CDR/CDRW, USB Key CDR/CDRW, USB Key
CodebookCodebook All codes All codes Adding variables, cases, computing new Adding variables, cases, computing new
variablesvariables
Keep a roadmap Keep a roadmap Keep a log of all analyses with what you Keep a log of all analyses with what you
have donehave done Save syntax filesSave syntax files
Data Entry - CodebookData Entry - Codebook Always create a codebook that contains: Always create a codebook that contains:
Instructions for entering data Instructions for entering data
Instructions for making decisions when data are ambiguousInstructions for making decisions when data are ambiguous
Instructions for handling missing observationInstructions for handling missing observation
Numerical codes you will use for categorical dataNumerical codes you will use for categorical data General troubleshooting informationGeneral troubleshooting information
Treat it as a working documentTreat it as a working document
Cleaning your dataCleaning your data In order to minimize errors while manually entering data, In order to minimize errors while manually entering data,
you can set ranges in Excel so that if a value outside the you can set ranges in Excel so that if a value outside the range is entered, the cell will change color. range is entered, the cell will change color.
To to this go to Format - Conditional Formatting and To to this go to Format - Conditional Formatting and specify the ranges for which a different format should specify the ranges for which a different format should show up.show up.
Also, you can use the data validation options. Go to Data - Also, you can use the data validation options. Go to Data - ValidationValidation
Keeping Track of Data Keeping Track of Data SetsSets
Ever time you make changes to your data, save it with the Ever time you make changes to your data, save it with the current datecurrent date
Keep a document with a list of the major changes with Keep a document with a list of the major changes with each versioneach version
A good idea is to keep a folder with the original data sets A good idea is to keep a folder with the original data sets and create different subfolders as you make changes to the and create different subfolders as you make changes to the data set. Sometimes it is also a good idea to keep a data set. Sometimes it is also a good idea to keep a working directory for currently active files working directory for currently active files
Always make backup copiesAlways make backup copies
Keeping Track of Keeping Track of Syntaxes and OutputsSyntaxes and Outputs
Save all the syntax you writeSave all the syntax you write
Save all the output you produce and try to annotate it as Save all the output you produce and try to annotate it as much as possible much as possible
Save your syntax and output with the data file name a brief Save your syntax and output with the data file name a brief description of the analyses and the current date description of the analyses and the current date
Save syntax and output in a separate folder from your dataSave syntax and output in a separate folder from your data
So, how do I analyze my So, how do I analyze my data?data?
CorrelationCorrelation Correlation allows you to quantify relationships Correlation allows you to quantify relationships
between variables (r, r-squared)between variables (r, r-squared) Regression allows prediction of dependent variable Regression allows prediction of dependent variable
based on one or more independent variablesbased on one or more independent variables
Group differencesGroup differences t-test & ANOVAt-test & ANOVA Chi-square for categorical and frequency dataChi-square for categorical and frequency data
Significance v. effect sizeSignificance v. effect sizeMore Complex ModelsMore Complex Models
Take Away MessagesTake Away Messages1)1) Determine your question, methods Determine your question, methods
and statistics before you startand statistics before you start
2)2) Keep a codebook of everythingKeep a codebook of everything
3)3) Keep a log of all commands issuedKeep a log of all commands issued
4)4) Save data at every stepSave data at every step
5)5) Ask for helpAsk for help
6)6) Don’t get in over your headDon’t get in over your head