stata for s-052 m. shane tutwiler your friendly s-040 lecturer william johnston it services harvard...
TRANSCRIPT
STATA for S-052
M. Shane Tutwiler
Your Friendly S-040 Lecturer
William Johnston
IT Services
Harvard Graduate School of Education
Getting the files
The do-file used in this workshop as well as all data files are in the Stata Help tab of the course iSite.
– Download SATdata.csv, auto.dta and Stata for S-052.do and save them to a new folder called Stata_Workshop on your desktop or on a usb drive.
• Office: Gutman 324
• Email: – [email protected]
• Want to set up a consultation? – hgse.service-now.com/ess/research.do
• Want to learn more on your own?– itservices.gse.harvard.edu/its/services/research-online-
resources/stata
Contact Information
Agenda: Overview
I. Overview of Stata
II. Getting Started
III. ‘Do’ files
IV. Basic data cleaning
V. Basic data management
VI. Beginning analysis
VII. Questions
Getting Help in Stata
• Many pathways to getting help in Stata:
. help command
. search command
. findit command
• Use the help menu• Look online with a web browser• Set up an appointment
• ([email protected])!
Some notes
• A word about programming in and using Stata
• Stata is case sensitive, so Myvar is different from myvar
• All commands in Stata are lower-case
• and = “&“, or = “|“, not = “! “
• Assignment is “=“ , value equivalency is “==“
• Missing values are coded as extremely large numbers, and are represented by a . or a blank
How to Begin a Session?
• Specify your directory
– cd “_______”
• Begin using a log file
– log using “______.log”
• Open your data and look at it
– insheet using “SATdata.csv”, comma
– browse
– describe
Anatomy of a Stata Command
• Stata commands follow a pattern:
• [prefix:] command [varlist] [if] [in] [weight ] [, options]
• For example: • bysort region: summarize expense, detail• mean csat if income >= 30000 & region != .• list state in 1/10, nolabel
Getting Started
• Opening Data• Stata formatted data (.dta) : use “file name”
• Comma-separated variables: insheet using “file name”, comma
• Tab-delimited variables: insheet using “file name”, tab
• Web-based data files: webuse “web location”
• Flat-files: Create a dictionary {beyond the scope of this
workshop}
Looking at Data
• Look at your data – did our data import correctly?
• How are our data measured?• What kinds of variables do we have?
• Editor. edit
• Browser. browse
• Other commands. codebook. describe
Examining Data
• There are several ways to look at our data in Stata• How would we describe the distribution of our data?
• Graphs of distribution• Histograms
• histogram• Scatterplots
• scatter
• Charts/Tables of frequency and distribution• Frequency tables
• table• Cross-tabs
• tabulate
Basic Data Operations, part 1
• Generating a new variable
gen newvarname=expression
• Subsetting• keep varlist• drop varlist • if
• Joining Two Datasets
. Merge• Note—this is covered in detail in the Data Management
Workshop!
Basic Data Operations part 2
• Labeling
• To label a variable: label variable varname labelname
• To label values:
. label define labelname 1 ‘high’ 0 ’low’ . label value variable labelname
• Renaming
. rename varname1 varname2
• Replacing values of an already generated variable
. replace newvarname=expression
Apply Your Knowledge
• Use the SATdata dataset
• Generate a dichotomous variable called hi_score from the csat variable, where a value of 1 indicates a score of greater than 922 and a 0 is less than or equal to 922.
• Label it as 0=low and 1=high.
Agenda
I. Overview of Stata
II. Getting Started
III. ‘Do’ files
IV. Basic data cleaning
V. Basic data management
VI. Beginning analysis
VII. Questions
Beginning Analysis
• Useful commands
• Looking at Distributions• table, histogram, summarize
• Testing the Normality Assumption• sktest, ladder, gladder
• Beginning to Look at Relationships• tabulate, pwcorr, ttest, anova
Apply Your Knowledge
• Generate a histogram of the expense variable.
• Generate a two-way table to see if distributions are the same or different for the values of expense by the different values of your newly created hi_score variable.
• If you have time, see if there is a significant correlation between scores on SATs and the average amount of money that each state spends on education (expense).
Building Regression Models
• Regression models
• Linear regression• regress depvar indepvar1 indepvar2 …
• Logistic Regression• logit depvar indepvar1 indepvar2 …
Apply Your Knowledge
• Generate two scatterplots – one to look at the relationship between expense and csat , one to look at expense and hi_score.
• Depending on your estimation of the relationship (linear or not), run the appropriate regression to test for the relative effect of expense on either csat scores or hi_scores.
Saving data, code, and output
• Saving your newly transformed data• save “pathname\filename.dta”• outsheet using “pathname\filename”
• Saving your code• SAVE YOUR DO-FILE!!!!!
• Saving your output• create a log file
• . log using “pathname\filename”• . log close (!!!!) Not closing = not saving!
• Saving graphs• . graph save
Agenda: Overview
I. Overview of Stata
II. Getting Started
III. ‘Do’ files
IV. Basic data cleaning
V. Basic data management
VI. Beginning analysis
VII. Questions
Thanks!
Questions?
Gutman Library, room 323a
http://itservices.gse.harvard.edu/its/services/research