research methods lecture 5 advanced stata
DESCRIPTION
Research Methods Lecture 5 Advanced STATA. IAN WALKER Module Leader S2.109 [email protected]. Housekeeping announcement. Stephen Nickell (MPC and LSE) British Academy Keynes Lecture in Economics "Practical Issues in UK Monetary Policy 2000-2005" Wednesday 2nd November - PowerPoint PPT PresentationTRANSCRIPT
Housekeeping announcement
• Stephen Nickell (MPC and LSE) – British Academy Keynes Lecture in Economics– "Practical Issues in UK Monetary Policy 2000-
2005"
–Wednesday 2nd November
– Arts Centre Conference Room at 5.30pm – http://www2.warwick.ac.uk/fac/soc/economics/
forums/deptsems/keynes_lecture/
Stat-Transfer• Use STAT-
TRANSFER to convert data.
• Click on• Stat-transfer is
“point and click”.
• Just tell it the file name and format
• and the format you want it in.
• Click “transfer”.
Stat Tran 6.lnk
Stat Transfer options• Useful options for creating a manageable
dataset from a large one:– Keep or drop variables– Change variable format
• E.g. float to integer
– Select observations• E.g. “where (income + benefits)/famsize < 4500”
• Can be used for reading a large STATA dataset and writing a smaller one
• Avoids doing this in STATA itself
Practicising• You can import some of Stata’s own demo files
using the .sysuse command– E.g. .sysuse auto
• Many datasets are available at specific websites– E.g. STATA’s own site has all the demo data used in
the manual examples
• You can use the .webuse command to load the files directly into stata without copying locally.webuse auto /* gets the data from STATA’s own site */Or .webuse set http://www2.warwick.ac.uk/fac/soc/
economics/pg/modules/rm/notes/auto.dta
More help• You can search the whole of STATA’s online help
using .search xxx• Michigan’s web-based guide to STATA (for SA)• UCLA resources to help you learn and use STATA:
– including movies and “web-books”• Consult other user-written guides and tutorials
– Chevalier1, Chevalier2; Princeton; Illinois; Gruhn• ESDS’s “Stata for LFS”• Stata’s own resources for learning STATA
– Stata website, journal, library, archive– http://www.stata.com/links/resources1.html
Web resources• STATA is web-aware
– E.g. . update /* updates from www.stata.com */
• Statalist is an email listserv discussion group• The Stata Journal is a refereed journal
– Replaces the old Stata Technical Bulletin (STB):
• SSC Boston College STATA Archive – Extensive library of programs by Stata users– Files can be downloaded in Stata using . ssc
• Eg .ssc install outreg • Installs the outreg ado file that makes tables pretty
Always (whatever the software)
• Use lowercase• Open a log file• Label your data• Use the do file editor• Organise your files
– Separate directories for separate projects– Archive (zip) data, do and results files
when your finished
Customising STATA• profile.do runs automatically when STATA
starts• Edit it to include commands you want to
invoke every time.set mem 200m.log using justincase.log, replace
• Define preferences for STATA’s look and feel– Click on Prefs in menu
• Colours, graph scheme, etc.• Save window positioning
Regression models - I• Linear regression and related models when
the outcome variable is continuous– OLS, 2SLS, 3SLS, IV, quantile reg, Box-Cox …
• Binary outcome data– the outcome variable is 0 or 1(or y/n)
• probit, logit, nested logit...;
• Multiple outcome data– the outcome variable is 1, 2, ...,
• conditional logit, ordered probit
Regression models - II• Count data
– the outcome variable is 0, 1, 2, ..., occurrences • Poisson regression, negative binomial
• Choice models– multinomial choice– A, B or C
• Multinomial logit, Random utility model, unordered probit, nested logit, ...etc
• Selection models– Truncated, censored
• Tobit, Heckman selection models; • linear regression or probit with selection
Regression models - III• STATA supports several special data types.• Once type is defined special commands work• Time series
– Estimate ARIMA, and ARCH models– Estimators for autocorrelation and heteroscedasticity– Estimate MA and other smoothers– Tests for auto, het, unit roots - h, d, LM, Q, ADF, P-P …..– TS graphs sysuse tsline2, clear tsset day tsline calories, ttick(28nov2002 25dec2002 , tpos(in)) ttext(3470 28Nov2002 “Thanks" 3470 25dec2002 “Xmas"",orient(vert))
…gives
than
ks
x-m
as
3400
3600
3800
4000
4200
4400
Ca
lorie
s co
nsu
med
01jan2002 01apr2002 01jul2002 01oct2002 01jan2003Date
Special data types: survey
• Non-randomness induces OLS to be inefficient
• STATA can handle non-random survey data– see the “syv***” commands– Example (stratified sample of medical cases):
. webuse nhanes2f, clear
. svyset psuid [pweight=finalwgt], strata(stratid)
. svy: reg zinc age age2 weight female black orace rural
. reg zinc age age2 weight female black orace rural
Number of strata = 31 Number of obs = 9189 Number of PSUs = 62 Population size = 1.042e+08 Design df = 31 F( 7, 25) = 62.50 Prob > F = 0.0000 R-squared = 0.0698 ------------------------------------------------------------------------------ | Linearized zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.1701161 .0844192 -2.02 0.053 -.3422901 .002058 age2 | .0008744 .0008655 1.01 0.320 -.0008907 .0026396 weight | .0535225 .0139115 3.85 0.001 .0251499 .0818951 female | -6.134161 .4403625 -13.93 0.000 -7.032286 -5.236035 black | -2.881813 1.075958 -2.68 0.012 -5.076244 -.687381 orace | -4.118051 1.621121 -2.54 0.016 -7.424349 -.8117528 rural | -.5386327 .6171836 -0.87 0.390 -1.797387 .7201216 _cons | 92.47495 2.228263 41.50 0.000 87.93038 97.01952 ------------------------------------------------------------------------------ . regress zinc age age2 weight female black orace rural Source | SS df MS Number of obs = 9189 -------------+------------------------------ F( 7, 9181) = 79.72 Model | 110417.827 7 15773.9753 Prob > F = 0.0000 Residual | 1816535.3 9181 197.85811 R-squared = 0.0573 -------------+------------------------------ Adj R-squared = 0.0566 Total | 1926953.13 9188 209.724982 Root MSE = 14.066 ------------------------------------------------------------------------------ zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.090298 .0638452 -1.41 0.157 -.2154488 .0348528 age2 | -.0000324 .0006788 -0.05 0.962 -.0013631 .0012983 weight | .0606481 .0105986 5.72 0.000 .0398725 .0814237 female | -5.021949 .3194705 -15.72 0.000 -5.648182 -4.395716 black | -2.311753 .5073536 -4.56 0.000 -3.306279 -1.317227 orace | -3.390879 1.060981 -3.20 0.001 -5.470637 -1.311121 rural | -.0966462 .3098948 -0.31 0.755 -.7041089 .5108166 _cons | 89.49465 1.477528 60.57 0.000 86.59836 92.39093
Special data types: duration
• Survival time data– See the “st***” commands
.stset failtime /*sets the var that defines duration*/
• Estimates a wide variety of models to explain duration– E.g. Weibull “hazard” model -
Weibull example ….
twoway (function y = .5*x^(-.5), range(0 5) yvarlab("a=.5") )( function y = 1.5*x^(.5), range(0 5) yvarlab("a=1.5") )( function y = 1*x^(0), range(0 5) yvarlab("a=1") )( function y = 2*x, range(0 2) yvarlab("a=2") ), saving(weib1, replace)title("Weibull hazard: lambda=1, alpha varying")ytitle(hazard) xtitle(t)
• ST regression supports Weibull, Cox PH and other options. streg load bearings, distribution(weibull)
• After streg you can plot bthe estimated hazard with . stcurve, cumhaz• STATA allows functions to be plotted by specifying the
function:
gives…..0
12
34
haza
rd
0 1 2 3 4 5t
a=.5 a=1.5a=1 a=2
Weibull hazard: lambda=1, alpha varying
Special data types: Panel data
• STATA can handle “panel” data easily– see the “xt***” commands
• Common commands are.xtdes Describe pattern of xt data
.xtsum Summarize xt data
.xttab Tabulate xt data
.xtline Line plots with xt data
.xtreg Fixed and random effects
Panel data• An xt dataset looks like this: pid yr_visit fev age sex height smokes ---------------------------------------------------------- 1071 1991 1.21 25 1 69 0 1071 1992 1.52 26 1 69 0 1071 1993 1.32 28 1 68 0 1072 1991 1.33 18 1 71 1 1072 1992 1.18 20 1 71 1 1072 1993 1.19 21 1 71 0
• xt*** commands need to know the variables that identify person and “wave”:
. iis pid . tis yr_visit
Or use the tsset command. tsset pid yr_visit, yearly
Panel regression
• Once STATA has been told how to read the data it can perform regressions quite quickly:. xtreg y x, fe
. xtreg y x, re
Further advice
• See Stephen Jenkins’ excellent course on duration modelling in STATA
• See Steve Pudney’s excellent course on panel data modelling in STATA– Beware the dataset is 30mb+