stat a guide

10
An Introduction to Stata Hemanshu Kumar August 2015 1 The Stata Layout The Stata screen comprises four windows: Results: the biggest window is where Stata shows the commands you enter and the results they generate. Command: the window found at the bottom of the screen by default. This is where you type in commands. Review: this window displays the history of commands entered in the command window. Variables: If there is data loaded, then this window displays the variables in the dataset. These windows can be un“pinned”, moved around, etc. Feel free to experi- ment! In addition, note the toolbar on top. In particular, the Data, Graphics and Statistics menus contain a wide variety of commands which are very useful for most data summary and description purposes. 2 Some Preliminary Notes Stata can be used interactively, by entering commands one at a time in the command window, or using the menus on the toolbar. However, for larger tasks, it is best to put all the commands together in a file and run them together. This has the advantage that you can easily replicate your work later, as well as make changes easily if you make mistakes. The file which stores your commands is a simple text file, and is given the extension .do. More on this later. Stata works by loading a (single) dataset into the computer’s memory. All the commands you give work with the data in memory, and the data in the file is left untouched. The file is only changed when you 1

Upload: jp14041994

Post on 10-Feb-2016

213 views

Category:

Documents


0 download

DESCRIPTION

state guide guide e Elusive Quest for Growth - Free ebook download as Text file (.txt), PDF File (.pdf) or read book onli

TRANSCRIPT

Page 1: Stat a Guide

An Introduction to Stata

Hemanshu Kumar

August 2015

1 The Stata Layout

The Stata screen comprises four windows:

• Results: the biggest window is where Stata shows the commands youenter and the results they generate.

• Command: the window found at the bottom of the screen by default.This is where you type in commands.

• Review: this window displays the history of commands entered inthe command window.

• Variables: If there is data loaded, then this window displays thevariables in the dataset.

These windows can be un“pinned”, moved around, etc. Feel free to experi-ment!

In addition, note the toolbar on top. In particular, the Data, Graphicsand Statistics menus contain a wide variety of commands which are veryuseful for most data summary and description purposes.

2 Some Preliminary Notes

• Stata can be used interactively, by entering commands one at a timein the command window, or using the menus on the toolbar. However,for larger tasks, it is best to put all the commands together in a fileand run them together. This has the advantage that you can easilyreplicate your work later, as well as make changes easily if you makemistakes. The file which stores your commands is a simple text file,and is given the extension .do. More on this later.

• Stata works by loading a (single) dataset into the computer’s memory.All the commands you give work with the data in memory, and thedata in the file is left untouched. The file is only changed when you

1

Page 2: Stat a Guide

explicitly ask for the data to be saved to file. This minimizes thechance that any mistakes you make permanently destroy your data.

• Stata variables are actually entire vectors, with one value for eachobservation. If you want to store a simple number or string, it ismore appropriate to use scalars. Stata also provides a simple way tostore matrices of numbers. These can all be stored in the same Statadataset.

• If you need to give Stata a command that involves a filename and/or di-rectory name that contains spaces, you must enclose the file/directoryname in a pair of double quotation marks ("").

• Stata commands and variable names are case-sensitive.

• In any command, Stata is generally insensitive to the number of con-tinuous spaces.

3 Command Syntax

The essential syntax of a Stata command looks as follows:.[prefix cmd :] command [varlist ] [if] [in] [,options ]

where the portions enclosed in square brackets are optional. We shallsee various examples of Stata commands ahead; you should return here tosee how they fit into this schema, and as a guide to know how to make yourown modifications to the commands.

4 Some Preliminary Steps

For the bulk of this document, we will assume that you are working inter-actively in Stata, by typing commands in the command window.

4.1 Getting Started

Once we have launched Stata from Windows, we take immediate note of twothings on the screen:

• the status bar at the bottom mentions the current working directoryof Stata on its left corner. Say we want to use and save datasets inthe directory C:\My Documents\C003. For this, just type

.cd "C:\My Documents\C003"

in the command window, without the initial “.” I will include the dotat the beginning of each command, since this is the way the command

2

Page 3: Stat a Guide

is displayed in the Results window. However, it is not typed in by theuser. In this command, cd is short for “change directory”. The doublequotes in the command we typed are only necessary when the nameof the directory contains spaces.

• the results window typically states that Stata has allocated 1.0MB ofmemory for data. In addition, in Intercooled Stata, the default set-tings allow for a maximum of 200 variables in the dataset. While thisis enough for small datasets, we might occasionally need more. Tochange the memory allocation to 100MB, for example, and the per-missible number of variables to 300, we can type.set memory 100m

.set matsize 300

It is important to execute these commands before creating any vari-ables or loading any dataset into Stata’s memory. Once a dataset hasbeen loaded and/or variables created, Stata does not let you fiddlewith memory because it would destroy the dataset in memory.

Note: The set memory command is obsolete starting with Stata 12,since Stata now manages memory automatically.

4.2 Getting Help

Stata has an extensive help system. You can get help on Stata commandsat any time by typing.help commandname

As an example, try asking for help on matsize. In addition, you cansearch Stata’s help system for any word(s) of your choice. For example, try.search memory

In the help on a command, you will notice a portion of the commandunderlined. This tells you the extent to which you can abbreviate that com-mand in Stata. For example, in setting memory allocation above, we couldhave typed just.set mem 100m

.set mat 300

4.3 Logging your Work

Especially when using Stata in interactive mode, it is a good idea to keepa log of your work – both the commands you entered and the results Statagave. To begin a log, either go to File > Log > Begin... and specify thelog filename and location, or in the command window, type

3

Page 4: Stat a Guide

.log using mylog , fileformat

where of course you can replace mylog with any filename of your choice.Stata can create logs in one of two formats – SMCL, which has rich for-matting but can only be read by Stata, and text files, which have almostno formatting, but can be read by any editor such as Notepad, Wordpad,Word, etc. By default, Stata uses SMCL logs. However, we can specify anoption in the above command, to use the text format. If we merely typedin.log using mylog

Stata would start a log file called mylog.smcl in the SMCL format, whileif we typed.log using mylog, text

it would start a log file called mylog.log, which is a simple text file. Thisalso highlights one aspect of the standard syntax of Stata commands: the“options” with a command are specified by entering a “,” after the maincommand.

To suspend recording to a log file at any time, type.log off

To resume a suspended log, just type.log on

And to close a log file completely, type.log close

5 Using Stata in Batch Mode

5.1 Working with Do Files

Arguably the most professional way to use Stata is to do all your workwith do files. A do file is nothing but a text file that contains a series ofcommands. When the file is run in Stata, the commands are processedtogether in a batch. And since the do file is a simple text file, it can beopened by any text editor, such as Windows’ native Notepad or Wordpadprograms. I personally prefer to use a program called WinEdt.

However, Stata has its own do file editor as well, and to pull it up, youcan just type

.doedit

in the command window, or alternatively click on the envelope icon inthe toolbar near the top of Stata’s main window. As an example of our first

4

Page 5: Stat a Guide

do file, you could type into the editor* this is my first ever do file

clear

Use the Ctrl + S keyboard shortcut to save this file with your chosenname. Notice that the default file extension is .do. To execute this file, youcan either click the icon for “Do current file” in the toolbar of the do fileeditor, or in Stata’s command window, you can type.do filename

where Stata assumes the .do extension to filename if it is not specified.

5.2 Comments

It is good programming practice to include extensive comments. This allowsothers to understand your code, as well as for you yourself to make sense ofit at a later date. In Stata, single-line comments can be included by startingthe line with an asterisk (*), as you can see in the do file above. Long com-ments that span multiple lines are also possible – in this case, the commentshould begin with /* and end with */ . In addition, you can also includea comment at the end of a line which contains a Stata command. Such acomment must be separated from the command by // as in the followingexample:.set obs 40 // this changes the number of observations to 40

5.3 Backward Compatibility

Since new versions of Stata are constantly coming out, it is quite conceivablethat a do file you write today may not work a few months or years downthe line. The solution is to specify the Stata version you created the file in,at the top of the file, using the version command. This is what you see inthe do file example in section 5.1 above.

5.4 Clearing Memory

The clear command is a very useful command that also often finds placeat the beginning of do files. It simply clears the memory of all variablesand observations, as well as other Stata structures such as scalars, matrices,labels, equations, and so on.

6 Generating Data

In this guide, we will consider a situation where you are interested in creatinga dataset from scratch, rather than using a pre-existing one.

Stata thinks of its dataset as being comprised of several variables, all ofwhich have the same number of observations. In a spreadsheet or matrix

5

Page 6: Stat a Guide

representation, the variables comprise the columns and the observations siton individual rows. The first thing to do when generating a dataset is totell Stata how many observations the variables will have.

.set obs #

where # should be replaced with the requisite positive integer. Thiscommand is usually given when there are no pre-existing variables in mem-ory.1

7 Saving your Data

Remember that Stata works with data in memory. Until you explicitly askStata to save the data, no change will be made to any file on disk. Savingyour dataset in Stata’s own proprietory format is simplicity itself. Supposeyou want to save to a file called mywork.dta in the working directory. Youneed to type:.save using mywork

Notice that we did not need to specify the .dta extension – Stata adds itautomatically. If a file called mywork.dta already exists, Stata will promptlygive you an error. If you are sure you want to overwrite the existing file withthe dataset in memory, you should add the replace option to the command:.save using mywork, replace

8 Taking a First Look at your Data

8.1 Browse command

Having loaded in your dataset, you will find the Variables window popu-lated by the various variable names that were found in your data. If youimported a text file into Stata, Stata would have converted the variablenames to small letters even if they were originally not so. By default, allvariable names in Stata are purely in small letters, and no Stata variablename can begin with a number.

Perhaps the first thing to do is to just look at the spreadsheet of yourdata. This is achieved by typing

.browse

As an aside, note that the browse command can be abbreviated to aslittle as br.

1If there are variables in memory, then this command can be used to increase thenumber of observations in the dataset. In this case, the new observations will all havemissing values.

6

Page 7: Stat a Guide

The Data Browser pane that opens up allows you to look at your data,but not edit it. Also, while the Data Browser is open, no other commandscan be executed by Stata. These are for your protection! Stata stronglydeprecates direct editing of data; you should use commands, so that youhave a better track of what changes are made, and are forced to changedata in a consistent manner.If you are sure you want to manually changethe data, you can always use the edit command.

If you have missing observations/cells in your data, these are recordedas a single dot (“.”).

8.2 Conditions, Ranges and Variable Lists

Sometimes, you may wish to look at only part of your data. For example,you might have a variable country, which stores names of various countries,and you may wish to see only those observations for which country takeson the value “India”. To do this, type.br if country == "India"

Most commands in Stata accept the if argument. Notice that this is notan “option” – it does not need to be preceded by a comma. if executes acommand for those observations for which the succeeding logical expressionholds true. You should note that in a logical expression for equality, wemust use a double = sign. See help operator for more.

Instead of performing a command (such as browse) for a set of observa-tions which satisfy a condition, if we want to execute it over some specificrange of observation numbers, we can do the following

.br in 50/l

(where l is the lowercase L). This would browse the observation numbersfrom 50 to the last. (f, for the f irst observation, is also available as a specialcharacter). Notice the use of the forward slash (“/”) to give an observationrange.

We could also choose to browse only a subset of the variables in our data.Suppose our dataset had five variables, country, year, gdp, gnp, exrate,listed in that order in our Variables window. Then.br country year gdp gnp if year>1990

would show us the specified variables for the data from after 1990. Noticethat in a list of variables, the individual variables are separated by spaces.

Stata also allows us to abbreviate variable names as long as it canuniquely identify a variable from its abbreviation. In addition, wildcardssuch as * and ? are permitted. Thus, the same result as above could beobtained by typing.br c y g* if y>1990

7

Page 8: Stat a Guide

You can also use “-” to shorten a list of variables, using the order in theVariables window. Thus, the same result as above could also be obtained by.br c-gnp if y>1990

8.3 Other data description commands

Typing just

.describe

gives you a basic summary of your data, including the source dataset, thenumber of observations and variables, the amount of memory in use, anda list of all variables with their respective storage types, display formatsand labels (more later about labels). To describe only specific variables, thesyntax is

.describe varlist

where varlist is a list of variables.For numeric variables, the inspect command provides a useful first pass

at the nature of the data: it gives a small histogram, tells you the numberof unique and missing values, and the number of values which are posi-tive/zero/negative and integer or not.

For categorical (nominal) variables, we can quickly obtain a frequencydistribution of the data using the tabulate command.

For any variable, its values (if desired, for a specified range of observa-tions) in the dataset can be obtained using the command list.

Basic descriptive statistics for cardinal variables can be obtained withthe summarize command. For example,.sum gnp gdp if year<=1990

(where sum is the abbreviated version of summarize) provides the mean,standard deviation and minimum and maximum values of gnp and gdp foryears uptil 1990. Adding the detail option to the command provides alarger set of statistics, including quantiles and skewness and kurtosis.

9 Creating and Deleting Variables

9.1 Creating Variables

We often need to create new variables in a dataset which operate on theexisting variables to give us our quantity of interest. Suppose for examplethat we are not interested in GDP itself, but in its logarithm. We could thencreate a new variable, say called lgdp, by using the generate command:.generate lgdp = ln(gdp)

where the function ln() gives the natural log. Suppose we change our mindand decide we want the log to the base 10 instead. We can then replace

8

Page 9: Stat a Guide

our variable as follows:.replace lgdp = log10(gdp)

If we had used the generate command instead, Stata would have givenan error, since it is not sure whether we realize that the variable lgdp alreadyexists, and would get overwritten.

As another example, suppose we want to create a variable to store thegrowth rate of GDP. We could then do.generate growth = (gdp[ n]-gdp[ n-1])/gdp[ n-1]

To understand what we are doing, we need to realize that Stata generateseach observation of the new variable growth, one at a time. At any giventime during that process, Stata uses a temporary variable n to store thecurrent observation number. [Note again, Stata is case sensitive. In partic-ular, N is an entirely different variable, one which stores the total numberof observations. Further, the leading underscore is a common feature of alot of Stata’s internal hidden variables.]

To access a specific observation, we need to enclose the observation num-ber in square brackets. Thus the nth observation of the variable growth

is generated by differencing gdp[ n] and gdp[ n-1] (its value in the lastperiod), and dividing by the latter.

As you might guess, the first observation of our new variable growth

should be “missing”. You should browse to satisfy yourself that this isindeed the case.

Also use help functions to see the range of operations on offer.

9.2 Deleting Variables

We can use drop varlist to drop a specific list of variables, or use keep

varlist to drop except the specified list of variables.

9.3 Macros

Macros come in two types, global and local. Global macros, once defined, areavailable anywhere in Stata. Local macros exist solely within the programor do-file in which they are defined. If that program or do-file calls anotherprogram or do-file, the local macros previously defined temporarily cease toexist, and their existence is reestablished when the calling program regainscontrol. When a program or do-file ends, its local macros are permanentlydeleted.

For example,to create a local containing a value, you could type:

.local x = 4

Locals can be numeric or text strings. For example, you could type.local y = "Hello"

9

Page 10: Stat a Guide

The list of macros in memory at any time, and their values, can be ob-tained using

.macro list

The content of a local in memory at any time, and its value, can beobtained using.disp ‘x’ ‘y’

To delete any local from memory (for example y), we can type.local drop y

10 Using Stata as an Advanced Calculator

10.1 Simple Math

We can do simple math on the command line by using the display com-mand. display will simply output the result of the computation for us. Forexample, we can ask Stata to.display 5 + ln((3-1)/2)

and Stata will just output 5.

10.2 Using Scalars

For more complex mathematics, however, we would ideally like to storeresults in some variables. However, Stata’s variables are entire vectors,containing one value for each observation in the dataset. The purpose isserved by using scalars.

To create a scalar containing the same result as above, you can type, forexample:.scalar myfirst = 5 + ln((3-1)/2)

Scalars can be numeric or text strings. For example, you could type

.scalar hk = ‘‘Hemanshu Kumar’’

The list of scalars in memory at any time, and their values, can be ob-tained using

.scalar list

To delete any scalar from memory (for example hk), we can type.scalar drop hk

10