an introduction into stata i prof. dr. herbert brücker university of bamberg seminar “migration...

Post on 22-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Introduction into Stata I

Prof. Dr. Herbert Brücker

University of Bamberg

Seminar “Migration and the Labour Market”Session 3, June 9, 2011

Contents

1Introduction into the workplan2Introduction into the dataset3Introduction into STATA I•Overview on working with STATA•Menues and editors

• General editor• Data editor• Do File editor

•The Grammar of STATA• commands• loading data• describing data• graphs

•Working with Do-Files

1 Workplan

•Forming four teams à 4-5 students•Introduction and outline of research question•Review of literature on labour market effects of migration (3-5 pages)•Description of the dataset

• Data sources and caveats• Descriptive statistics and graphs

•Presenting the empirical model•Presenting and discussing the regression results•Conclusions•Presenting the papers in class

2 The dataset: general information

•The IAB employment sample (IABS)•2% random sample of all employees obliged to pay social security contributions and recipients of unemployment benefits (e.g. SGB II and III)•Precise information on wages and unemployment spells•Information on education and work experience•Period: 1974-2004 (meanwhile until 2008)•Here we use 1980 – 2004 since information at beginning of sample period are less reliable•Focus on Western Germany excl. (West-)Berlin due to unification

2 The dataset: Caveats I

•Identification of foreigners by nationality• We use nationality of first spell to control for

nationalisations•Problem to identify immigration of ethnic Germans (Spätaussiedler)

• We try to identify via programme participation•No civil servants (“Beamte”) and self-employed

• Nothing what we can do.•Wages are censored at legal pension threshold level (66,000 Euros)

• We impute wages above threshold level

2 The dataset: Caveats II

•Missing education information (17%, about 35 per cent of foreigners)

• We impute education information•We have only daily wages (not hourly wages)

• We exclude all part-time workers•See Brücker/Jahn (2011), Data Section for Description and FDZ at IAB for description of data set

2 The dataset: Organisation

•We distinguish 25 years (1980 – 2004)•We distinguish 64 labour market spells by education (4), work experience (8) and nationality (2)

• 4 x 8 x 2 = 64•We use the following indexes:

• h = native (German)• f = foreigner• q = Education• k = work experience• t = time

• Note that we have also aggregates in the dataset (e.g. wt, wqt, wqkt and not only whqkt, wfqkt)

General overview of STATA

The desktop of STATA is divided in four different parts:

1.Review shows executed commands2.Results shows the results of your commands3.Variables the current list of variables in the data set4.command here the commands have to be typed in

Review window:Lists your previous commands

Result window:Shows outcome of your current command

Variable window:Shows variables of your dataset

Command window: Here you can type your commands

STATA has the following menues/editors you can work with:

1.The desktop menue You can run all commands here2.The data editor Here you can edit the data you

have loaded3.The data browser Here you can browse the data

you have loaded, but not edit4.The do file editor The do file is a file where you

can edit and execute all types of commands. Very useful for replication and memorizing what you have done. We come back to this.

The Data Editor. You can change each cell by hand.

The Data Browser looks similiar. But you can‘t edit the data.

The Do File Editor. You can type your commands and execute your commands there.

(Words in stars are not treated as commands, e.g. * Note that … *).

The Grammar of STATA

General Structure of STATA

[prefix :] command [varlist] [if] [in] [weight] [, options]

General structure of STATA

We will concentrate on:

[prefix :] command [varlist] [if] [in] [weight] [, options]

General structure of STATA

We will concentrate on:

[prefix :] command [varlist] [if] [in] [weight] [, options]

What you want to do?

[prefix :] command [varlist] [if] [in] [weight] [, options]

First step how to load data:

> use “Filename” , clear

Practice:

> use “C:\EigeneDateien\Stata\data1.dta” , clear

other option to load data:-> File -> Open -> Choose your data

General structure of STATA

There are two types of variables (data):

numerical variables, e.g.: 0, 1, 501, 0.5, -12 etc.

string variables, e.g.: no voc train , male, female etc.

How to deal with the data types:

Numerical variables: you can do all mathematical operations, e.g. var1 + var2, var1/var2, var1*var2 etc.

String variables: You have to use quotation marks for identifcation, e.g.

var1 = 1 if sex == “female”

The black variables are numerical variables.

The red variables are string variable.

[prefix :] command [varlist] [if] [in] [weight] [, options]

Since you have now loaded the data –

How to get an overview of your data?

> describe

“describe” gives general information about the data, such as the number of observations, the amount of variables, the label and the name of the variables etc.

[prefix :] command [varlist] [if] [in] [weight] [, options]

How to get an overview of your data?

> list

enlists the data of every single cell (e.g. persons, groups, classes) in the data set.

Attention your data might be really large! “-more-” indicates that there are more information available, either put any key to continue or “q” in order to “quit”.

General structure of STATA

We will concentrate on:

[prefix :] command [varlist] [if] [in] [weight] [, options]

What is concerned?

[prefix :] command [varlist] [if] [in] [weight] [, options]

[varlist] stands for either a list of variables or only one variable which is concerned by the command.

[varlist] is set into brackets since it’s an optional specification; in case there is no [varlist] specified, STATA will execute the command for all variables.

Practice:

In order to get information only about education and wages in the data set:

> list ed whqkt

[prefix :] command [varlist] [if] [in] [weight] [, options]

Further commands to describe the data set I.:

> tabstat

gives a table with the mean of the variable(s)

> codebook

indicates the codification of the variable with information on the datatype, range, units, unitvalues, missings, mean, standard deviation, percentiles

In practice:

tabstat whqkt wfqkt

codebook

tabstat whqkt

[prefix :] command [varlist] [if] [in] [weight] [, options]

Further commands to describe the data set II.:

> summarize

gives the absolute frequencies, the mean, the standard deviation, the minimum and the maximum of a variable

> tabulate

indicates a table with the absolute and relative distributions of a certain variable

In practice:

> sum whqkt wfqkt

> tab whqkt wfqkt

[prefix :] command [varlist] [if] [in] [weight] [, options]

Practice:

- how many observations- mean earnings or unemployment rate- standard deviation of earnings and unemployment rate- range of observations (minimum and maximum wage and unemployment rate)

Note that the descriptive statistics provides already interesting information about the data, helps to control for outliers and measurement error and for the interpretation of regression results (most results refer to the sample mean)

General structure of STATA

We will concentrate on:

[prefix :] command [varlist] [if] [in] [weight] [, options]

Under which condition

[prefix :] command [varlist] [if] [in] [weight] [, options]

With [if] you can set a condition, or make restrictions.

e.g. in order to get to know only the average income of migrants with the lowest education (no vocational training).

summarize wfqkt if ed == “no voc train”?

“no voc train” is a string variable (therefore the quotation marks) and indicates that an individual has no vocational training.

[prefix :] command [varlist] [if] [in] [weight] [, options]

How to create dummies?

What is a dummy variable? A dummy variable has a value of 0 or 1.

With STATA you are also able to make up new variables out of the data.

In order to do so you need the command of “generate” and “replace”

> gen ed1 = 0

> replace ed1 = 1 if education == “no voc train”

Other example:

> gen ex1 = 0

> replace ex1 = 1 if ex == 1

[prefix :] command [varlist] [if] [in] [weight] [, options]

How to calculate and transform numerical variables

> generate newvar = var1 – var2

STATA knows the mathematic calculations rules (+, -, /, logs, etc.)

Practice: Create the log wage:

> generate ln_whqkt = ln(whqkt)

[prefix :] command [varlist] [if] [in] [weight] [, options]

How to modify variables/dummies?

> replace var = (var1 – var2)/2

STATA knows the mathematic calculations rules (+, -, /, log, etc.)

Practice: Replace the wage by the log wage only for low skilled

> replace ln_whfqkt = ln(whqkt) if ed == “no voc train”

[prefix :] command [varlist] [if] [in] [weight] [, options]

How to create graphics?

> graph twoway line var1 year [if] [in]

STATA produces twodimensional graphs with lines, bars, dots, scatter plots etc. with the “graph twoway” command, the type of the graph is assigned after that, e.g. “line”

Practice:

Graph the development of native and foreign wages for the years in our sample in a given education and experience group.

> graph twoway line whqkt wfqkt year if ed == “no voc train” & ex == 1

> graph twoway scatter whqkt wfqkt if ed == “no voc train” & ex == 1

The do-file

STATA also provides a do-file (= text-editor), into which the commands can be written.

- the do-file can be opened by the command “doedit” or by pressing “STRG + 8” or by clicking at the do-file bar.

How to execute commands in a do-file?

- you write the command into the text-editor, then mark the text and press “STRG + d”- in case of no text is marked, the whole do-file will be executed. That can create troubles if you have in your list of commands a mistake. (That happens in most cases.)

The do-file

Reasons to use a do-file:

- your work is documented and reproducible!

- you can include comments into your work by setting a “*” at the very

beginning of the line (they automatically get a green color):

e.g. > *load data> use “C:\User\...data1.dta” , clear> *get an overview> describe

- you can save your do-file ->File ->Save- and you also can open do-files ->File ->Open- do-files have the extensions “.do”

This is an example of a Do-File.

First I „set more off“ and load the data.

Second I use a command for panel regressions.

Third I generate some variables.

The remarks in stars are explaing what I‘m doing.

Now I mark the lines where I have the commands I want to execute.

Then I press the execute button.

Next Meeting:

June 30, Room RZ 1.03!

top related