econometrics i chapter 1: the nature of regression analysis textbook: damodar n. gujarati (2004)...

ECONOMETRICS I

CHAPTER 1: THE NATURE OF REGRESSION ANALYSIS

Textbook: Damodar N. Gujarati (2004) Basic Econometrics, 4th edition, The McGraw-Hill Companies

HISTORICAL ORIGIN OF THE TERM REGRESSION

• The term regression is introduced by Francis Galton.

• He found that, although there was a tendency for tall parents to have tall children and for short parents to have short children, the average height of children born of parents of a given height tended to move or “regress” toward the averge height in the population as a whole. This tendency is called Galton’s law of universal regression.

THE MODERN INTERPRETATION OF REGRESSION

• Regression analysis is concerned with the study of

the dependence of one variable, the dependent

variable, on one or more other variables, the

explanatory variables, with a view to estimating

and/or predicting the (population) mean or average

value of the former in terms of the known or fixed (in

repeated sampling) values of the latter.

Examples of Regression Analysis

1. Reconsider Galton’s law of universal regression.

We want to find out how the average height of sons changes, given the father’s height.

Look at the scatter diagram or scattergram on the next slide.

Figure 1.1 Hypothetical distribution of sons’ heights corresponding to given heights of fathers.


2. Consider the heights of boys measured at fixed ages.

Notice that corresponding to any given age we have a range of heights. Therefore, knowing the age, we may be able to predict the average height corresponding to that age.

Figure 1.2 Hypothetical distribution of heights corresponding to selected ages.


5. A labor economist may want to study the rate of change of money wages in relation to the unemployment rate.

Figure 1.3


6. From monetary economics it is known that, other things

remaining the same, the higher the rate of inflation π, the lower

the proportion k of their income that people would want to

hold in the form of money, as depicted in Figure 1.4 (next slide).

A quantitative analysis of this relationship will enable the

monetary economist to predict the amount of money, as a

proportion of their income, that people would want to hold at

various rates of inflation.

Figure 1.4 Money holding in relation to the inflation rate π

STATISTICAL AND DETERMINISTIC RELATIONSHIPS

• In the regression analysis we are concerned with that what is known as the statistical, not functional or deterministic, dependence among variables, such as those of classical physics.

• In statistical relationships among variables we essentially deal with random or stochastic variables. These variables have probability distributions.

REGRESSION VERSUS CAUSATION

• Although regression analysis deals with the dependence of one variable on other variables, it does not necessarily imply causation.

• A statistical relationship per se cannot logically imply causation.

REGRESSION VERSUS CORRELATION

• In the correlation analysis we try to measure the strength or degree of linear association between two variables. The correlation coefficient measures this strength of (linear) association

• In regression analysis we try to estimate the average value of one variable on the basis of the fixed values of other variables.

REGRESSION VERSUS CORRELATION

• In correlation analysis we treat any two variables symmetrically. There is no distinction between variables. Both variables are considered random.

• Most of the regression theory is based on the assumption that the dependent variable is stochastic but the explanatory variables are fixed or nonstochastic.

TERMINOLOGYDependent variable Explanatory variable

Explained variable Independent variable

Predictand Predictor

Regressand Regressor

Response Stimulus

Endogenous Exogenous

Outcome Covariate

Controlled variable Control variable

TERMINOLOGY

• In a simple (two-variable) regression analysis we study the dependence of a variable on only a single explanatory variable, such as that of consumption expenditure on real income.

• In a multiple regression analysis we study the dependence of one variable on more than one explanatory variable, such as that of money demand on interest rates, income, and inflation.

TERMINOLOGY

• The term random is a synonym for the term stochastic. A random (stochastic) variable is a variable that can take on any set of values, positive or negative, with a given probability.

NOTATION

• Y: dependent variable• X1, X2, … , Xk : explanatory variables

• Xk : kth explanatory variable

• Xki : ith observation on variable Xk (cross-sectional data)

• Xkt : tth observation on variable Xk (time series data)• N (or T): the total number of observations or values in

the population.• n (or t): the total number of observations in the

sample. (time series data)

TYPES OF DATA

• There are mainly three types of data for empirical analysis: 1. Time series data2. Cross sectional data3. Pooled data

Time series data

• A time series is a set of observations on the values that a variable takes at different times.

Cross-sectional data

• Cross-sectional data are data on one or more variables collected at the same point in time.

GPA study hours/week

3.5 10

2.7 8

1.9 9

2.3 5

2.0 8

2.2 6

2.5 3

Pooled data

• In the pooled data there are elements of both time and cross-sectional data.

time GPA study hs/week

2000 2.5 9

2000 2.7 8

2000 2.3 6

2005 1.9 5

2005 3.1 12

2010 2.4 7

2010 2.0 5

2010 3.9 11

2010 1.2 2

• Panel data is a special type of pooled data in which the same cross-sectional unit is surveyed over time.

person time GPA study hs/week

1 2010 2.5 91 2011 2.7 71 2012 2.3 62 2010 1.9 82 2011 3.1 122 2012 2.4 63 2010 2.0 53 2011 3.9 113 2012 1.2 2

Sources of Data

• Government agencies (Department of Commerce...)

• International agencies (World Bank...)• Surveys

In the social sciences the data that one generally obtains are nonexperimental in nature, that is, not subject to the control of the researcher.

The quality of data which are used in economics is often not that good.

1. Possibility of observational errors.2. Approximations and roundoffs.3. Nonresponce to surveys may cause

selectivity bias.4. The sampling method used in obtaining the

data may vary so widely that it might be very difficult to compare them.

5. Economic data are generally available at a highly aggregate level. Such highly aggregated data may not tell us much about the individual or micro level units (GNP...) .

6. Because of confidentiality, certain data can be published only in highly aggregate form (health data...).

The researcher should always keep in mind that the results of research are only as good as the quality of data.

econometrics i chapter 1: the nature of regression analysis textbook: damodar n. gujarati (2004)...

Documents

regression analysis

term regression

inflation rate slide

correlation analysis

quantitative analysis

given height

explanatory variables

stochastic variables