econometrics study supports -...

VYSOKÁ ŠKOLA BÁŇSKÁ-TECHNICKÁ UNIVERZITA OSTRAVA

Fakulta metalurgie a materiálového inženýrství

PROJECT IRP

„Creation of English Study Supports for Selected Subjects of the Follow-up Master Study in the Quality Management Study Field“

IRP/2015/104

ECONOMETRICS

Study supports

Filip Tošenovský

Filip Tošenovský Econometrics

2

Language review: Mgr. Gabriela Chudašová Title: Econometrics Author: Ing. Filip Tošenovský, Ph.D. Edition: First, 2015 Number of pages: 80 Study materials for the study supports of Econometrics, the Faculty of Metallurgy and Material Engineering. Intended for the IRP project of: Creation of English Study Supports for Selected Subjects of the Follow-up Master Study in the Quality Management Study Field. Number: IRP/2015/104

Execution: VŠB – Technical University of Ostrava The project is co-financed by the Ministry of Education, Youth and Sports of the Czech Republic ISBN 978-80-248-3847-2


3

STUDY INSTRUCTIONS You have received a study text for the course Econometrics which is intended for students of the study programmes Quality Management and Economics and Management in Industry at the Faculty of Metallurgy and Material Engineering. The aim of the course is to introduce students to the foundations of Econometric theory which is widely used in industry, finance and other scientific or applied fields for purposes of modelling relations among phenomena of interest. The study text is divided into parts and chapters, which logically divide the subject matter, but are not equally comprehensive. The estimated study time of the chapters may vary considerably, which is why large chapters are further divided into numbered sub-chapters. The division corresponds tothe structure described below.

The subject matter is presented to students within the framework of corresponding lectures, and is practised in seminars where students practically learn the presented topics.

After studying the course, students should be able to:

- Define and estimate models among many variables, using regression techniques. - Analyse model suitability for practical purposes. - Detect problems in modelling and remove them if they are present. - Work with single-equation and multi-equation models.


4

STRUCTURE OF THE CHAPTERS

Goal

At the beginning of each chapter, objectives of that chapter are given, so that students can better orientate in terms of what is to be achieved after reading the chapter in question.

Time to learning

After a goal is set, the time necessary to study the subject matter is provided. The time is approximate and serves as a rough guide for the study layout.

Example

The text provides the reader with concrete examples which are to describe the practical aspect of work with the presented theory. The illustrative problems are solved step by step so that the ideas of the procedures used are more clear.

Summary of terms

At the end of each chapter, the most important terms are listed for convenience. These terms represent the part of the theory students should focus on. If a term has not been fully understood, the student should go back in the text, and read the corresponding explanatory part again.

Questions

There are also several theoretical or practical questions to verify that the student fully and well mastered the subject matter of the chapter.

Answers to questions

The practical questions are answered in the follow-up „Answers to questions“ section. The author wishes readers a successful study of this textbook.

Ing. Filip Tošenovský, Ph.D.


5

CONTENTS

STUDY INSTRUCTIONS ........................................................................................................3

STRUCTURE OF THE CHAPTERS ......................................................................................4

INTRODUCTION .....................................................................................................................7

1 CLASSICAL LINEAR REGRESSION .......................................................................... 11

1.1 Regression line............................................................................................................ 12

1.1.1 Model formulation ............................................................................................... 12

1.1.2 Estimation of parameters of regression line .......................................................... 14

1.1.3 Evaluation of model quality.....................................................................................16

1.1.4 Properties of estimates ......................................................................................... 18

1.1.5 Statistical properties of regression line ................................................................. 20

1.1.6 Statistical inference .............................................................................................. 21

1.2 Multivariate regression ................................................................................................ 23

1.2.1 Model formulation ............................................................................................... 23

1.2.2 The least squares estimation of model parameters ................................................ 25

1.2.3 Statistical properties of least squares .................................................................... 26

1.2.4 Statistical inference .............................................................................................. 27

1.2.5 Adjusted coefficient of determination................................................................... 31

1.2.6 Test of model ....................................................................................................... 32

2 PROBLEMS IN CLASSICAL REGRESSION .............................................................. 37

2.1. Multicollinearity ......................................................................................................... 37

2.2. Number of regressors .................................................................................................. 41

2.3. Measurement errors..................................................................................................... 41

2.3.1. Errors in y ............................................................................................................ 42

2.3.2. Errors in x ............................................................................................................ 42

2.4. Heteroscedasticity ....................................................................................................... 44

2.4.1. Formulation of the model and its consequences .................................................... 44

2.4.2. The generalized least squares ............................................................................... 45

2.4.3. Goldfeld-Quandt test ............................................................................................ 48

2.5. Autocorrelation ........................................................................................................... 52


6

2.5.1. Formulation of the model and consequences ........................................................ 52

2.5.2. Autocorrelation and the generalized least squares method .................................... 54

2.5.3. Durbin-Watson test .............................................................................................. 55

3. SETS OF SIMULTANEOUS EQUATIONS .................................................................. 65

3.1 Identification......... .................................................................................................. 66

3.2 Estimation of simultaneous equations ....................................................................... 69

3.2.1 The method of indirect least squares ........................................................................ 69

3.2.2 Two – stage least squares......................................................................................... 72

TABLES .................................................................................................................................. 76

REFERENCES…..……………………………………………………………………………...79


7

INTRODUCTION

Many economic subjects need to possess a deeper knowledge of the relations among diverse variables due to their own industrial or economic activity. One of the reasons why companies demand the knowledge might be the fact that the result of their work depends directly on the development of these variables. To give an example, banks monitor the progress of interest rates because banks’ financial profits are affected by the rates. Thus, knowing the relations between the rates and financial profits, a change in the rates might indicate the extent to which the financial profits will be altered. Another example involves industrial product makers who need to cut down their costs to stay competitive. To achieve this objective, they need to learn how different factors entering their production processes are related to each other and how they influence the firms’ industrial output. Understanding that, the levels of the factors can be set up in such a way that the output is realized at lower costs. Other examples could be worded to illustrate that different relations among different variables are of interest to both private and public companies. If relations are of interest, they should be described as best as possible, so that precise enough predictions can be formulated on the development of one set of variables, based on the setting of other variables. This allows companies to correct their behaviour in advance, so that the outcome of their activities satisfies both their and customers’ needs. Econometrics is one of the disciplines that provides the theory and tools to describe the relations among variables. However, not only is it used to discover new relations, it is also exploited for empirical verifications of already formulated relations. There are many areas of human activity where econometrics can be used, but its economic and industrial applications probably dominate. At this stage we can say that the essential objective of econometrics is to describe and measure dependencies among variables. When relations among variables are of interest, they are usually described through a certain mathematical model that they are a part of. In the sciences, mathematical models are mostly deterministic, meaning that the relations hold exactly. In this context, the word deterministic signifies that once a set of variables takes on specific values, another set of variables, related to the former through a function, attains its own levels uniquely. A law in physics may be an example of this unique deterministic relationship. However, deterministic relations are often not the case both in practice and theory, even though some theories assume the deterministic nature of what they work with. This assumption might only be a simplification of the true state of things. If this is the case, only the most optimistic analyst, who works with the deterministic model, can presume the model corresponds to its real counterpart. More often than not, this will not be true, however. To give an example, let us look at production models. No matter how perfect such models might be, they will never encompass unexpected events, such as technical breakdowns of a machine that can temporarily stop production altogether. For these purposes, it is better if the formerly deterministic model is extended by a stochastic element, which leads to the concept of a stochastic model. In a stochastic model, the behaviour of the variable we are interested in, such as production level, for instance, depends not only on deterministically defined other variables, but also on variables the effect of which on the variable of interest is hard or impossible to identify or measure. The overall effect of these unidentified variables is contained in or described by a random element which is inserted in the model, as well. When working with a stochastic model, it is no longer possible to predict the value of one variable, when knowing the values of other variables, since the model contains a random element the


8

value of which remains unknown until it is implemented. This distinguishes stochastic models from the deterministic ones. Stochastic models are less precise than deterministic models in the sense of whether the unique value of a variable corresponds exactly to the values of other variables. On the other hand, stochastic models often describe the reality more precisely than deterministic models. This is usually at the cost of simplifying the work with these models, however. The validity of a deterministic model can be refuted with a single empirical observation, whereas in the case of stochastic models, this would require many observations which „predominantly” are not in accordance with the model. Trying to approach the reality better means working with stochastic models. When building a stochastic model, economic and business theory is utilized, because the theory may require certain conditions to be met for the parameters entering the model. The theory may also define which variables can enter the model at all from the logical point of view. Mathematical tools must also be used as well as the theory of probability or its application in the form of mathematical statistics, since a stochastic model is dealt with. Furthermore, the model must be in line with what real data suggest, and so empirical data have to be at hand, as well, to underscore the validity of the built-up model. To sum up, econometrics is a complex scientific discipline in which the following three disciplines meet – general mathematics, mathematical statistics and (business) economics. We can mention the first issue of the journal Econometrica, in which The Econometric Society wrote that the main objective of the society will be its support of studies which try to unify a quantitative – theoretical approach with a quantitative – empirical approach, and which rely on constructive and rigorous thinking similar to that of the sciences. However, a quantitative approach has several aspects none of which should be linked to econometrics alone. Econometrics is not the same as economic statistics nor general economic theory, although a large part of it is quantitative in nature. Nor should econometrics be a synonym for the mathematics applied in economics. Experience showed that each of the three views, i.e. mathematics, economics and statistics, is necessary, and none of them can serve for these purposes by itself. It is their unification that turns them into a powerful tool. And this unification represents econometrics. How does an econometric model originate? At first, a model linking several variables is formulated. This model can draw on the validity of other theories from which it has been verified. The Cobb-Douglas production function may serve as an example. The formulation might also depend on the experience of the analyst who tries to build the model, the type of available information based on which the model is defined, and the character of the problem the model is built for. There may be more forms of the model, although none of them should contradict the theories. To give an example of the procedure, the Cobb-Douglas function can be taken as a starting point for modelling purposes, the function serving as an instance of a theory which has not been disputed, and the function may be further examined quantitatively in case of a specific company, since the function contains company-specific unknown parameters. In the next step, the formulated model is converted into an econometric model which represents an algebraic expression of the relations among the variables of interest, using the means of mathematics. This model contains the aforementioned random element and unknown parameters which reflect the intensity and direction of the mutual effects of the variables appearing in the model. The conversion requires that the variables to be included in the model are listed, and the


9

explanatory (independent) and explained (dependent) variables are determined. Thus, the direction of causal relations must be defined. Also, it must be decided whether one or more equations will make up the entire model, and what analytical form of the model it will be (linear vs nonlinear, for instance). If it can be assumed that the resulting econometric model contains all important variables and its random element has reasonable statistical properties, the unknown parameters of the model may be estimated, using proper statistical techniques. Under suitable conditions, the estimated parameters will have statistical properties that are optimal in a sense. To estimate the parameters means to determine a particular form of the model, based on concrete data, since at the beginning the model was defined only generally. The theory of econometrics strives to search for methods and conditions under which the estimated model has optimal statistical properties. It also analyses what happens to the model properties when the optimal conditions are violated and looks for ways how to solve these problems. A suitable estimation of the unknown parameters in the model is based on data gathered by the analyst who formulated the model. This is an integral part of econometricians’ work because the core of econometrics lies in a quantitative approach. In economic reality, a source of necessary data are official reports published by the country’s statistical office, the results of polls run by marketing agencies, the information releases of central banks, companies’ office records, the reports from chambers of commerce, etc. This information represents the basis for modelling the relations between economic and industrial variables. After a specific model is built, it is imperative to verify its validity. Firstly, it must be verified whether the estimated parameters are in accordance with general economic and business theory. For instance, some model parameters should be positive according to economic theory. If it turns out to be negative, the model cannot be correct and must be modified. In this context, we talk about an economic verification of the model. Secondly, the quality of the model from the purely statistical point of view must be checked, as well. This requires that various statistical tests are run, and model quality characteristics are calculated. Different criteria checking model quality exist, and they are either a direct result of the theory of statistics, or they are artificially constructed, having a specific interpretation. In this case, a statistical verification of the model is performed. An econometric verification of the model represents yet another feedback about the model and its estimation. This type of verification is very important, since it checks whether all conditions necessary for estimating the model and its statistical verification have been met. As it turns out, building an econometric model is not a straightforward and easy task. It can be actually quite difficult because the quality of the mathematical description of relations among economic variables depends on more aspects involving the extent of the economic reality to be described, the character of variables to be used in the model, the mathematical form of the model or the stochastic character of the random component used in the model. The theory of econometrics tries to overcome these obstacles so that the models formulated were reasonable in terms of their precision, and could be suitably used in real-life situations.


10

Econometrics is a valuable tool for studying economic and business phenomena, and as a theory it is advanced. On the other hand, its power should not be overestimated. We must realize that econometricians often observe economic events under conditions which do not allow them to run a controlled experiment. They cannot always monitor the reaction of a variable of interest, dependent on other influential variables or factors, the values of which they could control at their own free will. In these cases, they cannot expect to find deep relations through widespread and controlled data manipulation. What econometricians may have at hand are data of a non-experimental nature because they are only the passive observers of events. This contributes to the fact that stochastic rather than deterministic models are worked with. The analyst can only hope that a data sample on implemented economic processes will be available under conditions which econometrics can handle, and the result of his or her efforts will be a narrower class of reasonably good models usable in practice. One might even say that the theory of econometrics is a good organizer of available economic and business data. It must be stressed that any model will always be a simplification of reality, and that it will not be usually possible to create a model which would be better than other models in every respect. This means that the objective of an econometrician should rather be to narrow down the set of all usable models to a subset of better ones which might serve as better candidates for the description of reality. As statisticians say, every model is poor, but some models are better than others. And it is the better model that econometricians should try to detect, based on all the data they can have. Econometrics originated some 80 years ago, in the 1930s, at the time when the American Econometric Society was founded. As it is the case with other theories, econometrics is built in subsequent steps, starting with models corresponding to simpler situations, and ending up with models reflecting more complex economic settings. We shall follow this concept in the following chapters of this text.


11

1 CLASSICAL LINEAR REGRESSION

Goal:

This chapter introduces the reader to fundamental concepts of regression modelling in its simplest form, which covers multivariate classical linear regression and its special case, the case of the regression line. The chapter covers estimation procedures for finding parameters in regression models, checking model quality and using models for statistical inference. Time to learning:

10 hours. We shall start the explanation of the theory of econometrics in the area of classical linear regression which represents a starting point for more advanced econometric procedures. The term “regression” is already known from other fields, such as psychoanalysis, and it can be loosely translated as a reversed procedure. In econometrics, we do not analyse data based on the already known functional relations among them. We rather proceed the opposite way, hence the term „reversed“. We try to find a functional relation or model, having gathered some empirical data in advance. This model should explain the form of the available data. In econometrics, the regression principles and techniques are utilized to a great extent. Regarding the modifier „linear“, the term relates to the analytical form of the models we are going to work with. Linearity is known from general mathematics, although within the econometric framework, it is perceived in a broader sense. We shall define the term linear regression model as the model of the form

= + + + ⋯ + + , (1.1) where is the so-called explained or dependent variable, , j=1,2,…,k, are explanatory or independent variables (yet another term is regressors), , j=1,2,…,k, are unknown model parameters and represents the random component of the model. To stress the meaning of the word “linear”, we shall write the relation 1-1 in a more general form

= + ∗ + ∗ + ⋯ + ∗ + , (1.2) where ∗ = ( , , … , ) for j=1,2,…,k. Thus, when we talk about a linear model, we will have in mind the expression 1-2, which is linear in terms of how the unknown parameters appear analytically in the model. The explanatory variables can be nonlinear functions of their


12

arguments. For example, a model of the form = + + + is still linear, while the model = + is not. The word “classical” is related to a set of properties required for the random model component and the regressors. The properties are formulated in a way which ensures that the estimation of the unknown model parameters will result in an econometric model of a reasonable quality. One of these conditions requires that the regressors are non-random variables (or more precisely, that they are degenerated random variables). This practically means that an econometrician working with the model defines the values of the regressors to be included in the model himself or herself. Having set the i-th value of the j-th variable , which shall be denoted , the econometrician then measures or obtains a value of the dependent variable corresponding to the values of regressors set in advance. If the subscript i represented a point in time (in that case, a more common subscript is t instead of i), the model contained only one explanatory variable - the amount of a company’s savings, and the dependent variable Y would stand for a company’s investments, then for i= 1, the econometrician would define a company’s savings of and look for the investments of companies whose savings are valued at . If n values are defined for the explanatory variables in advance, the resulting data sample to be analysed will be of the form ( , , , … , ), i=1,2, …, n. This sample shall be used to build an econometric model which would describe the relation(s) between the variable and the variables , , … , . The obtained values of are realizations of the random variable Y. As was outlined in the introduction, the construction of a model can be divided into three major steps. In the first step, a general mathematical form of the model is defined, and statistical properties are set for the random component of the model. In the second step, the unknown parameters of the model are estimated, so that the general form of the model from the first step takes on a specific empirical form. For this, concrete empirical data must be available. In the final step, the overall quality of the model is assessed, and the feasibility of the assumptions about the properties of the random component is discussed. If the assumptions are met, the model will have a reasonable quality and can be worked with for the purposes of statistical inference. If the assumptions are not met, the approach to building the model must first be altered before it is reasonably used for statistical inference. We shall follow this concept in the next chapters. The theory of classical regression will now be divided into two parts. In part one, we shall discuss the concept of the regression line. This will serve us well in demonstrating some fundamental principles of econometrics [1]. In the second part, these principles will be generalized for the case when the regression model contains more independent variables. Thus, we will switch to model 1-1 in which k may be greater than one. 1.1 REGRESSION LINE

1.1.1 MODEL FORMULATION The model of the regression line is of the form


13

= + + , = 1, 2, … , . (1.3) Compared to 1-1, the model contains the subscript i, which simply says that we work with n specific forms/values of the regressor . Expression 1-3 is a special case of 1-1 when k = 1 and the equation is examined for n specific values of . As we shall see, concrete sets of values ( , ), = 1, 2, … , , will be utilized when building the model. These values are available before the unknown coefficients in the model are estimated. The values ’s are concrete realizations of the random variables ’s. Since 1-3 represents a set of n equations with unknown parameters , , it is possible to rewrite it in the matrix form

= + , (1.4) where

= : , =

11: :1

, = : , = .

In the classical regression, the random components are required to meet the following conditions: ~ (0, ) for = 1, 2, … , , , = 0 for ≠ . In other words, the random elements are to be normally distributed with a zero mean and a constant variance. They should also be mutually or serially uncorrelated. When the condition of constant variance is met, we talk about homoscedasticity. In the opposite case, when the condition is violated, we talk about heteroscedasticity. If the condition of the zero correlation is satisfied, we say that there is no autocorrelation in the model. More precisely, the two conditions are usually written in the matrix form, which implies that the random components are not only uncorrelated, but also statistically independent:

~ ( , ), (1.5)

where N denotes a multivariate normal distribution (of dimension n in this case), 0 is the vector of expected values (0, 0, …, 0) and , with I being the identity matrix, is the so-called covariance matrix. The matrix contains variances of the random components of the model on its diagonal – the diagonal elements are all equal in the classical model – and covariances between the random components represent the off-diagonal elements of the matrix. In general, the element of the matrix lying in the i-th row and j-th column is = , , i, j = 1,…, n. The classical case also has requirements for the matrix of regressors X. It is demanded that a) elements of are determined in advance, and thus are not realizations of


14

nondegenerated random variables. They are not assigned to the analyst by chance.

b) columns of , which is of × type, i.e. it has n rows and k columns, need to be linearly independent. In other words, the rank of X is ℎ( ) = . (1.6)

1-6 implies that ≥ . We have six conditions altogether in the case of the classical regression model. Four of them concern the random components of the model (normality, zero expected values, constant variance and zero covariances, or put equivalently, zero correlations), while the remaining two conditions are related to the matrix of regressors. We add that in the general formulation of classical regression, the condition of normality is absent, since many important properties of the resulting estimated model may still be proved without this requirement. However, normality simplifies the subsequent statistical inference we shall do with the model, therefore we will assume the condition is met. 1.1.2 ESTIMATION OF PARAMETERS OF REGRESSION LINE We will present the basic and most used method of estimation of the unknown parameters appearing in a regression line. The method is called the least squares method, and it can be used for more complex models, as well. Computationally, the method is simple, and it produces estimates with reasonable statistical properties. Its principle is such that the estimates of the two unknown parameters , , denoted , , are the values which minimize expression

( , ) = ∑ ( − − ) . (1.7) Formula 1-7 can be understood as a function of two variables and , and so to find the estimates means to find the minimum of the function S, or to find the extreme of a function of two variables, generally speaking. To find the extreme, we derive function 1-7 with respect to and then with respect to , and set the two partial derivatives equal to zero. This way, we arrive at equations ( , ) = 2 ∑ − − (−1) = 0,

( , ) = 2 ∑ − − (− ) = 0, This represents the set of normal equations. The set can be further adjusted as + ∑ = ∑ , ∑ + ∑ = ∑ . This leads to the solution = ∑ ∑ ∑

∑ ∑,


15

= ∑ ∑ ∑ ∑

∑ ∑. (1.8)

Expressions 1-8 can be simplified to = ( , )

( ),

= − , (1.9) where the symbol “cov” means the covariance of the variables and : ( , ) =(1/ ) ∑ − (1/ ) ∑ ∙ (1/ ) ∑ , and the symbol “var” stands for the variance of calculated according to the formula (1/ ) ∑ ( − ) . The symbols and are often replaced with symbols and because this complies with the notation principles used in statistics. Greek letters are reserved for unknown parameters, whereas their estimates are denoted with letters of the Latin alphabet. The model ( ) = + , where the symbol “E” denotes the expected value of Y as usual, is sometimes called the theoretical regression function, its estimate = + is called the sample regression function. The values are called fitted values. When the model is estimated, it is possible to estimate the i-th value of the random component = − − with = − − , which is more often denoted as . The estimate of the random component is called residual. Expression 1-7 may now be estimated by the residual sum of squares = ∑ .

EXAMPLE 1

Using the least squares method, find the estimates and of the unknown coefficients in the model = + + . The following data sample is available Table 1: Data sample for Example 1

x 2 4 6 8 10 12 14 y 3 4 6 8 11 15 17

Source: own Solution: We shall use expressions 1-8. To do so, let us extend Table 1 to table


16

Table 2 x y xy x2 2 3 6 4 4 4 16 16 6 6 36 36 8 8 64 64 10 11 110 100 12 15 180 144 14 17 238 196

Summing the columns of the table, we get

=7 ∙ 650 − 56 ∙ 647 ∙ 560 − 56 ∙ 56 = 1.232,

=64 ∙ 560 − 56 ∙ 650

7 ∙ 560 − 56 ∙ 56 = −0.714. Therefore, the sample regression line is of the form = −0.714 + 1.232 . The same result can be obtained, using formulas 1-9. In our case, ( , ) = 19.714 and ( ) = 16. Further,

= 9.142 and = 8 , so that = 19.714/16 = 1.232 and = 9.142 − 1.232 ∙ 8 =−0.714 . In the next step, residuals can be calculated as well as the residual sum of squares. The results of these calculations are in Table 3. Table 3: Calculation of residuals in Example 1

x y fitted y e e2

2 3 1.750 1.250 1.5625 4 4 4.214 -0.214 0.045796 6 6 6.678 -0.678 0.459684 8 8 9.142 -1.142 1.304164 10 11 11.606 -0.606 0.367236 12 15 14.070 0.930 0.8649 14 17 16.534 0.466 0.217156

The residual sum of squares is 4.82 (this is the sum of the last column of Table 3). The value represents the minimal value of function 1-7. 1.1.3 EVALUATION OF MODEL QUALITY After the model is estimated, it makes sense to make a judgement about its quality. We shall focus on the statistical verification of the model at this stage and on the general characteristics of model quality. Since we have not assigned a particular economic meaning to the model, we will


17

not perform the economic verification. As far as the econometric verification is concerned, it will be dealt with in greater detail in later chapters where we will be concerned with the causes and consequences of violations of classical regression conditions for a general model. If we want to make a judgement about the quality of a model, we could theoretically use the criterion , because its value tells us how well the model at hand runs through the measured values . The smaller the value of the residual sum of squares, the more optimistic we might be in relation to the model found. However, this criterion is not suitable and the reason behind this statement is simple - depends on the physical units of Y, and so a change in the units changes

. Therefore, various dimensionless criteria of model quality are constructed, reflecting in a sense to what extent the model is suitable for use. One of these criteria is the coefficient of determination. Let us demonstrate its principle. Let = ∑ ( − ) and = ∑ ( − ) . The expression for is called the total sum of squares, the expression for is called the explained sum of squares, and it describes the variability of the fitted values. We may write

( − ) = ( − + − ) ,

= ( − ) + ( − ) .

Thus, we have

= + . (1.10) Dividing both sides of 1-10 by , we get

1 = / + / . (1.11) The ratio / is called the coefficient of determination, denoted R2. The ratio describes what part of the total variability of ’s is explained by the variability of the fitted values, i.e. what part of the total variability is explained by the model. It follows from 1-11 that the coefficient can take on values from interval [0, 1]. The higher the value of the coefficient, the more suitable the model seems to be. In the case of our regression line, = 170/174.85 =0.972. This value suggests that the model is suitable for the description of the x, y relation. It must be stressed, however, that the coefficient is an elementary characteristic of model quality, and it has its drawbacks. We shall talk about it in the chapter on multivariate regression, where more regressors will be added to the model and a suitable modification of the coefficient of determination will be introduced. The coefficient of determination evaluates the suitability of the analytical form of the estimated model, but it does not say anything about the statistical properties of the estimated model


18

parameters or the entire model. Let us focus on these properties which are related to conditions 1-5 and 1-6 of the classical regression case. 1.1.4 PROPERTIES OF ESTIMATES To assess the statistical quality of a model and its estimated parameters, suitable criteria must be defined, against which the quality will be compared. The theory of econometrics works particularly with the following criteria: Unbiased estimate We say that b is an unbiased estimate of if the expected value of the estimate equals the unknown parameter, that is, if ( ) = . Usually, a vector of unknown parameters =( , , … , ), k >1, is being estimated, and in that case, a vector = ( , , … , ) is said to be an unbiased estimate of if ( ) = , i = 1,…, k . The best unbiased estimate An unbiased estimate b is said to be the best unbiased estimate of the unknown parameter if

( ∗) ≥ ( ), where ∗ is any unbiased estimate of . This means no unbiased estimate has a smaller variance. If the entire vector = ( , , … , ), k >1, is estimated, then an estimate is said to be the best unbiased estimate if a linear combination ′ = + ⋯ +

, where ’s are arbitrary real numbers, is the best unbiased estimate of ′ . Asymptotically unbiased estimate We say that , where the subscript denotes the data sample size we work with, is an asymptotically unbiased estimate of if its expected value converges to the unknown parameter lim

→∞( ) = .

This means that for large enough samples, the estimate is an unbiased estimate of the unknown parameters, but only approximately. An analogous definition holds for the vector of estimates: the limit of the expected value is applied to each vector component separately. One may also define the best asymptotically unbiased estimate in a manner similar to that of the best unbiased estimate, but within the limit. Consistent estimate An estimate , where the subscript denotes the data sample size we work with, is a consistent estimate of the unknown parameter if it converges to in probability, i.e. if lim

→∞({| − | > }) = 0


19

for any fixed > 0. A vector of estimates is said to be a consistent estimate of the vector of unknown parameters if each component of the vector of estimates is a consistent estimate of the corresponding component of the vector of unknown parameters. Convergence in distribution and asymptotic distribution of an estimate In many situations it is convenient to know the probability distribution of the estimated parameters of a regression model because it allows us to perform statistical inferences. It may happen, however, that the exact distribution is unknown when only a data sample of finite size is available, which is always the case in practice. What might be known, however, is the limiting distribution of such an estimate. It is then possible to say that the known distribution describes the stochastic behaviour of the estimate with a precision that grows as the data sample size increases. The term asymptotic distribution of an estimate may then be introduced in this context. Before doing so, however, the term limiting distribution must first be defined, which is related to what is called a convergence in distribution. In our case, all these terms will concern a normal probability distribution, in particular. Convergence in distribution Let { } be a sequence of random variables the behaviour of which is described by distribution functions ( ), and let X be a random variable described by a distribution function ( ). If lim →∞ ( ) = ( ) in the points of continuity of F, then is said to converge to X in distribution. The convergence in distribution is analogically defined for random vectors – all that must be done in the definition just presented is that { } needs to be understood as a sequence of random vectors, X is taken as a random vector and the distribution functions discussed must be viewed as distribution functions of the corresponding random vectors. If univariate random variables converge in distribution to a normal variable, the notation → (. , . ) is used to describe this fact. A similar notation is used for vectors of a multivariate normal distribution:

→ (. , . ). Asymptotic distribution We say that the distribution of an estimate b is asymptotically normal with an expected value and a variance / if √ ( − ) → (0, ) . We say that a vector of estimates b is asymptotically normally distributed with a vector of expected values and a covariance matrix

/ if √ ( − ) → ( , ). Here, we deal with a multivariate normal distribution N and the covariance matrix of a random vector. Regarding the covariance matrix, see the beginning of this chapter, where this term has been described for the vector of random components of a regression model, which is a random vector, as well. V in this general definition may not be, however, a diagonal matrix. Some other statistical properties are defined in the context of econometrics, as well. These are the efficiency and asymptotic efficiency of an estimate [2]. The precise definitions of these terms are rather technical, dealing with regular probability distributions. We shall not present these definitions in this text, and will focus mainly on (the best) unbiasedness, (the best) linear


20

unbiasedness to be yet introduced, and on the consistency and distribution of an estimate. These terms are especially important both for practical and theoretical purposes. 1.1.5 STATISTICAL PROPERTIES OF REGRESSION LINE Let us look at some of the statistical properties of the estimated parameters ina regression line provided the conditions for classical regression are satisfied. We shall be interested in the most important properties for the case of finite data samples, and the properties that do not have a straightforward proof will only be mentioned without presenting the proof. A more comprehensive coverage of the properties will be done in the more general case of multivariate regression models in the next chapter. Regarding the estimate in the model = + + , we may write

= ( , )( )

= ∑ ( )∙( )∑ ( )

= ∑ ( )∙∑ ( )

= ∑ , (1.12) where

=( − )

∑ ( − ) .

Similarly as in 1-12, we can write = ∑ − . Thus, both estimates can be expressed analytically in the form ∑ ℎ . Since this expression is a linear function of y, the estimates are called linear estimates. Taking as a random variable now, not its specific realization, thus using instead of in 1-12, and substituting with equation = + + in 1-12, we have

= ∑ ( )∙( )∑ ( )

= + ∑ , (1.13)

since ∑ ( − ) = ∑ ( − ) and ∑ ( − ) = 0. Therefore, ( ) = ( +∑ ) = + ∑ ( ) = because of the condition ( ) = 0 . Further, since ( ) =

( ∑ ) = ( + ∑ ) = + , it follows that ( ) = ( − ) = + − ( ) = . We can say that the least squares method leads to unbiased estimates

of the unknown parameters and . Let us calculate the variances of the estimated parameters now, as an exercise. Given 1-13 and given the fact that the random components of the model are uncorrelated, we may write ( ) = ( + ∑ ) = ∑ ( ) = ∑ = / ∑ ( − ) ,


21

because generally, (∑ ) = ∑ ( ) + 2 ∑ ( , ), .

As for the parameter , it can be adjusted, using 1-12, as = − = ∑ ( + + ) −

= + ( − ∑ ) + ∑ − = + ∑ , where = ( − ) [3]. Therefore, ( ) = ∑ . We are now interested in how good these estimates are. Theoretically, the data at hand could have been handled another way, which would have led to other estimates of the model parameters. For instance, other estimates can result from using a different criterion than 1-7. It turns out, however, that such efforts would have been fruitless because in the class of all linear unbiased estimates, we would not have found estimates with smaller variances. If the conditions for classical regression hold true, the least squares method leads to the best linear unbiased estimates of the unknown parameters and . The class of linear estimates could be too narrow, however, and the question arises what the best unbiased estimate looks like, whether it is linear or not. This question has an answer in our classical case, because if 1-5 is satisfied, the least squares method gives estimates which are the best among all unbiased estimates, whatever their analytical form is. 1.1.6 STATISTICAL INFERENCE The main objective of building a model, which includes estimating its parameters among other things, is to use it for statistical inference. Statistical inference concerns testing statistical hypotheses and constructing confidence intervals for unknown model coefficients. This will be our focus at the very end of this chapter on the regression line. To perform statistical inference, the variances of the estimated coefficients must be known. These were derived in the previous paragraphs, but the resulting equations contain the parameter which is almost always unknown. Therefore, we must start with finding its appropriate estimate = . Since = ( ) = ( ) − [ ( )] = ( ) and ( ) = , it looks like a reasonable candidate might be the expression ∑ / . And really, we are not far from the truth. We shall, however, alter the expression slightly to get a variance estimate with proper statistical properties. Let’s start with

= + + , = 1, 2, … , . (1.14)

Averaging both sides of 1-14, we get

= − − . (1.15)

Also, using 1-12, 1-14 and 1-15, we have


22

= − − = ( + + ) − −

= ( − − ) + + − ( − ) −

= ( − ) + ( − )( − ).

Therefore

= ( − ) − ( − ) ∑ . (1.16)

This equality implies that ∑ ( ) = ∑ = ( − 2) . In other words, = ∑ /( − 2) is an unbiased estimate of .

We have now all information we need to proceed with statistical inference. Since we know the unbiased estimate of , we know the unbiased estimates of variances of the two estimated parameters in the regression line: . ( ) = ∑ ; . ( ) = / ∑ ( − ) .

Symbols ( ) and ( ) are used more frequently instead of . ( ) and . ( ). We also know that the estimated coefficients are unbiased estimates. Last but not least, when a random variable, such as an estimate of a coefficient, is a linear transformation of another normally distributed random variable, the estimate itself is normally distributed. Therefore,

~ , ( ) . It follows then that ( − )/ ( )~ (0,1) . When the unknown variance ( ) is replaced with its unbiased estimate ( ), it can be proved that the variable obtained has the Student’s t distribution with − 2 degrees of freedom

( − )/ ( )~ . (1.17)

The construction of 1-17 is similar to that of the one-sample t-test criterion. The result 1-17 allows us to construct confidence intervals for the model coefficients, and to test hypotheses about these coefficients. We can test the hypothesis : = ,

where k is a given number, against the alternative hypothesis : ≠ . The null hypothesis will be accepted at the significance level of test if


23

( − )/ ( ) < ( ),

where the term on the right-hand side represents the corresponding critical value of the t-distribution, i.e. the value ( ), such that for a random variable ~ , ({| | ≥ ( )}) = .

In the opposite case, when ( − )/ ( ) ≥ ( ),

the null hypothesis is rejected. Usually, k = 0, and in that case, the statistical significance of is tested. Put in other words, it is tested whether there is any sense in adding the i-th regressor to the model. We may also construct confidence intervals for the parameters of the model. For the parameter

, the 100(1 − α) - per cent confidence interval is, according to 1-17, of the form

b − t , / ∙ s(b ) ≤ β ≤ b + t , / ∙ s(b ) , (1.18) where , / is the (1 − /2 ) - quantile of the t-distribution, i.e. < , / =

1 − . 1.2 MULTIVARIATE REGRESSION A regression line may be sufficient for the description of elementary relations. An example is the capital market line in the theory of investments. But for most purposes, this will be a too simple model, since the explained variable Y usually depends on more than one regressor. In these cases, we talk about multivariate regression, which is the contents of this chapter. We shall again explain the subject matter in three basic steps. Step one will concern the formulation of the model, its unknown coefficients will be estimated in step two, and in the final step three, statistical properties of the estimates will be discussed. Since multivariate regression is a generalization of the regression line, it is not surprising that many theoretical conclusions that will be formulated in this chapter resemble those that have been presented in the previous chapter. 1.2.1 MODEL FORMULATION The multivariate linear regression model is of the form

= + + + ⋯ + + , = 1, 2, … , . (1.19)

For k = 1, we get the case of regression line. Model 1-19 may be written in the matrix form =+ , where the symbols have the same meaning as before, in the case of line, except for the


24

matrix of regressors X appearing in this equation. In the case of multivariate regression, the matrix satisfies

=

⎝

⎜⎛

11. .

.

.

. .. .. .1

. .

. .

. ⎠

⎟⎞

.

The model is linear in terms of its parameters, but not necessarily in terms of its regressors (see chapter one). Therefore, an example of the multivariate linear regression model is the equation

= + + + . We shall still assume that the classical regression case is valid here, i.e. no theoretical problems occur when estimating the parameters of the model. Let us generalize the classical regression conditions for the case of multivariate regression. We shall do so, using the matrix notation, since this form of expressing ideas will be helpful in subsequent chapters. We assume that 1) ( ) = 0, = 1, 2, … , , which can also be expressed as

( ) =

( )( )..( )

=00..0

= .

Considering 1-19, condition 1) implies that ( ) = . 2)

( ) =

⎝

⎜⎜⎛

( ) ( )( ) ( )

. .

. ( )

. ( )

. .. .. .

( ) ( )

. .

. .

. ( )⎠

⎟⎟⎞

=

⎝

⎜⎛

000 0

. 0

. 0

. 0. .. .0 0

. .

. .

. ⎠

⎟⎞

.

This condition is related to the covariance matrix of the vector of random components = ( ). The element represents the covariance between the i-th and j-th component of the vector . The second condition says that all the random components have the same variance and are mutually uncorrelated: , = − ( ) = 0 for ≠ . Condition 2) may also be written as ( ) = , where I is the identity matrix. 3) The matrix of regressors = ( ), where is the i-th value of the j-th variable appearing

in the model, is nonrandom and its columns are linearly independent. Thus, the values in X are determined in advance by the statistician, the matrix is of × type and the rank of the matrix is k. For the rank condition to be satisfied, ≥ must be necessarily satisfied, i.e. if

< , then the rank of the matrix cannot be equal to k.


25

The third condition is important in relation to the rank of the matrix. The rank plays a significant role when the least squares estimation of the multivariate model coefficients is performed. Looking at the case of regression line, if the rank condition is not satisfied, the second column of X would be a multiple of the first column of ones. This would mean the second column would be a column of constants, and in that case, we would have ∑ ( − )2 = 0, being unable to use formula 1-9 due to division by zero. We would not be able to estimate the parameters of the regression line. As we shall see, a similar result would occur if we tried to estimate the parameters of the multivariate linear regression model provided the matrix X had linearly dependent columns. 4) The vector of random components appearing in 1-19 has a multivariate normal distribution. Summarizing the conditions related to the vector , we may write: ~ ( , ). 1.2.2 THE LEAST SQUARES ESTIMATION OF MODEL PARAMETERS Let us now derive the formula that estimates the unknown parameters of model 1-19. The estimates are obtained using the least squares method, which means that the objective is to find values , , … , which minimize expression

( ) = ∑ ( − − − − ⋯ − ) . (1.20)

The expression is understood as a function of the variables , , … , . To find their proper values means, as in the case of the regression line, to solve an optimization problem. To do so, partial derivatives of 1-20 are calculated and set to zero. Setting = − − − −⋯ − for the residuals of the model, we get

( ) = 2 ∑ ∙ (− ) = 0, = 0, 1, … , (1.21)

Writing = ( , , … , ) and = ( , , … , ) , = − , 1-21 may be rewritten as

( ) = − = − ( − ) = − + = , and therefore the solution for satisfies

= . (1.22)

Expression 1-22 represents a set of the so-called normal equations we have resolved in the case of the regression line. If the inverse ( ) exists, the set of equations has the unique solution

= ( ) . (1.23)

At , the sum of least squares ( ) may theoretically be either minimized or maximized, or there may be no extreme of the function, as well. It can be shown that the function is minimized in this


26

case, and therefore, 1-23 is the solution we were looking for. Formulas 1-9 and 1-23 give the same result, or put another way, 1-9 is a special case of 1-23. The vector of coefficients can be obtained uniquely only if the inverse ( ) exists, which brings us back to the conditions required in the case of classical regression. If the matrix X does not have the full rank, i.e. its columns are linearly dependent, the matrix does not have its inverse, and 1-23 cannot be applied. It is clearly seen why the classical regression conditions include the condition for the rank of the matrix of regressors. Once the model is estimated, the question about the statistical quality of the estimates arises again. 1.2.3 STATISTICAL PROPERTIES OF LEAST SQUARES We shall derive some basic properties of the estimates, the other more complex results will only be mentioned as a result. The vector of estimates , viewed as a random vector,satisfies

= ( ) = ( ) ( + ) = + ( ) , (1.24)

and so its expected value satisfies

( ) = ( ) + {( ) }, = + ( ) ( ), = + . (1.25)

When calculating the expected value of a vector, recall that this means that the expected value of each of its components is calculated. Expression 1-25 holds because the vector of unknown parameters is a vector of constants and the conditions 1) and 3) of classical regression apply. This means that the vector is an unbiased estimate of the vector . This is a generalization of the result obtained for regression lines. For regression lines, we have also derived the variances of the estimated coefficients. We shall generalize this procedure in the case of the multivariate model, as well. When working with a random vector, however, we talk, as we already know, about its covariance matrix = ( ), where denotes the element in the i-th row and j-th the column of the matrix, or the covariance between the i-th and j-th component of the random vector. For the vector , we can write = , = ( − ( )) ∙ − ( ) . The entire covariance matrix of the vector is then ( ) = {( − ( ))( − ( )) },

or, in a more readable form according to 1-25,

( ) = {( − )( − ) }. Since the condition 3) of classical regression holds, as well as 1-25, we may use the following three matrix rules: ( )AB B A , 1 1( ) ( )A A , var( ) var( )A A Ax x for a random vector x. Thus,


27

( ) = {( − )( − ) },

= {( ) (( ) ) }, = ( ) ( ) ( ) , = ( ) ( ) , = ( ) . (1.26)

What matters most is the result which tells us how to find the covariance matrix of the vector of estimates. In fact, for a regression line we can also construct the covariance matrix of its estimated coefficients, since a line contains two parameters, or a vector of two parameters. The diagonal of such matrix, a special case of 1-26, would contain the variances of and , which have been derived, and the two matrix elements off its main diagonal would represent the covariances ( , ) and ( , ) , ( , ) = ( , ) . Matrix 1-26 is always symmetric, given the definition of covariance. Using the matrix ( ) , it can be shown that the estimate + + + ⋯ + is the best unbiased estimate of + + + ⋯ + , provided the conditions of classical regression are satisfied and the coefficients were estimated by the least squares method according to 1-23. As far as the probability distribution of is concerned, we assume that the random vector has a multivariate normal distribution. Therefore, the vector Y, a linear transformation of , and the vector , a linear transformation of Y, are also normally distributed. Summarizing the results so far, we have

~ ( , ( ) ). (1.27)

For the sake of completeness, let us also say that under additional and very general conditions imposed on the behaviour of X, is also a consistent estimate of [3]. Relation 1-27 may be used for statistical inference. 1.2.4 STATISTICAL INFERENCE We shall test the hypothesis about the components of and construct confidence intervals for them, which is going to be an analogy to the case of the regression line. The starting point is expression 1-27. For the i-th component of , we have ~ ( , ), where is the i-th diagonal element of the matrix ( ) . This implies that

( − )/ ~ (0,1). (1.28) If we knew the parameter , we could use 1-28 for statistical inference. However, the parameter is rarely known, and so we have to resort to its reasonable estimate, as in the case of the regression line. The use of an estimate will change the probability distribution of 1-28. It can be shown that an unbiased estimate of is = ∑ /( − ), where p denotes the number of model parameters. For our generally defined multivariate model, = + 1. This result is a generalization of the case of the line, where we had = 2. Using the estimate,


28

( − )/ ~ , (1.29)

where represents the Student’s t-distribution with − degrees of freedom. Expression 1-29 can be used to test the hypotheses about the parameters and for construction of their confidence intervals. Therefore, formulas 1-17 and 1-18 hold with different degrees of freedom of the t-distribution. Let’s demonstrate the procedures in classical regression by an example. EXAMPLE 2

The following table contains data on a variable Y which is assumed to depend on variables and , the relation being of the form = + + + + . The goal is to estimate the unknown parameters ’s, test their significance and construct a 95% confidence interval for the parameters. Table 4: Data for Example 2

y 1 2 4 7 8 10 12 9 x1 0 1 1 2 2 3 3 4 x2 2 0 2 1 0.5 1 0.5 0

Source: own Solution: The model can be written as = + + + + , where = , so we are working with three explanatory variables. For this purpose, let us expand the original table to table Table 5

y 1 2 4 7 8 10 12 9 x1 0 1 1 2 2 3 3 4 x2 2 0 2 1 0.5 1 0.5 0

x3= x1.x2 0 0 2 2 1 3 1.5 0 The matrix of regressors is


29

=

⎝

⎜⎜⎜⎜⎛

1 01 11 11 21 21 31 31 4

2 00 02 21 2

0.5 11 3

0.5 1.50 0 ⎠

⎟⎟⎟⎟⎞

.

Hence

=8 16

16 447 9.5

9.5 21.57 9.5

9.5 21.510.5 10.25

10.25 20.25

,

( ) =1.615 −0.5−0.5 0.205

−0.797 0.180.26 −0.113

−0.797 0.260.18 −0.113

0.585 −0.199−0.199 0.186

.

The vector of estimates is

= ( ) =1.615 −0.5−0.5 0.205

−0.797 0.180.26 −0.113

−0.797 0.260.18 −0.113

0.585 −0.199−0.199 0.186

∙ ∙

⎝

⎜⎜⎜⎜⎛

12478

10129 ⎠

⎟⎟⎟⎟⎞

,

=0.712.4

−0.141.04

.

The model is of the form = 0.71 + 2.4 − 0.14 + 1.04 . Inserting the vector of the i-th values ( , , ) in the resulting model, we can calculate the fitted values . This further allows us to evaluate the residuals = − . Table 6 shows the calculations (i = 1, 2,…, 8) together with second powers of the residuals. Table 6: Calculations in Example 2

y y-fitted e e2

1 0.43 0.57 0.3249 2 3.11 -1.11 1.2321 4 4.91 -0.91 0.8281 7 7.45 -0.45 0.2025


30

8 6.48 1.52 2.3104 10 10.89 -0.89 0.7921 12 9.4 2.6 6.76 9 10.31 -1.31 1.7161

The residual sum of squares is = ∑ = 14.166. The parameter is therefore estimated as 14.166/(8 − 4) ≈ 3.54. The variances of the estimated coefficients are estimated as Table 7: Coefficients and their estimated variances

coefficient b0 1.615 5.7171 b1 0.205 0.7257 b2 0.585 2.0709 b3 0.186 0.65844

Knowing the estimates of the variances, we can now construct 95% confidence intervals for the model coefficients. For this purpose, we use formula 1-18, where − 2 degrees of freedom are replaced with − 4 degrees of freedom, since there are = 4 parameters in the model. For

= 0.05, we have , , = 2.776, which provides the following confidence intervals

0.71 − 2.776√5.7171 ≤ ≤ 0.71 + 2.776√5.7171 = (−5.93, 7.35),

2.4 − 2.776√0.73 ≤ ≤ 2.4 + 2.776√0.73 = (0.028, 4.77),

−0.14 − 2.776√2.0709 ≤ ≤ −0.14 + 2.776√2.0709 = (−4.13, 3.86),

1.04 − 2.776√0.658 ≤ ≤ 1.04 + 2.776√0.658 = (−1.21, 3.29). In the final step, we are going to test the significance of the model coefficients. To test the significance of the i-the coefficient means to examine the validity of the null hypothesis : =0. We can use the statistic

=( )

,

just like in the case of regression line. If the null hypothesis is true, the statistic has the t-distribution with n-p degrees of freedom. For each of the unknown coefficients, we have


31

Table 8: Test of coefficient significance

Coefficient T b0 0.3 b1 2.82 b2 -0.094 b3 1.28

A coefficient is considered significant if | | ≥ ( ), because this is when the null hypothesis is rejected. In the example above, only is significant, since ( ) = 2.776. None the less, let us note that the absolute term is included in the regression models, as well, for various reasons even if it seems insignificant by the statistical test. The conclusion for our example could be such that we might be satisfied with modelling the behaviour of the variable Y with a regression line, reflecting the effects of a single regressor. The other regressors seem to have no effect on Y. 1.2.5 ADJUSTED COEFFICIENT OF DETERMINATION Let us return to the coefficient of determination which was previously used to assess the suitability of a regression line. In the multivariate case, the coefficient is calculated the same way. For the last example, the coefficient is = 93.84/107.87 = 0.87.

If insignificant variables were excluded from the model (except for the absolute term), we would work with a regression line. A change in the number of regressors will generally have an effect on the estimates of and in the new model. The least squares method results in a line

= 1.291 + 2.66 . The coefficient of determination for the line is 0.79. As can be seen, adding new variables to the model of line increased the coefficient of determination. This is a general property of the coefficient. More regressors in the model mean that the coefficient will not decrease. This property is a weakness of the criterion. As will be seen later, “enriching” a simpler model with an insignificant regressor worsens the precision of model parameters. The quality will be worse in the sense that their variances will rise. The coefficient of determination, however, does not capture this deterioration. On the contrary, it signals a model improvement. Therefore it is necessary to have another criterion available which would penalize a model that resulted from an extension of a simpler model through adding insignificant regressors. The so-called adjusted coefficient of determination ranks among the criteria that have this property (to an extent). It is defined as

= 1 − (1 − ). (1.30) When another variable is added to the model, will not decrease, but for a low enough t-test statistic of this variable, will decrease. It is convenient to use the adjusted version of the


32

criterion when the quality of two models is compared, one of the model being an extension of another. The higher the criterion 1-30, the better the model. The adjusted coefficient can also take on negative values. When this happens, it is zero by definition. 1.2.6 TEST OF MODEL The end of the chapter on classical regression is devoted to the statistical test often used in relation to regression. It is the F-test of model significance. The test examines the validity of the hypothesis : = = ⋯ = = 0.

Thus, the test asks the question whether any of the regressors is significant. The t-tests of significance concerned each regressor alone. Although it might seem proper to test significance of the regressors by applying a series of t-tests, such a procedure, as is known in statistics, is not adequate in the statistical sense of the word. If the regression model is of the form 1-19, the F-test criterion can be written as

= //( )

, (1.31) which has the Fisher’s distribution with k and n – k– 1 degrees of freedom if the null hypothesis is true. If the criterion is greater than (or equal) to the critical value of the distribution,

, ( ), the null hypothesis is rejected for the significance level of test . Expression 1-31 can also be written as

= /( )/( )

. (1.32) Returning to Example 2, we can test significance of the original model with all three explanatory variables. Using 1-32, we have

=0.87/3

(1 − 0.87)/(8 − 3 − 1) = 8.92,

which is greater than the critical value F , (0.05) = 6.59. The hypothesis that the model is insignificant is rejected. At the very end, it is important to note that it happen that all regressors of a model are insignificant, as suggested by t-tests, but the model as a whole is significant by the F-test. It may even be said that this situation occurs fairly often. The contradiction suggests there is a strong multicollinearity in the model, a problem we will deal with in the subsequent sections of this text. In the final but important note, let us say that many of the results concerning statistical properties of parameter estimates can be attained without the assumption of normality. Some of the results are weaker – for instance, the estimates are not the best unbiased but only the best linear unbiased, and some statistical properties hold only asymptotically.


33

Summary of terms:

- Regression line - Multivariate regression - Linear regression model - Model coefficients - Random component of the model - Regressor - Explained and explanatory variable - Classical regression - The least squares estimation - Normal equations - Statistical inference in regression - T-test of coefficient significance - F-test of model significance - Confidence interval for a model coefficient - Expected value of a coefficient - Variance of a coefficient - Covariance matrix of vector of coefficients - Linear estimate - Unbiased and best unbiased estimate - Best linear unbiased estimate - Consistent estimate - Asymptotic distribution - Fitted value - Residual, residual sum of squares - Coefficient of determination - Adjusted coefficient of determination - Statistical, economic and econometric verification of a model

Questions

1. Given the data in the following table, estimate the regression line (= model 1), describe

the dependence of the variable y on the variable x, and calculate the fitted values of y, using the model. Make a judgement on the quality of the model, using the coefficient of determination.

x 6 7 6 8 11 12 7 8 9 14 10 10 13 y 3 2 4 1 -3 -4 2 0 -1 -6 -2 -3 -5

Source: own


34

2. The table below contains data on three variables. Find estimates of the unknown parameters in model 2: = + + + , and evaluate the quality of the model with the coefficient of determination.

x1 6 7 6 8 11 12 7 8 9 14 10 10 13

x2 1.3 4.5 7.8 12.2 -2.3 -5.4 4.3 12.1 -1.1 -6.5 -2.1 -2.6 -4 y 3 2 4 1 -3 -4 2 0 -1 -6 -2 -3 -5

Source: own

3. Compare Models 1 and 2 with the coefficient of determination and its adjusted version. Discuss which of the models seems more suitable.

4. Estimate the variances of the coefficients , from the models 1 and 2, and compare

the variances. For which of the two models are the variances greater? 5. Test the significance of the regression coefficients in Model 1 and 2 (alpha = 0.05). 6. Construct a 95% confidence interval for the parameter of Model 1 and 2. Compare the

intervals. Which of the intervals is broader? Explain why.

7. What is the difference between the random component of the model and its residual. 8. Explain the advantage of the adjusted coefficient of determination, as compared to its

unadjusted version. 9. Explain the main idea behind the least squares estimation.

10. Explain the idea of a consistent estimate. Answers to questions

1) 1

0

1

13 121 12 10.24442121 1209 211 1.19981

bb

1b (X X) X y

The fitted values iy satisfy 0 1ˆi iy b b x , and so x 6 7 6 8 11 12 7 8 9 14 10 10 13 y-fitted 3.05 1.85 3.05 0.65 -2.95 -4.15 1.85 0.65 -0.55 -6.55 -1.75 -1.75 -5.35 The coefficient of determination is


35

132

2 113

2

1

ˆ119.15077 0.9693.

122.923

ii

ii

y yR

y y

2) 1

0

1

2

13 121 18.2 12 9.917121 1209 20.6 211 1.16718.2 20.6 501.6 165.5 0.018

bbb

1b (X X) X y

132

2 113

2

1

ˆ119.22 0.9698.122.923

ii

ii

y yR

y y

3) The coefficients of determination are similar for both models. Thus, from the practical point of view, the two models are more or less of the same quality. The problem is that the coefficient will be greater automatically for the second model because there are more regressors in the second model. Since the models have a different number of regressors, it is more suitable to compare their quality with the adjusted coefficient of determination:

Model 1: 2 2.

13 1(1 (1 )) 0.966513 2adjR R

.

Model 2: 2. 0.9638adjR .

The two coefficients are again similar, though not the same. The coefficient is a bit higher for the first model, signalling that the second regressor x2 does not have a significant influence on the variable y. 4) For Model 1,

1.236 0.112450.11245 0.012

1(X X)

and the residual sum of squares is 13

2

1

ˆ( ) 3.77i ii

y y

. Therefore, the estimate of the variance of 0b

is (3.77 / (13 2)) 1.1236 0.385 . For 1b , we have (3.77 / (13 2)) 0.012 0.004 . For Model 2,

2.699 0.2687 0.0870.2687 0.0276 0.00860.087 0.0086 0.0048

1(X X)


36

and the residual sum of squares is13

2

1

ˆ( ) 3.704i ii

y y

. Thus, the variance estimate for 0b is

(3.704 / (13 3)) 2.699 1 . For 1b , the estimate is (3.704 / (13 3)) 0.0276 0.01 . In model 2, both coefficients have more than twice as large variances as those in Model 1. 5) The test criterion is )( jj bsbT , where )( jbs is the estimate of the standard deviation of jb , or

its standard error. The criterion is compared to the critical value of the Student’s t distribution with 13-2 = 11 degrees of freedom (the case of model 1) or 13-3 = 10 degrees of freedom (the case of Model 2).

Model 1:

T Crit. value b(0) 16.5034 2.200985159 b(1) -18.97 2.200985159

Both parameters can be regarded as significant, since the absolute value of T exceeds the critical value in both cases. Model 2:

T Crit. value b(0) 9.916589 2.22813884 b(1) -11.6735 2.22813884 b(2) 0.428599 2.22813884

The test confirmed that the variable x2 does not have a significant effect on y, as has been demonstrated by the adjusted coefficients of determination for Model 1 and 2. It is not reasonable to include the variable in the model (it is not reasonable to extend Model 1 to Model 2) because it is probably the primary cause of higher variances of the coefficients in Model 2. We shall discuss this phenomenon in the second chapter of the text. 6) The general form of the confidence interval is 1 1 1 1( ( ) ( ), ( ) ( )).n p n pb s b t b s b t For Model 1,

the interval is ( 1.34, 1.06), for Model 2, the interval is ( 1.39, 0.94) . The second interval is 1.5 times wider than the first interval, which is a result of the higher variance of the estimated regressor.

7) Residual is an estimate. 8) See section 1.2.5. 9) See section 1.2.2. 10) See section 1.1.4.


37

2 PROBLEMS IN CLASSICAL REGRESSION

Goal:

This chapter covers various problems that may appear in classical regression. These problems concern measurement errors in data, the problem of how many regressors should be included in the model, the problem of multicollinearity, heteroscedasticity and autocorrelation. Time to learning:

11 hours. So far, we have assumed that all the conditions of classical regression are satisfied. This means we have worked under ideal conditions. Starting in this chapter, the theory will begin to divert from the ideal state, since it does not happen very often in reality that all the conditions are met. Violation of the conditions has different negative effectson the model which is analysed. Some of the problems are related to how the model is formulated, others are linked to imperfections in the data used for building the model, and are not related to the model itself. The explanation of the theory that follows will begin with the problem of multicollinearity, the problem of how many regressors should be inserted in the model, and the problem of measurement errors that might occur when gathering the data. The frequently occurring problems of heteroscedasticity and autocorrelation will then follow. 2.1. MULTICOLLINEARITY The problem of multicollinearity arises when the condition of the linear independence of the columns of matrix X is either violated or almost violated. It has been explained that when applying the least squares method, the linear dependence of the columns prohibits using 1-23 to get estimates of the unknown coefficients. This is the case of so-called perfect multicollinearity. When this happens, some explanatory variables (regressors) can be expressed as linear combinations or more linear combinations of other explanatory variables. Multicollinearity rarely occurs in its perfect form, and when it does, it rather suggests that the model was ill-constructed (for instance, too many regressors were included in the model). More often, the condition of the linear independence of the columns of X is satisfied, i.e. the matrix has the full rank, equal to the number of its columns. However, to a certain degree, an approximate linear dependence among the regressors exists. In that case, we talk about the problem of imperfect multicollinearity. This type of approximate dependence substantially increases the variances of the coefficient estimates, which means that the precision of the estimates is reduced. As a result, the estimates will vary substantially from one data sample to another. These consequences are shown in Example 3, which follows after this chapter. Another consequence can be seen in the construction of the t-test criterion, used to test the significance of


38

model coefficients. A stronger linear dependence among regressors distorts the conclusions based on the t-tests – the null hypothesis is accepted more often than it should be, due to the increased variance of the coefficient estimates. Thus, some regressors will appear to be insignificant only because their values were not defined properly. The increased variances of the coefficient estimates will, of course, also affect the width of the confidence intervals, constructed for model coefficients. To summarize, multicollinearity reduces the precision of the model as well as conclusions made on the basis of statistical inference. On the positive side, multicollinearity does not introduce any bias to the estimated coefficients – they remain unbiased. This is evident from the derivation of unbiasedness, in which a specific form of the matrix of regressors played no role. Multicollinearity originates for many reasons. It can be the case that regressors, the natural character of which is such that they simply are related to each other, enter the model. This is often the case in economic sphere, where one variable develops in accordance with another variable. Another reason occurs when delayed variables are used in the model. Such models are called dynamic, since they capture the dynamics of time – not only is the present value of a regressor used in the model, but also its values from the past are used: , , … . The model may then look like this: = + + + . Such regressors are usually strongly correlated due to certain inertia embedded in their development over time. When discrete regressors (variables whose range of values is a finite set) are used in modelling, multicollinearity may even be perfect when the model is wrongly specified. These situations are usually mentioned in the remarks, related to modelling seasonalities, the remarks serving as a warning against the perfect multicollinearity. The following example portrays the consequences of multicollinearity. EXAMPLE 3

Using the data from Table 9, estimate the coefficients in the model = + + + and calculate their variances. Table 9: data for Example 3

y 10 8 6 4 2 0 -1 -3 x1 0 1 0 2 1 3 2 4 x2 0 1 2 -1 -3 -4 2 2

Source: own Solution: The least squares method gives estimates = 7.73, = −2.76, = −0.03, and the residual sum of squares is = 40.01. Thus, = 40.01/(8 − 3) = 8. Using and the inverse


39

( ′ ) =0.316 −0.118 −0.004

−0.118 0.073 0.004−0.004 0.004 0.026

,

we get for the estimated variances of the estimated coefficients: . ( ) = 2.5 , . ( ) = 0.58 and . ( ) = 0.2. The two regressors are not highly correlated: the

correlation coefficient equals 0.1.

Let us assume now that the values of the regressors are as is depicted in Table 10. The correlation between the variables is -0.98.

Table 10

x1 5 4 3 2 0 -1 -2 -4 x2 1 2 3 5 6 7 8 9 y . . . . . . . .

We have not inserted the values of Y, since they will be different for the different values of the regressors. Let the values be such that the estimate will remain the same. This allows showing the effect of the different values of the regressors on the variance estimates. Using the inverse ( ′ ) ,

23.9 −3.67 −4

−3.67 0.58 0.62−4 0.62 0.68

,

we now have: . ( ) = 191.2, . ( ) = 4.64 and . ( ) = 5.44 . What a difference! The variance of the absolute term, for instance, rose more than 76 times! When analysing multicollinearity, we are mainly interested in detecting its approximate form. Several techniques exist for these purposes. A simple approach lies in constructing the sample correlation matrix which contains paired correlation coefficients for individual pairs of model regressors. If any of the matrix elements is greater than 0,8 in absolute value, this suggests the linear dependence is harmful. This approach is simple, but has a major disadvantage: it measures the dependence among the regressors only by paired correlation. More complex linear relations among them are not taken into account. Therefore it is recommended that multiple correlation coefficients be calculated. This coefficient measures the strength of the linear dependence of a selected regressor on all the other regressors. If any of the coefficients seems high, multicollinearity is present. It is a known fact that the multiple correlation coefficient is related to the coefficient of determination from the auxiliary regression, and so the coefficient of determination itself may serve as an indicator of multicollinearity. As is known, the square root of the coefficient of determination gives the multiple correlation coefficient. There are various rules of thumb, based on experience from empirical studies, which suggest how high the coefficient of determination must be in order for the multicollinearity to be harmful. These rules say that if the coefficient is higher than 0.8, one should be concerned with the multicollinearity.


40

Sometimes, a less strict criterion is used, advising caution when the coefficient is higher than 0.9. Last but not least, the test ofsignificance of is recommended, as well, i.e. relations

= + + ⋯ + + , ≠ , = 1, 2, … , − 1 = 1, 2, … , (2.1)

are potentially assumed and the F-tests of model significance are run for these regression. For Model 2-1, the test criterion is

=/( − 1)

(1 − )/( − ).

If any of the regressions is confirmed as significant, multicollinearity is present. Sometimes, it is also a good habit to compare with the coefficient of determination calculated for the original regression model describing the dependence of Y on the regressors. If any of the coefficients is greater than , the multicollinearity is again considered harmful. Since the multiple correlation coefficient is closely related to , it can also be used as an indicator of a too strong linear relation among the regressors.

Having detected a too strong multicollinearity, the question is how to solve the problem. There are more ways of lessening the problem, although none of them necessarily leads to a completely satisfactory result. One possibility is to expand the size of the data sample being processed. Of course, there are two problems with this approach: gathered data is usually a precious item and no expansion of the experiment might later be available due to financial reasons, for instance; also, this simple technique assumes that multicollinearity is milder in the corresponding wider population. If this is not true, one cannot expect any improvement in the character of the expanded data sample. Even controlled data expansion, leading to a muted linear dependence, requires a careful setting of the additional values of the regressors by the analyst, and this may be an overly tricky thing to do.

Another possibility is to try to find other information, not in the form of additional data, about the model itself. This information may take the form of equality or inequality or another mathematical constraint that should be incorporated into the model. Such information reduce the variances of the estimated coefficients, although the information must be accurate or, more generally, not too inaccurate. If there is any amount of inaccuracy in the information, the consequence is that the estimated coefficients will have a lower variance, but they will not be unbiased anymore, which would complicate statistical inference. Yet another possibility is to exclude from the model the regressor which generates the unwanted linearity among the regressors. In the better case, such a regressor may actually have no effect on the modelled variable Y (see chapter 2.2.1.). In the worse case, when it will have a nonzero effect on Y, its exclusion may, as we shall learn, bring more harm than benefit.

Other special techniques for dealing with multicollinearity exist, as well. They include the so-called ridge regression and the technique of principal components. It must be stressed that the techniques are not always welcome, since they may reduce the variances of the estimated coefficients, but render them biased, generally speaking. Biasedness is not a desired property


41

because it complicates the statistical inference to be done with the resulting model. If the amount of biasedness was known, inference could still be performed, but with these techniques, it depends on unknown parameters. The method of principal components reduces the number of variables used so that the loss of information incurred is minimal in a sense, but the technique is sensitive to the physical units used for the regressors, and the interpretation of the estimated coefficients is also complicated, since the resulting estimates are a mixture of the original estimates.

Selecting a reasonable approach to solving the problem with multicollinearity depends on the character of the situation, the experience of the analyst, and also on the purpose for which the model is to be utilized. The best way to avoid multicollinearity is prevention, of course, which means a suitable design of the matrix of regressors if it is possible.

2.2. NUMBER OF REGRESSORS

The conclusions based on regression analysis depend on what regressors were used in the model. An optimal scenario is such that the model works only with significant regressors, having put aside the insignificant ones. Of course, it is usually not known if this is the case or not. However, an analysis can be done as to what happens to the properties of the estimated coefficients if the model contains one or more insignificant variables or if it, on the contrary, lacks a significant regressor or more such regressors. Let us assume that the correct form of the model is = + , whereas a model of the form

= + + ∗ is worked with. Here, represents a matrix of insignificant regressors. This is the situation when there are too many regressors in the model. In the same way as in 1-25, it is straightforward to show that the coefficients estimated by the least squares will remain unbiased. There is, however, a problem with their variance (their efficiency). Only the greatest optimist can expect the additional insignificant regressors not to have any linear relation to the other, significant regressors enclosed in , at least to an extent. Usually there will be, at least approximately, a certain linear dependence between the values of the significant regressors and insignificant regressors. This will mean, as was seen in the case of multicollinearity, that the variances of the estimated coefficients will generally rise. In other words, the estimates will be less efficient. However, since unbiasedness is maintained, statistical inference can still be made (tests and intervals). An opposite situation arises when some important regressors are missing in the model. Mathematically speaking, we work with the model = + ∗ instead of the true model

= + + , where is the matrix of significant regressors. In this case, the consequences are far worse. The least squares method yields biased and also inconsistent estimates of the regression coefficients. The truth is that the lower variance of the estimates accompanies this procedure, but biasedness inhibits statistical inference. The inference is also hampered by the biasedness of the estimate = /( − ) in this case.

2.3. MEASUREMENT ERRORS

Another type of problem that can be encountered in modelling has to do with measurement errors contained in model variables. To keep the problem simple, we shall restrict the analysis to


42

the case of the regression line. The character of the errors must be mathematically described so that their consequences were more obvious. What is in this context usually assumed is

∗ = + , ~ (0, ),

∗ = + , ~ (0, ).

Here, , are variables not encumbered with measurement errors, whereas ∗, ∗ are variables containing errors; , are normally distributed random variables with zero mean and their own specific variance. It is also assumed that and are uncorrelated, and the same is true about the pairs , and , . What are the consequences of this situation? Let us look separately at errors contained in Y and errors contained in .

2.3.1. ERRORS IN Y

Let us assume the correct form of the model is = + , whereas we work with a model ∗ = + ∗, ∗ = + . Then

= ( ) ∗, = ( ) ( + ∗), = + ( ) ( + ).

Thus ( ) = + ( ′ ) ′ ( + ) = . Therefore, the vector of estimates remains an unbiased estimate. As far as the variance of the estimate is concerned, i.e. its covariance matrix, we get similarly as in 1-26 ( ) = ( + )( ) . Since > 0 , the estimate is no longer efficient. We can sum up that when the explained variable is measured with errors, the consequences for the estimate of unknown coefficients are the same as when the model contains too many regressors.

2.3.2. ERRORS IN X

In this case, a model of the form = ∗ + ∗, ∗ = − , is analysed, which is the same as in 1.4.1. The expected value satisfies ( ) = + {( ∗ ∗) ∗ ∗}, ( ) = + {( ∗ ∗) ∗ ( − )}. Here, the far-right term is not equal to a zero vector because the matrix ∗ contains the random component , and so the vector of regressors and the random component ∗ are not generally uncorrelated. Because of that {( ∗ ∗) ∗ ( − )} ≠ .


43

The result is that the estimate of unknown coefficients is generally biased preventing statistical inference. Moreover, it can be shown that it is also generally inconsistent, so we cannot rely on large-sample properties of least-square estimates either. The consequences of measurement errors contained in regressors are far more serious than in the case of measurement errors contained in the dependent variable. The same results are obtained for multivariate regression where, however, the situation is even worse in the sense that it suffices to have a measurement error in only one regressor, and all the other regressors will be contaminated by error, as well. To prevent measurement errors, the fundamental principle is prevention. This means that one should, for instance, draw on data that are officially published. It is not always possible to use this strategy, however. A technique of so-called instrumental variables might then be utilized as a method that solves problems with measurement errors. The technique provides biased estimates, but consistent, asymptotically unbiased and normally distributed estimates under fairly mild conditions. Let us demonstrate the idea behind the technique in the case of regression line = + + , = 1, 2, … , . The equation implies that = + + , so that we can write

( − ) = ( − ) + ( − ), = 1, 2, … , . (2.2)

Equation 2-2 can be rewritten in the form ∗ = ∗ + ∗, = 1, 2, … , , where the symbols with asterisk denote the corresponding deviation. Multiplying the equation with the i-th value of a variable ∗, which will be specified in a moment, and summing both sides of the equation for

= 1, 2, … , , we have, ∑ ∗ ∗ = ∑ ∗ ∗ + ∑ ∗ ∗. Let us note that if this equation was multiplied by (1/n), the left-hand side of the multiplied equation would denote an estimate of the covariance between Z and Y, and covariances would also be estimated on the right-hand side of that equation. If Z, the so-called instrumental variable, is selected in such a way that it is uncorrelated with but correlated with x, this will be reflected in the sample, and the bigger the sample, the better. We may then, using the sample values of the variables, calculate the coefficient estimates as

=∑ ∗ ∗

∑ ∗ ∗,

= − . The procedure can also be generalized and formulated for the case of multivariate regression. It must be stressed, however, that not always it is straightforward to find variables with the aforementioned statistical properties, i.e. variables uncorrelated with the random component of the model but correlated with the regressors. As the number of regressors contaminated with measurement errors increases, the difficulty of finding instrumental variables increases, as well. Again, it is much more convenient to pay a lot of attention to prevention, when it comes to


44

measurement errors. Sometimes, however, the nature of a regressor is such that errors are unavoidable in its case.

2.4. HETEROSCEDASTICITY

In previous chapters, we dealt with problems we can influence to a certain extent. To give an example, the problem of multicollinearity is such that it can be alleviated by selecting proper values of the regressors, provided that measurements of Y can be obtained for the modified values of the regressors. We can also affect the number of regressors, which, as was seen, may lead to some problems, as well, and the same is true, to an extent, about measurement errors. Starting with this chapter, our attention will be turned to problems we cannot affect because they are connected to the probability distributions that govern the behaviour of the random component . If this is the case, then we can either reconcile this fact, or we can try to modify the regression procedures to lessen the extent of the problem. We shall start with the problem of heteroscedasticity. From now on, only the case of multivariate regression will be the subject of the discussion, so that the conclusions were general enough, regardless of how many regressors there are in the model.

2.4.1. FORMULATION OF THE MODEL AND ITS CONSEQUENCES

The classical multivariate regression model is altered when heteroscedasticity is present – it is assumed that ( ) = , = 1, 2, … , . In other words, the random components do not have the same variance any more. We shall assume that all other classical regression conditions are still valid. The covariance matrix of the vector of random components is now of the form

( ) =

00

⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮…

. (2.3)

Expression 2-3 is usually written as Ω, which means that the elements on the main diagonal of the matrix are rescaled. It will become clear why this form of the covariance matrix is preferred. Heteroscedasticity is typically encountered in cross-sectional data. These are data that usually describe relations among subjects at one point in time. A progress of the relations in time, if such a development exists at all, is of no primary interest in cross-sectional data analysis. On the contrary, in time series analyses, where the time development of one or more variables is observed in the first place, heteroscedasticity is not typically present. As in the case of multicollinearity, heteroscedasticity originates for many reasons. It may be the case that an important explanatory variable is missing in the model. Heteroscedasticity can also be caused by measurement errors. This happens when the errors accumulate with an increase of one of the explanatory variables. The accumulation increases variability of the random component of the model. As shall be seen, nonconstant variances also appear when the analyst stops working with original data and starts to work with their averaged values. This is, generally speaking, the problem of aggregated data. Many other reasons explaining the origin of heteroscedasticity could be presented.


45

Regarding the consequences of heteroscedasticity when the least squares method of parameter estimation is used, the expected value of the estimate is

( ) = + ( ) ( ) = . (2.4) The estimate is still unbiased. It is obvious because the variability of the random components does not enter relation 2-4 in any way. As far as the covariance matrix of the estimate is concerned, the situation is different. Since ( ) = ( ). for any nonstochastic matrix A and any random vector v, we have

( ) = (( ) ), = ( ) ( )( ) . (2.5)

If heteroscedasticity isn’t present, matrix 2-5 will collapse to the matrix known from the classical regression, since then = . It can hardly be expected that the expression on the right-hand side of 2-5 is generally equal to ( ) , the term obtained in the classical regression. Additionally, if we realize that the vector of estimates is efficient when the classical regression conditions are satisfied, it is not surprising that the estimate, the covariance matrix of which is 2-5, cannot be more efficient than the one obtained in the classical regression. Trully, its efficiency is lower, and it is not, generally speaking, the best unbiased linear estimate any more. The question how much its efficiency is lower cannot be answered, however, as it differs case from case. The lower efficiency of is not the only consequence of heteroscedasticity. Another consequence is that the estimate = /( − ) of is biased, so any statistical inference based on the matrix ( ) is inappropriate, let alone the fact that the matrix has a different form, as compared to 2-5. The t-test of an individual model coefficient is based on this matrix, and the F-test of model significance is also designed under the condition of homoscedasticity. Thus, the tests are invalid.

2.4.2. THE GENERALIZED LEAST SQUARES

As we have seen, heteroscedasticity causes some problems – the parameters estimated by ordinary least squares cease to be precise enough, and similar consequences concern the subsequent statistical inference, if the inference can be done at all. The question is how to proceed when heteroscedasticity is present. One of the methods that tries to remove, or at least diminish, the problem is the so-called generalized least squares method. The main idea of this procedure is to transform the data, or model, so that it satisfies the condition of homoscedasticity. The method then applies the ordinary least squares to the transformed model. To demonstrate the technique, let us assume that a regular matrix P can be found so that = . Multiplying the model = + by P from the left, we have

= + . (2.6) Now, the transformation implies ( ) = ( ) , = ( ) ,


46

= ( ) , = . Thus, rewriting 2-6 to

∗ = ∗ + ∗, (2.7) where ∗ = , ∗ = , ∗ = , we can say that the random component ∗ now satisfies all the conditions of classical linear regression. Therefore, the proper estimate of the unknown coefficients satisfies

= ( ∗ ∗) ∗ ∗, = ( ) , = ( ) . (2.8)

The estimate was obtained under the conditions of classical regression, and so it is the best linear unbiased estimate, or even best unbiased estimate under normality. Its covariance matrix is

( ) = ( ∗ ∗) , = ( ) ,

= ( ) . (2.9)

Let us use a simple example to demonstrate how the matrix could look like. Assuming that the variance of the i-th random component of the model is = , i.e. it rises as the value of the k-th regressor increases, the covariance matrix of the vector of random components is

( ) =

00

⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮…

=

⎝

⎛0

0⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮… ⎠

⎞.

Since = ,

=

⎝

⎛1/ 0

0 1/⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮… 1/ ⎠

⎞.

Now, we are required to find the matrix P, so that = . It is easy to see that

=

⎝

⎜⎛

1/ 00 1/

⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮… 1/ ⎠

⎟⎞

. (2.10)


47

Expression 2-6 further implies that to write the general form of the regression model, i.e. to multiply equation 2-6 with matrix 2-10 from the left, means to perform the following division

= + + + ⋯ + + , = 1, 2, … , .

Strictly speaking, | | should be the divisor, but the simplification we have made by skipping the absolute value does not matter here, since the simplified transformation also leads to random components which satisfy the conditions of classical regression, that is the condition of homoscedasticity, among other conditions. This procedure can be used in a similar manner when group averages, or aggregated data, are used instead of original data. In this case, the original n observations are divided into G groups, and average observation is calculated within each group. This way, the original data are represented by group averages. This data organization spurs heteroscedasticity, since the variance of the random component of the model starts to shift as the size of the j-th group changes. This is the problem of aggregated data outlined earlier. Let us describe this situation formally. Instead of the original classical regression model, = + + + ⋯ + + , = 1, 2, … , , we work with a model

= + + + ⋯ + + , (2.11) where = ∑ , = ∑ , = ∑ ,

for 1, 2,...,j G , 1, 2,...,m k and data in the j-th group. There are G groups now. Since

= 1/ ∑ = , the variance of depends on the size of the j-th group. Thus, heteroscedasticity is present. A better way to estimate the parameters of 2-11 is therefore the technique of generalized least squares, or the corresponding transformation of 2-11 to 2-7, this being done by multiplying the j-th equation of 2-11 by . This is because the covariance matrix is in this case of the form

=

1/ 00 1/

⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮… 1/

,

which implies that


48

=

⎝

⎜⎛

00

⋯ 0⋯ 0

⋮ ⋮0 0

⋱ ⋮… ⎠

⎟⎞

.

We would like to make one final note concerning the group averages. To find the estimate of 2-11 means, as has just been presented, to find the parameters that minimize the expression ∑ ( − − − − ⋯ − ) . The element may now be regarded as a weight. This is why the procedure is also labelled as the method of the weighted least squares. We know how heteroscedasticity can come into existence, what effect it has on the least squares estimates and how it can be removed, at least theoretically. What remains is finding out how the problem could be detected. There are statistical tests for these purposes, and one of the most fundamental tests is the Goldfeld-Quandt test.

2.4.3. GOLDFELD-QUANDT TEST

Let the model of interest be = + with p unknown parameters and n observations. It is assumed that = . The idea of the test lies in dividing the n observations into two groups and running a separate least-squares regression in each group. A middle part of the original observations is left out and not used in the regressions. Using the two estimated regressions models, residuals are calculated separately for each group as well as the corresponding two residual sums of squares. A test criterion based on the sums is then constructed. There is often a discussion on how to create the two groups, because the conclusion of the test depends on the way the original data are divided. A general rule is to select the first and second group so that not more than one third of all the data are left out. The recommendation for the data division is such that the values of are sorted in the ascending order, and if there are 30 observations altogether, the first regression is applied to the first 11 observations of y corresponding to the first 11 sorted (lowest) values of , whereas the second regression is run, using the 11 values of y related to the 11 highest values of . Generally speaking, for n observations, it is recommended to skip the middle 8n/30 values. The number must be adjusted slightly if it is not an integer number (see the example below). The test examines the validity of the null hypothesis : The model satisfies the condition of homoscedasticity against the alternative hypothesis : = . We would like to draw attention to the formulation of the alternative hypothesis. If the form of heteroscedasticity is more complex, the test should not be used, and a different technique is


49

required to detect the problem in the model. The test also assumes a multivariate normality of the vector ofrandom components of the model. The test criterion is of the form

=(1)(2),

where (1) and (2) are the residual sums of squares calculated separately for each of the two groups. The numerator of T represents the bigger of the two sums of squares. The criterion is compared with the critical value of the Fisher’s distribution , ( ) , for the level of significance , where p is the number of parameters in the model, M1 is the number of observations in the group corresponding to (1) and M2 is the number of observations in the group corresponding to (2). If ≥ , ( ) , the null hypothesis is rejected. If the opposite inequality holds, the null hypothesis is accepted. EXAMPLE 4

Table 11 represents the result of a poll run among 17 companies, which differ from one another by the time of their existence (variable in years) and by their annual net profit (variable in millions of crowns). The dependence of the companies’ average monthly investments in information technology (variable y in thousands of crowns) on the two variables/regressors is examined. We shall perform the Goldfeld-Quandt test with respect to the variable , and if the test turns out to be significant, we will use the generalized least squares method to adjust the model.

Table 11: Data of Example 4

y 10 12 15 14 17 24 35 22 37 45 50 57 60 70 80 92 85 x1 2 2 2 2 2 5 5 5 5 8 8 8 8 10 10 10 10 x2 5 5 10 10 10 15 15 20 20 40 40 15 50 100 100 500 500

Source: own Solution: There are 17 observations, so we shall leave out the middle (8/30) ∙ 17 ≈ 5 values. Thus, the first group will contain 6 observations, and the same number of observations will be in the second group. We shall now comment on the results contained in the tables that follow. The following two tables are related to the test applied to . The first six observations, utilized for the first regression, are in the left table. In this case, the regression leads to a model =3.77 + 1.44 + 0.86 . The last six observations, used for the second regression, are in the right table. In the latter case, the regression results in a model = 0.72 + 7.08 + 0.034 . The last columns of the tables contain residuals of the related models.


50

The test criterion is

=(1)(2) =

76.1476.66 = 11.42.

The significance level of the test is set at five per cent, which means that the critical value equals , (0.05) = 9.276. Since the criterion exceeds the critical value, the hypothesis of

homoscedasticity is rejected. Table 12:1stgroup Table 13: 2ndgroup

y x1 x2 e y x1 x2 e

10 2 5 -1 57 8 15 -0.905 12 2 5 1

60 8 50 0.9059

15 2 10 -0.3333 70 10 100 -4.960 14 2 10 -1.3333 80 10 100 5.0396 17 2 10 1.6666 92 10 500 3.4603 24 5 15 0 85 10 500 -3.539

Similarly, the test could be applied to the model with respect to the second regressor. The test has shown the first regressor causes a heteroscedasticity of the form = . Therefore, we shall improve the estimates of the coefficients in the original model by using the generalized least squares method. This means that each variable of the model will be multiplied by 1/ , and the ordinary least squares will be applied to these transformed values. The transformed data are in Table 14. Table 14: Transformed data

y 5 6 7.5 7 8.5 4.8 7 4.4 7.4 5.6 6.3 7.1 7.5 7 8 9.2 8.5 x1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x2 2.5 2.5 5 5 5 3 3 4 4 5 5 1.9 6.3 10 10 50 50

Source: own Instead of the model = + + + , the model ∗ = + ∗ + ∗ + ∗ is used, where ∗ = / , ∗ = / a ∗ = 1/ . The ordinary least squares method, applied to the transformed model, gives the estimate = ( , , ) = (6.13, 0.056, 0.664) . The covariance matrix is in this case

( ) =0.291 −0.006 −0.706

−0.006 0.0003 0.01−0.706 0.01 2.439

,


51

where the first element on the diagonal is related to the parameter , the second element on the diagonal is related to and the last element, lying in the 3rd row and 3rd column of the matrix, concerns the parameter . If the ordinary least squares were applied to the original, untransformed model, the result would have been = ( , , ) = (−2.18, 6.813, 0.046) with the covariance matrix (see chapter 2.4.1.)

( ) = ( ′ ) ( ′ )( ′ ) =5.52 −1.41 0.02

−1.41 0.45 −0.00780.02 −0.0078 0.0003

.

Comparing the results, it can be seen that the transformation led to an approximately twice as low variance for the first two coefficients (according to the estimated variances, at least). Variances / are Table 15: Comparison of variance / for the least squares and generalized least squares

Least sq. Generalized least sq. b0 5.52 2.439 b1 0.45 0.291 b2 0.0003 0.0003

We shall finish this chapter by noting that the relation = represents only the simplest model that describes heteroscedasticity. Other, more complex models exist, as well. For instance, a statistical test can show that a regression of the form = + is more suitable for the description of . Such a relation will not be detected by the Goldfeld-Quandt test. In this model,

is a random variable and is often expressed as a linear function of the explanatory variables appearing in the original regression, i.e. = + + ⋯ + , or = ( + +⋯ + ) . To apply the generalized least squares method in this case means to estimate the parameters in the model for in the first place, so that the matrix P could be constructed. Following this procedure, however, leads to the estimate

= , (2.12) where the symbol represents the estimate of , and contains the estimates . The vector will have different statistical properties than the estimate ( ) that should have been used. For small-size data samples, the properties of 2-12 are unknown, but its asymptotic properties may help. For the definition of asymptotic properties, see the first chapter of this text. If the estimates are consistent, the generalized least squares method calculated according to 2-12 gives an estimate which has the same asymptotic properties as the estimate ( ) . These are the properties valid in the case of classical regression. This is the reason why the first attempts are usually aimed at finding consistent estimates , using various models for , when a more complicated form of heteroscedasticity occurs [4].


52

2.5. AUTOCORRELATION

Another problem that deserves extra attention is autocorrelation. Whereas heteroscedasticity typically occurs in cross-sectional data, and not in time series data, the opposite is true in the case of autocorrelated random components. Time series are the domain of autocorrelation, cross-sectional data less frequently so. We shall therefore replace the subscript i used in regression models with the subscript t which will denote a point in time. The general regression model takes the form = + + + ⋯ + + , = 1, 2, … , . The subject matter of this chapter will be presented similarly as in the previous chapter. Firstly, we shall be preoccupied with the general formulation of the model and the consequences of autocorrelation. Later, the removal of problems caused by autocorrelation will be discussed, and the final parts of the chapter will deal with the detection of autocorrelation.

2.5.1. FORMULATION OF THE MODEL AND CONSEQUENCES

When autocorrelation is present, it is assumed that all the conditions of classical regression hold, save for the assumption of zero covariance between the random components, which now satisfies the condition

( , ) = ≠ 0, , = 1, 2, … , , ≠ .

The condition of homoscedasticity: ( , ) = = > 0 is still satisfied. Under these conditions, the covariance matrix of the vector of random components of the model can be written as

( ) =……

… … … ……

or

( ) =

11

……

… … … …… 1

= , (2.13)

where denotes serial correlation between the i-th and j-th random component. Autocorrelation originates for many reasons. It typically results from the natural association of the modelled economic variables. It is a distinctive feature that the past values of these variables affect to an extent their future values, as well. The household consumption of electronics may serve as an example – it is logical that the more households supplied themselves with electronic devices in the past, the less they will buy them in the near future. Another reason behind autocorrelation is the situation when an important explanatory variable has not been included in the regression model, i.e. the model is incorrectly specified. Also, autocorrelation is a typical feature of dynamic models, which contain lagged variables. Autoregressive models are such an example – they contain not only explanatory variables, but also lagged-explained variables. Last


53

but not least, autocorrelation may also result from poor linear approximations of a nonlinear function. The roughness of the approximation can install some correlated structure within the model. As for the consequences of autocorrelation, their general formulation is more complex than in the case of heteroscedasticity because of the great variety of forms of autocorrelation. For instance, only random components that are placed next to one another on the time axis may be correlated, whereas all other components farther away from each other in time may not. The variety of forms of autocorrelation is suggested, after all, by matrix 2-13.

Nonetheless, only a few of the many models that can be used to describe the behaviour of random components is actually exploited in practice. This chapter will focus on one such model, the so-called autoregressive model of the first order, or AR(1). This model is used most often because it is simple and works surprisingly well in many situations. The model assumes that the random components satisfy

= + ,

where is another random component which meets all the conditions of classical regression, together with a new condition that it is uncorrelated with . The component is also called white noise. This model implies that the correlation between and equals , regardless of t, the correlation between and is equal to , regardless of t, and so on. Generally, the

= , = 0, 1, 2, …,

regardless of t. Let us return now to the consequences of autocorrelation that would have to be confronted if the ordinary least squares method was used to estimate the unknown parameters of the regression model. First of all, when looking at 2-4, it can be seen that the conclusions are the same as in the case of heteroscedasticity, as the derivation of the conclusions would be based again on working with the matrix of parameters , regardless of how the parameters look like. This means that even if the autocorrelation is present, the estimated regression coefficients remain unbiased. When it comes to their variances, let us first rewrite the relation = +

to = + = + + = ⋯ = + ∑ . (2.14)

If | | < 1, the first term on the right-hand side of 2-14 converges, in a certain well-defined sense [5], to zero, which implies that 2-14 can be rewritten as = ∑ .

Further, since the operation of “summing” and the operation of “calculating variance” can be interchanged, the last equation and the uncorrelated nature of the ′ result in

( ) = 1 − .


54

This means that the vector of estimated coefficients, obtained by the ordinary least squares, has a variance

( ) = ( ′ ) ,

where is the variance of the white noise. This has a clear consequence: if there is an AR(1) autocorrelation in the model, the least squares estimates cease to be efficient. Similar results are enforced by other forms of autocorrelation, and so it can be generally said that autocorrelation reduces the precision of the coefficients estimated by the ordinary least squares method. Another problem is the expression ( − )⁄ which is used to estimate the unknown variance . This estimate is not unbiased anymore, which means, among other things, that the statistical inference based on the standard F-test and t-tests is not ideal. The consequences of autocorrelation are therefore analogous to those provoked by heteroscedasticity.

2.5.2. AUTOCORRELATION AND THE GENERALIZED LEAST SQUARES METHOD

Since the consequences of autocorrelation are similar to those of heteroscedasticity, there is no reason to alter the approach to solving the problem of autocorrelation, and the procedure used in the case of heteroscedasticity can be applied, as well. The generalized least squares method leads to the same conclusions under autocorrelation, the only difference being the form of the transformation matrix P. Since we are working with the model AR(1) and the matrix satisfies 2-13, we can replace the element of the matrix with | |, getting

=

⎝

⎜⎜⎜⎛

11

…… …

1 …… …

……

… ……

… … …1 ⎠

⎟⎟⎟⎞

. (2.15)

The inverse to the matrix 2-15 is

=( )

⎝

⎜⎜⎜⎛

1 −− 1 +

0 0− 0

⋯ 0⋯ 0

0 −0 0

1 + −− ⋯

⋯ 0⋯ 0

… …0 0

⋯ ⋯⋯ 0

1 + −− 1 ⎠

⎟⎟⎟⎞

(2.16)

and the matrix P such that = is


55

=

⎝

⎜⎜⎜⎛

1 − 0− 1

0 00 0

⋯ 0⋯ 0

0 −0 0

1 0− ⋯

⋯ 0⋯ 0

… …0 0

⋯ ⋯⋯ 0

1 0− 1⎠

⎟⎟⎟⎞

. (2.17)

As in the case of heteroscedasticity, we may again ask the question what this matrix represents, when it comes to using it to transform the original model. Multiplying the model = + with P from the left, we get a set of equations that we would obtain if we multiplied the first equation of the set by the term 1 − , i.e.

(1 − ) = 1 − + 1 − + ⋯ + 1 − + 1 − , (2.18)

and the other equations (i=2,...,T) were transformed into equations

− = (1 − ) + − , + ⋯ + − , +( − ), = 2, … , . (2.19)

Formula 2-19 suggests that the equation describing the relation from the previous time period be multiplied by and substracted from the equation related to the current time t. Sometimes, only transformation 2-19 is used without applying 2-18 to the first equation of the model. The estimate obtained by applying the ordinary least squares method to the transformed model given by 2-19 is called the Cochran-Orcutt estimate. A more efficient estimate, however, is the one obtained by transforming the first equation, as well. If all the data are used, the transformations restore the case of classical regression, and so the resulting estimate has all the desired statistical properties.

In order to apply the generalized least squares method given by 2-18 and 2-19, a suitable estimate of the unknown correlation coefficient must be available. Such an estimate is

= ∑∑

, (2.20)

where ‘s are residuals obtained by the ordinary least squares method applied to the original model. The estimate r is a consistent estimate of .

2.5.3. DURBIN-WATSON TEST

To verify whether an AR(1) model can be assumed for the relation between the random components of a regression model, the Durbin-Watson test can be applied. The test scrutinizes the validity of the null hypothesis that there is no autocorrelation in the model. The alternative hypothesis assumes there is an AR(1) autocorrelation in the model. The test is carried out in several steps. First of all, the least squares estimates of the unknown coefficients of the original model are found, and the estimates are used to calculate the residuals of the model. Secondly, the test criterion


56

= ∑ ( )∑

(2.21)

is obtained. Special statistical tables with critical values (see the end of this text) are then utilized to evaluate the criterion. The tables list the critical values for the test for a given number of observations T, significance level and the number of model parameters k, excluding the absolute term. Two values are found in the tables – a lower limit and an upper limit . If the sample correlation r is positive, the null hypothesis is accepted if the test criterion T is greater than . It is rejected if the criterion is smaller than . If the correlation is negative, an alternative criterion ∗ = 4 − is calculated and evaluated the same way with the same critical values and . If any of the two test criteria is greater than and smaller than , the test is inconclusive. Nevertheless, if this is the case, it is recommended to assume that autocorrelation is present in the model as a precaution because this is highly probable in the time series under scrutiny.

The test is suitable for the detection of an AR(1) autocorrelation. It is a strong test. It can also be used for the detection of autocorrelations of the form = + , > 1 . The testing procedure is the same with the exception that appears in the numerator of 2-21 instead of

.

The Durbin-Watson’s test is subject to certain requirements. One of the conditions requires that there is no lagged dependent variable among the regressors of the original model. If the condition is not met, the so-called modified Durbin-Watson test must be applied instead. The null and alternative hypotheses are the same; the test criterion is of the form

= (1 − 0,5 ) ∙

( ), (2.22)

where d is the test criterion 2-21 and ( ) is the variance estimate for the coefficient of the lagged variable. Test criterion 2-22 has approximately the standard normal distribution, which means that if its absolute value exceeds the corresponding quantile or critical value of N(0,1), the null hypothesis is rejected. The disadvantage of the test is that it cannot be used if ( ) = 1/ . As in the case of heteroscedasticity, different and more complicated forms of autocorrelation also exist. Different forms then change the transformation matrix P, the properties of the generalized least squares estimates are proved differently, as well, and the detection of these autocorrelations requires other testing procedures. Example 5 shows the mechanism of working with the simple form of autocorrelation. EXAMPLE 5

The following Table 16 contains fictitious data on monthly household expenses on food in the Moravian region (in millions of Czech crowns). The data reflects the period from January 2000 (t = 1) to March 2001 (t = 15). In this example, = .


57

Table 16: Entry data for Example 5

yt 141 145 142 147 146 154 150 158 157 165 164 170 167 174 175 xt 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Source: own Given the character of the time series (see Fig.1), a linear regression model with a single explanatory variable = shall be used to describe the data. The parameters of the model will be estimated by the ordinary least squares method and then tested for the presence of autocorrelation with the Durbin-Watson test. If it turns out the model is burdened with a significant autocorrelation, its coefficients will be re-estimated with the generalized least squares method. It is assumed that autocorrelation might be the only problem in the model.

Figure 1: Time series yt

The ordinary least squares method leads to estimates

= = ( ) = 0,295 −0.028−0.028 0.0036 ∙ 2355

19552 = 136.662.543 .

Inserting the estimates back in the model, we get the fitted values = + for various values of t, and also the residuals = − and = − . All the calculations are in Table 17 as well as other data to be used in the test of autocorrelation.

Table 17: Residuals and other calculations

et et-1 (et - et-1) 2 et 2 et . et-1

1.8 3.24 3.25 1.8 2.1025 10.5625 5.85 -2.28 3.25 30.5809 5.1984 -7.41 0.17 -2.28 6.0025 0.0289 -0.3876 -3.37 0.17 12.5316 11.3569 -0.5729 2.08 -3.37 29.7025 4.3264 -7.0096 -4 2.08 42.6409 19.8025 -9.256 1 -4 29.7025 1 -4.45


58

-2.54 1 12.5316 6.4516 -2.54 2.91 -2.54 29.7025 8.4681 -7.3914 -0.62 2.91 12.4609 0.3844 -1.8042 2.82 -0.62 11.8336 7.9524 -1.7484 -2.71 2.82 30.5809 7.3441 -7.6422 1.74 -2.71 19.8025 3.0276 -4.7154 0.2 1.74 2.3716 0.04 0.348

Sum 272.547 85.9438 -48.7297

The Durbin-Watson test criterion = 272.547/85.9 = 3.1 . The estimated correlation between the random components of the model is, according to 2-21, = −48.73/85.9 = −0.56. The Durbin-Watson’s table of critical values shows that, given T = 15, the number of regressors without the absolute term = 1 and significance level = 0.05, the lower limit = 1.077 and the upper limit = 1.361. Since the correlation is negative, the alternative test criterion is calculated: 4 − = 0.9 and compared with the limits. The alternative criterion is too small, suggesting that there is an AR(1) autocorrelation in the model. Thus, the model will be re-estimated by transforming its terms according to 2-19 and 2-20. This means the first observations are multiplied by √1 − = 0.83, the other observations are reduced by the r-multiple of the corresponding observation from the preceding time period. Table 18 depicts the transformed data Table 18: Transformed data

yt* abs.term* t*

117.03 0.83 0.83 222.55 1.55 2.55 221.75 1.55 4.1 225.1 1.55 5.65 226.85 1.55 7.2 234.3 1.55 8.75 234.7 1.55 10.3 240.5 1.55 11.85 243.9 1.55 13.4 251.35 1.55 14.95 254.75 1.55 16.5 260.2 1.55 18.05 260.5 1.55 19.6 265.85 1.55 21.15 270.7 1.55 22.7

Now, applying the ordinary least squares method to the transformed data, we get a new estimate


59

= = ( ) = 0.139 −0.0137−0.0137 0.0017 ∙ 5387.285

44595.75 = 136.462.56 .

The residuals now look as is described by Table 19.

Table 19: Residuals from the transformed model

et et 2 et-1 (et - et-1) 2 et . et-1

1.64 2.69 4.50 20.25 1.64 8.18 7.38 -0.26 0.07 4.50 22.66 -1.17 -0.88 0.77 -0.26 0.38 0.23 -3.10 9.61 -0.88 4.93 2.73 0.37 0.14 -3.10 12.04 -1.15 -3.19 10.18 0.37 12.67 -1.18 -1.36 1.85 -3.19 3.35 4.34 -1.93 3.72 -1.36 0.32 2.62 1.54 2.37 -1.93 12.04 -2.97 0.97 0.94 1.54 0.32 1.49 2.45 6.00 0.97 2.19 2.38 -1.21 1.46 2.45 13.40 -2.96 0.17 0.03 -1.21 1.90 -0.21 1.05 1.10 0.17 0.77 0.18

For the new model, = 95.17/61.19 = 1.55 and = 11.71/61.19 = 0.19 . Since the correlation is positive, we work with the statistic = 1.55. The table at the end of this text provides us now with the lower limit 0.946 and the upper limit 1.54. The statistic DW is higher than the upper limit, suggesting we have managed to reduce the autocorrelation to a bearable level. This fact has also been reflected by the low sample correlation.

Summary of terms:

- Multicollinearity – perfect and imperfect - Measurement errors – errors in Y, errors in X - Method of instrumental variables - Heteroscedasticity and its consequences - Goldfeld-Quandt test - The generalized least squares in the case of Goldfeld-Quandt heteroscedasticity - The method of weighted least squares - Autocorrelation and its consequences


60

- The autoregressive process of order 1 and autocorrelation - Autocorrelation AR(1) and the generalized least squares - Cochran-Orcutt estimate - Durbin-Watson test, modified Durbin-Watson’s test

Questions

1. Using the table below, estimate the parameters of the model = + + + . A

possible heteroscedasticity of the form = is assumed. Find out with the Goldfeld-Quandt test whether any of the regressors generate the heteroscedasticity. The significance level is 10%. (When creating the groups for the test, leave out the middle three observations).

x1 21 24 21 21 24 25 26 26 26 27 31 31 31 x2 16 18 18 20 22 24 26 26 30 32 34 33 36 y 1 3 5 5 6 8 13 14 14 14 44 45 41

2. Let’s assume, regardless of the result in 1), that the model from 1) involves the

heteroscedasticity discussed, which is caused by . In fact, the result given by the test may not be credulous due to the small size of the data. Estimate the model parameters using the generalized least squares (for the case of heteroscedasticity).

3. What would the estimated variances of the coefficients of the model in 1) look like if the ordinary least squares were applied to the original data from the table above, in spite of the presence of the heteroscedasticity?

4. Estimate the variances of the coefficients acquired through the generalized least squares, and

compare them with the estimated variances of the coefficients obtained by the ordinary least squares applied to the original data.

5. Data from the two tables below are available. Find the parameters of the model = +

+ + , and verify with the Durbin-Watson test whether an autocorrelation is present in the model. The significance level is 5%.

t 1 2 3 4 5 6 7 8

xt1 3.3 3.4 3.5 3.5 3.4 3.3 3.4 3.2 xt2 5.9 6 6.2 6.3 6.3 5.9 5.9 5.8 yt 25.3 23.02 19.9 20.95 18.59 16.15 15.22 17.26

t 9 10 11 12 13 14 15 16 17

xt1 3.2 3.1 3.1 3.1 3.2 3.1 3.1 3 3 xt2 5.5 5.4 5.2 4.8 4.8 4.7 4.6 4.5 4.5 yt 18.98 20.09 18.65 17.79 20.84 16.69 18.33 16.79 16.48


61

6. Assume an AR(1) autocorrelation in the model from 5). Estimate the model parameters with

the generalized least squares. 7. How do the estimated coefficients of the model from 5) change if they are calculated by the

Cochran-Orcutt’s procedure? Compare the results also with the estimates obtained by the ordinary least squares.


1) The ordinary least squares give the estimates

0b = -90.1,

1b = 4.2, 2b = -0.08.

There are 13 values, thus the number of values to be left out in subsequent calculations involved in the Goldfeld-Quandt test is (8/30) x 13 = 3.46. We decided to skip the middle three observations. One regression is performed for the observations related to the five lowest values of the selected explanatory variable, another regression is performed for the observations related to the five highest values of this explanatory variable. The test criterion is compared with the 10% critical value of the distribution , , which equals 9. To test the heteroscedasticity caused by , the two regressions lead to the sums of squares 1Sr = 2.916 and 2Sr = 9.786, and so

2

1

3.35,SrTSr

which is smaller than 9. Therefore, the first explanatory variable does not seem to cause any heteroscedasticity. As for the variable , in this case, one works exceptionally with exactly the same data after sorting the values of the explanatory variable in the ascending order. Thus, the conclusion will be exactly the same. None of the regressors causes any heteroscedasticity. 2) The covariance matrix of the vector of random components of the model Ω contains zero

elements off its main diagonal. The i-th diagonal element of the matrix is 2 22 .ix The

transformation matrix is therefore diagonal, as well. Its i-th diagonal element is 2 22(1 / ).ix

The transformed model is * * * * * * *0 1 1 2 2 ,i i i iY x x

*2/i i iY Y x

*0 2

2*1 /1 ii xx

*1 0

21*2 / iii xxx

*2 1

We work with new data now:


62

*1ix 0.06 0.056 0.05 0.05 0.04 0.04 0.03 0.03 0.03 0.03 0.02 0.03 0.02

*2ix 1.31 1.16 1.05 1.33 1.09 1.04 1 1 0.86 0.84 0.91 0.93 0.86 *

iy 0.06 0.277 0.25 0.16 0.27 0.33 0.5 0.54 0.46 0.43 1.29 1.36 1.14 The ordinary least squares applied to these data give the estimates: = 0.69 = 2 , = -59.8 =

0 , *2 = 2.26 = 1 , and the resulting model is 1 2ˆ 59.8 2.26 0.69 .y x x

3) Using the ordinary least squares regardless of heteroscedasticity leads to the following covariance matrix of the vector of estimated coefficients

2 1 1var( ) ( ) ( ) ( ) ,X X X X X X b

Since

1

9.47 0.64 0.277( ) 0.64 0.05 0.025

0.277 0.02 0.01X X

, 9,181 250,343 265,049

( ) 250,343 6,929.337 7,397.103265, 049 7,397.103 7,975.921

X X

,

we have

2

7, 061.6 479.247 201.64var( ) 479.247 36.4 17.67 .

201.64 17.67 9.94

b

4) The covariance matrix of the vector of estimated coefficients of the transformed model is 2 1 1( ) .X X Therefore

( Ω ) =5,456.9 −366.4 151.6−366.4 27.8 −13.5151.6 −13.5 7.74

.

We may now compare the estimated variances for the ordinary and generalized least squares estimates, after the variances are divided by the unknown parameter 2 :

Ordinary least

squares Generalized least

squares b(0) 7,061.6 5,456.9 b(1) 36.4 27.8 b(2) 9.94 7.74

5) The least squares method yields estimates 0 3.5b 1 3.88b , 2 0.52b . The residuals are

*0

*1


63

t 1 2 3 4 5 6 7 8 9 e(t) 5.898 3.173 -0.439 0.566 -1.407 -3.256 -4.568 -1.703 0.17

t 10 11 12 13 14 15 16 17

e(t) 1.727 0.389 -0.266 2.401 -1.313 0.379 -0.719 -1.03 The sample correlation coefficient is positive:

17

1217

2

1

40.3 0.67.59.37

t tt

tt

e er

e

The Durbin-Watson statistic equals

172

12

172

1

( )71.84 1.21.59.37

t tt

tt

e eDW

e

Since there are two parameters in the model (excluding the absolute term), k = 2 and the number of observations is n = 17. The lower and upper limit for the test, given the parameters k and n, is dL = 1.015 and dH = 1.536. The test criterion lies between the two limits, meaning that it is not possible to determine whether an autocorrelation is or is not present. 6) To apply the generalized least squares method to the model 0 0 1 1 2 2t t t t tY x x x , where

0tx = 1 for t = 1,2.,…,17, means to perform the following data transformation:

For t = 1: for t = 2,3,…,17

tt yry )1( 2 1 ttt ryyy 0

20 )1( tt xrx 0,100 ttt rxxx

12

1 )1( tt xrx 1,111 ttt rxxx 2

22 )1( tt xrx 2,122 ttt rxxx

The parameter r is the usual estimate of the unknown correlation coefficient :

17

1217

2

1

.0.67.

t tt

tt

e er

e

After the transformation, the model * * * * *0 0 1 1 2 2t t t t tY x x x is estimated. Here,


64

t 1 2 3 4 5 6 7 8 9 10 *

0tx 0.74 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 *

1tx 2.45 1.19 1.22 1.16 1.06 1.02 1.19 0.92 1.06 0.96 *

2tx 4.38 2.05 2.18 2.15 2.08 1.68 1.95 1.85 1.61 1.72 *

ty 18.8 6.07 4.48 7.62 4.55 3.69 4.4 7.06 7.42 7.37

t 11 12 13 14 15 16 17 *

0tx 0.33 0.33 0.33 0.33 0.33 0.33 0.33 *

1tx 1.02 1.02 1.12 0.96 1.02 0.92 0.99 *

2tx 1.58 1.32 1.58 1.48 1.45 1.42 1.49 *

ty 5.19 5.29 8.92 2.73 7.15 4.51 5.23 The least squares method applied to the transformed data gives the estimates:

7) The procedure is similar to that in 6). The difference is that only the transformation of the original observations for t = 2,…,17 is done:

,1 ttt ryyy ,0,100 ttt rxxx ,1,111 ttt rxxx

.2,122 ttt rxxx Thus, the column for t = 1 in the tables from 6) is left out, and the ordinary least squares method is then applied to the remaining transformed data. This leads to the estimation of 10.98 for b0, 2.53 for b1 and -0.32 for b2. The ordinary least squares applied to the original, untransformed data gives the estimates:

0 3.516,b 1 3.88,b 2 0.52.b

0 7.31b

1 0.96b

2 1.65.b


65

3. SETS OF SIMULTANEOUS EQUATIONS

Goal:

This chapter introduces the reader to sets of simultaneous equations which have their own specific features that require altering the standard procedures used to estimate the unknown parameters of a model in a classical regression case. The chapter covers the problem of identification and presents frequently used methods of estimating unknown model coefficients. Time to learning:

4 hours. The presentation of fundamental regression concepts has focused so far on a model given by a single equation. In many cases, however, it is necessary to introduce more equations to describe properly all the relations among the variables of interest. This means that sets of regression equations are involved in the analysis. Sets of equations are applied to a great extent in economics, for instance, which often deals with situations that are supposed to describe a balance in the market. These situations require a description of the supply side with one equation, the demand side with another equation, and they also require a third equation to describe the balance of supply and demand. Generally speaking, different kinds of sets of regression equations exist, depending on what theoretical requirements the equations are supposed to comply with. The various kinds of sets differ in the extent to which the equations of a set are intertwined. The interconnection further depends on what variables appear in the equations in the first place. Since the classification of variables is important here, let us extend the vocabulary we have used to describe the variables of a regression model. Looking at a particular regression equation, we see that the explained variable is generated by the system which is defined by an equation. The variable is a product of that equation. We call such variables endogenous. On the contrary, variables which come into existence outside the system described by the equation, or they are not really a product of that equation, are called exogenous. So far, we have worked with a single regression equation. In that equation, the variable Y was endogenous and the regressors were exogenous. There is also another category of variables that are called lagged endogenous. These are, as is clearly suggested by the name, endogenous variables from previous time periods. To give an example, the equation = + + + + contains two lagged endogenous variables

and . All exogenous variables together with lagged endogenous variables are called predetermined variables. When it comes to different kinds of regression systems, one type is represented by M equations

= + , j = 1,2,…,M, where the equations may contain different regressors on their right-hand side and different coefficients, as well, but only exogenous variables appear on the


66

right-hand side of the equations. Also, the random components from different equations may be correlated. These systems are called seemingly unrelated equations. Although the systems look as something quite new, their parameters can often be estimated, using the theory that has been presented because the models consisting of one equation are actually a set of equations, each of the equations being related to a particular observation. We shall not be preoccupied with these systems, however, and will focus on sets of equations the mutual relations of which are much tighter because variables which are explanatory variables in one equation may be explained variables in another equation. These are the systems that are called simultaneous equations. An example of such a system is the set of two equations = + + + , = + + + . As we can see in this example, the variable is an explanatory variable in the first equations, whereas it is the explained variable in the second equation. Analogy holds true for the variable

. The two equations are outside the framework of classical regression because some of the regressors, being stochastic in the first place, are correlated with the respective random components of the model. To see this, we observe that depends on which depends on . Thus, depends on , and we cannot assume that there is no correlation between the two variables. This causes a problem. If a nonzero correlation exists between a regressor and the corresponding random component, the ordinary least squares method gives biased and inconsistent estimates. This implies we cannot use this method to estimate the parameters of each equation separately. And this problem has not been at all complicated in the text by a potential autocorrelation among the random components themselves, which would make things even worse. Methods exist, however, which provide us with the estimates of reasonable statistical properties. It can be shown that all of them, in fact, belong to the category of instrumental-variable technique. Some of these techniques are formally more complicated, others less complicated. In this chapter, two basic methods of estimation will be presented – the method of indirect least squares and the two-stage least squares method. Before doing so, however, the so-called identification problem must be resolved first for any set of simultaneous equations.

3.1 IDENTIFICATION

The problem of identification will be described for the following typical example of simultaneous equations: = + + , = + + , = . The first equation describes the dependence of supply on price level. The second equation describes the dependence of demand on price level. The third, a final equation, says that there is a balance in the market – the amount of supplied products is the same as the amount of products demanded by the market. Let us imagine now that we are analysts who want to estimate the unknown regression parameters appearing in the equations. If we do that for the second equation,


67

we will work with two sets of data, when performing the estimation: a data array for = = and a data array for . When we are done with the estimation, we may ask ourselves what

has actually been estimated. Is it the first equation or the second equation (because = =)? We can even multiply the first equation by a number a and the second equation by a

number b, getting ( + ) = + + ( + ) + ( + )

or

=++ +

++ +

++ = + + .

As it is suggested by the last equation, we might have even estimated a linear combination of the supply and demand equation. To sum up, the set of simultaneous equations is defined in such a way that we do not know what we estimate. In other words, we are confronted with the so-called problem of identification. Thus, before performing any estimation, conditions should be formulated, which will ensure that there is no problem with identification. The problem of identification also manifests itself in more than one way. Another way, perhaps more natural, is based on the relation between the so-called structural and the reduced form of the set of equations. Let us write the set of simultaneous equations for the times t= 1, 2, .., T in the structural form

+ + ⋯ + + + + ⋯ + = + + ⋯ + + + + ⋯ + =

……………………………………………………………………………..

+ + ⋯ + + + + ⋯ + = . (3.1)

Here, the y’s are endogenous variables and there are as many of them as there are equations. The x’s represent predetermined variables. In the matrix form, the equations can be written as

+ = , (3.2) where = ( , , … , ), = ( , , … , ), = ( , , … , )

and

=……… … … ……

, =……… … … ……

.

The set can also be written in the reduced form


68

= − + , = + . (3.3)

The question is whether the original coefficients of the set, contained in the matrices and , can be obtained from the coefficients of the matrix because this could be the way of estimating the parameters of 3-2. We might estimate the reduced coefficients of 3-3 and then reconstruct the original or structural coefficients, knowing that = − . If the original coefficients can be obtained uniquely from the reduced coefficients, the set 3-2 is said to be exactly identified and there is no problem with identification. If there are more equations in = − than is necessary to obtain the structural coefficients, the set 3-2 is said to be overidentified. In this case, there is no problem with identification either, but the question is how to utilize the abundant information on the coefficients as effectively as possible. If there are too few equations in = − to reconstruct the structural coefficients, the set 3-2 is said to be underidentified, and there is no way for us to “identify” the original coefficients. In order for the entire system of equations to be identified, each of its equations must be identified, which means that each equation must be either exactly identified or overidentified. Conditions exist that tell us which of the aforementioned cases occur, when working with a set of simultaneous equations. There are two conditions: the order condition and the rank condition. The former is necessary for identification, but not sufficient. Together with the latter, however, we get a sufficient condition of identification. Working with m equations, and endogenous and predetermined variables, the order condition for an equation is: the number of variables (endogenous and predetermined) excluded from the equation must be equal or greater than the number of endogenous variables in the system less one. the rank condition for an equation is: the rank of the matrix of parameters of all the variables (endogenous and predetermined) excluded from the equation is equal to m – 1. If the rank condition for an equation is met and the order condition is met in the form of equality, that equation is exactly identified. If the rank condition for an equation is met and the order condition is met in the form of inequality, that equation is overidentified. If all the equations are identified exactly, the whole system is exactly identified. If at least one equation is overidentified, the whole system is overidentified. EXAMPLE 6

Determine the identification form of the system = + + + , = + + + , where the y’s are endogenous and the x’s are predetermined.


69

Solution: As for the first equation, one variable is missing – the variable , whereas the number of endogennous variables in the system equals two. Thus, the order condition is met exactly. Analogously, only one variable is missing in the second equation, and so the order condition is met exactly in for the second equation, as well. Regarding the rank condition, let us rewrite the system as

− − − − = 0, − − − − = 0.

Tabulating the coefficients of the endogenous and predetermined variables, we have Table 20: Coefficients in the structural form of equations

The rank condition now is as follows: for the first equation, the corresponding matrix has a single element: − . And so, unless this parameter is zero, the rank condition is satisfied. For the second equation, the matrix of interest has a single element: − , and unless this element is zero, the rank condition is satisfied for this equation, as well. To sum up, unless the two parameters are zero, both equations are exactly identified.

3.2 ESTIMATION OF SIMULTANEOUS EQUATIONS

Let us focus now on how the unknown coefficients of the system could be estimated. We shall assume that the equations are either exactly identified or overidentified because there is no procedure to estimate underidentified systems. The methods of estimation can be divided into two categories: category one – is represented by the methods that are applied to each equation of the system separately; category two – is represented by the methods which are applied to the whole system at once. We shall deal with the first category which is represented particularly by the method of indirect least squares and by the method of two-stage least squares.

3.2.1 THE METHOD OF INDIRECT LEAST SQUARES

The principle of this method is simple, and has been outlined when the problem of identification was described. Let us assume a set of simultaneous equations in the structural form 3-2. This system can be written in the reduced form

= (− ) + = + ,

in which all the endogenous variables are functions of all the predetermined variables. We can apply the ordinary least squares method to each reduced-form equation separately, whereby acquiring the estimates of the parameters contained in the matrix . Knowing that, we can get estimates of the original parameters by solving the set of equations = − . If the system


70

3-2 is exactly identified, the equations = − have a unique solution for the estimates of ’s and ’s. This procedure is the method of indirect least squares, and is applicable only to

exactly identified systems of equations.

The method provides consistent, asymptotically unbiased and asymptotically normal estimates under relatively mild conditions. For the purpose of statistical inference, the estimated covariance matrix of the vector of estimated coefficients b for a given equation is the matrix

( ) = ( ) where = ( − ) ∑ , ’s being the components of the vector ( + ), T is the number of observations, p = number of parameters in the equation and

, are the matrices with estimated coefficients. Further, is the matrix the columns of which are values of the predetermined variables appearing in the equation and fitted values of the explanatory endogenous variables appearing in that equation. The fitted values are obtained by regressing each explanatory endogenous variable on all predetermined variables of the system. Thus, the matrix looks similar to that used in classical regression. The only difference is that the values of the endogenous variables appearing on the right-hand side of the equation are replaced with their respective fitted values. In other words, approximately ~ ( , ) where is the vector of unknown coefficients in the given equation.

EXAMPLE 7

Let us demonstrate the method for the system of equations = + + + , = + + + , which, as we have seen in Example 6, is exactly identified. To apply the method, the observations of Table 21 will be used. Table 21: Data for Example 7

t 1 2 3 4 5 6 7 yt1 2.3 2.1 2.4 2.3 2.5 2.7 2.7 yt2 3.4 3.3 3.4 3.4 3.5 3.6 3.6 xt1 13 14 14.6 14.6 15.6 16.8 17.1 xt2 23 22.5 23.7 23.7 23.9 24.4 24.5

t 8 9 10 11 12 13 14

yt1 3.1 3.2 3.2 3.1 3.3 3.2 3.4 yt2 3.8 3.9 3.9 4 4.1 4.1 4.2 xt1 17.2 17.3 17 16.6 16.5 16.6 16.5 xt2 24.7 24.8 24.9 24.6 24.9 24.9 24.9

Source: own


71

Solution: To simplify the notation, we shall not use the subscript t in what follows. We have

=+

1 − + 1 − + 1 − ++

1 −

= + + + .

=+

1 −+

1 −+

1 −+

+1 −

= + + + .

The ordinary least squares applied separately to each equation above result in = −11.4245 − 0.06098 + 0.6277 , = −6.8494 − 0.0964 + 0.4997 .

Since

=+

1 − , = 1 − , = 1 −

and

=+

1 − , = 1 − , = 1 −

we have = = 1.5808,

= = 1.256.

= (1 − ) = 0.0601, = (1 − ) = −0.4926, and because = (1 − ) − , = (1 − ) − ,


72

it also follows that = −2.821, = 11.211.

3.2.2 TWO – STAGE LEAST SQUARES

As opposed tothe indirect least squares, the method of two-stage least squares is applicable to sets of both exactly identified and overidentified equations. In the first step of the method, instrumental variables are searched for, which would serve as a suitable replacement for the endogenous variables, appearing as explanatory variables in the equations. Using these instruments, the ordinary least squares are then applied to each equation separately in the second step of the method, the instruments serving as a replacement for the endogenous explanatory variables. The most suitable instrument turns out to be the fitted values of the explanatory endogenous variable in question. The fitted values are obtained from regressing the given explanatory endogenous variable on all the predetermined variables of the system. The same statistical properties as in the case of indirect least squares hold for the resulting estimator here (the properties and the asymptotic distribution).

EXAMPLE 8

Estimate the unknown coefficients of the system below with two-stage least squares. = + + + , = + + + . The necessary observations are in Table 22. Table 22: Time series data for Example 8

yt1 5.1 6.3 6.2 6.7 6.8 5.8 6.1 6.5 6.6 6.6 7.2 7.5 8 9.3 yt2 12 12 13 13 13 13 14 15 14 15 15 13 13 14 xt1 21 22 23 22 23 24 25 24 26 27 31 33 34 37 xt2 7 8 7 6 6.4 6.5 7.1 7.9 8.1 8.1 8.2 8.2 8.3 8.4

Source: own Solution: This is the same set of equations as in the previous example; therefore the equations are exactly identified. We may proceed with the method. In the first step, we perform the regression = + + +


73

with the ordinary least-squares method. This gives the estimate = 10.49 + 0.025 +0.3 . The fitted values are in Table 23.

Table 23: Fitted values from equation = . + . + . .

13.1 13.4 13.2 12.8 13 13 13.3 13.5 13.6 13.6 13.7 13.8 13.8 13.9 In the second step, the ordinary least squares are applied to the equation = + + + ,

which gives the result = 9.07 − 0.59 + 0.211 .

Similarly for the second equation, the first step of the method - the ordinary least squares applied to the equation

= + + +

results in the estimate = 2.868 + 0.197 − 0.178 with theoretical values in Table 24.

Table 24: Fitted values from regression = . + . − . .

5.76 5.8 6.15 6.1 6.3 6.44 6.5 6.19 6.6 6.7 7.5 7.9 8.1 8.7 The second step – application of the ordinary least squares to the equation

= + + +

provides the result

= 10.13 + 0.126 + 0.324 .

To sum up, the two-equation system is estimated as

= 9.07 − 0.59 + 0.211 , = 10.13 + 0.126 + 0.324 .

Summary of terms:

- Simultaneous equations - Structural and reduced form of sets of equations - Endogenous, exogenous variable; explained, explanatory, predetermined variables - Identification – exactly identified, underidentified, overidentified systems - Order and rank condition for identification - Indirect least squares - Two-stage least squares


74

Questions

1. Estimate the following model with the two-stage least squares method

= + + , + + , = + + + , + ,

using the data:

t 0 1 2 3 4 5 6 7 yt1 6 6.3 6.2 6.4 6.6 6.7 6.6 6.8 yt2 1.8 2.1 2.3 2.5 2.6 3.1 3.5 3.4 xt1 12.1 12 12 12.1 12 11.6 11 xt2 1.1 1.2 0.8 0.7 1.2 1.4 1.5

t 8 9 10 11 12 13 14 15 yt1 6.5 6.9 7 7.1 7 6.9 7.3 7.5 yt2 3.7 3.8 4.2 4.2 4.5 4.6 4.7 4.8 xt1 11 11.4 11 11 10.7 11 10.5 10 xt2 1.5 1.6 1.5 1.7 2 2.1 2.2 2.2

2. Estimate the system from problem 1, with the indirect least squares and compare the result

with the two-stage least squares.

3. Explain the principle of the two-stage least squares.

4. Explain the principle of the indirect least squares.

5. Formulate the order and rank condition for an equation.

6. Explain what “predetermined variable” means, explain the difference between “exogenous” and “endogenous”.


1. First -step result:

1 1 2 1,1 1,2ˆ 5.72 0.03 0.14 0.057 0.37 ,t t t t ty x x y y 2 1 2 1,1 1,2ˆ 3.3 0.03 0.18 0.6 0.67 .t t t t ty x x y y


75

Second-step regression is:

1231,121101 ˆ ttttt yyxy

.22,13122102 ˆ ttttt yyxy The resulting estimates are:

,5ˆ0 1ˆ 0.1, 2ˆ 0.187, 3ˆ 0.5,

0ˆ 43.5,

1 1.18,

2ˆ 7.8,

3ˆ 2.3.

2. The final estimates (for the structural-form system) will be the same as in 1.

3. See section 3.2.2 again.

4. See section 3.2.1 again.

5. See section 3.1. again.

6. See the beginning of the chapter 3 again.


76

TABLES

Tables for the Durbin – Watson test: sig.level = 1%, dL = lower crit.value, dU = upper crit.value, n = data sample size, k = number of regressors excluding the absolute term (taken from [3])


77


78

Tables for the Durbin – Watson test: sig.level = 5%, dL = lower crit.value, dU = upper crit.value, n = data sample size, k = number of regressors excluding the absolute term (taken from [3])


79

References

[1] KOUTSOYIANNIS, A. Theory of econometrics. Palgrave MacMmillan, 2001, 681 p., ISBN 0-333-77822-7.

[2] AMEMIYA, T. Advanced Econometrics. Harvard University Press, 1985, 521 p., ISBN-13: 978-0674005600.

[3] GREENE, William H. Econometric analysis. Prentice Hall, 2012, 1198 p., ISBN 1-4419-13-139538-1.

[4] HUŠEK, Roman. Základy ekonometrické analýzy. VŠE Praha, 1996, 225 p., ISBN 80-707-9102-0.

[5] BROCKWELL, P.J., DAVIS, R.A. Time Series: Theory and Methods. Springer Science & Business Media, 2009, 577 p., ISBN-13: 978-0319-8.

Additional resources [1] WOOLDRIDGE, J.M. Introductory Econometrics. Macmillan Publishing, 2009, 868 p., ISBN 978-0-324-66054-8. [2] KENNEDY, P. A Guide to Econometrics. MIT Press, 2003, 623 p., ISBN 978-0262611831. [3]HAYASHI, F.Econometrics.Princeton University Press, 2000, 712 p., ISBN 0-691-01018-8.

Katedra, institut: Katedra managementu kvality, Fakulta metalurgie a materiálového

inženýrství, VŠB-TUO Název: Econometrics Autor: Filip Tošenovský Místo, rok, vydání: Ostrava, 2015, 1. vydání Počet stran: 80 Vydala: Vysoká škola báňská-Technická univerzita Ostrava Neprodejné ISBN 978-80-248-3847-2

econometrics study supports -...

Documents