gov 2000: 10. multiple regression in matrix form...whymatricesandvectors? ⢠hereâs one way to...
TRANSCRIPT
Gov 2000: 10. MultipleRegression in Matrix Form
Matthew BlackwellFall 2016
1 / 64
1. Matrix algebra review
2. Matrix Operations
3. Linear model in matrix form
4. OLS in matrix form
5. OLS inference in matrix form
2 / 64
Where are we? Where are wegoing?
⢠Last few weeks: regression estimation and inference with oneand two independent variables, varying effects
⢠This week: the general regression model with arbitrarycovariates
⢠Next week: what happens when assumptions are wrong
3 / 64
Nunn & Wantchekon
⢠Are there long-term, persistent effects of slave trade onAfricans today?
⢠Basic idea: compare levels of interpersonal trust (ðð) acrossdifferent levels of historical slave exports for a respondentâsethnic group
⢠Problem: ethnic groups and respondents might differ in theirinterpersonal trust in ways that correlate with the severity ofslave exports
⢠One solution: try to control for relevant differences betweengroups via multiple regression
4 / 64
Nunn & Wantchekon
⢠Whaaaaa? Bold letter, quotation marks, what is this?⢠Todayâs goal is to decipher this type of writing
5 / 64
Multiple Regression in Rnunn <- foreign::read.dta("../data/Nunn_Wantchekon_AER_2011.dta")mod <- lm(trust_neighbors ~ exports + age + male + urban_dum
+ malaria_ecology, data = nunn)summary(mod)
#### Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 1.5030370 0.0218325 68.84 <2e-16 ***## exports -0.0010208 0.0000409 -24.94 <2e-16 ***## age 0.0050447 0.0004724 10.68 <2e-16 ***## male 0.0278369 0.0138163 2.01 0.044 *## urban_dum -0.2738719 0.0143549 -19.08 <2e-16 ***## malaria_ecology 0.0194106 0.0008712 22.28 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.978 on 20319 degrees of freedom## (1497 observations deleted due to missingness)## Multiple R-squared: 0.0604, Adjusted R-squared: 0.0602## F-statistic: 261 on 5 and 20319 DF, p-value: <2e-16
6 / 64
Why matrices and vectors?
7 / 64
8 / 64
Why matrices and vectors?
⢠Hereâs one way to write the full multiple regression model:
ðŠð = ðœ0 + ð¥ð1ðœ1 + ð¥ð2ðœ2 + ⯠+ ð¥ðððœð + ð¢ð
⢠Notation is going to get needlessly messy as we add variables.⢠Matrices are clean, but they are like a foreign language.⢠You need to build intuitions over a long period of time.
9 / 64
Quick note about interpretation
ðŠð = ðœ0 + ð¥ð1ðœ1 + ð¥ð2ðœ2 + ⯠+ ð¥ðððœð + ð¢ð
⢠In this model, ðœ1 is the effect of a one-unit change in ð¥ð1conditional on all other ð¥ðð.
⢠Jargon âpartial effect,â âceteris paribus,â âall else equal,ââconditional on the covariates,â etc
⢠Notation change: lower-case letters here are random variables.
10 / 64
1/ Matrix algebrareview
11 / 64
Vectors
⢠A vector is just list of numbers (or random variables).⢠A 1 à ð row vector has these numbers arranged in a row:
ð = [ ð1 ð2 ð3 ⯠ðð ]
⢠A ð à 1 column vector arranges the numbers in a column:
ð =â¡â¢â¢â¢â£
ð1ð2â®
ðð
â€â¥â¥â¥âŠ
⢠Convention weâll assume that a vector is column vector andvectors will be written with lowercase bold lettering (ð)
12 / 64
Vector examples
⢠Vector of all covariates for a particular unit ð:
ð±ð =â¡â¢â¢â¢â¢â¢â£
1ð¥ð1ð¥ð2â®
ð¥ðð
â€â¥â¥â¥â¥â¥âŠ
⢠For the Nunn-Wantchekon data, we might have:
ð±ð =â¡â¢â¢â¢â£
1exportsð
ageðmaleð
â€â¥â¥â¥âŠ
13 / 64
Matrices
⢠A matrix is just a rectangular array of numbers.⢠We say that a matrix is ð à ð (âð by ðâ) if it has ð rows and ð
columns.⢠Uppercase bold denotes a matrix:
ð =â¡â¢â¢â¢â£
ð11 ð12 ⯠ð1ðð21 ð22 ⯠ð2ðâ® â® â± â®
ðð1 ðð2 ⯠ððð
â€â¥â¥â¥âŠ
⢠Generic entry: ððð where this is the entry in row ð and column ð
14 / 64
Examples of matrices
⢠One example of a matrix that weâll use a lot is the designmatrix, which has a column of ones, and then each of thesubsequent columns is each independent variable in theregression.
ð =â¡â¢â¢â¢â£
1 exports1 age1 male11 exports2 age2 male2â® â® â® â®1 exportsð ageð maleð
â€â¥â¥â¥âŠ
15 / 64
Design matrix in R
head(model.matrix(mod), 8)
## (Intercept) exports age male urban_dum malaria_ecology## 1 1 855 40 0 0 28.15## 2 1 855 25 1 0 28.15## 3 1 855 38 1 1 28.15## 4 1 855 37 0 1 28.15## 5 1 855 31 1 0 28.15## 6 1 855 45 0 0 28.15## 7 1 855 20 1 0 28.15## 8 1 855 31 0 0 28.15
dim(model.matrix(mod))
## [1] 20325 6
16 / 64
2/ MatrixOperations
17 / 64
Transpose
⢠The transpose of a matrix ð is the matrix created byswitching the rows and columns of the data and is denoted ðâ².
⢠ðth column of ð becomes the ðth row of ðâ²:
ð = â¡â¢â¢â£
ð11 ð12ð21 ð22ð31 ð32
â€â¥â¥âŠ
ðâ² = [ ð11 ð21 ð31ð12 ð22 ð32
]
⢠If ð is ð à ð, then ðâ² will be ð à ð.⢠Also written ðð
18 / 64
Transposing vectors
⢠Transposing will turn a ð à 1 column vector into a 1 à ð rowvector and vice versa:
ð±ð =â¡â¢â¢â¢â¢â¢â£
1ð¥ð1ð¥ð2â®
ð¥ðð
â€â¥â¥â¥â¥â¥âŠ
ð±â²ð = [ 1 ð¥ð1 ð¥ð2 ⯠ð¥ðð ]
19 / 64
Transposing in R
a <- matrix(1:6, ncol = 3, nrow = 2)a
## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 2 4 6
t(a)
## [,1] [,2]## [1,] 1 2## [2,] 3 4## [3,] 5 6
20 / 64
Write matrices as vectors⢠A matrix is just a collection of vectors (row or column)⢠As a row vector:
ð = [ ð11 ð12 ð13ð21 ð22 ð23
] = [ ðâ²1
ðâ²2
]
with row vectorsðâ²
1 = [ ð11 ð12 ð13 ] ðâ²2 = [ ð21 ð22 ð23 ]
⢠Or we can define it in terms of column vectors:
ð = â¡â¢â¢â£
ð11 ð12ð21 ð22ð31 ð32
â€â¥â¥âŠ
= [ ðð ðð ]
where ðð and ðð represent the columns of ð.⢠ð subscripts columns of a matrix: ð±ð⢠ð and ð¡ will be used for rows ð±â²
ð .21 / 64
Design matrix
⢠Design matrix as a series of row vectors:
ð =â¡â¢â¢â¢â£
1 exports1 age1 male11 exports2 age2 male2â® â® â® â®1 exportsð ageð maleð
â€â¥â¥â¥âŠ
=â¡â¢â¢â¢â£
ð±â²1
ð±â²2â®
ð±â²ð
â€â¥â¥â¥âŠ
⢠Design matrix as a series of column vectors:
ð = [ ð ð±1 ð±2 ⯠ð±ð ]
22 / 64
Addition and subtraction
⢠How do we add or subtract matrices and vectors?⢠First, the matrices/vectors need to be comformable, meaning
that the dimensions have to be the same.⢠Let ð and ð both be 2 à 2 matrices. Then, let ð = ð + ð,
where we add each cell together:
ð + ð = [ ð11 ð12ð21 ð22
] + [ ð11 ð12ð21 ð22
]
= [ ð11 + ð11 ð12 + ð12ð21 + ð21 ð22 + ð22
]
= [ ð11 ð12ð21 ð22
]
= ð
23 / 64
Scalar multiplication
⢠A scalar is just a single number: you can think of it sort oflike a 1 by 1 matrix.
⢠When we multiply a scalar by a matrix, we just multiply eachelement/cell by that scalar:
ðð = ð [ ð11 ð12ð21 ð22
] = [ ð Ã ð11 ð Ã ð12ð Ã ð21 ð Ã ð22
]
24 / 64
3/ Linear model inmatrix form
25 / 64
The linear model with newnotation
⢠Remember that we wrote the linear model as the following forall ð â {1, ⊠, ð}:
ðŠð = ðœ0 + ð¥ððœ1 + ð§ððœ2 + ð¢ð
⢠Imagine we had an ð of 4. We could write out each formula:
ðŠ1 = ðœ0 + ð¥1ðœ1 + ð§1ðœ2 + ð¢1 (unit 1)ðŠ2 = ðœ0 + ð¥2ðœ1 + ð§2ðœ2 + ð¢2 (unit 2)ðŠ3 = ðœ0 + ð¥3ðœ1 + ð§3ðœ2 + ð¢3 (unit 3)ðŠ4 = ðœ0 + ð¥4ðœ1 + ð§4ðœ2 + ð¢4 (unit 4)
26 / 64
The linear model with newnotation
ðŠ1 = ðœ0 + ð¥1ðœ1 + ð§1ðœ2 + ð¢1 (unit 1)ðŠ2 = ðœ0 + ð¥2ðœ1 + ð§2ðœ2 + ð¢2 (unit 2)ðŠ3 = ðœ0 + ð¥3ðœ1 + ð§3ðœ2 + ð¢3 (unit 3)ðŠ4 = ðœ0 + ð¥4ðœ1 + ð§4ðœ2 + ð¢4 (unit 4)
⢠We can write this as:
â¡â¢â¢â¢â£
ðŠ1ðŠ2ðŠ3ðŠ4
â€â¥â¥â¥âŠ
=â¡â¢â¢â¢â£
1111
â€â¥â¥â¥âŠ
ðœ0 +â¡â¢â¢â¢â£
ð¥1ð¥2ð¥3ð¥4
â€â¥â¥â¥âŠ
ðœ1 +â¡â¢â¢â¢â£
ð§1ð§2ð§3ð§4
â€â¥â¥â¥âŠ
ðœ2 +â¡â¢â¢â¢â£
ð¢1ð¢2ð¢3ð¢4
â€â¥â¥â¥âŠ
⢠Outcome is a linear combination of the the ð±, ð³, and ð® vectors
27 / 64
Grouping things into matrices
⢠Can we write this in a more compact form? Yes! Let ð and ð·be the following:
ð(4Ã3)
=â¡â¢â¢â¢â£
1 ð¥1 ð§11 ð¥2 ð§21 ð¥3 ð§31 ð¥4 ð§4
â€â¥â¥â¥âŠ
ð·(3Ã1)
= â¡â¢â¢â£
ðœ0ðœ1ðœ2
â€â¥â¥âŠ
28 / 64
Matrix multiplication by a vector
⢠We can write this more compactly as a matrix(post-)multiplied by a vector:
â¡â¢â¢â¢â£
1111
â€â¥â¥â¥âŠ
ðœ0 +â¡â¢â¢â¢â£
ð¥1ð¥2ð¥3ð¥4
â€â¥â¥â¥âŠ
ðœ1 +â¡â¢â¢â¢â£
ð§1ð§2ð§3ð§4
â€â¥â¥â¥âŠ
ðœ2 = ðð·
⢠Multiplication of a matrix by a vector is just the linearcombination of the columns of the matrix with the vectorelements as weights/coefficients.
⢠And the left-hand side here only uses scalars times vectors,which is easy!
29 / 64
General matrix by vectormultiplication
⢠ð is a ð à ð matrix⢠ð is a ð à 1 column vector⢠Columns of ð have to match rows of ð⢠Let ðð be the ðth column of ðŽ. Then we can write:
ð(ðÃ1)
= ðð = ð1ð1 + ð2ð2 + ⯠+ ðððð
⢠ð is linear combination of the columns of ð
30 / 64
Back to regression
⢠ð is the ð à (ð + 1) design matrix of independent variables⢠ð· be the (ð + 1) à 1 column vector of coefficients.⢠ðð· will be ð à 1:
ðð· = ðœ0 + ðœ1ð±1 + ðœ2ð±2 + ⯠+ ðœðð±ð
⢠Thus, we can compactly write the linear model as thefollowing:
ð²(ðÃ1)
= ðð·(ðÃ1)
+ ð®(ðÃ1)
31 / 64
Inner product
⢠The inner (or dot) product of a two column vectors ð and ð(of equal dimension, ð à 1):
âšð, ðâ© = ðâ²ð = ð1ð1 + ð2ð2 + ⯠+ ðððð
⢠If ðâ²ð = 0 we say that the two vectors are orthogonal.⢠With ð = ðð, we can write the entries of ð as inner products:
ðð = ðâ²ðð
⢠If ð±â²ð is the ðth row of ð, then we write the linear model as:
ðŠð = ð±â²ðð· + ð¢ð
= ðœ0 + ð¥ð1ðœ1 + ð¥ð2ðœ2 + ⯠+ ð¥ðððœð + ð¢ð
32 / 64
4/ OLS in matrixform
33 / 64
Matrix multiplication
⢠What if, instead of a column vector ð, we have a matrix ðwith dimensions ð à ð.
⢠How do we do multiplication like so ð = ðð?⢠Each column of the new matrix is just matrix by vector
multiplication:
ð = [ð1 ð2 ⯠ðð] ðð = ððð
⢠Thus, each column of ð is a linear combination of thecolumns of ð.
34 / 64
Properties of matrix multiplication
⢠Matrix multiplication is not commutative: ðð â ðð⢠It is associative and distributive:
ð(ðð) = (ðð)ðð(ð + ð) = ðð + ðð
⢠The transpose: (ðð)â² = ðâ²ðâ²
35 / 64
Square matrices and the diagonal
⢠A square matrix has equal numbers of rows and columns.⢠The identity matrix, ðð is a ð à ð square matrix, with 1s along
the diagonal and 0s everywhere else.
ð3 = â¡â¢â¢â£
1 0 00 1 00 0 1
â€â¥â¥âŠ
⢠The ð à ð identity matrix multiplied by any ð à ð matrixreturns the matrix:
ððð = ð
36 / 64
Identity matrix⢠To get the diagonal of a matrix in R, use the diag() function:
b <- matrix(1:4, nrow = 2, ncol = 2)b
## [,1] [,2]## [1,] 1 3## [2,] 2 4
diag(b)
## [1] 1 4
⢠diag() also creates identity matrices in R:diag(3)
## [,1] [,2] [,3]## [1,] 1 0 0## [2,] 0 1 0## [3,] 0 0 1
37 / 64
Multiple linear regression in matrixform
⢠Let ð· be the matrix of estimated regression coefficients and ᅵᅵbe the vector of fitted values:
ð· =â¡â¢â¢â¢â£
ðœ0ðœ1â®
ðœð
â€â¥â¥â¥âŠ
ᅵᅵ = ðð·
⢠It might be helpful to see this again more written out:
ᅵᅵ =â¡â¢â¢â¢â£
ðŠ1ðŠ2â®ðŠð
â€â¥â¥â¥âŠ
= ðð· =â¡â¢â¢â¢â£
1ðœ0 + ð¥11ðœ1 + ð¥12ðœ2 + ⯠+ ð¥1ððœð1ðœ0 + ð¥21ðœ1 + ð¥22ðœ2 + ⯠+ ð¥2ððœð
â®1ðœ0 + ð¥ð1ðœ1 + ð¥ð2ðœ2 + ⯠+ ð¥ðððœð
â€â¥â¥â¥âŠ
38 / 64
Residuals
⢠We can easily write the residuals in matrix form:
ᅵᅵ = ð² â ðð·
⢠The norm or length of a vector generalizes Euclidean distanceand is just the square root of the squared entries,
âðâ = âð21 + ð2
2 + ⯠+ ð2ð
⢠We can write the norm in terms of inner product: âðâ2 = ðâ²ð⢠Thus we can compactly write the sum of the squared residuals
as:âᅵᅵâ2 = ᅵᅵâ²ï¿œï¿œ
=ð
âð=1
ð¢2ð
39 / 64
OLS estimator in matrix form
⢠OLS still minimizes sum of the squared residuals
arg minðââð+1
âᅵᅵâ2 = arg minðââð+1
âð² â ððâ2
⢠Take (matrix) derivatives, set equal to 0⢠Resulting first order conditions:
ðâ²(ð² â ðð·) = 0
⢠Rearranging:ðâ²ðð· = ðâ²ð²
⢠In order to isolate ð·, we need to move the ðâ²ð term to theother side of the equals sign.
⢠Weâve learned about matrix multiplication, but what aboutmatrix âdivisionâ?
40 / 64
Scalar inverses
⢠What is division in its simplest form? 1ð is the value such thatð1ð = 1:
⢠For some algebraic expression: ðð¢ = ð, letâs solve for ð¢:
1ððð¢ = 1
ðð
ð¢ = ðð
⢠Need a matrix version of this: 1ð .
41 / 64
Matrix inverses
⢠Definition If it exists, the inverse of square matrix ð, denotedðâ1, is the matrix such that ðâ1ð = ð.
⢠We can use the inverse to solve (systems of) equations:
ðð® = ððâððð® = ðâðð
ðð® = ðâððð® = ðâðð
⢠If the inverse exists, we say that ð is invertible or nonsingular.
42 / 64
Back to OLS
⢠Letâs assume, for now, that the inverse of ðâ²ð exists (weâllcome back to this)
⢠Then we can write the OLS estimator as the following:
ð· = (ðâ²ð)â1ðâ²ð²
⢠Memorize this: âex prime ex inverse ex prime yâ sear it intoyour soul.
43 / 64
Understanding check
⢠Suppose ð² is ð à 1 and ð is ð à (ð + 1).⢠What are the dimensions of ðâ²ð?⢠True/False: ðâ²ð is symmetric.
ⶠNote: A square matrix is symmetric if ð = ðâ².
⢠What are the dimensions of (ðâ²ð)â1?⢠What are the dimensions of ðâ²ð²?⢠What are the dimensions of ð·?
44 / 64
Implications of OLS
⢠We can generalize some mechanical results about OLS.⢠The independent variables are orthogonal to the residuals:
ðâ²ï¿œï¿œ = ðâ²(ð² â ðð·) = 0
⢠The fitted values are orthogonal to the residuals:
ᅵᅵâ²ï¿œï¿œ = (ðð·)â²ï¿œï¿œ = ð·â²ðâ²ï¿œï¿œ = 0
45 / 64
OLS by hand in R
ð· = (ðâ²ð)â1ðâ²ð²
⢠First we need to get the design matrix and the response:
X <- model.matrix(trust_neighbors ~ exports + age + male+ urban_dum + malaria_ecology, data = nunn)
dim(X)
## [1] 20325 6
## model.frame always puts the response in the first columny <- model.frame(trust_neighbors ~ exports + age + male
+ urban_dum + malaria_ecology, data = nunn)[,1]length(y)
## [1] 20325
46 / 64
OLS by hand in R
ð· = (ðâ²ð)â1ðâ²ð²
⢠Use the solve() for inverses and %*% for matrixmultiplication:
solve(t(X) %*% X) %*% t(X) %*% y
## (Intercept) exports age male urban_dum## [1,] 1.503 -0.001021 0.005045 0.02784 -0.2739## malaria_ecology## [1,] 0.01941
coef(mod)
## (Intercept) exports age male## 1.503037 -0.001021 0.005045 0.027837## urban_dum malaria_ecology## -0.273872 0.019411
47 / 64
Intuition for the OLS in matrix form
ð· = (ðâ²ð)â1ðâ²ð²
⢠Whatâs the intuition here?⢠âNumeratorâ ðâ²ð²: is roughly composed of the covariances
between the columns of ð and ð²â¢ âDenominatorâ ðâ²ð is roughly composed of the sample
variances and covariances of variables within ð⢠Thus, we have something like:
ð· â (variance of ð)â1(covariance of ð & ð²)
⢠This is a rough sketch and isnât strictly true, but it canprovide intuition.
48 / 64
5/ OLS inferencein matrix form
49 / 64
Random vectors
⢠A random vector is a vector of random variables:
ð±ð = [ ð¥ð1ð¥ð2
]
⢠Here, ð±ð is a random vector and ð¥ð1 and ð¥ð2 are randomvariables.
⢠When we talk about the distribution of ð±ð, we are talkingabout the joint distribution of ð¥ð1 and ð¥ð2.
50 / 64
Distribution of random vectors
⢠Expectation of random vectors:
ðŒ[ð±ð] = [ ðŒ[ð¥ð1]ðŒ[ð¥ð2] ]
⢠Variance of random vectors:
ð[ð±ð] = [ ð[ð¥ð1] Cov[ð¥ð1, ð¥ð2]Cov[ð¥ð1, ð¥ð2] ð[ð¥ð2] ]
⢠Properties of this variance-covariance matrix:ⶠif ð is constant, then ð[ðâ²ð±ð] = ðâ²ð[ð±ð]ð.ⶠif matrix ð and vector ð are constant, then
ð[ðð±ð + ð] = ðð[ð±ð]ðâ²
51 / 64
Most general OLS assumptions
1. Linearity: ðŠð = ð±â²ðð· + ð¢ð
2. Random/iid sample: (ðŠð, ð±â²ð) are a iid sample from the
population.3. No perfect collinearity: ð is an ð Ã (ð + 1) matrix with rank
ð + 14. Zero conditional mean: ðŒ[ð¢ð|ð±ð] = 05. Homoskedasticity: ð[ð¢ð|ð±ð] = ð2ð¢6. Normality: ð¢ð|ð±ð ⌠ð(0, ð2ð¢)
52 / 64
Matrix rank⢠Definition The rank of a matrix is the maximum number of
linearly independent columns.⢠Definition The columns of a matrix ð are linearly
independent if ðð = 0 if and only if ð = 0:
ð1ð±1 + ð2ð±ð + ⯠+ ððð±ð = 0
⢠Example violation: one column is a linear function of theothers.
ⶠ3 covariates with ð±1 = ð±2 + ð±3
0 = ð1ð±1 + ð2ð±2 + ð3ð±3= ð1(ð±2 + ð±3) + ð2ð±2 + ð3ð±3= (ð1 + ð2)ð±2 + (ð1 + ð3)ð±3
⢠âŠequals 0 when ð1 = âð2 = âð3 â not linearly independent!53 / 64
Rank and matrix inversion
⢠If ð is ð à (ð + 1) has rank ð + 1, then all of its columns arelinearly independent
ⶠGeneralization of no perfect collinearity to arbitrary ð.
⢠ð has rank ð + 1â (ðâ²ð) has rank ð + 1⢠If a square (ð + 1) à (ð + 1) matrix has rank ð + 1, then it is
invertible.⢠ð has rank ð + 1â (ðâ²ð)â1 exists and is unique.
54 / 64
Zero conditional mean error
⢠Combining zero mean conditional error and iid we have:
ðŒ[ð¢ð|ð] = ðŒ[ð¢ð|ð±ð] = 0
⢠Stacking these into the vector of errors:
ðŒ[ð®|ð] =â¡â¢â¢â¢â£
ðŒ[ð¢1|ð]ðŒ[ð¢2|ð]
â®ðŒ[ð¢ð|ð]
â€â¥â¥â¥âŠ
=â¡â¢â¢â¢â£
00â®0
â€â¥â¥â¥âŠ
55 / 64
Expectation of OLS⢠Useful to write OLS as:
ð· = (ðâ²ð)â1 ðâ²ð²= (ðâ²ð)â1 ðâ²(ðð· + ð®)= (ðâ²ð)â1 ðâ²ðð· + (ðâ²ð)â1 ðâ²ð®= ð· + (ðâ²ð)â1 ðâ²ð®
⢠Under assumptions 1-4, OLS is conditionally unbiased for ð·:
ðŒ[ð·|ð] = ð· + (ðâ²ð)â1 ðâ²ðŒ[ð®|ð]= ð· + (ðâ²ð)â1 ðâ²ð= ð·
⢠Implies that OLS is unconditionally unbiased: ðŒ[ð·] = ð·56 / 64
Variance of OLS
⢠What about ð[ð·|ð]?⢠Using some facts about variances and matrices, can derive:
ð[ð·|ð] = (ðâ²ð)â1 ðâ²ð[ð®|ð]ð (ðâ²ð)â1
⢠What the covariance matrix of the errors, ð[ð®|ð]?
ð[ð®|ð] =â¡â¢â¢â¢â£
ð[ð¢1|ð] cov[ð¢1, ð¢2|ð] ⊠cov[ð¢1, ð¢ð|ð]cov[ð¢2, ð¢1|ð] ð[ð¢2|ð] ⊠cov[ð¢2, ð¢ð|ð]
â® â±cov[ð¢ð, ð¢1|ð] cov[ð¢ð, ð¢2|ð] ⊠ð[ð¢ð|ð]
â€â¥â¥â¥âŠ
⢠This matrix is symmetric since cov(ð¢ð, ð¢ð) = cov(ð¢ð, ð¢ð)
57 / 64
Homoskedasicity⢠By homoskedasticity and iid, for any units ð, ð , ð¡:
ⶠð[ð¢ð |ð] = ð[ð¢ð |ð±ð] = ð2ð¢ (constant variance)ⶠcov[ð¢ð , ð¢ð¡ |ð] = 0 (uncorrelated errors)
⢠Then, the covariance matrix of the errors is simply:
ð[ð®|ð] = ð2ð¢ðð =â¡â¢â¢â¢â£
ð2ð¢ 0 0 ⊠00 ð2ð¢ 0 ⊠0
â®0 0 0 ⊠ð2ð¢
â€â¥â¥â¥âŠ
⢠Thus, we have the following:
ð[ð·|ð] = (ðâ²ð)â1 ðâ²ð[ð®|ð]ð (ðâ²ð)â1
= (ðâ²ð)â1 ðâ²(ð2ð¢ðð)ð (ðâ²ð)â1
= ð2ð¢ (ðâ²ð)â1 ðâ²ð (ðâ²ð)â1
= ð2 (ðâ²ð)â1
58 / 64
Sampling variance for OLSestimates
⢠Under assumptions 1-5, the sampling variance of the OLSestimator can be written in matrix form as the following:
ð[ð·|ð] = ð2ð¢(ðâ²ð)â1
⢠This symmetric matrix looks like this:
â¡â¢â¢â¢â¢â£
ð [ðœ0|ð] Cov [ðœ0, ðœ1|ð] ⯠Cov [ðœ0, ðœð |ð]Cov [ðœ0, ðœ1|ð] ð [ðœ1|ð] ⯠Cov [ðœ1, ðœð |ð]
â® â® â± â®Cov [ðœ0, ðœð |ð] Cov [ðœð, ðœ1|ð] ⯠ð [ðœð |ð]
â€â¥â¥â¥â¥âŠ
59 / 64
Inference in the general setting⢠Under assumption 1-5 in large samples:
ðœð â ðœðse[ðœð]
⌠ð(0, 1)
⢠In small samples, under assumptions 1-6,ðœð â ðœðse[ðœð]
⌠ð¡ðâ(ð+1)
⢠Thus, under the null of ð»0 ⶠðœð = 0, we know thatðœð
se[ðœð]⌠ð¡ðâ(ð+1)
⢠Here, the estimated SEs come from:ᅵᅵ[ð·] = ᅵᅵ2ð¢(ðâ²ð)â1
ᅵᅵ2ð¢ = ᅵᅵâ²ï¿œï¿œð â (ð + 1)
60 / 64
Covariance matrix in R⢠We can access this estimated covariance matrix, ᅵᅵ2ð¢(ðâ²ð)â1,
in R:
vcov(mod)
## (Intercept) exports age male## (Intercept) 0.0004766593 1.164e-07 -7.956e-06 -6.676e-05## exports 0.0000001164 1.676e-09 -3.659e-10 7.283e-09## age -0.0000079562 -3.659e-10 2.231e-07 -7.765e-07## male -0.0000667572 7.283e-09 -7.765e-07 1.909e-04## urban_dum -0.0000965843 -4.861e-08 7.108e-07 -1.711e-06## malaria_ecology -0.0000069094 -2.124e-08 2.324e-10 -1.017e-07## urban_dum malaria_ecology## (Intercept) -9.658e-05 -6.909e-06## exports -4.861e-08 -2.124e-08## age 7.108e-07 2.324e-10## male -1.711e-06 -1.017e-07## urban_dum 2.061e-04 2.724e-09## malaria_ecology 2.724e-09 7.590e-07
61 / 64
Standard errors from thecovariance matrix
⢠Note that the diagonal are the variances. So the square rootof the diagonal is are the standard errors:
sqrt(diag(vcov(mod)))
## (Intercept) exports age male## 0.02183253 0.00004094 0.00047237 0.01381627## urban_dum malaria_ecology## 0.01435491 0.00087123
coef(summary(mod))[, "Std. Error"]
## (Intercept) exports age male## 0.02183253 0.00004094 0.00047237 0.01381627## urban_dum malaria_ecology## 0.01435491 0.00087123
62 / 64
Nunn & Wantchekon
63 / 64
Wrapping up
⢠You have the full power of matrices.⢠Key to writing the OLS estimator and discussing higher level
concepts in regression and beyond.⢠Next week: diagnosing and fixing problems with the linear
model.
64 / 64