![Page 1: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/1.jpg)
1/30
18.650 – Fundamentals of Statistics
6. Linear Regression
![Page 2: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/2.jpg)
2/30
Goals
Consider two random variables X and Y . For example,
1. X is the amount of $ spent on Facebook ads and Y is thetotal conversion rate
2. X is the age of the person and Y is the number of clicks
Given two random variables (X,Y ), we can ask the followingquestions:
I How to predict Y from X?
I Error bars around this prediction?
I How much more conversions Y for an additional dollar?
I Does the number of clicks even depend on age?
I What if X is a random vector? For example, X = (X1, X2)where X1 is the amount of $ spent on Facebook ads and X2
is the duration in days of the campaign.
![Page 3: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/3.jpg)
3/30
Conversions vs. amount spent
![Page 4: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/4.jpg)
4/30
Clicks vs. age
![Page 5: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/5.jpg)
5/30
Modeling assumptions
(Xi, Yi), i = 1, . . . , n are i.i.d from some unknown jointdistribution IP.
IP can be described entirely by (assuming all exist)
I Either a joint PDF h(x, y)
I The marginal density of X h(x) =
∫h(x, y)dy and the
conditional density
h(y|x) =h(x, y)
h(x)
h(y|x) answers all our questions. It contains all the informationabout Y given X
![Page 6: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/6.jpg)
6/30
Partial modeling
We can also describe the distribution only partially,e.g., using
I The expectation of Y : IE[Y ]
I The conditional expectation of Y given X = x: IE[Y |X = x]The function
x 7→ f(x) := IE[Y |X = x] =
∫yh(y|x)dy
is called regression function
I Other possibilities:
I The conditional median: m(x) such that∫ m
−∞h(y|x)dy =
1
2
I Conditional quantilesI Conditional variance (not informative about location)
![Page 7: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/7.jpg)
7/30
Conditional expectation and standard deviation
![Page 8: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/8.jpg)
8/30
Conditional expectation
![Page 9: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/9.jpg)
9/30
Conditional density and conditional quantiles
![Page 10: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/10.jpg)
10/30
Conditional distribution: boxplots
![Page 11: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/11.jpg)
11/30
Linear regression
We first focus on modeling the regression functionf(x) = IE[Y |X = x]
I Too many possible regression functions f (nonparametric)
I Useful to restrict to simple functions that are described by afew parameters
I Simplest:
f(x) = a+ bx linear (or affine) functions
Under this assumption, we talk about linear regression
![Page 12: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/12.jpg)
12/30
Nonparametric regression
![Page 13: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/13.jpg)
13/30
Linear regression
![Page 14: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/14.jpg)
14/30
Probabilistic analysis
I Let X and Y be two real r.v. (not necessarily independent)with two moments and such that var(X) > 0.
I The theoretical linear regression of Y on X is the linex 7→ a∗ + b∗x where
(a∗, b∗) = argmin(a,b)∈IR2
IE[(Y − a− bX)2
]
I Setting partial derivatives to zero gives
I b∗ =cov(X,Y )
var(X),
I a∗ = IE[Y ]− b∗IE[X] = IE[Y ]− cov(X,Y )
var(X)IE[X].
![Page 15: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/15.jpg)
15/30
Noise
Clearly the points are not exactly on the line x 7→ a∗ + b∗x ifvar(Y |X = x) > 0. The random variable ε = Y − (a∗ + b∗X) is
called noise and satisfies
Y = a∗ + b∗X + ε,
with
I IE[ε] = 0 and
I cov(X, ε) = 0.
![Page 16: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/16.jpg)
16/30
Statistical problem
In practice a∗, b∗ need to be estimated from data.
I Assume that we observe n i.i.d. random pairs(X1, Y1), . . . , (Xn, Yn) with same distribution as (X,Y ):
Yi = a∗ + b∗Xi + εi
I We want to estimate a∗ and b∗.
![Page 17: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/17.jpg)
17/30
Statistical problem
Yi = a∗ + b∗Xi + εi
![Page 18: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/18.jpg)
18/30
Statistical problem
Yi = a∗ + b∗Xi + εi
![Page 19: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/19.jpg)
19/30
Statistical problem
Yi = a∗ + b∗Xi + εi
![Page 20: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/20.jpg)
20/30
Statistical problem
Yi = a∗ + b∗Xi + εi
![Page 21: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/21.jpg)
21/30
Least squares
DefinitionThe least squares estimator (LSE) of (a∗, b∗) is the minimizer ofthe sum of squared errors:
n∑i=1
(Yi − a− bXi)2.
(a, b) is given by
b =XY − XYX2 − X2
a = Y − bX.
![Page 22: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/22.jpg)
22/30
Residuals
- SSE2.pdf
![Page 23: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/23.jpg)
23/30
Multivariate regression
Yi = X>i β∗ + εi, i = 1, . . . , n.
I Vector of explanatory variables or covariates: Xi ∈ IRp
(wlog, assume its first coordinate is 1).
I Response / Dependent variable: Yi.
I β∗ = (a∗,b∗>)>; β∗1(= a∗) is called the intercept.
I {εi}i=1,...,n: noise terms satisfying cov(Xi, εi) = 0.
DefinitionThe least squares estimator (LSE) of β∗ is the minimizer of thesum of square errors:
β = argminβ∈IRp
n∑i=1
(Yi −X>i β)2
![Page 24: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/24.jpg)
24/30
LSE in matrix form
I Let Y = (Y1, . . . , Yn)> ∈ IRn.
I Let X be the n× p matrix whose rows are X>1 , . . . ,X>n (X is
called the design matrix).
I Let ε = (ε1, . . . , εn)> ∈ IRn (unobserved noise)
I Y = Xβ∗ + ε, β∗ unknwon.
I The LSE β satisfies:
β = argminβ∈IRp
‖Y − Xβ‖22.
![Page 25: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/25.jpg)
25/30
Closed form solution
I Assume that rank(X) = p.
I Analytic computation of the LSE:
β = (X>X)−1X>Y.
I Geometric interpretation of the LSE: Xβ is the orthogonalprojection of Y onto the subspace spanned by the columns ofX:
Xβ = PY,
where P = X(X>X)−1X>.
![Page 26: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/26.jpg)
26/30
Statistical inference
To make inference (confidence regions, tests) we need moreassumptions.
Assumptions:
I The design matrix X is deterministic and rank(X) = p.
I The model is homoscedastic: ε1, . . . , εn are i.i.d.
I The noise vector ε is Gaussian:
ε ∼ Nn(0, σ2In)
for some known or unknown σ2 > 0.
![Page 27: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/27.jpg)
27/30
Properties of LSEI LSE = MLE
I Distribution of β: β ∼ Np(β∗, σ2(X>X)−1
).
I Quadratic risk of β: IE[‖β − β∗‖22
]= σ2tr
((X>X)−1
).
I Prediction error: IE[‖Y − Xβ‖22
]= σ2(n− p).
I Unbiased estimator of σ2: σ2 =1
n− p‖Y − Xβ‖22.
Theorem
I (n− p) σ2
σ2∼ χ2
n−p.
I β ⊥⊥ σ2.
![Page 28: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/28.jpg)
28/30
Significance testsI Test whether the j-th explanatory variable is significant in the
linear regression (1 ≤ j ≤ p).
I H0 : β∗j = 0 v.s. H1 : β∗j 6= 0.
I If γj is the j-th diagonal coefficient of (X>X)−1 (γj > 0):
βj − β∗j√σ2γj
∼ tn−p.
I Let T (j)n =
βj√σ2γj
.
I Test with non asymptotic level α ∈ (0, 1):
Rj,α = {|T (j)n | > qα
2(tn−p)}
where qα2(tn−p) is the (1− α/2)-quantile of tn−p.
I We can also compute p-values.
![Page 29: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/29.jpg)
29/30
Bonferroni’s test
I Test whether a group of explanatory variables is significant inthe linear regression.
I H0 : β∗j = 0, ∀j ∈ S v.s. H1 : ∃j ∈ S, β∗j 6= 0, whereS ⊆ {1, . . . , p}.
I Bonferroni’s test: RS,α =⋃j∈S
Rj,α/k, where k = |S|.
I This test has nonasymptotic level at most α.
![Page 30: 18.650 Fundamentals of Statistics 26mm 6. Linear RegressionMITx+18.6501x+3T...18.650 { Fundamentals of Statistics 6. Linear Regression. 2/30 Goals Consider two random variables Xand](https://reader033.vdocuments.us/reader033/viewer/2022042422/5f37b70e59b3c370806c5427/html5/thumbnails/30.jpg)
30/30
Remarks
I Linear regression exhibits correlations, NOT causality
I Normality of the noise: One can use goodness of fit tests totest whether the residuals εi = Yi − X>i β are Gaussian.
I Deterministic design: If X is not deterministic, all the abovecan be understood conditionally on X, if the noise is assumedto be Gaussian, conditionally on X.