lecture 1a: linear regression with one predictor variable
DESCRIPTION
Lecture 1a: Linear regression with one predictor variable. Course structure. 732G21 Sambandsmodeller http://www.ida.liu.se/~732G21 One semester= Regr.analysis + + analysis of variance (teacher: Lotta Hallberg) 732G28 Regression methods http://www.ida.liu.se/~ 732G28 - PowerPoint PPT PresentationTRANSCRIPT
732G21/732A35/732G28 1
Lecture 1a:
Linear regression with one predictor variable
732G21/732A35/732G28 2
732G21 Sambandsmodellerhttp://www.ida.liu.se/~732G21
One semester=Regr.analysis+ + analysis of variance (teacher: Lotta Hallberg)
732G28 Regression methodshttp://www.ida.liu.se/~732G28
Half of semester=Regr. analysis
732A35 Linear statistical modelshttp://www.ida.liu.se/~732A22
Almost one semester=Regr. Analysis++ analysis of variance (teacher: Lotta Hallberg)
Course structure
732G21/732A35/732G28 3
Course language: English, but you may use Swedish
We use It’s learning (accessed via Student portal) (show…)
9 Lectures
8 Labs (computer). Deadlines, around 5 days after lab ends
8 Lessons=I solve problems on the whiteboard + lab discussion
One written final exam
Course book: Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. Applied Linear Statistical Models with Student Data CD, 5th Edition, ISBN 0073108742.
Course structure (regression part)
732G21/732A35/732G28 4
Linear statistical models are widely used in◦ Business◦ Economics◦ Engineering◦ Social, biological sciences◦ Etc
Example:A database contains price of houses sold in Linköping in 2009,
their age, size, other parameters.◦ Given parameters of a new house
determine its approximate market price Determine reasonable price bounds
Regression analysis
732G21/732A35/732G28 5
Analysis of databases
Observations (records, cases) in rows
Variables in columns◦ Explanatory variables (predictors, inputs) Xi
◦ Response Y, we assume Y=f(X1,…,Xn)
In this lecture, models with only one explanatory variable
What we analyse
No Area (X1) Age (X2) Price (Y)1 320 14 2,530,0002 210 1 1,800,000… … … …
732G21/732A35/732G28 6
Real data can seldom be presented as Y=βX (observation errors, missing inputs etc)
Statistical relation and functional relation
Example: Age and salary for a sample of eight persons from a company.
Age Salary
21 1732 3040 2756 3561 4455 3839 3633 25
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60 70
Age (x)
Sala
ry (y
)
Scatterplot
732G21/732A35/732G28 7
Presented relation is almost linear Linear regression analysis: find a linear finction as close as
possible to the data
Statistical relation and functional relation
y = 0.5471x + 8.4545
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60 70
Age (x)
Sala
ry (y
)
732G21/732A35/732G28 8
For each X, there is a probability distribution P(Y=y|X=x) of Y
The aim is to find a regression function E(Y|X=x)
Regression models
732G21/732A35/732G28 9
Construction of regression models
Selection of prediction variables (variance reduction) Functional form (from theory, approximation) Domain of the model
Software MINITAB SAS SPSS Matlab Excel
Regression models
732G21/732A35/732G28 10
Formal statement
Yi is i th response value β0 β1 model parameters, regression parameters (intercept,
slope) Xi is i th predictor value is i.i.d. random vars with expectation zero and variance
σ2
Simple linear model
ii XY 110
i
732G21/732A35/732G28 11
Features (show…)
All Yi and Yj are uncorrelated
Meaning of regression parameters β0 response value at X=0 β1 change in EY per unit increase in X
Simple linear model
ii XYE 10
22 iY
732G21/732A35/732G28 12
Given data set
Method of least squares:
Observed response Yi Estimated response Deviation
Regression fit is good when all deviations are minimized (see pict) -> minimimize sum of squares
Estimation of regression function
iX10 ii XY 10
n
iii XYQ
1
210
nn YXYXS ,,...,, 11
732G21/732A35/732G28 13
How to find minimum of Q?
Estimators of β0 and β1
Estimation of regression function
0
0
1
0
Q
Q
XbYb
XX
YYXXb n
ii
n
iii
10
1
2
11
732G21/732A35/732G28 14
Exercise (For salary data, MINITAB):
1. Make scatterplot (Scatterplot…, with, without regression lien)
2. Perform regression using ”Regression…”3. Perform regression using ”Fitted line plot..”4. Calculate coefficients by hand
Estimation of regression function
732G21/732A35/732G28 15
Estimation of regression function
y = 0.5471x + 8.4545
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60 70
Age (x)
Sala
ry (y
)
732G21/732A35/732G28 16
Gauss-Markov theorem
Estimators b0 and b1 are unbiased and have minimum variance among all unbiased estimators
Unbiased bias=Eb0-β0=0 Eb0=β0 Analogously, Eb1=β1
Show illustration…
Estimation of regression function
732G21/732A35/732G28 17
Mean (expected response)
Point estimator of mean response (fitted value)
Residuals
Estimation of regression function
X10
XbbY 10ˆ
iii YYe ˆ
732G21/732A35/732G28 18
Plot of residuals (obtain it with MINITAB)
Estimation of regression function
-6
-4
-2
0
2
4
6
8
0 10 20 30 40 50 60 70
Age
Resi
dual
s
732G21/732A35/732G28 19
Properties of residuals
1. (because )
2. is minimum possible
3. (because of 1)
4. , (can be shown)
5. Regression line always goes through
Estimation of regression function
01
n
iie 0
0
Q
n
iie
1
2
n
ii
n
ii YY
11
ˆ
01
n
iiieX 0ˆ
1
n
iiieY
YX ,
732G21/732A35/732G28 20
Estimate of variance of single population (sample variance)
In regression, we compute s2 using residuals (look at residual plot)
Estimation of error term variance
n
ii YY
ns
1
22
11
n
ii
n
iii eYYSSE
1
2
1
2ˆ
22
nSSEMSEs
732G21/732A35/732G28 21
Why divided by n-2? Because E(MSE)=σ2
Important: In general, unbiased
d - degrees of freedom, number of model parameteres
Example: Compute residuals, SSE, MSE, find it in MINITAB output
Estimation of error term variance
dnSSEMSEs
2
732G21/732A35/732G28 22
Minitab◦ Graph → Scatterplot◦ Stat → Regression◦ Stat->Fitted Line Plot
Simple regression using software
732G21/732A35/732G28 23
Course book, Ch. 1 up to page 27.
Reading