everyday is a new beginning in life. every moment is a time for self vigilance
DESCRIPTION
Everyday is a new beginning in life. Every moment is a time for self vigilance. . Simple Linear Regression. Regression model Goodness of fit Model diagnosis. Goal: to predict the length of Armspan for a given Height. Humm… How long is my armspan?. Armspan Data. HEIGHTARMSPAN - PowerPoint PPT PresentationTRANSCRIPT
1
Everyday is a new beginning in life. Every moment is a time for self
vigilance.
2
Simple Linear Regression
Regression modelGoodness of fitModel diagnosis
3
Humm… How long is my
armspan?
Goal: to predict the length of Armspan for a given Height
4
Armspan DataHEIGHT ARMSPAN68.75 64.2575.75 70.2545.75 43.0066.75 66.2566.50 66.7572.25 71.2548.25 47.25…75.50 70.0075.00 77.2564.00 65.2568.50 67.50
5
6
Review: Math Equation for a Line
Y: the response variable X: the explanatory variable
X
Y YX
}
} 1
7
Regression Model
The regression line models the relationship between X and Y on average.– Population regression line – Least squared regression line
The math equation of a regression line is called regression equation.
8
The Predicted Y Value
We use the regression line to estimate the average Y value for a specified X value and use this Y value to predict what Y value we might observe at this X value in the near future.
This predicted Y value, denoted as and pronounced as “y hat,” is the Y value on the regression line. So,
XY ˆˆˆ
Y
Regression equation
9
The Usage of Regression Equation
Predict the value of Y for a given X valueEg. Wish to predict a lady’s weight by her height.** What is X? Y?** Suppose are estimated as -205 and 5: ** For ladies with HT of 60”, their WT will be
predicted as x60=95 pounds, the (estimated) average WT of all ladies with HT of 60’’.
10
• The predicted WT of a given HT
• The predicted armspan of a given height
Examples of the Predicted Y
XY 5205ˆ
XY 04.173.3ˆ
11
The Limitation of the Regression Equation
The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed.
Eg. Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!
12
The Unpredicted Part
The value is the part the regression equation (model) cannot catch, and it is called “residual,” denoted as e, an estimate of “error” at this observation
YY ˆ
13
residual {
Least Square Method
The regression line is the line which minimizes the sum of squares of residuals (SSE) and so the formulas for intercept and slope on the regression line are:
14
n
ii
n
iii
xx
yyxx
1
2
1
)(
))(( xy ˆˆ
Inference for Regression Slope
Standard error of
Confidence interval
Hypothesis test
15
Goodness of Fit
For each observation: residuals For the whole data set: the coefficient of
determination R2, which measures the proportion of variability in Y explained by the model (the linear regression of Y on X);
For simple linear regression (only one predictor) R2 = r2
16
Model Assumptions and Diagnosis
1. Independent observations2. Y|X=x follows a normal distribution with a
common standard deviation , independent of x value
Diagnosis: Residual Plot, residual vs. fitted value
17
Residual Plot: Is the spread level of residuals more or less the same over fitted value?
18
Minitab: Stat>>Regression>> regression …
Select the response and predictors accordingly
Click “graphs” for residual plots
19
Residual Plots
20
Click “residuals versus fits”
Minitab Output
21
Regression Analysis: ARMSPAN versus HEIGHT
The regression equation isARMSPAN = - 3.73 + 1.04 HEIGHT
Predictor Coef SE Coef T PConstant -3.728 2.660 -1.40 0.169HEIGHT 1.03655 0.04082 25.39 0.000
S = 2.12905 R-Sq = 94.4% R-Sq(adj) = 94.3%
Analysis of Variance
Source DF SS MS F PRegression 1 2922.8 2922.8 644.81 0.000Residual Error 38 172.2 4.5Total 39 3095.1