data mining regression
TRANSCRIPT
MLR Application Situation
• Predicted variable is a continuous variable
• Predictors are continuous, binary or ordinal class variables
MLR Theoretical Foundation - 1
This procedure performs linear regression on the selected dataset. This fits a linear model of the form
Y= b0 + b1X1 + b2X2+ .... + bkXk+ e
where Y is the dependent variable (response) and x1, x2,.. .,xk are the independent variables (predictors) and e is random error. b0 , b1, b2, .... bk are known as the regression coefficients, which have to be estimated from the data
Regression Model
0
200
400
600
800
1000
1200
0 20 40 60 80
Advertising Expenditure
Sale
s Sales
Linear (Sales)
Validity of MLR - Intuitive
The multiple linear regression algorithm in XLMiner™ chooses regression coefficients so as to minimize the difference between predicted values and actual values.
Predicted and Actual Values
-10
0
10
20
30
40
50
60
1 14 27 40 53 66 79 92 105 118 131 144
Predicted Value
Actual Value
Residual
-15
-10
-5
0
5
10
15
20
1 13 25 37 49 61 73 85 97 109 121 133 145
Residual
Predicted Value
Actual Value
Residual
30.19297657 34.7 4.507023434
18.75084324 15 -3.75084324
19.83233146 20.4 0.567668545
19.54158638 18.2 -1.34158638
19.6134897 19.9 0.286510301
16.45707912 20.2 3.742920884
18.76402488 18.2 -0.56402488
15.45156408 15.2 -0.25156408
Validation Data
Validity of MLR - Measured
Total sum of squared
errorsRMS Error
Average Error
3166.458039 4.564204288-
0.09095719
Residual df 239
Multiple R-squared 0.7101965
Std. Dev. Estimate 5.11450529
Residual SS 6251.80127
Validation Data Prediction Error Report
Regression
MinimumShould be higher side towards 1.0
Validity of MLR – Test of Significance 1
• Is result significant as reflected by p-value being < .05?• If yes, null hypothesis (H0:β1= 0 and β2 = 0…..) is
rejected at 95% confidence level• That is, there is at least one predictor which does not
have its coefficient as 0
Source df SS MS p-value
Regression 13 15320.7503 1178.5192548.75607E-
57
Error 239 6251.80127 26.15816431
Total 252 21572.55157
ANOVA lower than .05?
Validity of MLR – Test of Significance 2
• Which variables are valid predictors in terms of its p-value being less than .05 reflecting the rejection of null hypothesis (H0:βk= 0)?
Predictor (Indep. Var.)
p-value
Constant 0
ZN 0.00100423
INDUS 0.84421581
CHAS 0.18414988
NOX 0.00680984
RM 0.00000333
AGE 0.55937093
DIS 0.00000109
RAD 0.00013665
CRIM 0.00235436
TAX 0.00271487
PTRATIO 0.00000157
B 0.03476212
LSTAT 0
MLR Subset Selection
• This procedure gives us the subset of variables that are the best predictors.
• We can find the subset through the selection of any of the following procedures:– Backward elimination– Exhaustive search– Stepwise selection– Forward selection– Sequential replacement
LR Application Situation
• Predicted variable is a binary class variable – ‘1’ for success and ‘0’ for failure
• The results indicate the probability of success
• Predictors are continuous, binary or ordinal class variables
LR Theoretical Foundation - 1
• Logistic regression is a variation of ordinary regression
• Logistic regression forms a predictor variable (log (p/(1-p)) which is a linear combination of the explanatory variables
• P(Y=1) = Exp(ß0 + ß1X1 + .. ßkXk ) /
(1 + Exp(ß0 + ß1X1 + .. ßkXk ))
• 1 – P = 1/ (1 + Exp(ß0 + ß1X1 + .. ßkXk )
This Slide is Intentionally Left Blank