math 3359 introduction to mathematical modeling project multiple linear regression multiple logistic...
TRANSCRIPT
MATH 3359 Introduction to Mathematical
Modeling
Project
Multiple Linear Regression
Multiple Logistic Regression
Project
Dataset: Any fields you are interested in,
large sample size
Methods: simple/multiple linear regression
simple/multiple logistic regression
Due on April 23rd
OutlineMultiple Linear Regression
IntroductionMake scatter plots of the data Fit multiple linear regression modelPrediction
Multiple Logistic RegressionIntroductionFit multiple logistic regression modelExercise
Given a data set {yi, xi, i=1,…,n} of n observations,
yi is dependent variable, xi is independent variable,
the linear regression model is
or where
Recall: Simple Linear Regression
Given a data set of n observations,
yi is dependent variable,
are independent variables,
the linear regression model is
Multiple Linear Regression
Generally, we can do transformations for those xi’s before plugging them in the model and they might not be independent with each other.
1. Transformations:
2. Dependent case:
3. Cross-Product Terms:
ExampleThe data includes the selling price at auction of 32 antique grandfather clocks. The ages of the clocks and the number of people who mad a bid are also recorded in this dataset.
Age Bidders Price127 13 1235115 12 1080127 7 845150 9 1522156 6 1047
Recall: Scatter Plots — Function ‘plot’
plot(auction$Age , auction$Price , main='Relationship between Price and Age')
plot(auction$Bidders , auction$Price , main='Relationship between Price and Number of bidders')
plot ( auction )
Fit Multiple Linear Regression Model
— Function ‘lm’ in Rreg= lm ( formula , data )
summary ( reg )
In our example,reg= lm ( Price ~ Age + Bidders , data = auction )
> summary(reg)
Call:lm(formula = Price ~ Age + Bidders, data = auction)
Residuals: Min 1Q Median 3Q Max -207.2 -117.8 16.5 102.7 213.5
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1336.7221 173.3561 -7.711 1.67e-08 ***Age 12.7362 0.9024 14.114 1.60e-14 ***Bidders 85.8151 8.7058 9.857 9.14e-11 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Hence, the function of best fit isPrice = 12.7362 * Age + 85.8151 * Bidders – 1336.7221
Prediction — Function ‘predict’ in R
predict the average price of the clock with Age=150, bidders=10:
predict ( reg , data.frame ( Age=150,Bidders=10) )
predict the average price of the clock with Age=150, Bidders=10 and Age=160, Bidders=5:
predict ( reg , data.frame ( Age=c(150,160), Bidders=c(10,5)) )
Exercise
1. Download data:http://www.statsci.org/data/multiple.html‘Mass and Physical Measurements for Male Subjects’
2. Import txt file in R
3. Use ‘Mass’ as the response, ‘ Fore’, ‘Waist’, ‘Height’ and ‘Thigh’ as independent variables
4. Make scatter plot for the response and each of the independent variables
5. Fit the multiple linear regression
6. Predict ‘Mass’ with Fore= 30, Waist=180, Height=38 and Thigh=58 and with Fore=29, Waist=179, Height=39 and Thigh=57
Recall: Simple Logistic Regression
Odds:
Log-odds:
Recall: Simple Logistic Regression
Logistic regression models the log-odds as a linear function of independent variables
Not a linear function of X
Multiple Logistic Regression
Example
am: transmission, 0: auto, 1: manualhp: gross horsepowerwt: weight (lb/1000)
Multiple Logistic Regression
— Function ‘glm’ in Rlogreg=glm(fomula, family=‘binomial’ ,data=binary)
glm: generalized linear model
Family: distribution of variance
Data: name of the dataset
In the example,
reg = lm ( am ~ hp + wt , data = mtcars )
> summary(reg)
Call:lm(formula = am ~ hp + wt, data = mtcars)
Residuals: Min 1Q Median 3Q Max -0.6309 -0.2562 -0.1099 0.3039 0.5301
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.547430 0.211046 7.332 4.46e-08 ***hp 0.002738 0.001192 2.297 0.029 * wt -0.479556 0.083523 -5.742 3.24e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Final Model:
Final Model:
For every one unit change in hp,
the log odds of manual (versus auto) increases by 0.002738,
odds of manual (versus auto) increases by exp(0.002738)=1.002742.
For every one unit change in wt,
the log odds of manual (versus auto) decreases by 0.479556,
odds of manual (versus auto) decreases by exp(0.479556)=1.615357.
Exercise
1. Import data from web:
http://www.ats.ucla.edu/stat/data/binary.csv
2. Fit the logistic regression of admit (as response) and gre, rank and gpa (as independent variables).
What is the final logistic model?
Are three independent variables significant ?
glm(formula, family=‘binomial’, data=)