math 3359 introduction to mathematical modeling project multiple linear regression multiple logistic...

23
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Upload: hannah-nichols

Post on 11-Jan-2016

238 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

MATH 3359 Introduction to Mathematical

Modeling

Project

Multiple Linear Regression

Multiple Logistic Regression

Page 2: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Project

Dataset: Any fields you are interested in,

large sample size

Methods: simple/multiple linear regression

simple/multiple logistic regression

Due on April 23rd

Page 3: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

OutlineMultiple Linear Regression

IntroductionMake scatter plots of the data Fit multiple linear regression modelPrediction

Multiple Logistic RegressionIntroductionFit multiple logistic regression modelExercise

Page 4: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Given a data set {yi, xi, i=1,…,n} of n observations,

yi is dependent variable, xi is independent variable,

the linear regression model is

or where

Recall: Simple Linear Regression

Page 5: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Given a data set of n observations,

yi is dependent variable,

are independent variables,

the linear regression model is

Multiple Linear Regression

Page 6: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression
Page 7: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Generally, we can do transformations for those xi’s before plugging them in the model and they might not be independent with each other.

1. Transformations:

2. Dependent case:

3. Cross-Product Terms:

Page 8: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

ExampleThe data includes the selling price at auction of 32 antique grandfather clocks. The ages of the clocks and the number of people who mad a bid are also recorded in this dataset.

Age Bidders Price127 13 1235115 12 1080127 7 845150 9 1522156 6 1047

Page 9: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Recall: Scatter Plots — Function ‘plot’

plot(auction$Age , auction$Price , main='Relationship between Price and Age')

Page 10: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

plot(auction$Bidders , auction$Price , main='Relationship between Price and Number of bidders')

Page 11: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

plot ( auction )

Page 12: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Fit Multiple Linear Regression Model

— Function ‘lm’ in Rreg= lm ( formula , data )  

summary ( reg )

In our example,reg= lm ( Price ~ Age + Bidders , data = auction )

Page 13: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

> summary(reg)

Call:lm(formula = Price ~ Age + Bidders, data = auction)

Residuals: Min 1Q Median 3Q Max -207.2 -117.8 16.5 102.7 213.5

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1336.7221 173.3561 -7.711 1.67e-08 ***Age 12.7362 0.9024 14.114 1.60e-14 ***Bidders 85.8151 8.7058 9.857 9.14e-11 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Hence, the function of best fit isPrice = 12.7362 * Age + 85.8151 * Bidders – 1336.7221

Page 14: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Prediction — Function ‘predict’ in R

predict the average price of the clock with Age=150, bidders=10:

predict ( reg , data.frame ( Age=150,Bidders=10) )

predict the average price of the clock with Age=150, Bidders=10 and Age=160, Bidders=5:

predict ( reg , data.frame ( Age=c(150,160), Bidders=c(10,5)) )

Page 15: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Exercise

1. Download data:http://www.statsci.org/data/multiple.html‘Mass and Physical Measurements for Male Subjects’

2. Import txt file in R

3. Use ‘Mass’ as the response, ‘ Fore’, ‘Waist’, ‘Height’ and ‘Thigh’ as independent variables

4. Make scatter plot for the response and each of the independent variables

5. Fit the multiple linear regression

6. Predict ‘Mass’ with Fore= 30, Waist=180, Height=38 and Thigh=58 and with Fore=29, Waist=179, Height=39 and Thigh=57

Page 16: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Recall: Simple Logistic Regression

Odds:

Log-odds:

Page 17: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Recall: Simple Logistic Regression

Logistic regression models the log-odds as a linear function of independent variables

Not a linear function of X

Page 18: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Multiple Logistic Regression

Page 19: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Example

am: transmission, 0: auto, 1: manualhp: gross horsepowerwt: weight (lb/1000)

Page 20: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Multiple Logistic Regression

— Function ‘glm’ in Rlogreg=glm(fomula, family=‘binomial’ ,data=binary)

glm: generalized linear model

Family: distribution of variance

Data: name of the dataset

In the example,

reg = lm ( am ~ hp + wt , data = mtcars )

Page 21: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

> summary(reg)

Call:lm(formula = am ~ hp + wt, data = mtcars)

Residuals: Min 1Q Median 3Q Max -0.6309 -0.2562 -0.1099 0.3039 0.5301

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.547430 0.211046 7.332 4.46e-08 ***hp 0.002738 0.001192 2.297 0.029 * wt -0.479556 0.083523 -5.742 3.24e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Final Model:

Page 22: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Final Model:

For every one unit change in hp,

the log odds of manual (versus auto) increases by 0.002738,

odds of manual (versus auto) increases by exp(0.002738)=1.002742.

For every one unit change in wt,

the log odds of manual (versus auto) decreases by 0.479556,

odds of manual (versus auto) decreases by exp(0.479556)=1.615357.

Page 23: MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression

Exercise

1. Import data from web:

http://www.ats.ucla.edu/stat/data/binary.csv

2. Fit the logistic regression of admit (as response) and gre, rank and gpa (as independent variables).

What is the final logistic model?

Are three independent variables significant ?

glm(formula, family=‘binomial’, data=)