linear regression in r
Post on 25-Jan-2017
571 Views
Preview:
TRANSCRIPT
www.edureka.co/r-for-analytics
What will you learn today?
What is Linear Regression ?
How to Design a Linear Regression Model ?
How to Compare Regression Models ?
Hands-On : Linear Regression in R
www.edureka.co/r-for-analytics
Problem
Lets assume you are an owner of a restaurant where “tips” are part of a waiter’s pay. The amount of tip depends on the amount of the total bill.
Lets see how we can predict the amount of tip
from the bill using Linear Regression
www.edureka.co/r-for-analytics
Predicting the Tip
Suppose you don’t have the data for the amount of bill, so only data that you have is the tip amount for the order as shown below.
For first meal order waiter got 5$ as tip, for second meal order waiter got 17$ as tip as shown above
www.edureka.co/r-for-analytics
How to predict the next tip?
Since only data we have is the tip amount, all we can do is take a mean of the tip amount.
www.edureka.co/r-for-analytics
Conclusion
So the best estimate that we can do for the tip amount from the data that we have is 10$, which is the mean of all the tip amounts
Mean= 5$+17$+11$+8$+14$+5$
6
=10$
Note that when you have only one variable and no other information, the best prediction that can be made is the mean of the sample data itself
www.edureka.co/r-for-analytics
Residuals (Errors)
The deviation between actual and estimated value is called residuals or errors
www.edureka.co/r-for-analytics
Residuals (Errors)
Note that sum of the residuals is always zero. So if you add up all the positive and negative deviation you will get zero. In other words, amount of positive and negative deviation is always the same
www.edureka.co/r-for-analytics
Sum of Square of Residuals (Errors)
Note that sum of squared errors (SSE) is 120
www.edureka.co/r-for-analytics
Why Square the Residuals ?
What do we get from
squaring the residuals ?
www.edureka.co/r-for-analytics
Key Points
By squaring the residuals(errors) we achieve following :
It emphasizes the deviation and make it more obvious
It helps in comparing different analysis models
The goal of linear regression is to create a linear model which minimizes the sum of square of residuals/errors SSE
www.edureka.co/r-for-analytics
Improving the Current Model
The tip of the waiter depends on the amount of the bill.
Till now we were just using the value of previous tips to estimate the value of next tip.
Next we will design a linear regression model which will estimate the amount of tip depending on billing amount.
www.edureka.co/r-for-analytics
Lets Visualize the data that we have
Note that Tip amount is dependent variable which depends on Bill amount and Bill amount is independent variable
www.edureka.co/r-for-analytics
Linear Regression
Note that in linear regression the value of dependent variable (e.g. tip amount) is the mean of values, not just a single value
Linear Regression Equation
www.edureka.co/r-for-analytics
Linear Regression Types
A linear regression model with narrow distribution is much better than a model with broad distribution
Narrow Distribution Broad Distribution
www.edureka.co/r-for-analytics
Linear Regression – a closer look
To draw a linear regression line we would need value of slope (b1) and value of interceptor (b0) as shown below :
www.edureka.co/r-for-analytics
Linear Regression – Calculating Slope
Value of slope (b1) is 0.1462 as calculated below :
www.edureka.co/r-for-analytics
Linear Regression – Calculating Y Intercept
Value of Y intercept (b0) is -0.8188 as calculated below :
www.edureka.co/r-for-analytics
Linear Regression – Putting the values
Lets put the values of slope and Y intercept into the Linear Regression equation
www.edureka.co/r-for-analytics
Linear Regression – Predicting Tip amount
Lets calculate the predicted tip amount
www.edureka.co/r-for-analytics
Linear Regression – Calculating Residuals
Lets calculate the residuals (errors)
www.edureka.co/r-for-analytics
Linear Regression – Squaring the residuals (errors)
Lets calculate the sum of square of residuals
www.edureka.co/r-for-analytics
Summing it up - Comparison
As shown,
Second approach provides better estimate as it decreases the sum of squared errors (SSE)
www.edureka.co/r-for-analytics
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.
top related