regression analysis simple regression. y = mx + b y = a + bx

Regression Analysis

Simple Regression

y = mx + b

y = a + bx

y = a + bxwhere:

y dependent variable (value depends on x)

a y-intercept (value of y when x = 0)

b slope (rate of change in ratio of delta y divided by delta x)

x independent variable

Assumptions

Linearity

Independence of Error

Homoscedasticity

Normality

Linearity

The most fundamental assumption is that the model fits the situation [i.e.: the Y

variable is linearly related to the value of the X variable].

Independence of Error

The error (residual) is independent

for each value of X.

[Residual = observed - predicted]

Homoscedasticity

The variation around

the line of regression

constant

for all values of X.

Normality

The values of Y be normally distributed at

each value of X.

Diagnostic Checking Linearity

Independence

Examine scatter plot of residuals versus fitted [Yhat] for evidence of nonlinearity

Plot residuals in time order and look for patterns

Diagnostic Checking Homoscedasticity

Normality

Examine scatter plots of residuals versus fitted [Yhat] and residuals vs time order and look for changing scatter.

Examine histogram of residuals. Look for departures from normal curve.

Develop a statistical model that can predict the values of a dependent (response) variable based upon the values of the independent (explanatory) variable(s).

Simple Regression

A statistical model that utilizes one quantitativequantitative independent variable “X” to predict the quantitativequantitative dependent variable “Y.”

Mini-Case

Since a new housing complex is being developed in Carmichael, management is under pressure to open a new pie restaurant. Assuming that population and annual sales are related, a study was conducted to predict expected sales.

Mini-Case(Descartes Pie Restaurants)

RestaurantPopulation

(1000)Annual Sales

($1000)1 2 58

2 6 105

3 8 88

::: ::: :::

9 22 149

10 26 202

Mini-Case What preliminary conclusions

can management draw from the data?

What could management expect sales to be if population of the new complex is approximately 18,000 people?

Scatter Diagrams The values are

plotted on a two-dimensional graph called a “scatter diagram.”

Each value is plotted at its X and Y coordinates.

Scatter Plot of Pieshop

0 5 10 15 20 25 30

Population (1000’s)

Sales ($1000’s_sal

ScatterPlot of PIESHOP

Types of Models

No relationship between X and Y

Positive linear relationship

Negative linear relationship

Method of Least Squares The straight line that best fits the data.

Determine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.

Measures of Variation

Explained

Unexplained

Explained Variation

Sum of Squares(Yhat - Ybar)

due to Regression

Unexplained Variation

Sum of Squares(Yobs - Yhat)2

Total Variation

Sum of Squares(Yobs - Ybar)2

There is no linear relationship between the

dependent variable and the explanatory variable

Hypotheses

H0: = 0

H0: No relationship exists

H1: A relationship exists

Analysis of Variance for Regression

Sourceof

VariationSum ofSquares d.f. Mean Square

[Regression]Model SSR k - 1 SSR/dfn

[Residual]Error SSE n - k SSE/dfd

Total SST n - 1test:

p 0.05SST

Standard Error of the Estimate

- the measure of variability around the line of regression

Relationship

When null hypothesis is rejected, a

relationship between Y and X variables exists.

Coefficient of Determination

R2 measures the proportion of variation that is explained

by the independent variable

in the regression model.

R2 = SSR / SST

Confidence interval estimates

»True mean

»Individual

Pieshop Forecasting

0 5 10 15 20 25 30

Population (1000’s)

Sales ($1000’s)

PIESHOP Forecasts

Coefficient of Sanity

Diagnostic Checking

H0 retain or reject

{Reject if p-value 0.05}

R2 (larger is “better”)

sy.x (smaller is “better”)

Analysis of Variance for Regression for Pieshop

SourceSum ofSquares d.f. Mean Square

Model 14,200.0 1 14,200.0

Error 1,530.0 8 191.25

Total 15,730.0 9test:

p = 0.00003SST

Coefficient of Determination

R2 = SSR / SST

= 90.27 %

thus, 90.27 percent of the variation in annual sales is

explained by the population.

Standard Error of the Estimate

sy.x = 13.8293

SSE = 1,530.0

Regression Analysis[Simple Regression]

*** End of Presentation ***

Questions?

regression analysis simple regression. y = mx + b y = a + bx

Documents

basic statistics linear regression. x y simple linear...

bivariate regression z - university of western ontario ·...

y linear modeling/regression - d51schools.org

bx bx user manual v6!10!05feb14

class 25 slides: linear regression - mit opencoursewarey =...

allometry and isometry y changes as a function of x y...

regression analysis primer - lmi institute 1 - intro to...y...

periodic functions and applications iii significance of the...

essential statistics chapter 51 least squares regression...

trigger, daq and fpgastapper/lecture/trigger.pdf ·...

graphing sine and cosine functions in this lesson you will...

bivariate linear correlation. linear function y = a + bx

transforming functions y = a f( bx c) ± d goals and...

do now: if y = 2sin 2x, fill in the table below aim: how do...

x y y=a+bx cx a+bx - nist...

developing and using a simple regression equation. the...

1 silver’s gym: student coaching slides. 2 question 1:...

regression: introduction call this e(y | x)

logistic regression. what type of regression? dependent...

û6^bx]bx m±k - pku.edu.cn