chapters 8, 9, 10 least squares regression line fitting a line to bivariate data
TRANSCRIPT
- Slide 1
- Slide 2
- Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data
- Slide 3
- Suppose there is a relationship between two numerical variables. Data: (x 1, y 1 ), (x 2, y 2 ), , (x n, y n ) Let x be the amount spent on advertising and y be the amount of sales for the product during a given period. You might want to predict product sales for a month (y) when the amount spent on advertizing is $10,000 (x). The letter y is used to denoted the variable you want to predict, called the response variable. The other variable, denoted by x, is the explanatory variable.
- Slide 4
- Simplest Relationship Simplest equation that describes the dependence of variable y on variable x y = b 0 + b 1 x linear equation b 1 is the slope it is the amount by which y changes when x increases by 1 unit y-intercept b 0 where the line crosses the y-axis; that is, the value of y when x = 0.
- Slide 5
- Graph is a line y x0 b0b0 y=b 0 +b 1 x run rise Slope b=rise/run
- Slide 6
- How do you find an appropriate line for describing a bivariate data set? y = 10 + 2x y = 4 + 2.5x Lets look at only the blue line. To assess the fit of a line, we look at how the points deviate vertically from the line. What is the meaning of a negative deviation? The point (15,44) has a deviation of +4. To assess the fit of a line, we need a way to combine the n deviations into a single measure of fit.
- Slide 7
- The deviations are referred to as residuals and denoted e i.
- Slide 8
- Residuals: graphically
- Slide 9
- 8 The Least Squares (Regression) Line A good line is one that minimizes the sum of squared differences between the points and the line.
- Slide 10
- The Least Squares (Regression) Line 9 3 3 4 1 1 4 (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2 +(1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 Sum of squared differences =(2 -2.5) 2 +(4 - 2.5) 2 +(1.5 - 2.5) 2 +(3.2 - 2.5) 2 = 3.99 2.5 Let us compare two lines The second line is horizontal The smaller the sum of squared differences the better the fit of the line to the data.
- Slide 11
- Criterion for choosing what line to draw: method of least squares The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible This line has slope b 1 and intercept b 0 that minimizes
- Slide 12
- Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b 0
- Slide 13
- Scatterplot with least squares prediction line (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
- Slide 14
- Observed y, Predicted y predicted y when x=2.7 = b 0 + b 1 x = b 0 + b 1 *2.7 2.7
- Slide 15
- Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
- Slide 16
- Wt (x) Fuel (y) 3.45.5.5.251.111.231.555 3.85.9.9.811.512.28011.359 4.16.51.21.442.114.45212.532 2.23.3-.7.49-1.091.1881.763 2.63.6-.3.09-.79.6241.237 2.94.600.21.04410 2.02.9-.9.81-1.492.22011.341 2.73.6-.2.04-.79.6241.158 1.93.11-1.291.66411.29 3.44.9.5.25.51.2601.255 2943.905.18014.5898.49 col. sum
- Slide 17
- Calculations
- Slide 18
- Scatterplot with least squares prediction line
- Slide 19
- The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)
- Slide 20
- Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)
- Slide 21
- Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is outside the range of the x-data that we used to determine the least squares line
- Slide 22
- Avoid GIGO! Evaluating the least squares line 1.Create scatterplot. Approximately linear? 2.Calculate r 2, the square of the correlation coefficient 3.Examine residual plot
- Slide 23
- r 2 : The Variation Accounted For The square of the correlation coefficient r gives important information about the usefulness of the least squares line
- Slide 24
- r 2 : important information for evaluating the usefulness of the least squares line The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the least squares regression of y on x. -1 r 1 implies 0 r 2 1 The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by differences in x.
- Slide 25
- March Madness: S(k) Sagarin rating of k th seeded team; Y ij =Vegas point spread between seed i and seed j, i