1 everyday is a new beginning in life. every moment is a time for self vigilance

29
1 Everyday is a new beginning in life. Every moment is a time for self vigilance.

Upload: colin-nichols

Post on 02-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

1

Everyday is a new beginning in life. Every moment is a time for self

vigilance.

2

Simple Linear Regression

ScatterplotRegression equationCorrelation

3

Example: Computer Repair

 A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are:

Time and Units.

4

Humm…

How long will it take me to repair this unit?

Goal: to predict the length of repair Time for a given number of computer Units

5

Computer Repair Data

Units Min’s Units Min’s

1 23 6 97

2 29 7 109

3 49 8 119

4 64 9 149

4 74 9 145

5 87 10 154

6 96 10 166

6

Scatterplot of response variable against explanatory variable

What is the overall (average) pattern? What is the direction of the pattern? How much do data points vary from the

overall (average) pattern? Any potential outliers?

Graphical Summary of Two Quantitative Variable

7

Time is Linearly related with computer Units.

(The length of) Time is Increasing as (the number of) Units increases.

Data points are closed to the line.

No potential outlier.

Scatterplot (Time vs Units) Some Simple Conclusions

Summary for Computer Repair Data

8

Numerical Summary of Two Quantitative Variable

Regression equation

Correlation

9

Review: Math Equation for a Line

Y: the response variable X: the explanatory variable

X

Y Y=b0+b1X

} b0

} b1

1

10

Regression Equation

The regression line models the relationship between X and Y on average.

The math equation of a regression line is called regression equation.

11

The Predicted Y Value

We use the regression line to estimate the average Y value for a specified X value and use this Y value to predict what Y value we might observe at this X value in the near future.

This predicted Y value, denoted as and pronounced as “y hat,” is the Y value on the regression line. So,

XbbY 10ˆ

Y

Regression equation

12

The Usage of Regression Equation

Predict the value of Y for a given X valueEg. Wish to predict a lady’s weight by her height.** What is X? Y?** Suppose b0 = -205 and b1 = 5: ** For ladies with HT of 60”, their WT will be

predicted as b0+b1x60=95 pounds, the (estimated) average WT of all ladies with HT of 60’’.

13

The Usage of Regression Equation

Eg. How long will it take to repair 3 computer units?

** Suppose b0= 4.16 and b1=15.51:

** the predicted time = 4.16+15.51x3 = 50.69

** It will take about 50.69 minutes.

14

• The predicted WT of a given HT

• The predicted repair time of a given # of units

Examples of the Predicted Y

XY 5205ˆ

XY 51.1516.4ˆ

15

The Limitation of the Regression Equation

The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed.

Eg. Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!

16

The Unpredicted Part

The value is the part the regression equation (model) cannot catch, and it is called “residual.”

YY ˆ

17

residual {

18

Correlation between X and Y

X and Y might be related to each other in many ways: linear or curved.

19

x

y

0.0 0.2 0.4 0.6 0.8 1.0

1.2

1.4

1.6

1.8

2.0

2.2

x

y

0.0 0.2 0.4 0.6 0.8 1.0

1.5

2.0

2.5

3.0

r = .98Strong Linearity

r = .71Median Linearity

Examples of Different Levels of Correlation

20

x

y

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.5

3.0

3.5

4.0

r = -.09Nearly

Uncorrelated

Examples of Different Levels of Correlation

x

y

0.0 0.2 0.4 0.6 0.8 1.0

1.0

1.5

2.0

2.5

3.0

r = .00Nearly Curved

21

Correlation Coefficient of X and Y

A measurement of the strength of the “LINEAR” association between X and Y

Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y;

the correlation coefficient of X and Y is:

xy

n

iii

ssn

xxyyr

)1(

))((1

22

Correlation Coefficient of X and Y

-1< r < 1 The magnitude of r measures the strength of

the linear association of X and Y, which is the overall closeness of the points to a line.

The sign of r indicate the direction of the association: “-” negative association

“+” positive association

** visit the previous 4 plots again

23

Correlation Coefficient

The value r is almost 0

the best line to fit the data points is exactly horizontal

the value of X won’t change our prediction on Y

The value r is almost 1

A line fits the data points almost perfectly.

24

Correlation does not Prove Causation

Four Ways to interpret an observed association:

Causation There might be causation, but other variables

contribute as well The association is explained by how other

variables affect X and Y Y is causing a change in X

25

i

1

2

n

… …. ….

Total

2)(,, yyyyy iii 2)(,, xxxxx iii ))(( xxyy ii

2111 )(,, yyyyy

2222 )(,, yyyyy

2)(,, yyyyy nnn

211,1 )(, xxxxx

2222 )(,, xxxxx

2)(,, xxxxx nnn

))(( 11 xxyy

))(( 22 xxyy

))(( xxyy nn

2

11

)(,*, yyyn

ii

n

ii

2

11

)(,*, xxxn

ii

n

ii

))((1

xxyy i

n

ii

ySy,*, xSx ,0, r

Table for Computing Mean, St. Deviation, and Corr. Coef.

26

Example: Computer Repair Time

996.)96.2*217.46/(136)/(),(),(

,136)114/(1746),( ,1746))((

96.2 ,769.8)114/(114)( ,114)(

,614/84,84

22.46 ,2136)114/(36.27768)( ,36.27768)(

21.9714/1346,14,1346

1

2

1

1

1

2

1

xy

i

n

ii

x

n

ii

n

ii

y

n

ii

n

ii

ssXYCovXYCor

XYCovxxyy

sXVarxx

xx

sYVaryy

yny

27

(1) Fill the following table, then compute the mean and st. deviation of Y and X (2) Compute the corr. coef. of Y and X

(3) Draw a scatterplot

i

1 -.3 -.3 .09 .1 -.9 .81 .27

2 -.2 -.2 .04 .4 -.6 .36 .12

3 -.1 .01 .7

4 .1 .1 .01 1.2 .2

5 .2 .04 1.6 .6

6 .3 .3 .09 2.0

Total 0 * 6.0 *

ix xxi 2)( xxi iy yyi 2)( yyi ))(( xxyy ii

Exercise

28

4 6 8 10 12 14

X3

5

7

9

11

13

Y3

The Influence of Outliers

The slope becomes larger (toward the outlier)

The size of r becomes smaller

29

The slope becomes clear (toward outliers)

The size of r becomes larger (more linear: 0.1590.935)

The Influence of Outliers

x

y

1086420

5

4

3

2

1

0

Scatterplot of y vs x