stor 155, section 2, last time 2-way tables –sliced populations in 2 different ways –look for...

44
Stor 155, Section 2, Last Time 2-way Tables Sliced populations in 2 different ways Look for independence of factors Chi Square Hypothesis test Simpson’s Paradox Aggregating can give opposite impression Inference for Regression Sampling Distributions – TDIST & TINV

Upload: derrick-dennis

Post on 22-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Stor 155, Section 2, Last Time• 2-way Tables

– Sliced populations in 2 different ways

– Look for independence of factors

– Chi Square Hypothesis test

• Simpson’s Paradox

– Aggregating can give opposite impression

• Inference for Regression

– Sampling Distributions – TDIST & TINV

Page 2: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 634-667 & Review

Approximate Reading for Next Class:

Pages 634-667 & Review

Page 3: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionChapter 10

Recall:

• Scatterplots

• Fitting Lines to Data

Now study statistical inference associated with fit lines

E.g. When is slope statistically significant?

Page 4: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Recall Scatterplot

For data (x,y)

View by plot:

(1,2)

(3,1)

(-1,0)

(2,-1)

Toy Scatterplot, Separate Points

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

x

y

Page 5: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Recall Linear Regression

Idea:

Fit a line to data in a scatterplot

• To learn about “basic structure”

• To “model data”

• To provide “prediction of new values”

Page 6: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Recall Linear Regression

Given a line, , “indexed” by

Define “residuals” = “data Y” – “Y on line”

=

Now choose to make these “small”

),( 11 yx

abxy

)( abxy ii

),( 22 yx

),( 33 yx

ab&

ab&

Page 7: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Recall Linear Regression

Make Residuals > 0, by squaring

Least Squares: adjust to

Minimize the “Sum of Squared Errors”

ab&

21

)(

n

iii abxySSE

Page 8: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Least Squares in Excel

Computation:

1. INTERCEPT (computes y-intercept a)

2. SLOPE (computes slope b)

Revisit Class Example 14http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls

Page 9: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for Regression

Idea: do statistical inference on:

– Slope a

– Intercept b

Model:

Assume: are random, independent

and

iii ebaXY

ie

eN ,0

Page 10: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for Regression

Viewpoint: Data generated as:

y = ax + b

Yi chosen from

Xi

Note: a and b are “parameters”

Page 11: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for Regression

Parameters and determine the

underlying model (distribution)

Estimate with the Least Squares Estimates:

and

(Using SLOPE and INTERCEPT in Excel,

based on data)

a b

a b

Page 12: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for Regression

Distributions of and ?

Under the above assumptions, the sampling

distributions are:

• Centerpoints are right (unbiased)

• Spreads are more complicated

a b

aaNa ,~ˆ

bbNb ,~ˆ

Page 13: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionFormula for SD of :

• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope

• Small for x’s more spread out– Data more spread More accurate

• Small for more data– More data More accuracy

a

n

ii

ea

xxaSD

1

e

Page 14: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionFormula for SD of :

• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept

• Smaller for – Centered data More accurate intercept

• Smaller for more data– More data More accuracy

b

n

ii

eb

xx

xn

bSD

1

2

21ˆ

e

0x

Page 15: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionOne more detail:

Need to estimate using data

For this use:

• Similar to earlier sd estimate,

• Except variation is about fit line

• is similar to from before

e

2

ˆˆ1

2

n

bxays

n

iii

e

s

2n 1n

Page 16: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for Regression

Now for Probability Distributions,

Since are estimating by

Use TDIST and TINV

With degrees of freedom =

e es

2n

Page 17: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionConvenient Packaged Analysis in Excel:

Tools Data Analysis Regression

Illustrate application using:

Class Example 32,

Old Text Problem 10.12

Page 18: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionClass Example 32,

Old Text Problem 10.12Utility companies estimate energy used by

their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:

Page 19: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionData for October through June are:

Month X = Deg. Days Y = Gas Cons’n

Oct 15.6 5.2

Nov 26.8 6.1

Dec 37.8 8.7

Jan 36.4 8.5

Feb 35.5 8.8

Mar 18.6 4.9

Apr 15.3 4.5

May 7.9 2.5

Jun 0 1.1

Page 20: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionClass Example 32,

Old Text Problem 10.12

Excel Analysis:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg32.xls

Good News:

Lots of things done automatically

Bad News:

Different language,

so need careful interpretation

Page 21: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionExcel Glossary:

Excel Stor 155

R2 r2 = Prop’n of Sum of Squares

Explained by Line

intercept Intercept b

X Variable Slope a

Coefficient Estimates & .a b

Page 22: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionExcel Glossary:

Excel Stor 155

Standard Errors

Estimates of & .

(recall from Sampling Dist’ns)

T – Stat. (Est. – mean) / SE, i.e. put

on scale of T – distribution

P-value For 2-sided test of:

a b

0:.0:0

b

aHvs

b

aH A

Page 23: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionExcel Glossary:

Excel Stor 155

Lower 95%

Upper 95%

Ends of 95% Confidence

Interval for a and b

(since chose 0.95 for Confidence level)

Predicted . Points on line at ,

i.e. .iXiY

baX i

Page 24: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionExcel Glossary:

Excel Stor 155

Residual for .

Recall: gave useful information about quality of fit

(useful to plot)

Standard Residuals:

on standardized scale

e

ii bXaY

ˆˆ

iX bXaY iiˆˆ

Page 25: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionSome useful variations:

Class Example 33,

Text Problems 10.23 - 10.25

Excel Analysis:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

Page 26: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionClass Example 33, (10.23 – 10.25)

Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:

Page 27: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionClass Example 33,

(10.23 – 10.25)

The data are:

Year Lean

75 642

76 644

77 656

78 667

79 673

80 688

81 696

82 698

83 713

84 717

85 725

86 742

87 757

Page 28: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for RegressionClass Example 33, (10.23 – 10.25) :

(a) Plot the data, does the trend in lean over time appear to be linear?

(b) What is the equation of the least squares fit line?

(c) Give a 95% confidence interval for the average rate of change of the lean.

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

Page 29: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Inference for Regression

HW:

10.17 b,c

10.26 (using log base 10, for part c:

Est’d slope: 0.194

Est'd intercept: -379

95% CI for slope: [0.186, 0.202])

Page 30: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

And Now for Something Completely Different

Graphical Displays:

• Important Topic in Statistics

• Has large impact

• Need to think carefully to do this

• Watch for attempts to fool you

Page 31: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

And Now for Something Completely Different

Graphical Displays: Interesting Article:

“How to Display Data Badly”

Howard Wainer

The American Statistician, 38, 137-147.

Internet Available:

http://links.jstor.org

Page 32: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

And Now for Something Completely Different

Main Idea:

• Point out 12 types of bad displays

• With reasons behind

• Here are some favorites…

Page 33: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

And Now for Something Completely Different

Hiding the data in the scale

Page 34: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

And Now for Something Completely Different

The eye perceives

areas as “size”:

Page 35: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

And Now for Something Completely Different

Change of

Scales in

Mid-Axis

Really trust

the

Post???

Page 36: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Review Slippery Issues

Major Confusion:

Population Quantities

Vs.

Sample Quantities

Page 37: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Review Slippery Issues

Population Quantities:• Parameters• Will never know• But can think about

Sample Quantities:• Estimates (of parameters)• Numbers we work with• Contain info about parameters

Page 38: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Review Slippery Issues

Population Mathematical Notation:

(fixed & unknown)

Sample Mathematical Notation :

(summaries of data, have numbers)

p,,

psX ˆ,,

Page 39: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Review Slippery Issues

Sampling Distributions:

Measurement Error:

Counting / Proportions:

nNX

,~

n

pppNpnBip

)1(,,~ˆ

Page 40: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Review Slippery Issues

Confidence Intervals: Based on margin of error:

Measurement Error:

brackets 95% of time

Counting / Proportions:

brackets 95% of time

],[ mXmX

]ˆ,ˆ[ mpmp

m

p

Page 41: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Review Slippery Issues

Hypothesis Testing:

Statement of Hypotheses:

Actual Test:

P-value = P{What saw or m.c. | Bdry}

AHH ,0

Page 42: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Hypothesis Testing from 3/22

Other views of hypothesis testing:

View 2: Z-scores

Idea: instead of reporting p-value (to assess statistical significance)

Report the Z-score

A different way of measuring significance

Page 43: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Hypothesis Testing – Z scores

E.g. Fast Food Menus:

Test

Using

P-value = P{what saw or m.c.| H0 & HA bd’ry}

000,20$:0 H

000,20$: AH

10,400,2$,000,21$ nsX

Page 44: Stor 155, Section 2, Last Time 2-way Tables –Sliced populations in 2 different ways –Look for independence of factors –Chi Square Hypothesis test Simpson’s

Hypothesis Testing – Z scores

P-value = P{what saw or or m.c.| H0 & HA bd’ry}

rybdXP '|000,21$

000,20$|000,21$ XP

102400$

000,20$000,21$

nsX

P

317.1 ZP