an informal introduction to statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf ·...

91
An Informal Introduction to Statistics in 2h Tim Kraska

Upload: others

Post on 21-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

An Informal Introduction to Statistics in 2h

Tim Kraska

Page 2: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Goal of this Lecture• This is not a replacement for a proper

introduction to probability and statistics• Instead, it only tries to convey the very basic

intuition behind some of the ideas• The risk of this lecture: Half knowledge can be

dangerous • Most slides are based on CS 155 (big thanks to

Eli)

Page 3: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

The Very Basic

Page 4: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Statistics ≠ Probability

Probability: mathematical theory that describes uncertainty.

Statistics: set of techniques for extracting useful information from data.

Page 5: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Probability Space

Page 6: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Probability Function

Page 7: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Tossing a (Fair) Coin

Ω = H ,T F = 2Ω = 22 = 4Events

F = , H , T , H ,T

Pr ( )= 0

Pr H ( )= 0.5

Pr T ( )= 0.5

Pr H ,T ( )= 1

Page 8: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Rolling a Dice

Ω = 1,2,3,4,5,6 F = 2Ω = 26Events

Pr ( )= 0

Pr 1 ( )= Pr 2 ( )= Pr 3 ( )= Pr 4 ( )= Pr 5 ( )= Pr 6 ( )= 16

Pr 1,2 ( )= Pr 1,3 ( )= Pr 1,4 ( )= Pr 1,5 ( )= Pr 1,6 ( )= 26

...

Page 9: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Independent Events

Page 10: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Tossing a (Fair) Coin Twice

Ω = HH ,HT ,TH ,TT F = 2Ω = 24Events

Pr ( )= 0

Pr HH ( )= Pr H ( )Pr H ( )= 0.5 × 0.5 = 0.25

Pr HT ( )= Pr TH ( )= Pr TT ( )= 0.25

Pr HT ,TT ( )= Pr HH ,TH ( )= 0.5

Pr HH ,HT ( )= Pr TH ,TT ( )= 0.5...

Page 11: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Conditional Probability

Page 12: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Computing Conditional Probabilities

Page 13: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Example - a posteriori probability

Page 14: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement
Page 15: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Law of Total Probability

Page 16: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

In Class Exercises1. A fair coin was tossed 10 times and always

ended up on its head. What is the likelihood that it will end up tail next?

2. Stan has two kids. One of his kids is a boy. What is the likelihood that the other one is also a boy

Page 17: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Bayesian Statistics

Page 18: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Bayes’ Law

Page 19: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Bayes Theorem

P(H|D) =P(D|H) P(H)

P(D)

PriorThe probability of the hypothesis being true before collecting data

MarginalWhat is the probability of collecting this data under all possible hypotheses?

LikelihoodProbability of collecting this data when our hypothesis is true

PosteriorThe probability of our hypothesis being true given the data collected

Page 20: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Deriving Bayes’ Law

A ~A

B P(A B)

~B

P(A B) = P(A|B) * P(B) = P(B|A) * P(A)U

U

P(B|A) * P(A)

P(B) P(A|B) =

Page 21: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Application: Finding a Biased Coin

Page 22: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement
Page 23: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Class Example: Drug Test• 0.4% of the Rhode Island population use

Marijuana* • Drug Test: The test will produce 99% true

positive results for drug users and 99% true negative results for non-drug users.

If a randomly selected individual is tested positive, what is the probability he or she is a user?

* http://medicalmarijuana.procon.org/view.answers.php?questionID=001199

Page 24: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Class Example: Drug Test• 0.4% of the Rhode Island population use Marijuana* • Drug Test: The test will produce 99% true positive results for

drug users and 99% true negative results for non-drug users.

If a randomly selected individual is tested positive, what is the probability he or she is a user?

P User +( )=P +User( )P User( )

P +( )

=P +User( )P User( )

P +User( )P User( )+ P + !User( )P !User( )

= 0.99 × 0.0040.99 × 0.004 + 0.01× 0.996

= 28.4%

Page 25: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Spam Filtering with Naïve Bayes

9/12/13 Bill Howe, UW 25

P spam words( )=P spam( )P words spam( )

P words( )

P spam viagra,rich,..., friend( )=P spam( )P viagra,rich,..., friend spam( )

P viagra,rich,..., friend( )

P spam words( )≈

P spam( )P viagra spam( )P rich spam( )…P friend spam( )P viagra,rich,..., friend( )

Page 26: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Bayesian Inference

P H E( )=P E H( )P H( )

P E( )

P Θ E∩α( )=P E Θ ∩α( )P Θ α( )

P E α( )

H HypothesisP H( ) Prior Probability

P H E( ) Posterior Probability

P E H( ) Probability of observing E given H, likelihood

P E( ) Model Evidence (marginal likelihood)

Page 27: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Random Variables

Page 28: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

How to Model A Simple Game

I get $5 from you

You get $10 from me

Page 29: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Random Variables

Page 30: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Independence

Page 31: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Expectation

µ

Page 32: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Linearity of Expectation

Page 33: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

How to Model A Simple Game

I get $5 from you

You get $10 from me

Would you play this game?

Page 34: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Variance

Page 35: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Variance

Page 36: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

So far we knew the distribution What if we do not?

Page 37: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Population N

Red/Blue/Green Lottery

Page 38: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Empirical Probability

fi = niN

= nini

i∑

Population N

fblue = 1020 fgreen = 4

20fred = 6

20

Page 39: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Population

Mea

nVa

rianc

eµ =

xii

∑N

σ 2 =xi − µ( )

i∑ 2

N

Page 40: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Population N

Red/Blue/Green Lottery

Sample n

Page 41: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Population vs. Sample

Population (parameter) Sample (Statistic) àEstimates

Mea

nVa

rianc

e

µ =xi

i∑N

σ 2 =xi − µ( )

i∑ 2

N

x =xi

i∑n

SN2 =

xi − µ( )i

∑ 2

n

SN−12 =

xi − µ( )i

∑ 2

n −1

Bias

ed

Estim

ate

Un-

Bias

ed

Estim

ate

Page 42: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Big DataHow to calculate the Variance in 1-Pass

SN−12 =

xi − µ( )i

∑ 2

n −1

= 1n −1

xii

∑ 2− 1n

xii

2

= 1n −1

xi − x( )i

∑ 2− 1n

xi − x( )i

2

Page 43: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Law of Large Numbers

Page 44: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Law of Large Numbers

• Draw independent observations at random from any population with finite mean μ.

• As the number of observations increases, the sample mean approaches mean μ of the population.

• The more variation in the outcomes, the more trials are needed to ensure that is close to μ.

Page 45: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Weak Law of large numbers

Strong law of large numbers

X → µ

Pr limn→∞

Xn = µ( )= 1

Page 46: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Central Limit Theorem

Page 47: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Law of Large Numbers (Coin)

Page 48: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Convolution

Die 1: XDie 2: YDice 1+2 : Z = X Y

P Z = z( )= P(X = k)P(Y = z− k)k=−∞

Tossing 2 Dice

Page 49: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of X1: Die 1 or Die 2

0.

0.1

0.2

1 2 3 4 5 6

Page 50: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S2: 2 Dice

0.

0.1

0.2

2 3 4 5 6 7 8 9 10 11 12

Page 51: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S4

0.

0.1

0.2

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Page 52: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S8

0.

0.1

8 12 16 20 24 28 32 36 40 44 48

Page 53: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S16

0

0.0 1

0.02

0.03

0.04

0.05

0.06

0.07

16 24 32 40 48 56 64 72 80 88 96

Page 54: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S32

0

0.005

0.0 1

0.0 15

0.02

0.025

0.03

0.035

0.04

0.045

32 48 64 80 96 112 128 144 160 176 192

Page 55: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of X1

0.

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3

Page 56: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S2

0.

0.1

0.2

0.3

0.4

2 3 4 5 6

Page 57: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S4

0.

0.1

0.2

0.3

4 5 6 7 8 9 10 11 12

Page 58: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S8

0.

0.1

0.2

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Page 59: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S16

0

0.05

0.1

0.15

16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48

Page 60: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S32

0.

0.1

32 37 42 47 52 57 62 67 72 77 82 87 92

Page 61: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of X1

0.

0.1

0.2

0.3

0.4

0 1 2 3 4 5

Page 62: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S2

0.

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

Page 63: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S4

0.

0.1

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Page 64: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S8

0.

0.1

0 4 8 12 16 20 24 28 32 36 40

Page 65: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S16

0

0.0 1

0.02

0.03

0.04

0.05

0.06

0 8 16 24 32 40 48 56 64 72 80

Page 66: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Distribution of S32

0

0.005

0.0 1

0.0 15

0.02

0.025

0.03

0.035

0.04

0 16 32 48 64 80 96 112 128 144 160

Page 67: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Normal Distribution

Probability Density Function( ) ( )22 2/

21)( σµ

πσ−−= xexf

Probability Density Function (PDF) Cumulative Distribution Function (CDF)

Ν µ,σ 2( )

Page 68: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

The Central Limit Theorem

1. The distribution of means will be approximately a normal distribution for larger sample sizes

2. The mean of the distribution of means approaches the population mean, μ, for large sample sizes

3. The standard deviation of the distribution of means approaches for large sample sizes, where σ is the standard deviation of the population and n is the sample size

σ/ n

Page 69: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

The Central Limit Theorem Side Notes

1. For practical purposes, the distribution of means will be nearly normal if the sample size is larger than 30

2. If the original population is normally distributed, then the sample means will remain normally distributed for any sample size n, and it will become narrower

3. The original variable can have any distribution, it does not have to be a normal distribution

Page 70: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Shapes of Distributions as Sample Size Increases

Page 71: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Testing

Page 72: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Hypothesis TestingThe FDA or “science” needs to decide on a new theory, drug, treatment…• H0: The null hypothesis - the current theory,

drug, treatment, is as good or better• H1: The alternative hypothesis - the new theory,

drug, treatment, should replace the old oneResearchers do not know which hypothesis is true. They must make a decision on the basis of evidence presented.

Page 73: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Chap 10-73

What is a Hypothesis?

• A hypothesis is a claim (assumption) about a population parameter:

– population mean

• population proportion

Example: The mean monthly cell phone bill of this city is μ = $42

Example: The proportion of adults in this city with cell phones is p = .68

Page 74: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

The Null Hypothesis, H0

n

n

3μ:H0 =

3μ:H0 = 3X:H0 =

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Page 75: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Hypothesis Testing Process

Population

Claim: thepopulationmean age is 50.(Null Hypothesis:

REJECT

Supposethe samplemean age is 20: X = 20

SampleNull Hypothesis

20 likely if μ = 50?=IsIf not likely,

Now select a random sample

H0: μ = 50 )

X

Page 76: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Reason for Rejecting H0

76

Page 77: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Chap 10-77

Outcomes and Probabilities

Actual SituationDecision

Do NotReject

H0

No error (1 - α )

Type II Error ( β )

RejectH0

Type I Error( )α

Possible Hypothesis Test Outcomes

H0 False H0 True

Key:Outcome

(Probability) No Error ( 1 - β )

Page 78: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Chap 10-78

Level of Significance and the Rejection Region

H0: μ ≥ 3 H1: μ < 3

0

H0: μ ≤ 3 H1: μ > 3

α

α

Represents

critical value

Lower-tail test

Level of significance = α

0Upper-tail test

Two-tail testRejection region is shaded

/2

0

α /2αH0: μ = 3 H1: μ ≠ 3

Page 79: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Chap 10-79

p-Value Approach to Testing

• p-value: Probability of obtaining a test statistic more extreme ( ≤ or ≥ ) than the observed sample value given H0 is true

– Also called observed level of significance

– Smallest value of α for which H0 can be rejected

Page 80: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc.

Chap 10-80

Reject H0

α = .10

Do not reject H0

0

Reject H0

Calculate the p-value and compare to α

p-Value

p-Value

Page 81: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

9/12/13 Bill Howe, Data Science, Autumn 2012 81http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

Anders Pape Møller, 1991“female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers”“Between 1992 and 1997, the average effect size shrank by eighty per cent.”

Joseph Rhine, 1930s, coiner of the term extrasensory perceptionTested individuals with card-guessing experiments. A few students achieved multiple low-probability streaks. But there was a “decline effect” – their performance became worse over time.

Jonah Lehrer, 2010, The New YorkerThe Truth Wears off

John Davis, University of Illinois“Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.”

Jonathan Schooler, 1990“subjects shown a face and asked to describe it were much less likely to recognize the face when shown it later than those who had simply looked at it.”The effect became increasingly difficult to measure.

Page 82: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Reason 1: Publication Bias

9/12/13 Bill Howe, Data Science, Autumn 2012 82

“In the last few years, several meta-analyses have reappraised the efficacy and safety of antidepressants and concluded that the therapeutic value of these drugs may have been significantly overestimated.”

Publication bias: What are the challenges and can they be overcome?Ridha Joober, Norbert Schmitz, Lawrence Annable, and Patricia BoksaJ Psychiatry Neurosci. 2012 May; 37(3): 149–152. doi: 10.1503/jpn.120065

“Although publication bias has been documented in the literature for decades and its origins and consequences debated extensively, there is evidence suggesting that this bias is increasing.”

“A case in point is the field of biomedical research in autism spectrum disorder (ASD), which suggests that in some areas negative results are completely absent”

(emphasis mine)

“… a highly significant correlation (R2= 0.13, p < 0.001) between impact factor and overestimation of effect sizes has been reported.”

Page 83: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Publication Bias

9/12/13 Bill Howe, UW 83

“decline effect”

Page 84: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

9/12/13 Bill Howe, UW 84

“decline effect” = publication bias!

Page 85: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Background: Effect Size

• Expressed in relevant units

• Not just “significant” – how significant? • Used prolifically in meta-analysis to combine results from multiple

studies– But be careful – averaging results from different experiments can produce

nonsense

9/12/13 Bill Howe, UW 85Robert Coe, 2002, Annual Conference of the British Educational Research Association It's the Effect Size, Stupid: What effect size is and why it is important.

[Mean of experimental group] – [Mean of control group]

standard deviationEffect size =

Caveat: Other definitions of effect size exist: odds-ratio, correlation coefficient

Page 86: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Effect Size

• Standardized Mean Difference

09/12/2013 Bill Howe, UW 86

Lots of ways to estimate the pooled standard deviation

e.g., Hartung et al., 2008

Glass, 1976

Page 87: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Effect size: Cohen’s Heuristic

• Standardized mean difference effect size– small = 0.20– medium = 0.50– large = 0.80

09/12/2013 Bill Howe, UW 87

Page 88: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Reason 3: Multiple Hypothesis Testing

• If you perform experiments over and over, you’re bound to find something

• This is a bit different than the publication bias problem: Same sample, different hypotheses

• Significance level must be adjusted down when performing multiple hypothesis tests

9/12/13 Bill Howe, UW 88

Page 89: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

9/12/13 Bill Howe, UW 89

P(detecting an effect when there is none) = α = 0.05

P(detecting an effect when it exists) = 1 – α

P(detecting an effect when it exists on every experiment) = (1 – α)k

P(detecting an effect when there is none on at least one experiment) = 1 – (1 – α)k

α = 0.05

“Familywise Error Rate”

Page 90: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Familywise Error Rate Corrections• Bonferroni Correction

– Just divide by the number of hypotheses

• Šidák Correction– Asserts independence

09/12/2013 Bill Howe, UW 90

Page 91: An Informal Introduction to Statistics in 2hcs.brown.edu/courses/cs195w/slides/introtostats.pdf · Statistics in 2h Tim Kraska. Goal of this Lecture • This is not a replacement

Summary• Stochastic Variables• Basics in Statistics• Bayes’ Law• Central Limit Theorem• Law of Large Numbers• Testing