statistical methods in computer science hypothesis testing i: treatment experiment designs ido dagan

28
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Statistical Methods in Computer Science

Hypothesis Testing I:Treatment experiment

designs

Ido Dagan

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

2

Hypothesis Testing: Intro

We have looked at setting up experiments Goal: To prove falsifying hypotheses

Goal fails => falsifying hypothesis not true (unlikely) =>

our theory survives

Falsifying hypothesis is called null hypothesis, marked H0

We want to show that the likelihood of H0 being true is low.

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

3Comparison Hypothesis Testing

A very simple design: treatment experiment Also known as a lesion study / ablation test

treatment Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1

control Ex1 & Ex2 & .... & Exn ==> Dep2

Treatment condition: Categorical independent variable

What are possible hypotheses?

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

4Hypotheses for a Treatment Experiment

H1: Treatment has effect H0: Treatment has no effect

Any effect is due to chance

But how do we measure effect?

We know of different ways to characterize data: Moments: Mean, median, mode, .... Dispersion measures (variance, interquartile range,

std. dev) Shape (e.g., kurtosis)

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

5Hypotheses for a Treatment Experiment

H1: Treatment has effect H0: Treatment has no effect

Any effect is due to chance

Transformed into:

H1: Treatment changes mean of population H0: Treatment does not change mean of population

Any effect is due to chance

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

6Hypotheses for a Treatment Experiment

H1: Treatment has effect H0: Treatment has no effect

Any effect is due to chance

Transformed into:

H1: Treatment changes variance of population H0: Treatment does not change variance of

population Any effect is due to chance

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

7Hypotheses for a Treatment Experiment

H1: Treatment has effect H0: Treatment has no effect

Any effect is due to chance

Transformed into:

H1: Treatment changes shape of population H0: Treatment does not change shape of population

Any effect is due to chance

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

8

Chance Results

The problem: Suppose we sample the treatment and control

groups We find

mean treatment results = 0.7 mean control = 0.5

How do we know there is a real difference? It could be due to chance!

In other words: What is the probability of getting 0.7 given H0 ? If low, then we can reject H0

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

9

Testing Errors

The decision to reject the null hypothesis H0 may lead to errors Type I error: Rejecting H0 though it is true (false positive) Type II error: Failing to reject H0 though it is false (false negative)

Classification perspective of false/true-positive/negative

We are worried about the probability of these errors (upper bounds)

Normally, alpha is set to 0.05 or 0.01. This is our rejection criteria for H0 (usually the focus of significance tests)

1-beta is the power of the test (its sensitivity)

typeIerrorPr=α

rtypeIIerroPr=β

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

10Two designs for treatment experiments

One-sample: Compare sample to a known population e.g., compare to specification

Two-sample: Compare two samples, establish whether they are produced from the same underlying distribution

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

11

One sample testing: Basics

We begin with a simple case We are given a known control population P

For example: life expectancy for patients (w/o treatment) Known parameters (e.g. known mean) Recall terminology: population vs. sample

Now we sample the treatment population Mean = Mt

Was the mean Mt drawn by chance from the known control population?

To answer this, must know:What is the sampling distribution of the mean of P?

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

12

Sampling Distributions

Suppose given P we repeat the following: Draw N sample points, calculate mean M1

Draw N sample points, calculate mean M2

..... Draw N sample points, calculate mean Mn

The collection of means forms a distribution, too:

The sampling distribution of the mean

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

13

Central Limit Theorem

The sampling distribution of the mean of samples of size N,

of a population with mean M and std. dev. S:

1. Approaches a normal distribution as N increases, for which:

2. Mean = M 3. Standard Deviation =

This is called the standard error of the sample mean

Regardless of shape of underlying population

NS

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

14

So? Why should we care?

We can now examine the likelihood of obtaining the observed sample mean for the known population

If it is “too unlikely”, then we can reject the null hypothesis e.g., if likelihood that the mean is due to chance is less than

5%.

The process: We are given a control population C

Mean Mc and standard deviation Sc A sample of the treatment population

sample size N, mean Mt and standard deviation St If Mt is sufficiently different than Mc then we can reject

the null hypothesis

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

15

Z-test by exampleWe are given: Control mean Mc = 1, std. dev. = 0.948 Treatment N=25, Mt = 2.8We compute: Standard error = 0.948/5 = 0.19 Z score of Mt = (2.8-population-mean-given-H0)/0.19 = (2.8-1)/0.19 = 9.47 Now we compute the percentile rank of 9.47

This sets the probability of receiving Mt of 2.8 or higher by chance

Under the assumption that the real mean is 1. Notice: the z-score has standard normal distribution

Sample mean is normally distributed, and subtracted/divided by constants; Z has Mean=0, stdev=1.

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

16One- and two-tailed hypotheses

The Z-test computes the percentile rank of the sample mean Assumption: drawn from sampling distribution of control

population What kind of null hypotheses are rejected?

One-tailed hypothesis testing: H0: Mt = Mc H1: Mt > Mc If we receive Z >= 1.645, reject H0.

Z=1.645 =P95

Z=0 =P50

95% of Population

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

17One- and two-tailed hypotheses

The Z-test computes the percentile rank of the mean Assumption: drawn from sampling distribution of control

population What kind of null hypotheses are rejected?

Two-tailed hypothesis testing: H0: Mt = Mc H1: Mt != Mc If we receive Z >= 1.96, reject H0. If we receive Z <= -1.96, reject H0.

Z=1.96 =P97.5

Z=0 =P50

Z=-1.96 =P2.5

95% of Population

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

18

Two-sample Z-test

Up until now, assumed we have population mean But what about cases where this is unknown?

This is called a two-sample case: We have two samples of populations

Treatment & control For now, assume we know std of both populations

We want to compare estimated (sample) means

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

19Two-sample Z-test(assume std known)

Compare the differences of two population means When samples are independent (e.g. two patient groups)

H0: M1-M2 = d0

H1: M1-M2 != d0 (this is the two-tailed version)

var(X-Y) = var(X) + var(Y) for independent variables

When we test for equality, d0 = 0

2

22

1

21

021

+nσ

dMM=z

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

20Mean comparison when std unknown

Up until now, assumed we have population std. But what about cases where std is unknown?=> Have to be approximated

When N sufficiently large (e.g., N>30) When population std unknown: Use sample std

Population std is:

Sample std is:

N

XXi=

N

SS=σ X

2

11

2

N

XXi=

N

SS=S X

X

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

21

The Student's t-test Z-test works well with relatively large N

e.g., N>30 But is less accurate when population std unknown In this case, and small N: t-test is used

It approaches normal for large N

t-test: Performed like z-test with sample std Compared against t-distribution

t-score doesn’t distribute normally(denominator is variable)

Assumes sample mean is normally distributed Requires use of size of sample

N-1 degrees of freedom, a different distribution for each degree

t =0 =P50

thicker tails

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

22

t-test variations

Available in excel or statistical software packages Two-sample and one-sample t-test Two-tailed, one-tailed t-test t-test assuming equal and unequal variances Paired t-test

Same inputs (e.g. before/after treatment), not independent

The t-test is common for testing hypotheses about means

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

23

Testing variance hypotheses F-test: compares variances of populations

Z-test, t-test: compare means of populations Testing procedure is similar

H0:

H1: OR OR

Now calculate f = , where sx is the sample std of X

When far from 1, the variances likely different To determine likelihood (how far), compare to F

distribution

22

21 σ=σ

22

21 σσ 2

221 σ>σ 2

221 σ<σ

s12

s22

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

24

The F distribution

F is based on the ratio of population and sample variances

According to H0, the two standard deviations are equal

F-distribution Two parameters: numerator and denominator degrees-of-

freedom Degrees-of-freedom (here): N-1 of sample

Assumes both variables are normal

22

22

21

21

/

/

σS

σS=F

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

25Other tests for two-sample testing

There exist multiple other tests for two-sample testing

Each with its own assumptions and associated power For instance, Kolmogorov-Smirnov (KS) test

Non-parametric estimate of the difference between two distributions

Turn to your friendly statistics book for help

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

26Testing correlation hypotheses

We now examine the significance of r To do this, we have to examine the sampling

distribution of r What distribution of r values will we get from the different

samples? The sampling distribution of r is not easy to work with

Fisher's r-to-z transform:

Where the standard error of the r sampling distribution is:

r

r+ln=rz

11

0.5

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

27Testing correlation hypotheses

We now plug these values and do a Z-testFor example: Let the r correlation coefficient for variables x,y =

0.14 Suppose n = 30

H0: r = 0 H1: r != 0

0.1410.141

0.1410.50.140 =

+ln=z=z

Cannot reject H0

Empirical Methods in Computer Science © 2006-now Gal Kaminka/Ido Dagan

28Treatment Experiments(single-factor experiments)

Allow comparison of multiple treatment conditions

treatment1 Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1

treatment2 Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2

control Ex1 & Ex2 & .... & Exn ==> Dep3

Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish

baseline)Cannot use the tests we learned: Why?