pharmaceutical statistics lecture 12 hypothesis testing: introduction

29
Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Upload: georgina-leonard

Post on 21-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Pharmaceutical Statistics

Lecture 12Hypothesis Testing: Introduction

Page 2: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Hypothesis testing• A hypothesis may be defined as a statement about one or more

population.• The hypothesis is frequently concerned with the parameters of

populations about which the statement is made. (we use samples, but we do care about the parent pop).

• Hypothesis testing: Determination whether or not such statements arecompatible with the available data.

• Hypothesis testing includes different sequential logical steps:

Page 3: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Null & Alternative Hypothesis– Null hypothesis or the hypothesis to be tested (H0)• a claim that there is NO difference between the population mean and the

hypothesized value.• The complement of the conclusion that the researcher is seeking to reach

becomes the statement of the null hypothesis.• In the testing process the null hypothesis either is rejected or is not rejected

(we do not say: accepted!!!)– Alternative hypothesis HA

• a statement of what we will believe is true if our sample data cause us to reject the null hypothesis.

• Usually the alternative hypothesis is the research hypothesis (theconclusion that the researcher is seeking to reach).

H0 HA

Gabapentin has no pharmacological effect Gabapentin has a pharmacological effect

The mean for population A is 20 (H0: μ = 20) The mean for population A is not 20 (HA: μ ≠ 20)

The mean for population A is less than or equal to 20 (H0: μ ≤ 20)

The mean for population A is larger than 20 (HA: μ > 20)

The mean for population A is larger than or equal to 20 ((H0: μ ≥ 20)

The mean for population A is less than 20 (HA: μ < 20)

Page 4: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Null & Alternative Hypothesis• An indication of equality (=, ≥, ≤) must appear in the null hypothesis.• Rules of thumb to conclude what statements goes in null hypothesis and what

statements goes in the alternative hypothesis:– The null hypothesis should contain a statement of equality– The null hypothesis is the hypothesis to be tested– What you hope or expect to be able to conclude as a result of the test usually

should be placed in the alternative hypothesis.– The null and alternative hypothesis are complementary. The two together

exhaust all possibilities regarding the value that the hypothesized parameter can assume.

• Accepting or rejecting a hypothesis is not a proof of the hypothesis !!!!!!! (Nullhypothesis can be true or false, we only can reject it or not to reject it!!)

Page 5: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Example: How do things flow?!!Researchers postulate that drug A201bc may reduce blood glucose level.• Null hypothesis (H0): Drug A201bc does not reduce blood glucose level

• Alternative hypothesis (HA): Drug A201bc does reduce blood glucose level (notethis is the researchers wishes)

We need two groups of patients (one receives drug A201bc ; the second receives placebo)

We measure blood glucose level for the two groups (treated & control)

If the blood glucose level data show significant differencewe will reject H0

Page 6: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Concept of Hypothesis Testing

What is the average weight of the second-yr pharmacy students in Jordan?

I think it is 60 kg…

So you hypothesize that the mean = 60, right?

Yes, and we can test this hypothesis…we have learned this in a previous statistic course

Really!! How do you test this hypothesis?

First, we need to gather student sample (n= 30). Then we need to measure the weight of each andcompute the mean of their weights and the standard error of the mean

It sounds easy. let us go to UJ/Pharmacy school and do this…

mmm..What we can conclude now??..58 is not 60, do we reject your hypothesis?

After measuring student sample weights

O.k. we got a mean of 58 and SE of 0.8 Kg

Page 7: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

mmm, 58 kg is different than 60 kg..but the important point here is: How big or how significant this difference is?… in other words: is the difference is due a real difference between my hypothesized mean value and the true mean or it is due to sampling errors only?

How can we determine if this difference is big/significant?

Since 58 kg is the sample mean, we can use the standard normal distribution of the sample mean and the z-score to find how far is 58 kg from the hypothesized mean value (60 kg)

As you see below, the Z-score of the mean 58 kg is extremely to the left. (in the left gray area below!!)…this means that the occurrence of mean=58 kg is rare and unlikely if the hypothesized population mean is 60 kg!!!..which indicates that the population from which the sample was taken may have different mean value of 60 kg!!! May be the true pop mean is 59, then the z-score of the sample mean then will be(-1.25) a value that fall in the white area (likely to occur and not rare anymore)

z x 0

/ n 0.8 58 60 2.5

This is intersting..but the gray area is a part of the sample mean distribution and values in this area are possible, what about the sample we picked is just odd and the true is 60 kg??

You are right…always there is a possibility of error here. The true mean can be 60 kg and our sample just was extreme. This error is called α (rejecting a true null hypothesis). The good thing here that we choose α to be small for more certainty (5%, 1%).

Page 8: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

I start to understand now…you mean that we have no idea about the population mean, so we hypothesize a value to be (60 kg) and then we draw a sample from the population. If this sample was withdrawn from a population that has a mean of 60 (as we think), the mean of this sample should fall in the white area [58.4-61.6 kg] with a probability of (P=95%). Since our sample mean felt in the gray area, we conclude that the population mean may not be the hypothesized 60 kg. The error in our conclusion is only 5%, right?

Exactly..the white area is the area where we do not reject our hypothesis..where the gray area where we do. Remember it is possible always to reject a true null hypothesis (probability is α, type I error) and to fail to reject false hypothesis (probability is β, type II error). It all about probability, certainty, confidence, and significance..

That was a good brain exercise.. What else we can use hypothesis testing for? (topics for cominglectures)

Well, we can use hypothesis testing to test hypothesized values for single population mean, the difference between two population means, single pop proportion, and difference between two pop proportions

Thanks for this info You are welcome

Note: we used z-stat since we assumed that the parent pop is normal and sample size is 30 (large)

© Alkilany 2012

Page 9: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Test Statistic& Decision rules–

All possible values of the test statistic are points on the horizontal axis of the statistic distribution graph. These points are divided into a rejection region and a nonrejection region.The values of the test statistic forming the rejection region are those values that are less likely to occur if the null hypothesis is true, while the values forming the nonrejection (acceptance) region are more likely to occur if the null hypothesis is true.The decision rule tells us to reject the null hypothesis if the value of the test statistic that we compute from our sample is one of the values in the rejection region and to not reject the null hypothesis if the computed value of the test statistic is one of the values in the nonrejection region.

The decision of which values go into the rejection region and which ones go into the nonrejection region is based on the desired level of significance α.The level of significance α is the probability of rejecting a true null hypothesis. Since rejection a true null hypothesis is an error, we should make the level of significance small.The most frequently encountered α values are 0.01, 0.05 and 0.1.

Criticalvalues

Rare, unlikely values

Note: using z- or t-statistic depends on the assumptions (normality, known variance, sample size, equality of variance ..etc

Page 10: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Example on the Concept of Hypothesis Testing• We postulate that average Lead concentration in human blood is 18.5 ppb. We want

to investigate our postulation!!! H0: μ[Pb in blood]=18.5ppb• We will take a random blood sample and measure the lead levels and then analyze

the collected data:

H 0 α/2=0.025α/2=0.025

t2 3.2 5

s / n

1.794 x 0

23.25 18.5 2.65

The sample mean (X =23.35, t=+2.65) is

less likely to occur (in the rejection rejoin). Thus we reject the null hypothesis and conclude that the pop mean for Lead levels in human blood may not equal 18.5 ppb

t=-2.052X=14.82

t=+2.052X=22.16

N

Hypothesized Mean (μH) 18.5

Sample Mean X 23.25

n 28

D.F 27

s 9.49

SE (standard error) 1.794

Confidence level (1-α) 95%

Significance (α) 5%

Critical t-value

X±2.052

Lower Critical value ofX

14.82

otLeo:wweer Curistiecdal

tv-aslutaetosfinc

e22w.1e

6a

ssumed that

the parent pop is normal with unknown variance +small sample size

Page 11: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Summary…..!!

1. State null and the alternative hypotheses2. Choose a significance level (usually .05)3. Determine the critical region4. Computed the test statistic (based on assumptions!!). This

is the calculated value that we compare with the critical value

5. Reject the null hypothesis when the test statistic felt in the rejection region (otherwise do not reject null hypo).

6. State the appropriate conclusions

t x

s n

Page 12: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Types of Hypothesis Tests

Two-tailed test (not equal to)

Right-tailed test (Less than)Left-tailed test (greater than)

Too smallH>T

Too largeH<T

T ≠ HT ≠ H

Page 13: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

P-Value• It is a number that tells us how unusual our sample results are, given that the null

hypothesis is true.• It is the probability of obtaining a value of the test statistics as extreme and even

more extreme than the computed one.• If P-value≤α, we do reject the null hypothesis. If P-value>α we do not reject it.• Researcher prefer to report P-value as an indication of the significance of their test

statistics

In (Two-tailed test), P-value= In (left-tailed test), P-value= In (right-tailed test), P-value=

probability of observing a Z≥+ZT or probability of observing Z≤-ZT

probability of observing a Z≤-ZT

probability of observing a Z≥+ZT

These probabilities are computed using software packages or from Tables

P-value<0.05

Page 14: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Types of errors (α&β)

Type I error (α): Rejecting a true null hypothesis (Ho). The true population is the Ho, but the sampling error resulted in an odd statistic test value in the blue area (rejection area for Ho)

Type II error (β): Failure to reject (accepting) a false null hypothesis. The true population is the HA, but the sampling error resulted in an odd static test value in the brown area (in the acceptance area for Ho)

Page 15: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Pharmaceutical Statistics

Lecture 13Hypothesis Testing:A Single Population

Mean

Page 16: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case I: A Single Normal Population Mean withknown variance

• Hypotheses:The null hypothesis is H0: μage = 30

The alternative hypothesis is HA: μage ≠ 30

To reduce the risk of rejecting a true null hypothesis we choose a small value of α=0.05 (confidence level= 95%)

• Test statistic:– Since the population is normally distributed and since the population

variance is known, we use the z-statistic.

z x 0

/ n

Researchers are interested in the mean age of a certain population. They are wounding if the mean age is 30 yrs. Assuming that the population is normally

distributed with variance equal to 20. A simple random sample of 10 individuals drawn from the population of interest. From this sample, a mean of 27 is

calculated. Construct the proper hypothesis, test your hypothesis, and then state the proper conclusion?

Page 17: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case I: A Single Normal Population Mean withknown variance

• Distribution of test statistic:– The test statistic that we have chosen (z-dist of the sample

mean) is normally distributed with a mean of 0 and a variance of 1 (standard normal distribution).

• Decision rule:– The decision rule tells us to reject H0 if the computed

values falls in the rejection region and to fail to reject H0 if it falls in the nonrejection region.

– So either sufficiently small values or sufficiently large values will cause rejection of null hypothesis (two-tailed test). But how large or small??

Page 18: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case I: A Single Normal Population Mean with knownvariance

• Decision rule: (Contd)– Since our rejection region consists of two regions, part of α will have to be associated with the large

values and part with the small valuest.– So we divide α equally and let α/2=0.025 associated with large values and α/2=0.025 associated with

small values.– The critical values of z becomes then the value to the right of which lies 0.025 of the area under the

SND and the one to the left of which lies 0.025 of the area under the SND.– Z=1.96 and -1.96 (we determined the critical values here)– Reject H0 if the computed value of the Z-score is either ≥1.96 or ≤-1.96

• Calculation of the test statistic:

x 0

z27 / n

27 30

3 2.12

20 /10 1.4142• Statistical decision:

– We are able to reject the null hypothesis since -2.12 is in the rejection region.– We can say that the computed values is significant at the 0.05 level.

Page 19: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case I: A Single Normal Population Mean with knownvariance• Conclusion:

– We can coclude that μ is not equal to 30 (confidence level 95%; significance=5%)

• p value:– The p value indicates the probability of observing z ≥2.12 or z≤-

2.12 (two-tailed test, both areas in green).

– p value = P(z ≥2.12 ) + P(z≤-2.12)= 0.017+0.017=0.034 [AUC obtained from z- tables]

– The p value of the hypothesis testing is the probability of obtaining, when H0 is true, a value of the test statistic as extreme or more extreme (in the direction of supporting HA) than the one actually computed.

– General rule: if the p value is less than or equal to α, we reject the null hypothesis, if it is greater than α, we do not reject the null hypothesis

Page 20: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Note: Testing H0 by Means of A ConfidenceInterval

• Treating the previous example using the confidence interval approach:X z(1 / 2) *

n

27 1.96 20/10

27 1.96(1.4142)

27 2.7718

(24.2282_ 29.7718)

• Since the interval does not include 30, we can say that 30 is not a candidate for the mean we are estimating and therefore μ is not equal to 30 and H0 is rejected.

Page 21: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Testing H0 by Means of A Confidence Interval

• When testing a null hypothesis by means of a two sided confidence interval, we reject H0 at the α level of significance if the hypothesized parameter is not contained within the 100(1-α) percent confidence interval.

• If the hypothesized parameter is contained within the interval, H0 can not be rejected at the α level of significance.

Page 22: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

One-Sided Hypothesis Test• In this case either significantly “small” values only or significantly “large”

values only will cause rejection of the null hypothesis.

• In the previous example, the researchers are asking: can we conclude that μ30 (left-tailed test).

The null hypothesis is H0: μ ≥ 30 The alternative hypothesis is HA: μ 30

/ n

z x 0

Left-tailed test (Greater than)

Too smallT < H

Page 23: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

One-Sided Hypothesis Test• The critical z value is one to the left of which lies 0.05 of the area under

the SND. (-1.645)• Our decision rule tells us to reject H0 if the computed value of the test

statistic is less than or equal to -1.645.

We reject the null hypothesis since -2.12-1.645We conclude that the population mean may be smaller than 30.p value = P(z≤-2.12)= 0.0170 (just one tail)

• Test statistic: z27 /

n

x 0 27 30 3

2.12 20 /101.4142

Page 24: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case II: A Single Normal Population Mean with unknownvariance

The t Distribution (remember this slide from previous lect)• In the previous example, the population standard deviation σ is known, we have

learned how to test the hypothesis on a population mean based on sample mean.• However, usually the population standard deviation is unknown as well as the

population mean (what do we need to do in this case?).• We may use the sample standard deviation to replace σ. Here the z-statistics will

convert to t-statistics :

t

x s / n

z x

/ nz-distribution if σ is known

t-distribution if σ is unknown[pop variance is unknown+small n

Student’s t distribution

Standard Normal DistributionNote:

- When we have small samples, it becomes necessary for us to use the t- distribution in hypothesis testing.- When the sample size islarge (>30), our faith in s as an approximation of σ is usually substantial, and we may be justified in using standard normal distribution theory to test the hypothesized value for the population mean.

Page 25: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case II: A Single Normal Population Mean with unknownvariance

• The body mass index (BMI) of a group of 14 healthy adult males has a mean of 30.5 and a standard deviation of 10.6392, can we conclude that the mean BMI of the population is equalto 35?

The null hypothesis is H0: μ = 35

The alternative hypothesis is HA: μ ≠ 35

Test statistic

t x 0

s /

nα=0.05

t-distribution since σ is unknown [pop variance is unknown+small n]

Page 26: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case II: A Single Normal Population Mean with unknown variance

• We do not reject H0 since -1.58 falls in the nonrejection region.• We can conclude that the mean of the population from which the sample

came may be 35.• P not available directly

2.8434

30.5 35 4.5 1.58

• Decision rule: this is a two-tailed test and so we put α/2 (0.025) in each tail. The t values to the right and left of which 0.025 of the area are 2.1604 and -2.1604.

s / n 10.6392/ 14• Calculation of the test statistic: t x 0

Page 27: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case III: A Single Non-normal Population Mean with unknownvariance and large sample

We may, if our sample is large (greater than or equal to 30), take advantage ofthe central limit theorem and use z-stat as the test statistic.

• The mean maximum oxygen uptake (VO2max) for a sample of 242 women was 33.3 with a standard deviation of 12.14. we wish to know if, on the basis of the data, Can we conclude that the mean score for a population of such women is greater than 30?

The null hypothesis is H0: μ ≤ 30 The alternative hypothesis is HA: μ >

30

Test statistic α=0.05

z x 0

/ n

We use sample’s S since we do not know σ. This approximation is valid due to the large sample size. Also, we have learned that when n is large t-distribution is close to z-distribution, and here we can use the z-distribution as a valid approximation

Page 28: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Case III: A Single Non-normal Population Mean withunknown variance and large sample

• We are unwilling to assume that scores are normally distributed in such population, however, because of the central limit theorem, the test statistic is at worst approximately normally distributed with μ=0.

• By letting α=0.05, the critical value of the test statistic is 1.645. Reject H0 ifcomputed z≥1.645

• Reject H0 since 4.23>1.645• Conclude that the mean Vo2max score for the sampled population is greater than 30• P value for this test is 0.00005 since 4.23 is greater than 3.9

0.780412.14 /242

33.3 30 3.3z 4.23

Page 29: Pharmaceutical Statistics Lecture 12 Hypothesis Testing: Introduction

Conventions for interpreting P valuesP Value Interpretation

P > .05 Result is not significant; usually indicated by no asterisk

P < .05 Result is significant; usually indicated by *

P < .01 Result is highly significant; usually indicated by **

P < .001 Result is very highly significant; usually indicated by ***