biostatistics unit 5 samples needs to be completed. 12/24/13 1

71
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Upload: maximo-ortega

Post on 29-Mar-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Biostatistics

Unit 5

SamplesNeeds to be completed.

12/24/131

Page 2: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling distributions

• Sampling distributions are important in the understanding of statistical inference. 

• Probability distributions permit us to answer questions about sampling and they provide the foundation for statistical inference procedures.

2

Page 3: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Definition

• The sampling distribution of a statistic is the distribution of all possible values of the statistic, computed from samples of the same size randomly drawn from the same population. 

• When sampling a discrete, finite population, a sampling distribution can be constructed. 

• Note that this construction is difficult with a large population and impossible with an infinite population.

3

Page 4: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Construction of sampling distributions

1.  From a population of size N, randomly draw all possible samples of size n. 2.  Compute the statistic of interest for each sample.3.  Create a frequency distribution of the statistic.

4

Page 5: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Properties of sampling distributions

We are interested in the

•mean,

•standard deviation, and

•appearance of the graph (functional form) of a sampling distribution.

5

Page 6: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Types of sampling distributions

We will study the following types of sampling distributions.

A)Distribution of the sample mean

B)Distribution of the difference between two means

C)Distribution of the sample proportion

D)Distribution of the difference between two proportions

6

Page 7: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

(A) Sampling distribution of

Given a finite population with mean () and variance ().  When sampling from a normally distributed population, it can be shown that the distribution of the sample mean will have the following properties.

7

Page 8: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Properties of the sampling distribution

1.  The distribution of will be normal.

2. The mean , of the distribution of the values of

will be the same as the mean of the population

from which the samples were drawn; = .

3.  The variance, , of the distribution of

will be equal to the variance of the population

divided by the sample size; = .

8

Page 9: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Standard error

The square root of the variance of the sampling distribution is called the standard error of the mean which is also called the standard error.  

9

Page 10: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Nonnormally distributed populations

When the sampling is done from a nonnormally distributed population, the central limit theorem is used.

10

Page 11: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

The central limit theorem

Given a population of any nonnormal functional form with mean () and variance (2) , the sampling distribution of , computed from samples of size n from this population will have mean, , and variance, 2/n, and will be approximately normally distributed when the sample is large (30 or higher).

11

Page 12: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

The central limit theorem

Note that the standard deviation of the sampling distribution is used in calculations of z scores and is equal to:

12

Page 13: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling distribution of the mean and

Central Limit Theorem

We do in class together

13

Page 14: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Data

• A small apartment building has 3 apartments.

• How many people live in each apartment?

Apartment People

A

B

C

14

Page 15: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Find and

• Use the TI to obtain the values for the population.

The values are:

15

Page 16: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Form samples of size 2

• We need to form all samples of size 2, using replacement since the population is very small.

• Then we find the sample mean for each sample of 2 apartments.

Samples Sample mean

A, A

A, B

A, C

B, A

B, B

B, C

C, A

C, B

C, C16

Page 17: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Find and

• Use the TI to obtain the values for the means of the samples.

The values are:

17

Page 18: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

ResultsMean of Sample means

• Mean of population equals mean of the sample means x

18

Page 19: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Results of Standard deviation of the sample means

• S.D. equals the population standard deviation divided by the square root of the sample size

x n

19

Page 20: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Distribution of the sample means

• If the population is normally distributed, then the sample means will be normally distributed.

• If the population is not normally distributed, then the sample means will be normally distributed if the sample size is at least 30.

20

Page 21: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Important Consequence

• If we take samples of size n from some population, under the previous conditions, then we can determine the probability of the sample means fulfilling some condition. We use:

/

xz

n

21

Page 22: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example #1

• The heights of kindergarten children are approximately normally distributed with a mean of 39 and a standard deviation of 2. If one child is randomly selected, what is the probability that the child is taller than 41 inches?

• This is 1 child – Not the Central Limit Theorem!

22

Page 23: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example #2

• Suppose we have a class of 30 kindergarten children. What is the probability that the mean height of these children exceeds 41 inches?

• This is the Central Limit Theorem as it is asking about the probability of a sample mean!

23

Page 24: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Conclusion

• It is not unusual for one child, selected at random from a kindergarten class, to be taller than 41 inches.

• It is highly unlikely that the mean height for 30 kindergarten students exceeds 41 inches.

24

Page 25: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

An analogy

• It would not be unusual for a student to get an A on a statistics test.

• It would be unusual if the class average for a statistics class was an A!

25

Page 26: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Demonstration that Central Limit Theorem Really Works (1)

We start with a dwelling that has 3 apartments. Here is the list of occupancies.

Apt A = 3

Apt B = 4

Apt C = 2

This is the entire population. It is entered into a list on the TI-83

26

Page 27: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Demonstration that Central Limit Theorem Really Works (2)

We calculate 1-Var Stats to obtain the population parameters for this population.

Mean:

= 3

Standard Deviation:

= .8164965809

Note: we do not use s = 3 because this is the entire population, not a sample.

27

Page 28: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Demonstration that Central Limit Theorem Really Works (3)

• Knowing the population parameters of and , we now determine them using a sampling distribution.

• We can find the population parameters because it is a very small population.

• Normally, populations are too large to determine and directly from the population.

28

Page 29: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

• We need to form all samples of size 2, using replacement since the population is very small.

• Then we find the sample mean for each sample of 2 apartments.

Samples Sample mean

A, A 3.0

A, B 3.5

A, C 2.5

B, A 3.5

B, B 4.0

B, C 3.0

C, A 2.5

C, B 3.0

C, C 2.0

Demonstration that Central Limit Theorem Really Works (4)

29

Page 30: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Demonstration that Central Limit Theorem Really Works (5)

We calculate 1-Var Stats to obtain the population parameters for the sampling distribution.

Mean:

= 3

Standard Deviation:

= .5773502692

30

Page 31: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Demonstration that Central Limit Theorem Really Works (6)

31

Page 32: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

Given the information below, what is the probability that x is greater than 53?

(1) Write the given information.

     = 50     = 16     n = 64 x = 53

32

Page 33: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

(2) Sketch a normal curve.

 

33

Page 34: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

(3) Convert x to a z score.

       

 

34

Page 35: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

(4) Find the appropriate value(s) in the table.

A value of z = 1.5 gives an area of .9332. 

This is subtracted from 1 to givethe probability P (z > 1.5) = .0668

35

Page 36: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

(5) Complete the answer.

The probability that x is greater than 53 is .0668.

36

Page 37: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

(B) Distribution of the difference between two means

• It often becomes important to compare two population means. 

• Knowledge of the sampling distribution of the difference between two means is useful in studies of this type. 

• It is generally assumed that the two populations are normally distributed.

37

Page 38: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling distribution of

Plotting mean sample differences against frequency gives a normal distribution with mean equal to which is the difference between the two population means.

38

Page 39: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

VarianceThe variance of the distribution of the sample differences is equal to

Therefore, the standard error of the differences between two means would be equal to

39

Page 40: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Converting to a z score

To convert to the standard normal distribution, we use the formula

 We find the z score by assuming that there is no difference between the population means.

40

Page 41: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling from normal populations

This procedure is valid even when Sampling from normal populations the population variances are different or when the sample sizes are different.  Given two normally distributed populations with means,  and , and variances,  and , respectively.

(continued)

41

Page 42: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling from normal populations

The sampling distribution of the difference, , between the means of independent samples of size n1 and n2 drawn from these populations is

normally distributed with mean, , and

variance,

42

Page 43: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

In a study of annual family expenditures for general health care, two populations were surveyed with the following results:

Population 1: n1 = 40,  = $346

Population 2: n2 = 35,  = $300

43

Page 44: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

If the variances of the populations are

  = 2800 and  = 3250, what is the probability of obtaining sample results as large as those shown if there is no difference in the means of the two populations?

44

Page 45: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(1) Write the given information

n1 = 40,  = $346, = 2800

n2 = 35,  = $300,  = 3250

45

Page 46: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution(2) Sketch a normal curve

46

Page 47: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

 Solution

(3) Find the z score

       

47

Page 48: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(4) Find the appropriate value(s) in the table

    A value of z = 3.6 gives an area of .9998.  This is subtracted from 1 to give the probability        P (z > 3.6) = .0002

48

Page 49: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(5) Complete the answer

    The probability that  is as large as given is .0002.

49

Page 50: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

C) Distribution of the sample proportion ( )

While statistics such as the sample mean are derived from measured variables, the sample proportion is derived from counts or frequency data.

50

Page 51: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Properties of the sample proportion

Construction of the sampling distribution of the sample proportion is done in a manner similar to that of the mean and the difference between two means.  When the sample size is large, the distribution of the sample proportion is approximately normally distributed because of the central limit theorem.

51

Page 52: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Mean and variance

The mean of the distribution,  , will be equal to the true population proportion, p, and the variance of the distribution, , will be equal to p(1-p)/n.  

52

Page 53: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

The z-score

The z-score for the sample proportion is

53

Page 54: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

In the mid seventies, according to a report by the National Center for Health Statistics, 19.4 percent of the adult U.S. male population was obese.  What is the probability that in a simple random sample of size 150 from this population fewer than 15 percent will be obese?

54

Page 55: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(1) Write the given information

      n = 150      p = .194

     Find P( < .15)

55

Page 56: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution(2) Sketch a normal curve

 

56

Page 57: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(3) Find the z score

   

57

Page 58: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(4) Find the appropriate value(s) in the table

A value of z = -1.36 gives an area of .0869 which is the probability        P (z < -1.36) = .0869

58

Page 59: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(5) Complete the answer

The probability that < .15 is .0869.

59

Page 60: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

D) Distribution of the difference between two proportions

This is for situations with two population proportions.  We assess the probability associated with a difference in proportions computed from samples drawn from each of these populations.  The appropriate distribution is the distribution of the difference between two sample proportions.

60

Page 61: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling distribution of

The sampling distribution of the difference between two sample proportions is constructed in a manner similar to the difference between two means. 

(continued)

61

Page 62: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Sampling distribution of

Independent random samples of size n1 and n2 are drawn from two populations of dichotomous variables where the proportions of observations with the character of interest in the two populations are p1 and p2 , respectively.

62

Page 63: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Mean and variance

The distribution of the difference between two

sample proportions, , is approximately normal.  The mean is

The variance is

These are true when n1 and n2 are large.

63

Page 64: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

The z score

The z score for the difference between two proportions is given by the formula

64

Page 65: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Example

In a certain area of a large city it is hypothesized that 40 percent of the houses are in a dilapidated condition.  A random sample of 75 houses from this section and 90 houses from another section yielded difference, , of .09.  If there is no difference between the two areas in the proportion of dilapidated houses, what is the probability of observing a difference this large or larger?

65

Page 66: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(1) Write the given information

 n1 = 75,  p1 = .40

 n2 = 90,  p2 = .40

  = .09

Find P( .09)

66

Page 67: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution(2) Sketch a normal curve

67

Page 68: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(3) Find the z score

               

68

Page 69: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(4) Find the appropriate value(s) in the table

    A value of z = 1.17 gives an area of .8790 which is subtracted from 1 to give the probability

        P (z > 1.17) = .121

69

Page 70: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

Solution

(5) Complete the answer

    The probability of observing 

of .09 or greater is .121.

70

Page 71: Biostatistics Unit 5 Samples Needs to be completed. 12/24/13 1

fin

71