class8 - california state university, northridgean73773/slidesclass8.pdf · the distribution of the...

3/30/2009

1

Normal Probability Distribution

N

Probability Distributions

Chapter 6

2

Random variableThe outcome of each procedure is determined by chance.

Discrete Random variablestakes on a countable number of values (i.e. there are gaps between values).

Continuous Random variablesthere are an infinite number of values the random variable can take, and they are densely packed together (i.e. there are no gaps between values)

SPECIAL Discrete Random variables•Binomial distribution (Sections 5.3, 5.4)•Geometric distribution•Hypergeometric distribution•Poisson distribution (Section 5.5)

SPECIAL Continuous Random variables

•Normal distribution•Exponential distribution•Uniform distribution

Binomial distribution

3

� Fixed number of trials

� There are only two possible outcomes: success or failure

� The trials are independent

� The probabilities of success and failure are the remain the same

� Example: recording the genders of children in 250 families.

� The mean is

� The standard deviation is σ = − =np p npq( )1

µ = np

TI-83 Binomial Probability

4

� Press 2nd VARS.

� Select the option 0:binompdf(.

� Complete the entry to obtain binompdf(n, p, x), with the appropriate values substituted in.

� Example: What is the probability of getting exactly 2 heads when 4 tosses are made?

� Solution: Using the TI-83 with binompdf(4, 0.5, 2), it follows that the probability for getting 2 heads on 4 throws is 0.375.

Poisson distribution

5

� The random variable is the number of occurrences of some events over an interval.

� Used for describing the behavior of rare events� Number of industrial accidents per month in a manufacturing plant. � Number of people arriving at a checkout in a day

� Number of eagles nesting in a region� Number of patients arriving at an emergency room

� The occurrences must be random and independent of each other, and uniformly distributed over the interval.

� The mean is , and the standard deviation is µ σ = n

Continuous Random Variables

Continuous sample spaces contain an infinite number of events. They typically are intervals of possible, continuously-distributed outcomes.

� Ex.: Select ANY number between 0 and 1.

What is the sample space?

S = { all numbers between 0 and 1}

� Ex.: Drink ANY volume of water from a 32-ounce bottle.

What is the sample space?

S = { 0 – 32 ounce}

6

3/30/2009

2

Continuous Random Variables

7

� A continuous probability distribution function for a random variable X is a continuous function with the property that the area below the graph of the function between any two points a and b equals the probability that a ≤ X ≤ b.

� Remember, AREA = PROPORTION = PROBABILITY

Special Continuous Probability Special Continuous Probability Special Continuous Probability Special Continuous Probability DistributionsDistributionsDistributionsDistributions

8

Uniform distribution

Exponential distribution

Normal distribution

Uniform Distribution

�1. Equally Likely Outcomes

�2. Probability Density

�3. Mean & Standard Deviation Mean Mean MedianMedian

abxf

−=

1)(

122

abba −=

+= σµ

ab −

1

x

ffff((((xxxx))))

ba

Exponential Distribution

�1. Describes Time or Distance Between Events

�2. Density Function

�3. Parameters

X

f(X)

xexf

λλ −=)(

λλ σµ 11 , ==

λλλλλλλλ = 0.5= 0.5

λλλλλλλλ = 2.0= 2.0

Normal Distribution

11

X

f(X)

CA

B

A and B have the same center, but different standard deviations (shape).

A and C have the same standard deviations (shape), but different means (shifted).

f xx

( ) exp( )

= −−

1

2 2

2

2σ π

µ

σ

Examples of normal random variables

�testosterone level of male students�head circumference of adult females� length of middle finger of Math 225 students

�test scores in Math 225�height of all kindergarten kids at a school

12

3/30/2009

3

40 50 60 70 80 90 100

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

Density

Bell-shaped curve

Mean = 70 SD = 5

Mean = 70 SD = 10

13

Characteristics of normal normal normal normal distributiondistributiondistributiondistribution

� Symmetric, bell-shaped curve.� Shape of curve depends on population mean µµµµand standard deviation σσσσ.

� Center of distribution is µµµµ.� Spread is determined by σσσσ.�Most values fall around the mean, but some values are smaller and some are larger.

STANDARD NORMAL DISTRIBUTION:Mean: µµµµ = 0 Standard deviation: σσσσ =1

14

Probabilities for Normal Distributions

15

?)()( dxxfdxcPd

c∫=≤≤

c dx

f(x)

Probability is Probability is Probability is Probability is area area area area under under under under curvecurvecurvecurve!!!!

X

f(X)

Infinite Number of Tables

Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.

Each distribution would Each distribution would require its own table.require its own table.

Standardize the Normal Distribution

Xµµµµ

σσσσ

One table!One table!

Normal DistributionNormal Distribution

µµµµ = 0

σσσσ = 1

Z

ZX

====−−−− µµµµ

σσσσ Standardized Normal Distribution

Standardized Normal Distribution

To find probability follow these steps:

�Draw the normal distribution and shade the area of interest

�Find the standardized score (z-score) for the given x.

�Find the probability using the z-table or calculator

zx

=− µ

σ

18

3/30/2009

4

TI-83, 84: DISTR � 2:normalcdf(

55 60 65 70 75 80 85

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

De

nsity

Probability student scores higher than 75?

P(X > 75)

19

upper-tail: normalcdf(z,9999)

lower-tail: normalcdf(-9999,z)

Between part: normalcdf(z1,z2)

55 65 75 85

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

De

nsity

P(X < 65)

55 60 65 70 75 80 85

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Grades

Den

sity

P(65 < X < 70)

To find x from given area follow these steps

�Draw and shade

�Find the LOWER tail probability INSIDE the table, and read off the corresponding z-score. OR: use DISTR�3:invNorm(

�To find x use the formula:

x z= ⋅ +σ µ

20

Parameter versus statistic� Sample:Sample:Sample:Sample: the part of

the population we

actually examine

and for which we do

have data.

� A sssstatistictatistictatistictatistic is a

number describing a

characteristic of a

ssssample. We often

use a statistic to

estimate an

unknown population

parameter.

� Population:Population:Population:Population: the

entire group of

individuals in which

we are interested

but can’t usually

assess directly.

� A pppparameterarameterarameterarameter is a

number describing

a characteristic of

the ppppopulation.

Parameters are

usually unknown.

21

Example

22

� The Environmental Protection Agency took soil samples at 20 locations near a former industrial waste dump and checked each for evidence of toxic chemicals. They found no elevated levels of any harmful substances.

� Population: ALL the soil near the waste dump

� Sample: the 20 soil samples

� Parameter: mean level of toxic chemicals in the ground around the waste dump

� Statistic: the mean level of toxic chemicals in the 20 soil samples

Notation

Variable of interest: CategoricalCategoricalCategoricalCategorical

Variable of interest: QuantitativeQuantitativeQuantitativeQuantitative

23

� Then we are interested in PROPORTION

Notation:

� Population parameter: p� Sample statistic :

� Then we are interested in MEAN

Notation:

� Population parameter:

� Sample statistic: $p

µ

x

Sampling Variability�When we take many samples, the statistics from the samples are usually different from the population figures, and also different from what we got in the first sample.

�This very intuitive idea, that sample results change from sample to sample, is called sampling variability.

24

3/30/2009

5

Comments� 1. ParametersParametersParametersParameters are usually unknown, because it is impractical or impossible to know exactly what values a variable takes for every member of the population.

� 2. StatisticsStatisticsStatisticsStatistics are computed from the sample, and vary from sample to sample due to sampling variability.

25

Sampling Distributions

�The sampling distribution is The sampling distribution is The sampling distribution is The sampling distribution is a distribution of a sample a distribution of a sample a distribution of a sample a distribution of a sample statistic in infinite number statistic in infinite number statistic in infinite number statistic in infinite number of samples.of samples.of samples.of samples.

26

Sampling distribution of the sample mean, xSampling distribution of

Histogram of some sample averages

27

x

OK, we have the sampling distribution of the sample means. Then what?

Sampling distributions, like data distributions, are best described by shape, center, and spread.

28

Shape, Center, and Spread�Shape:Shape:Shape:Shape: Many, but not all, sampling distributions are approximately normal.

�Center: Center: Center: Center: The meanmeanmeanmean will be denoted by with a subscript to indicate which sampling distribution is being discussed. For example, the mean of the sampling distribution of the sampling distribution of the sampling distribution of the sampling distribution of the meanmeanmeanmean is represented by the symbol . (The mean of the sample means.)

�Spread: Spread: Spread: Spread: the standard deviation standard deviation standard deviation standard deviation of the sampling distribution of the sample means and is

µ

µX

σX

29

Mean and standard error of the sampling distribution of the sample means

� Suppose that is the mean of an SRS of size n drawn from a large population with mean μ and standard deviation σ. Then the sampling distribution of has

mean standard deviation

x

x

µ µx = σσ

xn

=

and

30

3/30/2009

6

Sampling distribution of

µµµµ

σ/√n

For any population with mean µ and standard deviation σ:

� The mean,mean,mean,mean, or center of the sampling distribution of , is equal to the population mean µµµµ....

� The standard deviationstandard deviationstandard deviationstandard deviation of the sampling distribution is σσσσ////√√√√n,n,n,n,

where n is the sample size.

x

x

31

Mean of a sampling distribution of

There is no tendency for a sample mean to fall

systematically above or below µ, even if the

distribution of the raw data is skewed. Thus, the

mean of the sampling distribution of is an

unbiasedunbiasedunbiasedunbiased estimator estimator estimator estimator of the population mean μ —it

will be “correct on average” in many samples.

x

x

32

Standard error of a sampling distribution of

The standard deviation of the sampling

distribution measures how much the sample

statistic varies from sample to sample. It is

smaller than the standard deviation of the

population by a factor of √n.� Averages are Averages are Averages are Averages are

less variable than individual observations.less variable than individual observations.less variable than individual observations.less variable than individual observations.

x

x

33

Generating Sampling Distributions

1. Take a random sample of a fixed size n from a population.

2. Compute the summary statistics (mean, proportion).

3. Repeat steps 1 and 2 many times.

4. Display the distribution of the summary statistics.

34

Example

� Extensive studies have found that the DMS odor threshold of adults follows a roughly normal distribution with mean =25 micrograms per liter and standard deviation =7 micrograms per liter. With this information, we can simulate many runs of our study with different subjects drawn at random from the population. We take 1000 samples of size 10, find the 1000 sample mean thresholds , and make a histogram of these 1000 values.

µ

x

σ

35

The results from the 1000 samples

� 1st SRS of size 10:

� 2nd SRS of size 10:

� 3rd SRS of size 10:

� 1000th SRS of size 10:

x s= =36 32, .

x s= =22 8 2 7. , .

x s= =30 4 41. , .

M

x s= =28 9 21. , .

36

3/30/2009

7

35302520

100

90

80

70

60

50

40

30

20

10

0

C1

Fre

qu

en

cy

Shape: looks normal.

Center: the mean of the 1000‘s is 25.073.

The distribution is centeredvery close to the populationmean

x

µ = 25

Spread: the standard error of the 1000 ‘s is 2.191, notablysmaller than the standard deviation of the population.

xσ = 7

The sampling distribution of the statistic .x

µx = 25073.

37

For normally distributed populationsWhen a variable in a population is normally distributed, then the sampling distribution of for all possible samples of size n is also normally distributed.

If the population is N(µ,σµ,σµ,σµ,σ), then the sample means distribution is N(µ,σµ,σµ,σµ,σ/√n ).

Population

Sample means

x

38

IQ scores: population vs. sampleIQ scores: population vs. sampleIQ scores: population vs. sampleIQ scores: population vs. sample

In a large population of adults, the mean IQ is 112 with standard deviation 16. Suppose 100

adults are randomly selected for a market research campaign.

�The distribution of the sample mean IQ is

A) exactly normal, mean 112, standard deviation 16.

B) approximately normal, mean 112, standard deviation 16.

C) approximately normal, mean 112 , standard deviation 1.6.

D) approximately normal, mean 112, standard deviation 4 .

C) approximately normal, mean 112, standard deviation 1.6.

Population distribution: N (µ = 112; σ = 16)

Sampling distribution for n = 200 is N (µ = 112; σ /√n = 1.6)

μ σ

nnnn

39

ApplicationApplicationApplicationApplication

Hypokalemia is diagnosed when blood potassium levels are low, below 3.5mEq/dl. Let’s assume

that we know a patient whose measured potassium levels vary daily according to a normal

distribution N(µ = 3.8, σ = 0.2).

If only one measurement is made, what's the probability that this patient will be misdiagnosed

hypokalemic?

2.0

8.35.3)( −=

−=

σ

µxz z = −1.5, P(z < −1.5) = 0.0668 ≈ 7%

If instead measurements are taken on four separate days, what is the probability of such a misdiagnosis?

42.0

8.35.3)( −=

−=

n

xz

σ

µz = −3, P(z < −1.5) = 0.0013 ≈ 0.1%

Note:

Make sure to standardize (z) using the standard deviation for the sampling distribution.40

But…But…But…But…

� Not all variables are normally distributed.

� Income is typically strongly skewed for

example.

� Is still a good estimator of µ then?

� The Central Limit Theorem will rescue

us!

x

41

The Central Limit Theorem The Central Limit Theorem The Central Limit Theorem The Central Limit Theorem VERY IMPORTANT!!!VERY IMPORTANT!!!VERY IMPORTANT!!!VERY IMPORTANT!!!

When randomly sampling from any

population with mean µ and standard

deviation σ, when when when when nnnn is large enough,is large enough,is large enough,is large enough,

the sampling distribution of is

approximately normal: N(µ, σ/√n).

x

42

3/30/2009

8

Central Limit Theorem� TheTheTheThe Central Limit Theorem Central Limit Theorem Central Limit Theorem Central Limit Theorem guarantees that a guarantees that a guarantees that a guarantees that a distribution of sample mean to be approximately distribution of sample mean to be approximately distribution of sample mean to be approximately distribution of sample mean to be approximately normal as long as the sample size is large normal as long as the sample size is large normal as long as the sample size is large normal as long as the sample size is large enough.enough.enough.enough.

� We will depend on the Central Limit Theorem again and again in order to take advantage of normal probability calculations when we use sample mean to draw conclusions about population mean, even if the population distribution is not normal.

43 44

Comments

�There is no requirement on the shape of the population distribution. This is where the strength of the Central Limit Theorem lies. It tells us that regardless of the shape of the population distribution, averages that are based on a large enough sample will have a normal distribution.

45

The central limit theorem

Population with strongly skewed

distribution

Sampling distribution of for n = 2

observations

Sampling distribution of

for n = 10 observations

Sampling distribution of for n = 25

observationsx x

x

46

http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html

Assessing Normality

47

� A normal probability plot is a graph with the original set of data on the x-axis, and the corresponding z scores for each data value on the y-axis.

� If the points appear to lie reasonably close to a straight line and there does not appear to be a systematic pattern that is not a straight line, we can conclude that the data came from a normally distributed population.

48

Data from a right-skewed distribution

Data from a left-skewed distribution

Data from a Short-tailed distribution

Data from a Long-tailed distribution

Data from a Normal distribution

v

class8 - california state university, northridgean73773/slidesclass8.pdf · the distribution of the...

Documents