class8 - california state university, northridgean73773/slidesclass8.pdf · the distribution of the...
TRANSCRIPT
3/30/2009
1
Normal Probability Distribution
N
Probability Distributions
Chapter 6
2
Random variableThe outcome of each procedure is determined by chance.
Discrete Random variablestakes on a countable number of values (i.e. there are gaps between values).
Continuous Random variablesthere are an infinite number of values the random variable can take, and they are densely packed together (i.e. there are no gaps between values)
SPECIAL Discrete Random variables•Binomial distribution (Sections 5.3, 5.4)•Geometric distribution•Hypergeometric distribution•Poisson distribution (Section 5.5)
SPECIAL Continuous Random variables
•Normal distribution•Exponential distribution•Uniform distribution
Binomial distribution
3
� Fixed number of trials
� There are only two possible outcomes: success or failure
� The trials are independent
� The probabilities of success and failure are the remain the same
� Example: recording the genders of children in 250 families.
� The mean is
� The standard deviation is σ = − =np p npq( )1
µ = np
TI-83 Binomial Probability
4
� Press 2nd VARS.
� Select the option 0:binompdf(.
� Complete the entry to obtain binompdf(n, p, x), with the appropriate values substituted in.
� Example: What is the probability of getting exactly 2 heads when 4 tosses are made?
� Solution: Using the TI-83 with binompdf(4, 0.5, 2), it follows that the probability for getting 2 heads on 4 throws is 0.375.
Poisson distribution
5
� The random variable is the number of occurrences of some events over an interval.
� Used for describing the behavior of rare events� Number of industrial accidents per month in a manufacturing plant. � Number of people arriving at a checkout in a day
� Number of eagles nesting in a region� Number of patients arriving at an emergency room
� The occurrences must be random and independent of each other, and uniformly distributed over the interval.
� The mean is , and the standard deviation is µ σ = n
Continuous Random Variables
Continuous sample spaces contain an infinite number of events. They typically are intervals of possible, continuously-distributed outcomes.
� Ex.: Select ANY number between 0 and 1.
What is the sample space?
S = { all numbers between 0 and 1}
� Ex.: Drink ANY volume of water from a 32-ounce bottle.
What is the sample space?
S = { 0 – 32 ounce}
6
3/30/2009
2
Continuous Random Variables
7
� A continuous probability distribution function for a random variable X is a continuous function with the property that the area below the graph of the function between any two points a and b equals the probability that a ≤ X ≤ b.
� Remember, AREA = PROPORTION = PROBABILITY
Special Continuous Probability Special Continuous Probability Special Continuous Probability Special Continuous Probability DistributionsDistributionsDistributionsDistributions
8
Uniform distribution
Exponential distribution
Normal distribution
Uniform Distribution
�1. Equally Likely Outcomes
�2. Probability Density
�3. Mean & Standard Deviation Mean Mean MedianMedian
abxf
−=
1)(
122
abba −=
+= σµ
ab −
1
x
ffff((((xxxx))))
ba
Exponential Distribution
�1. Describes Time or Distance Between Events
�2. Density Function
�3. Parameters
X
f(X)
xexf
λλ −=)(
λλ σµ 11 , ==
λλλλλλλλ = 0.5= 0.5
λλλλλλλλ = 2.0= 2.0
Normal Distribution
11
X
f(X)
CA
B
A and B have the same center, but different standard deviations (shape).
A and C have the same standard deviations (shape), but different means (shifted).
f xx
( ) exp( )
= −−
1
2 2
2
2σ π
µ
σ
Examples of normal random variables
�testosterone level of male students�head circumference of adult females� length of middle finger of Math 225 students
�test scores in Math 225�height of all kindergarten kids at a school
12
3/30/2009
3
40 50 60 70 80 90 100
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
Density
Bell-shaped curve
Mean = 70 SD = 5
Mean = 70 SD = 10
13
Characteristics of normal normal normal normal distributiondistributiondistributiondistribution
� Symmetric, bell-shaped curve.� Shape of curve depends on population mean µµµµand standard deviation σσσσ.
� Center of distribution is µµµµ.� Spread is determined by σσσσ.�Most values fall around the mean, but some values are smaller and some are larger.
STANDARD NORMAL DISTRIBUTION:Mean: µµµµ = 0 Standard deviation: σσσσ =1
14
Probabilities for Normal Distributions
15
?)()( dxxfdxcPd
c∫=≤≤
c dx
f(x)
Probability is Probability is Probability is Probability is area area area area under under under under curvecurvecurvecurve!!!!
X
f(X)
Infinite Number of Tables
Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & Normal distributions differ by mean & standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.standard deviation.
Each distribution would Each distribution would require its own table.require its own table.
Standardize the Normal Distribution
Xµµµµ
σσσσ
One table!One table!
Normal DistributionNormal Distribution
µµµµ = 0
σσσσ = 1
Z
ZX
====−−−− µµµµ
σσσσ Standardized Normal Distribution
Standardized Normal Distribution
To find probability follow these steps:
�Draw the normal distribution and shade the area of interest
�Find the standardized score (z-score) for the given x.
�Find the probability using the z-table or calculator
zx
=− µ
σ
18
3/30/2009
4
TI-83, 84: DISTR � 2:normalcdf(
55 60 65 70 75 80 85
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
De
nsity
Probability student scores higher than 75?
P(X > 75)
19
upper-tail: normalcdf(z,9999)
lower-tail: normalcdf(-9999,z)
Between part: normalcdf(z1,z2)
55 65 75 85
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
De
nsity
P(X < 65)
55 60 65 70 75 80 85
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Grades
Den
sity
P(65 < X < 70)
To find x from given area follow these steps
�Draw and shade
�Find the LOWER tail probability INSIDE the table, and read off the corresponding z-score. OR: use DISTR�3:invNorm(
�To find x use the formula:
x z= ⋅ +σ µ
20
Parameter versus statistic� Sample:Sample:Sample:Sample: the part of
the population we
actually examine
and for which we do
have data.
� A sssstatistictatistictatistictatistic is a
number describing a
characteristic of a
ssssample. We often
use a statistic to
estimate an
unknown population
parameter.
� Population:Population:Population:Population: the
entire group of
individuals in which
we are interested
but can’t usually
assess directly.
� A pppparameterarameterarameterarameter is a
number describing
a characteristic of
the ppppopulation.
Parameters are
usually unknown.
21
Example
22
� The Environmental Protection Agency took soil samples at 20 locations near a former industrial waste dump and checked each for evidence of toxic chemicals. They found no elevated levels of any harmful substances.
� Population: ALL the soil near the waste dump
� Sample: the 20 soil samples
� Parameter: mean level of toxic chemicals in the ground around the waste dump
� Statistic: the mean level of toxic chemicals in the 20 soil samples
Notation
Variable of interest: CategoricalCategoricalCategoricalCategorical
Variable of interest: QuantitativeQuantitativeQuantitativeQuantitative
23
� Then we are interested in PROPORTION
Notation:
� Population parameter: p� Sample statistic :
� Then we are interested in MEAN
Notation:
� Population parameter:
� Sample statistic: $p
µ
x
Sampling Variability�When we take many samples, the statistics from the samples are usually different from the population figures, and also different from what we got in the first sample.
�This very intuitive idea, that sample results change from sample to sample, is called sampling variability.
24
3/30/2009
5
Comments� 1. ParametersParametersParametersParameters are usually unknown, because it is impractical or impossible to know exactly what values a variable takes for every member of the population.
� 2. StatisticsStatisticsStatisticsStatistics are computed from the sample, and vary from sample to sample due to sampling variability.
25
Sampling Distributions
�The sampling distribution is The sampling distribution is The sampling distribution is The sampling distribution is a distribution of a sample a distribution of a sample a distribution of a sample a distribution of a sample statistic in infinite number statistic in infinite number statistic in infinite number statistic in infinite number of samples.of samples.of samples.of samples.
26
Sampling distribution of the sample mean, xSampling distribution of
Histogram of some sample averages
27
x
OK, we have the sampling distribution of the sample means. Then what?
Sampling distributions, like data distributions, are best described by shape, center, and spread.
28
Shape, Center, and Spread�Shape:Shape:Shape:Shape: Many, but not all, sampling distributions are approximately normal.
�Center: Center: Center: Center: The meanmeanmeanmean will be denoted by with a subscript to indicate which sampling distribution is being discussed. For example, the mean of the sampling distribution of the sampling distribution of the sampling distribution of the sampling distribution of the meanmeanmeanmean is represented by the symbol . (The mean of the sample means.)
�Spread: Spread: Spread: Spread: the standard deviation standard deviation standard deviation standard deviation of the sampling distribution of the sample means and is
µ
µX
σX
29
Mean and standard error of the sampling distribution of the sample means
� Suppose that is the mean of an SRS of size n drawn from a large population with mean μ and standard deviation σ. Then the sampling distribution of has
mean standard deviation
x
x
µ µx = σσ
xn
=
and
30
3/30/2009
6
Sampling distribution of
µµµµ
σ/√n
For any population with mean µ and standard deviation σ:
� The mean,mean,mean,mean, or center of the sampling distribution of , is equal to the population mean µµµµ....
� The standard deviationstandard deviationstandard deviationstandard deviation of the sampling distribution is σσσσ////√√√√n,n,n,n,
where n is the sample size.
x
x
31
Mean of a sampling distribution of
There is no tendency for a sample mean to fall
systematically above or below µ, even if the
distribution of the raw data is skewed. Thus, the
mean of the sampling distribution of is an
unbiasedunbiasedunbiasedunbiased estimator estimator estimator estimator of the population mean μ —it
will be “correct on average” in many samples.
x
x
32
Standard error of a sampling distribution of
The standard deviation of the sampling
distribution measures how much the sample
statistic varies from sample to sample. It is
smaller than the standard deviation of the
population by a factor of √n.� Averages are Averages are Averages are Averages are
less variable than individual observations.less variable than individual observations.less variable than individual observations.less variable than individual observations.
x
x
33
Generating Sampling Distributions
1. Take a random sample of a fixed size n from a population.
2. Compute the summary statistics (mean, proportion).
3. Repeat steps 1 and 2 many times.
4. Display the distribution of the summary statistics.
34
Example
� Extensive studies have found that the DMS odor threshold of adults follows a roughly normal distribution with mean =25 micrograms per liter and standard deviation =7 micrograms per liter. With this information, we can simulate many runs of our study with different subjects drawn at random from the population. We take 1000 samples of size 10, find the 1000 sample mean thresholds , and make a histogram of these 1000 values.
µ
x
σ
35
The results from the 1000 samples
� 1st SRS of size 10:
� 2nd SRS of size 10:
� 3rd SRS of size 10:
� 1000th SRS of size 10:
x s= =36 32, .
x s= =22 8 2 7. , .
x s= =30 4 41. , .
M
x s= =28 9 21. , .
36
3/30/2009
7
35302520
100
90
80
70
60
50
40
30
20
10
0
C1
Fre
qu
en
cy
Shape: looks normal.
Center: the mean of the 1000‘s is 25.073.
The distribution is centeredvery close to the populationmean
x
µ = 25
Spread: the standard error of the 1000 ‘s is 2.191, notablysmaller than the standard deviation of the population.
xσ = 7
The sampling distribution of the statistic .x
µx = 25073.
37
For normally distributed populationsWhen a variable in a population is normally distributed, then the sampling distribution of for all possible samples of size n is also normally distributed.
If the population is N(µ,σµ,σµ,σµ,σ), then the sample means distribution is N(µ,σµ,σµ,σµ,σ/√n ).
Population
Sample means
x
38
IQ scores: population vs. sampleIQ scores: population vs. sampleIQ scores: population vs. sampleIQ scores: population vs. sample
In a large population of adults, the mean IQ is 112 with standard deviation 16. Suppose 100
adults are randomly selected for a market research campaign.
�The distribution of the sample mean IQ is
A) exactly normal, mean 112, standard deviation 16.
B) approximately normal, mean 112, standard deviation 16.
C) approximately normal, mean 112 , standard deviation 1.6.
D) approximately normal, mean 112, standard deviation 4 .
C) approximately normal, mean 112, standard deviation 1.6.
Population distribution: N (µ = 112; σ = 16)
Sampling distribution for n = 200 is N (µ = 112; σ /√n = 1.6)
μ σ
nnnn
39
ApplicationApplicationApplicationApplication
Hypokalemia is diagnosed when blood potassium levels are low, below 3.5mEq/dl. Let’s assume
that we know a patient whose measured potassium levels vary daily according to a normal
distribution N(µ = 3.8, σ = 0.2).
If only one measurement is made, what's the probability that this patient will be misdiagnosed
hypokalemic?
2.0
8.35.3)( −=
−=
σ
µxz z = −1.5, P(z < −1.5) = 0.0668 ≈ 7%
If instead measurements are taken on four separate days, what is the probability of such a misdiagnosis?
42.0
8.35.3)( −=
−=
n
xz
σ
µz = −3, P(z < −1.5) = 0.0013 ≈ 0.1%
Note:
Make sure to standardize (z) using the standard deviation for the sampling distribution.40
But…But…But…But…
� Not all variables are normally distributed.
� Income is typically strongly skewed for
example.
� Is still a good estimator of µ then?
� The Central Limit Theorem will rescue
us!
x
41
The Central Limit Theorem The Central Limit Theorem The Central Limit Theorem The Central Limit Theorem VERY IMPORTANT!!!VERY IMPORTANT!!!VERY IMPORTANT!!!VERY IMPORTANT!!!
When randomly sampling from any
population with mean µ and standard
deviation σ, when when when when nnnn is large enough,is large enough,is large enough,is large enough,
the sampling distribution of is
approximately normal: N(µ, σ/√n).
x
42
3/30/2009
8
Central Limit Theorem� TheTheTheThe Central Limit Theorem Central Limit Theorem Central Limit Theorem Central Limit Theorem guarantees that a guarantees that a guarantees that a guarantees that a distribution of sample mean to be approximately distribution of sample mean to be approximately distribution of sample mean to be approximately distribution of sample mean to be approximately normal as long as the sample size is large normal as long as the sample size is large normal as long as the sample size is large normal as long as the sample size is large enough.enough.enough.enough.
� We will depend on the Central Limit Theorem again and again in order to take advantage of normal probability calculations when we use sample mean to draw conclusions about population mean, even if the population distribution is not normal.
43 44
Comments
�There is no requirement on the shape of the population distribution. This is where the strength of the Central Limit Theorem lies. It tells us that regardless of the shape of the population distribution, averages that are based on a large enough sample will have a normal distribution.
45
The central limit theorem
Population with strongly skewed
distribution
Sampling distribution of for n = 2
observations
Sampling distribution of
for n = 10 observations
Sampling distribution of for n = 25
observationsx x
x
46
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
Assessing Normality
47
� A normal probability plot is a graph with the original set of data on the x-axis, and the corresponding z scores for each data value on the y-axis.
� If the points appear to lie reasonably close to a straight line and there does not appear to be a systematic pattern that is not a straight line, we can conclude that the data came from a normally distributed population.
48
Data from a right-skewed distribution
Data from a left-skewed distribution
Data from a Short-tailed distribution
Data from a Long-tailed distribution
Data from a Normal distribution
v