normal distribution
DESCRIPTION
STAT 101 Dr. Kari Lock Morgan. Normal Distribution. Chapter 5 Normal distribution Central limit theorem Normal distribution for confidence intervals Normal distribution for p-values Standard normal. Re-grade Requests. 4e potential grading mistake: 0.025 is correct - PowerPoint PPT PresentationTRANSCRIPT
Statistics: Unlocking the Power of Data Lock5
Normal Distribution
STAT 101
Dr. Kari Lock Morgan
Chapter 5• Normal distribution• Central limit theorem• Normal distribution for confidence intervals• Normal distribution for p-values• Standard normal
Statistics: Unlocking the Power of Data Lock5
Re-grade Requests4e potential grading mistake: 0.025 is correct
Requests for a re-grade must be submitted in writing by class on Wednesday, March 5th
Partial credit will NOT be adjusted
Valid re-grade requests: You got points off but believe your answer is correct Points were added incorrectly
Warning: scores may go up or down
Statistics: Unlocking the Power of Data Lock5
slope (thousandths)-60 -40 -20 0 20 40 60
Measures from Scrambled RestaurantTips Dot Plot
r-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Measures from Scrambled Collection 1 Dot Plot
Nullxbar98.2 98.3 98.4 98.5 98.6 98.7 98.8 98.9 99.0
Measures from Sample of BodyTemp50 Dot Plot
Diff-4 -3 -2 -1 0 1 2 3 4
Measures from Scrambled CaffeineTaps Dot Plot
xbar26 27 28 29 30 31 32
Measures from Sample of CommuteAtlanta Dot Plot
Slope :Restaurant tips
Correlation: Malevolent uniforms
Mean :Body Temperatures
Diff means: Finger taps
Mean : Atlanta commutes
phat0.3 0.4 0.5 0.6 0.7 0.8
Measures from Sample of Collection 1 Dot PlotProportion : Owners/dogs
What do you notice?
Bootstrap and Randomization Distributions
Statistics: Unlocking the Power of Data Lock5
• The symmetric, bell-shaped curve we have seen for almost all of our bootstrap and randomization distributions is called a normal distribution
Normal Distribution
Freq
uenc
y
-3 -2 -1 0 1 2 3
050
010
0015
00
Statistics: Unlocking the Power of Data Lock5
Central Limit Theorem!
For a sufficiently large sample size, the distribution of sample
statistics for a mean or a proportion is normal
www.lock5stat.com/StatKey
Statistics: Unlocking the Power of Data Lock5
Distribution of 100n30n10n1n
0.5p
0.0 0.5 1.0
50n
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
0.7p
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
0.0 0.5 1.0
0.1p
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
Statistics: Unlocking the Power of Data Lock5
CLT for a MeanPopulation Distribution of
Sample DataDistribution of Sample Means
n = 10
n = 30
n = 50
Freq
uenc
y
0 1 2 3 4 5 6
0.0
1.5
3.0
1.0 2.0 3.0
Freq
uenc
y
0 1 2 3 4 5
04
8
1.5 2.0 2.5 3.0
Freq
uenc
y
0 2 4 6 8 12
010
25
1.4 1.8 2.2 2.6
x
0 2 4 6 8 10
Statistics: Unlocking the Power of Data Lock5
Central Limit Theorem
• The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies
• The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work
• For small samples, it is more important that the data itself is approximately normal
Statistics: Unlocking the Power of Data Lock5
Central Limit Theorem
• For distributions of a quantitative variable that are not very skewed and without large outliers, n ≥ 30 is usually sufficient to use the CLT
• For distributions of a categorical variable, counts of at least 10 within each category is usually sufficient to use the CLT
Statistics: Unlocking the Power of Data Lock5
Accuracy• The accuracy of intervals and p-values generated using simulation methods (bootstrapping and randomization) depends on the number of simulations (more simulations = more accurate)
• The accuracy of intervals and p-values generated using formulas and the normal distribution depends on the sample size (larger sample size = more accurate)
• If the distribution of the statistic is truly normal and you have generated many simulated randomizations, the p-values should be very close
Statistics: Unlocking the Power of Data Lock5
• The normal distribution is fully characterized by it’s mean and standard deviation
Normal Distribution
mean,standard deviationN
Statistics: Unlocking the Power of Data Lock5
Bootstrap DistributionsIf a bootstrap distribution is approximately normally distributed, we can write it as
a) N(parameter, sd)b) N(statistic, sd)c) N(parameter, se)d) N(statistic, se)sd = standard deviation of variablese = standard error = standard deviation of statistic
Statistics: Unlocking the Power of Data Lock5
Hearing Loss• In a random sample of 1771 Americans aged 12 to 19, 19.5% had some hearing loss (this is a dramatic increase from a decade ago!)
• What proportion of Americans aged 12 to 19 have some hearing loss? Give a 95% CI.
Rabin, R. “Childhood: Hearing Loss Grows Among Teenagers,” www.nytimes.com, 8/23/10.
Statistics: Unlocking the Power of Data Lock5
Hearing Loss
(0.177, 0.214)
Statistics: Unlocking the Power of Data Lock5
Hearing Loss
N(0.195, 0.0095)
Statistics: Unlocking the Power of Data Lock5
Confidence Intervals
If the bootstrap distribution is normal:
To find a P% confidence interval , we just need to find the middle P% of the distribution
N(statistic, SE)
Statistics: Unlocking the Power of Data Lock5
Area under a Curve• The area under the curve of a normal distribution is equal to the proportion of the distribution falling within that range
• Knowing just the mean and standard deviation of a normal distribution allows you to calculate areas in the tails and percentiles
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data Lock5
Hearing Loss
(0.176, 0.214)
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data Lock5
Standardized DataOften, we standardize the data to have mean 0
and standard deviation 1
This is done with z-scores
From x to z : From z to x:
Places everything on a common scale
x meanzsd
Statistics: Unlocking the Power of Data Lock5
Standard Normal• The standard normal distribution is the normal distribution with mean 0 and standard deviation 1
0,1N
Distribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Statistics: Unlocking the Power of Data Lock5
Standardized DataConfidence Interval (bootstrap distribution):
mean = sample statistic, sd = SE
From z to x: (CI)
x statist Eic Sz
Statistics: Unlocking the Power of Data Lock5
z*-z*
P%
P% Confidence Interval2. Return to
original scale with statistic z* SE
1. Find z-scores (–z* and z*) that capture
the middle P% of the standard normal
Statistics: Unlocking the Power of Data Lock5
Confidence Interval using N(0,1)
If a statistic is normally distributed, we find a confidence interval for the parameter using
statistic z* SE
where the area between –z* and +z* in the standard normal distribution is the desired
level of confidence.
Statistics: Unlocking the Power of Data Lock5
Confidence IntervalsFind z* for a 99% confidence interval.
www.lock5stat.com/statkey
z* = 2.575
Statistics: Unlocking the Power of Data Lock5
z*Why use the standard normal?
Common confidence levels:
95%: z* = 1.96 (but 2 is close enough)
90%: z* = 1.645
99%: z* = 2.576
Statistics: Unlocking the Power of Data Lock5
In March 2011, a random sample of 1000 US adults were asked
“Do you favor or oppose ‘sin taxes’ on soda and junk food?”
320 adults responded in favor of sin taxes.
Give a 99% CI for the proportion of all US adults that favor these sin taxes.
From a bootstrap distribution, we find SE = 0.015
Sin Taxes
Statistics: Unlocking the Power of Data Lock5
Sin Taxes
Statistics: Unlocking the Power of Data Lock5
Sin Taxes
Statistics: Unlocking the Power of Data Lock5
Randomization DistributionsIf a randomization distribution is approximately normally distributed, we can write it as
a) N(null value, se)b) N(statistic, se)c) N(parameter, se)
Statistics: Unlocking the Power of Data Lock5
p-valuesIf the randomization distribution is normal:
To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution
Statistics: Unlocking the Power of Data Lock5
First Born Children• Are first born children actually smarter?
• Explanatory variable: first born or not• Response variable: combined SAT score
• Based on a sample of college students, we find
• From a randomization distribution, we find SE = 37
Statistics: Unlocking the Power of Data Lock5
First Born Children
SE = 37
What normal distribution should we use to find the p-value?
a) N(30.26, 37)b) N(37, 30.26)c) N(0, 37)d) N(0, 30.26)
Statistics: Unlocking the Power of Data Lock5
Hypothesis TestingDistribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Observed Statistic
Distribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Distribution of Statistic Assuming Null
Statistic
-3 -2 -1 0 1 2 3
Observed Statistic
p-value
Statistics: Unlocking the Power of Data Lock5
First Born ChildrenN(0, 37)
www.lock5stat.com/statkey
p-value = 0.207
Statistics: Unlocking the Power of Data Lock5
Standardized DataHypothesis test (randomization distribution):
mean = null value, sd = SE
From x to z (test) :
x meanzsd
Statistics: Unlocking the Power of Data Lock5
p-value using N(0,1)
If a statistic is normally distributed under H0, the p-value is the probability a standard normal is beyond
Statistics: Unlocking the Power of Data Lock5
First Born Children
SE = 37
1) Find the standardized test statistic
2) Compute the p-value
Statistics: Unlocking the Power of Data Lock5
First Born Children
Statistics: Unlocking the Power of Data Lock5
z-statistic
If z = –3, using = 0.05 we would
(a) Reject the null(b) Not reject the null(c) Impossible to tell(d) I have no idea
Statistics: Unlocking the Power of Data Lock5
z-statistic
• Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale
Statistics: Unlocking the Power of Data Lock5
Confidence Interval Formula
*sample statistic z SE
From original data
From bootstrap
distribution
From N(0,1)
IF SAMPLE SIZES ARE LARGE…
Statistics: Unlocking the Power of Data Lock5
Formula for p-values
From randomization
distribution
From H0
sample statistic null valueSE
z
From original data
Compare z to N(0,1) for p-value
IF SAMPLE SIZES ARE LARGE…
Statistics: Unlocking the Power of Data Lock5
Standard Error
• Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations?
• We can!!!
• Or at least we’ll be able to next class…
Statistics: Unlocking the Power of Data Lock5
• For quantitative data, we use a t-distribution instead of the normal distribution
•The t distribution is very similar to the standard normal, but with slightly fatter tails (to reflect the uncertainty in the sample standard deviations)
t-distribution
Statistics: Unlocking the Power of Data Lock5
• The t-distribution is characterized by its degrees of freedom (df)
• Degrees of freedom are based on sample size• Single mean: df = n – 1 • Difference in means: df = min(n1, n2) – 1• Correlation: df = n – 2
• The higher the degrees of freedom, the closer the t-distribution is to the standard normal
Degrees of Freedom
Statistics: Unlocking the Power of Data Lock5
t-distribution
Statistics: Unlocking the Power of Data Lock5
Aside: William Sealy Gosset
Statistics: Unlocking the Power of Data Lock5
The Pygmalion Effect
Source: Rosenthal, R. and Jacobsen, L. (1968). “Pygmalion in the Classroom: Teacher Expectation and Pupils’ Intellectual Development.” Holt, Rinehart and Winston, Inc.
Teachers were told that certain children (chosen randomly) were expected to be intellectual “growth spurters,” based on the Harvard Test of Inflected Acquisition (a test that didn’t actually exist). These children were selected randomly.
The response variable is change in IQ over the course of one year.
Statistics: Unlocking the Power of Data Lock5
The Pygmalion Effect
n sControl Students 255 8.42 12.0“Growth Spurters” 65 12.22 13.3
X
Can this provide evidence that merely expecting a child to do well actually causes the child to do better?
If so, how much better?
*s1 and s2 were not given, so I set them to give the correct p-value
SE = 1.8
Statistics: Unlocking the Power of Data Lock5
Pygmalion Effect
Statistics: Unlocking the Power of Data Lock5
Pygmalion EffectFrom the paper: “The difference in gains could be ascribed
to chance about 2 in 100 times”
Statistics: Unlocking the Power of Data Lock5
Pygmalion Effect
Statistics: Unlocking the Power of Data Lock5
To DoDo Project 1 (due 3/7)
Read Chapter 5