inferential statistics doing stats with confidence

36
INFERENTIAL STATISTICS DOI NG S TATS WIT H CO N FIDEN CE

Upload: erik-dickerson

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Population: the total set of items that we are concerned about Parameter: a measure used to summarize a population (could include mean, median, standard deviation) Sample: a subset of the population (assumed to be sampled randomly, where every object has an equally likelihood of being selected) Statistic: a measure used to summarize a sample (mean, median, SD)

TRANSCRIPT

Page 1: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

INFERENTIA

L STAT

ISTICS

D O I NG S

T A T S WI T

H CO N F I D

E N C E

Page 2: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

GREEK SYMBOLS EVERYWHERE

Page 3: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

GREEK SYMBOLS EVERYWHERE • Population: the total set of items that we are

concerned about • Parameter: a measure used to summarize a

population (could include mean, median, standard deviation)

• Sample: a subset of the population (assumed to be sampled randomly, where every object has an equally likelihood of being selected)

• Statistic: a measure used to summarize a sample (mean, median, SD)

Page 4: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

CALCULATING PARAMETERS: POPULATIONS-Mean is always the same-Standard deviation

Page 5: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

CALCULATING STATISTICS: SAMPLES -Mean is always the same

The difference is that here you divide by (n-1)• When you only divide by N, you get consistently

low estimates of the population SD sigma • For this reason, the estimate is always made with

(n-1)

Page 6: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

CENTRAL LIMIT T

HEOREM

T H E MO S T I M

P O R T A N T TH I N

G I N S

T A T S

Page 7: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

INTUITIVE EXPLANATION-we can start out with any distribution (a continuous or a discrete distribution), and if it has a mean and a standard deviation, even if it looks nothing like a normal distribution:

Page 8: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

NOW, WE TAKE A SAMPLE FROM THIS DIST.

• You take a sample from this distribution (1, 2, 5, 5) 

• Say the first time, you take a sample size of 4 (n=4) [a sample of 4 samples!]• The SAMPLE refers to the SET of 4

numbers, and the sample size or “n" tells you how many you took in your sample

Page 9: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

NEXT WE AVERAGE THIS SAMPLE & PLOT IT• If we average out 1,2,5, and 5 we get 3.25 (then plot

it)

• NOW, repeat this again and again and again (i.e. increase your number of trials) and plot the mean of every single sample you take

• You continue to take samples, size 4, aveerage them, plot the frequency of the averages

• Say you do this 10,000 times

• Your plot will begin to look like the normal distribution

Page 10: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

SIMULATE THIS-Use this simulation: http://onlinestatbook.com/stat_sim/sampling_dist/

Page 11: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

THE SIMULATION AND WHAT WE NOTICE ABOUT THE CLT-The difference between n=5 and n=10 shows a

much more normal shape, tighter around the mean

-The mean is the same between the population and the sampling distribution

Page 12: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

SAMPLING DISTRIBUTION OF THE SAMPLE MEAN• This is the name of what we just made• To recap, you make it using the

following steps• Take a sample size n•Average your values• Plot them•Do it over and over• Plot each one •Watch as your plot begins to approximate the normal curve

Page 13: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

STANDARD ERROR OF THE MEAN• When you take a sample, then take another sample, the

means will be different.• When you take many samples again and again, then

calculate the mean for each sample these means you can plot this to form a distribution (sampling dist. Of the sampling mean)

• and then you can calculate the standard deviation of the distribution of these means

This is the standard error of the mean

Page 14: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

THERE IS A SIMPLE WAY TO TAKE THE STANDARD ERROR-you don’t even need to take 10,000

samples • s.e. is the standard error of the mean• sigma is the SD of the population• n is the sample size• But we rarely know the SD of the

population • We can use the second formula

above to estimate the standard deviation 

Page 15: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

PRACTICE PR

OBLEMS

Page 16: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

EXAMPLE 1The average woman drinks 2 L of water when active outdoors for a day (with a standard deviation of 0.7 L). You’re planning a trip for 50 women and you bring 110 L of water. What is the probability that you will run out of water?

Page 17: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

DISTRIBUTION OF THE POPULATIONThis data is an estimate of the population parameters. We are not told the distribution, but can guess at a drawing to ground our thinking 

Page 18: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

TRANSLATE THE PROBLEM INTO PROBABILITY

What we are looking for is the probability that the average woman drinks more than 2.2 liters of water (since we brought 110 L divided by 50 women)

Page 19: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

ANOTHER WAY OF SAYING THIS…• If we were to take an infinite number of samples

(n=50), what is the probability of the those contained in the sample drinking more than 2.2 L

• This sets us up to use the sampling distribution of the sample mean

• We can take the sampling distribution of the sampling mean when n=50

• Remember, the mean would remain at 2 L

Page 20: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

CALCULATE THE STANDARD ERROR

• We already got the mean, now we need to get the standard error

• This is the same thing as the standard deviation of the sample mean = S/sqrt(n) or 0.7/sqrt(50)

• Standard error = 0.099 (almost 0.1) /a very narrow SD

Page 21: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

DRAW YOUR PICTURE TO CONCEPTUALIZE WHAT YOU JUST FOUND

Page 22: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

NEXT STEP• Go back to the question: (we are looking for the

probability our sample will have an average of 2.2 L)

• our distribution is the plot of all possible samples. We will run out of water if our sample mean falls above 2.2

Page 23: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

NEXT STEP• We are finding the probability of the area of under

the curve highlighted in green hatching • We can use a z table to figure out what the green

area is • When we are above 2.2 L, we are 0.2 above the

mean.• If we want that in terms of standard deviations,

use the formula for the z score: x bar - mu / sigma or (2.2-2)/0.099=0.2/0.099=2.02• This value of 2.2 L has the same probability of

being 2.02 SD above the mean 

Page 24: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

LOOK IT UP IN THE Z TABLE

Be sure and consult your picture so you know exactly what your z score is telling you

Page 25: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

TRANSLATE THE Z SCORE INTO A PROBABILITY

Page 26: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

FINAL ANSWER

Final answer: there is a 2.17% chance we will run out of water (i.e. get a sample of 50 people who consume more than the mean amount of water)

Page 27: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

EXAMPLE 2

You sample 36 apples from your farm’s harvest of 200,000 apples. The mean weight is 112 grams (with a 40 g SD) what is the probability that the mean weight of all 200,000 apples is between 100 and 124 grams?

Page 28: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

THINK ABOUT WHAT THE PROBLEM WANTS

• This is asking you to conceptualize the sampling distribution of the sample mean

• We know that if we took a sample size 35 over and over, a distribution would form where the sampling mean would equal the population mean mu, and the SD of the distribution can be found with the formula for standard error

Page 29: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

START OUT WITH A PICTURE

Page 30: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

CALCULATE MEAN AND S.E.Always start by figuring out what you can:• The mean is 112

• The standard error is: 40/sqrt(36)=6.67

Page 31: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

FORMULATE WHAT THE PROBLEM WANTS IN TERMS OF PROBABILITY

• Go back to the original question: this is asking us what the probability that the population mean is within 12 of the sample mean (x bar)

• i.e. the sample mean is within 12 of the actual mean

• You know you’re being asked for a confidence interval because of the range

Page 32: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

FIND THE Z SCORE FOR 12 ABOVE OR BELOW THE MEAN:

• use the z score formula: x bar minus mu/ sigma

Get the z score: (112-100)/6.67=1.8Go back to the question: This is like saying what is the probability that our sample of 36 apples is within 1.8 standard deviations of the mean?

Page 33: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

DRAW IT AGAIN WITH THE NEW INFORMATION

Page 34: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

FIND THE AREA UNDER THE HATCHING• Use the z table

Page 35: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

INTERPRET THE PROBABILITY FROM THE Z TABLE

• Given this z chart shows from mean to z, you need to double it to get 1.8 SD in either direction

=0.46407*2=0.92814

Page 36: INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE

PUT IT BACK INTO PLAIN WORDS

• Put everything back in English: The probability that the sample mean is 1.8 SD from the actual mean has a 92.8 % chance, or • there is a 92.82% chance that the actual

population mean is within 12 grams of our sample (between 112 & 124)

• Also we are 92.8% confident that the mean is between 112 and 124 g