last lecture summary which measures of variability do you know? what are they advantages and...

29
Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Upload: ada-lindsey

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Last lecture summary• Which measures of variability do you know?• What are they advantages and disadvantages?• Empirical rule

Page 2: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Statistical jargon

Population - parameterMean Standard deviation

Sample - statisticMean Standard deviation

Výběr - statistikaVýběrový průměr Výběrová směrodatná odchylka

population (census) vs. sampleparameter (population) vs. statistic (sample)

Page 3: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Statistical inference• A statistic is a value calculated from our observed data

(sample).

• A parameter is a value that describes the population.

• We want to be able to generalize what we observe in our data to our population. In order to this, the sample needs to be representative.

• How to select a representative sample? Use randomization.

Page 4: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

New stuff

Page 5: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Random sampling• Simple Random Sampling (SRS) – each possible

sample from the population is equally likely to be selected.

• Stratified Sampling – simple random sample from subgroups of the population• subgroups: gender, age groups, …

• Cluster sampling – divide the population into non-overlapping groups (clusters), sample is a randomly chosen cluster• example: population are all students in an area, randomly select

schools and create a sample from students of the given school

Page 6: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Simple random sampling• sampling with replacement (WR)

• výběr s navrácením

• Generates independent samples• Two sample values are independent if that what we get on the first

one doesn't affect what we get on the second.

• sampling without replacement (WOR)• výběr bez navrácení

• Deliberately avoid choosing any member of the population more than once.

• This type of sampling is not independent, however it is more common.

• The error is small as long as 1. the sample is large

2. the sample size is no more than 10% of population size

Page 7: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bias• If a sample is not representative, it can introduce bias into

our results.• bias – zkreslení, odchylka• A sample is biased if it differs from the population in a

systematic way.

• The Literary Digest poll, 1936, U. S. presidential election• surveyed 10 mil. people – subscribers• 2.3 mil. responded predicting (3:2) a Republican candidate to win• a Democrat candidate won• What went wrong?

• only wealthy people were surveyed (selection bias)• survey was voluntary response (nonresponse bias) – angry people or

people who want a change

Page 8: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bessel’s correction

𝑠=√∑ (𝑥𝑖−𝑥 )2

𝑛−1www.udacity.com – Statistics

Page 9: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Sample vs. population SD• We use sample standard deviation to approximate

population paramater

• But don’t get confused with the actual standard deviation of a small dataset.

• For example, let’s have this dataset: 5 2 1 0 7. Do you divide by or by ?

Page 10: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bessel's game

𝜇=2

Page 11: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bessel's game• An important property of a sample statistic that estimates

a population parameter is that if you evaluate the sample statistic for every possible sample and average them all, the average of the sample statistic should equal to the population parameter.

We want: • This is called unbiased.

Page 12: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bessel’s game1. List all possible samples of 2 cards.

2. Calculate sample averages.

SampleSample average

Population of all cards in a bag

Page 13: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bessel’s game1. List all possible samples of 2 cards.

2. Calculate sample averages.

3. Now, half of you calculate samplevariance using /n, and half of you using /(n-1).

4. And then average all sample variances.

SampleSample average

Sample variance

0,2 1

0,4 2

2,0 1

2,4 3

4,0 2

4,2 3

0,0 0

2,2 2

4,4 4Population of all cards in a bag

𝑠2=∑ (𝑥𝑖−𝑥 )2

𝑛∨𝑛−1

Page 14: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Bessel’s game

SampleSample average

Sample variance (n-1) Sample variance (n)

0,2 1 2 1

0,4 2 8 4

2,0 1 2 1

2,4 3 2 1

4,0 2 8 4

4,2 3 2 1

0,0 0 0 0

2,2 2 0 0

4,4 4 0 0

average

𝜇=2 ,𝜎 2=83

Page 15: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Median absolute deviation (MAD)• standard deviation is not robust• IQR is robust• mean absolute deviation MAD – a robust equivalent of the

standard deviation

• Také your data, find median, calculate absolute deviation from the median, find the median of absolutes deviations

Page 16: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Median absolute deviation (MAD)Data Median deviation Absolute deviation

5

10

30

20

30

5

15

10

15

Median:

MAD:

Page 17: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

NORMAL DISTRIBUTION

Page 18: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Playing chess• Pretend I am a chess player.• Which of the following tells you most about how good I

am:1. My rating is 1800.

2. 8110th place among world competitive chess players.

3. Ranked higher than 88% of competitive chess players.

Page 19: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Distribution

Distribution of scores in one particular year

We should use relative frequencies and convert all absolute frequencies to proportions.

Page 20: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – absolute frequencies

http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

Page 21: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – relative frequencies

Page 22: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule
Page 23: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – relative frequencies

30%

What proportion of values is between 170 cm and 173.75 cm?

Page 24: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – relative frequencies

What proportion of values is between 170 cm and 175 cm?

We can’t tell for certain.

Page 25: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

• How should we modify data/histogram to allow us a more detail?1. Adding more value to the dataset

2. Increasing the bin size

3. A smaller bin size

Page 26: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – relative frequencies

What proportion of values is between 170 cm and 175 cm?

36%

Page 27: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – relative frequencies

Page 28: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Height data – relative frequencies

Page 29: Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule

Normal distribution

1

√2𝜎𝜋𝑒𝑥𝑝 {− (𝑥−𝜇)2

2𝜎2 }

recall the empirical rule

68-95-99.7