summary

46
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability range, IQR, average absolute deviation, variation and standard deviation Average distance between each data value and the mean is zero.

Upload: peers

Post on 23-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Summary. Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability range, IQR , average absolute deviation, variation and standard deviation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Summary

Summary• Five numbers summary, percentiles, mean• Box plot, modified box plot• Robust statistic – mean, median, trimmed mean

• outlier• Measures of variability

• range, IQR, average absolute deviation, variation and standard deviation

• Average distance between each data value and the mean is zero.

Page 2: Summary

Standard deviation – empirical rule

Page 3: Summary

Standard deviation – empirical rule

Page 4: Summary

Standard deviation – empirical rule

Page 5: Summary

Population - parameterMean Standard deviation

Sample - statisticMean Standard deviation

Výběr - statistikaVýběrový průměr Výběrová směrodatná odchylka

population (census) vs. sample

parameter (population) vs. statistic (sample)

Page 6: Summary

Bias, sampling• Sampling – how to construct sample from the population?• Bias – a sample is biased if it differs from the population in

a systematic way.• Unbiased standard deviation – divide by .

𝑠=√∑ (𝑥𝑖−𝑥 )2

𝑛−1 ≈𝜎=√∑ (𝑥𝑖−𝜇)2

𝑛

Page 7: Summary

SRS• sampling with replacement

• Generates independent samples.• Two sample values are independent if that what we get on the first

one doesn't affect what we get on the second.• sampling without replacement

• Deliberately avoid choosing any member of the population more than once.

• This type of sampling is not independent, however it is more common.

• The error is small as long as 1. the sample is large2. the sample size is no more than 10% of population size

Page 8: Summary

• Suppose you have a bag with 3 cards in it. The cards are numbered 0, 2 and 4.

• Population mean = 2• Population variance = 8/3

• An important property of a sample statistic that estimates a population parameter is that if you evaluate the sample statistic for every possible sample and average them all, the average of the sample statistic should equal the population parameter.

We want: • This is called unbiased.

Page 9: Summary

Bessel’s game

Sample Sample average Sample variance (n-1) Sample variance (n)

0,2 1 2 10,4 2 8 42,0 1 2 12,4 3 2 14,0 2 8 44,2 3 2 10,0 0 0 02,2 2 0 04,4 4 0 0

average

𝜇=2 ,𝜎=83

Page 10: Summary

Histogram revision• Distribution – the pattern of values in the data• Histogram – visualizing the distribution• We can see

• whether the data tend to be close to the particular value• whether the data varies a lot or a little about the most common

values• whether that variation tends to be more above or below the

common values• whether there are unusually large or small values in the data

Page 11: Summary

Life expectancy data – histogram• Use interactive histogram applet to generate histogram

with bin size of 10, starting at 40.

life expectancy

freq

uenc

y

Page 12: Summary

Life expectancy data – histogram

life expectancy

freq

uenc

y

Page 13: Summary

Making conclusions from a histogram• What all you can tell for life expectancy data?

• how many modes?• where is the mode?• symmetric, left skewed or right skewed?• outliers – yes or no?

life expectancy

freq

uenc

y

Page 14: Summary

Making conclusions from a histogram• Where is the mode, the median, the mean?

life expectancy

freq

uenc

y

Page 15: Summary

Min. Q1 Median Q3 Max. 47.79 64.67 73.24 76.65 83.39

Five numbers summary

8.5>3.5

25.4>10.2

What is the position of the mean and the median?

mean=69.9

Page 16: Summary
Page 17: Summary

symmetric, left or rigt skewed?

Page 18: Summary

STANDARDIZINGnormování

Page 19: Summary

Playing chess• Pretend I am a chess player.• Which of the following tells you most about how good I

am:1. My rating is 1800.2. 8110th place among world competitive chess players.3. Ranked higher than 88% of competitive chess players.

Page 20: Summary

Distribution

Distribution of scores in one particular year

We should use relative frequencies and convert all absolute frequencies to proportions.

Page 21: Summary

Height data – absolute frequencies

http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

Page 22: Summary

Height data – relative frequencies

Page 23: Summary
Page 24: Summary

Height data – relative frequenciesWhat proportion of values is between 170 cm and 173.75 cm?

30%

Page 25: Summary

Height data – relative frequenciesWhat proportion of values is between 170 cm and 175 cm?

We can’t tell for certain.

Page 26: Summary

• How should we modify data/histogram to allow us a more detail?1. Adding more value to the dataset2. Increasing the bin size3. A smaller bin size

Page 27: Summary

Height data – relative frequenciesWhat proportion of values is between 170 cm and 175 cm?

36%

Page 28: Summary

Height data – relative frequencies

Page 29: Summary

Decreasing bin size• Check out what happens with the smallest bin size for

Physics Test Scores from http://quarknet.fnal.gov/cosmics/histo.shtml.

Page 30: Summary

Height

Page 31: Summary

Height data – relative frequencies

Page 32: Summary

Normal distribution

recall the empirical rule

68-95-99.7

𝑥=3

Page 33: Summary

Empirical rule

0 +1 +2 +3-1-2-3

3 4 5 6 2 1 0

Page 34: Summary

Z

Z – number of standard deviations away from the mean

If the Z-value is 1, how many percent are less than that value?

cca 84 %

0 +1 +2 +3-1-2-3

Page 35: Summary

Who is more popular?Let’s demonstrate the importance of Z-scores with the following example.

Page 36: Summary

Who is more popular

s.d. = 36

s.d. = 60

Z = -3.53

Z = -2.57

Page 37: Summary

Standardizing

Page 38: Summary

Formula• What formula describes what we did?

Page 39: Summary

Quiz• What does a negative Z-score mean?

1. The original value is negative.2. The original value is less than mean.3. The original value is less than 0.4. The original value minus the mean is negative.

Page 40: Summary

Quiz II• If we standardize a distribution by converting every value

to a Z-score, what will be the new mean of this standardized distribution?

• If we standardize a distribution by converting every value to a Z-score, what will be the new standard deviation of this standardized distribution?

Page 41: Summary

Standard normal distribution

N(,)

N(,)

Page 42: Summary

Standard normal distribution

Page 43: Summary

Meaning of relative frequencies

5 2 3 2 4

1 3 4 3 3

1

2

2

3

3

3

34 45

Page 44: Summary

Histogram of these data

Page 45: Summary

Probability density function

Probability density function (PDF)

Hustota pravděpodobnosti

Page 46: Summary

Standard normal distribution

1√2𝜎𝜋

𝑒𝑥𝑝 {− (𝑥−𝜇)2

2𝜎2 }