statistics university of manchester
TRANSCRIPT
CHEN10011 Tutorial Problems
Tutorial Problems
N.J. Goddard
CHEN10011 Tutorial Problems 2
Exercise 1• You will be given a set of observations
– The age (in months) of an MSc class• Calculate:
– The mean, mode and median age– The standard deviation of the age
• Determine if these ages are normally distributed at the 95% confidence level
• Determine if there are any outliers in this set of ages at the 95% confidence level using one of the three methods in the notes
CHEN10011 Tutorial Problems 3
Exercise 1 Dataset264276300258263348276474264370274
CHEN10011 Tutorial Problems 4
Exercise 1
Age Mean 306.0909 Sorted Age Absolute dev. Sorted Abs. dev.264 Rounded mean 306 258 30 12276 Mode 264 263 25 12300 Median 276 264 24 12258 264 24 14263 Standard deviation 66.80188 274 14 24348 Rounded St. Dev. 67 276 12 24276 276 12 25474 MAD 24 300 12 30264 348 60 60370 σ (MAD/0.6745) 35.58191 370 82 82274 Rounded σ 36 474 186 186
CHEN10011 Tutorial Problems 5
Exercise 1
• Are these ages normally distributed?• Are there any reasons to suppose that your ages
are normally distributed?
CHEN10011 Tutorial Problems 6
Exercise 1Cumulative Frequency Plot
Age (months)
200 250 300 350 400 450 500 550
Cum
ulat
ive
Freq
uenc
y
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Observed cumulative frequencyExpected cumulative frequency
CHEN10011 Tutorial Problems 7
Exercise 1
• Null hypothesis is that your ages are normally distributed at the 95% confidence level
CHEN10011 Tutorial Problems 8
Exercise 1
Age Frequency Cumulative Freq. SNV Frac. Cum. Freq. Expected Cum. Freq. Expected-Actual258 1 1 -0.719903549 0.083333333 0.235792191 0.152458858263 1 2 -0.645055354 0.166666667 0.259445657 0.092778991264 2 4 -0.630085715 0.333333333 0.264319253 0.069014081274 1 5 -0.480389325 0.416666667 0.315475292 0.101191375276 2 7 -0.450450047 0.583333333 0.326192983 0.257140351300 1 8 -0.091178711 0.666666667 0.463675296 0.202991371348 1 9 0.627363962 0.75 0.734789658 0.015210342370 1 10 0.956696021 0.833333333 0.830639646 0.002693687474 1 11 2.513538479 0.916666667 0.994023663 0.077356996
Max 0.257140351
CHEN10011 Tutorial Problems 9
Lilliefors Critical Valuesn α=0.20 α=0.15 α=0.10 α=0.05 α=0.01
4 0.3027 0.3216 0.3456 0.3754 0.4129 5 0.2893 0.3027 0.3188 0.3427 0.3959 6 0.2694 0.2816 0.2982 0.3245 0.3728 7 0.2521 0.2641 0.2802 0.3041 0.3504 8 0.2387 0.2502 0.2649 0.2875 0.3331 9 0.2273 0.2382 0.2522 0.2744 0.3162
10 0.2171 0.2273 0.2410 0.2616 0.3037 11 0.2080 0.2179 0.2306 0.2506 0.2905 12 0.2004 0.2101 0.2228 0.2426 0.2812 13 0.1932 0.2025 0.2147 0.2337 0.2714 14 0.1869 0.1959 0.2077 0.2257 0.2627 15 0.1811 0.1899 0.2016 0.2196 0.2545 16 0.1758 0.1843 0.1956 0.2128 0.2477 17 0.1711 0.1794 0.1902 0.2071 0.2408 18 0.1666 0.1747 0.1852 0.2018 0.2345 19 0.1624 0.1700 0.1803 0.1965 0.2285 20 0.1589 0.1666 0.1764 0.1920 0.2226
CHEN10011 Tutorial Problems 10
Exercise 1
• The maximum difference (0.2571) exceeds the Lilliefors’ critical value for 11 observations at the 95% confidence level (0.2506), so we can reject the null hypothesis
• We accept the alternate hypothesis that your ages are not normally distributed
• We would expect this result as the ages are actually drawn from three populations (3rd year UG, 4th year UG and PG Masters level)
CHEN10011 Tutorial Problems 11
Criteria for the Rejection of Data • Grubb’s test• ISO recommended method• The suspect value is that furthest away from the
mean• Null hypothesis is that all measurements are from
the same population• We calculate:
Sx
G
luesuspect va
CHEN10011 Tutorial Problems 12
Criteria for the Rejection of Data
n Gcrit n Gcrit n Gcrit n Gcrit
3 1.15 15 2.55 27 2.86 39 3.03
4 1.48 16 2.59 28 2.88 40 3.04
5 1.71 17 2.62 29 2.89 50 3.13
6 1.89 18 2.65 30 2.91 60 3.20
7 2.02 19 2.68 31 2.92 70 3.26
8 2.13 20 2.71 32 2.94 80 3.31
9 2.21 21 2.73 33 2.95 90 3.35
10 2.29 22 2.76 34 2.97 100 3.38
11 2.34 23 2.78 35 2.98 110 3.42
12 2.41 24 2.80 36 2.99 120 3.44
13 2.46 25 2.82 37 3.00 130 3.47
14 2.51 26 2.84 38 3.01 140 3.49
Critical values for Grubb’s test
CHEN10011 Tutorial Problems 13
Criteria for the Rejection of Data • Our null hypothesis is that all of our data are
described a single distribution• The critical value for 11 observations at the 95%
confidence level is 2.34 • We have one value that lies above these values
(age 474 months, standard normal values 2.514 )• We can reject the null hypothesis and say this
value is from a different distribution
CHEN10011 Tutorial Problems 14
Criteria for the Rejection of Data • Chauvanet’s criterion• Null hypothesis is that all measurements are from
the same population• Remove the presumed outlier• Recalculate mean and standard deviation• Calculate the confidence limits at the required
confidence level• If the observation lies outside the new confidence
limits, it can be rejected
CHEN10011 Tutorial Problems 15
Criteria for the Rejection of Data • Chauvanet’s criterion
Age Mean 289.3264 Standard deviation 38.8903276 t (95%, 2 tailed) 2.262300 E 27.81851258263 Upper conf. limit 317.1185348 Lower conf. limit 261.4815276264370274
nStx nP 1,
CHEN10011 Tutorial Problems 16
Criteria for the Rejection of Data • Chauvanet’s criterion• The presumed outlier (474 months) lies well
outside the upper confidence limit (317.12 months)
• We can therefore reject the null hypothesis• The observation is an outlier at the 95%
confidence level
CHEN10011 Tutorial Problems 17
Criteria for the Rejection of Data Rank Difference Ratio (Q statistic) n α = 0.10 α = 0.05 α = 0.01
3 0.886 0.941 0.9884 0.679 0.765 0.8895 0.557 0.642 0.7806 0.482 0.560 0.6987 0.434 0.507 0.6378 0.650 0.710 0.8299 0.594 0.657 0.776
10 0.551 0.612 0.72611 0.517 0.576 0.67912 0.490 0.546 0.64213 0.467 0.521 0.61514 0.448 0.501 0.59315 0.472 0.525 0.61616 0.454 0.507 0.59517 0.438 0.490 0.57718 0.424 0.475 0.56119 0.412 0.462 0.54720 0.401 0.450 0.535
or 1
1
1
12
xxxx
xxxx
n
nn
n
or 2
2
11
13
xxxx
xxxx
n
nn
n
or 3
2
12
13
xxxx
xxxx
n
nn
n
CHEN10011 Tutorial Problems 18
Criteria for the Rejection of Data
• Null hypothesis is that all measurements are from the same population
• For 11 observations and a high outlier we use:
597.0211126
263474348474
2
2
xxxx
n
nn
CHEN10011 Tutorial Problems 19
Criteria for the Rejection of Data
• The critical Q value for 11 observations and 95% confidence level is 0.576
• Our calculated Q value is above the critical value• We therefore accept the null hypothesis that 474
months is an outlier