lecture five general statistics (sta 114) measures of ... of... · lecture, we shall consider...

40
1/40 Lecture Five General Statistics (STA 114) Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers. Prof. A.A. Sodipo Department of Statistics University of Ibadan May 23, 2018 Prof. A.A. Sodipo ( Department of Statistics University of IbadanLecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Rel May 23, 2018 1 / 40

Upload: others

Post on 14-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

1/40

Lecture Five

General Statistics (STA 114)

Measures of Partition:Measures of Relative Position – Quartiles, Interquartile Range, Deciles

and Percentiles. Z-Score and Detection of Outliers.

Prof. A.A. Sodipo

Department of Statistics

University of Ibadan

May 23, 2018

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 1 / 40

Page 2: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

2/40

Lecture overview

Lecture overview

In the previous class, we considered measures of variation. In this

lecture, we shall consider measures of relative position/partitions. At

the end of this lecture, you should have:

2 understood the concept of measures of relative position.

2 been able to apply your knowledge of these concepts to answer

some questions.

2 been able to compute Z-score and interpret appropriately.

2 understood the concept of outliers detection.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 2 / 40

Page 3: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

3/40

Measures of Partition

Measures of Partition

Quantiles

2 Quantiles are natural extensions of the median. Instead of dividing

the observations into two parts, those below the median and those

above it, we may divide them into three, four or more parts.

2 If we divide them into three parts, there will be two values such

that one third of the observations lie below one of them and two

thirds lie below the other.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 3 / 40

Page 4: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

4/40

Measures of Partition Cont’d

2 Similarly, if the observations are divided into four parts, we will

have three values. One quarter of the observations will lie below one

of them, half will lie below another one which we have called the

median and three quarters will lie below the last one.

2 We can think of dividing ‘n’ observations when arranged in order

into k parts. We will then have (k − 1) values such that the fractions

of the observations that lie below them are respectively (1/k), (2/k)

up to ((k − 1)/k). These (k − 1) values are generally called

quantiles.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 4 / 40

Page 5: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

5/40

Measures of Partition Cont’d

2 To give due prominence to k, the number of parts, it is better to

use the term k − quantiles in place of quantiles.

2 However, when no particular k is under consideration we shall talk

of quantiles in general. As defined, it follows that (100/k) per cent

of the observations are below the first k - quantile, (200/k) per cent

are below the second k − quantile etc.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 5 / 40

Page 6: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

6/40

Measures of Partition Cont’d

2 The formula for calculating the median generalizes easily to the

calculation of k-quantiles.

2 Thus for grouped data with n observations, if the appropriate

k-quantile falls in the class which begins at a and ends at b, then

using precisely the same notation as for median, the first k-quantile is:

XQuantile = a +b − a

f

(i × n

k− Fa

)(1)

Where i is the part of the particular Quantile to be calculated. An

example for quartiles is shown in Figure 1.Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 6 / 40

Page 7: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

7/40

Measures of Partition Cont’d

Quartiles

First example: i= 1 if we are to calculate the 1st quartile ; i = 3 if we

are to calculate the 3rd quartile.

2 The lower or first quartile will be(1×n4

)thvalue.

Lower quartile (Q1) = a +b − a

f

(1 × n

Q− Fa

)(2)

2 The upper or third quartile will be(3×n4

)thvalue.

Upper quartile (Q3) = a +b − a

f

(3 × n

Q− Fa

)(3)

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 7 / 40

Page 8: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

8/40

Measures of Partition Cont’d

Quartiles Cont’d

Figure 1: Distribution of Quartiles

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 8 / 40

Page 9: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

9/40

Example for measures of partitions

Example: Table 1

Class Interval Frequency Cummulative Frequency

52-56 3 3

56-60 6 9

60-64 10 19

64-68 4 23

68-72 8 31

72-76 2 33

76-80 1 34

2 Obtain the 1st , 2nd and 3rd Quartiles

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 9 / 40

Page 10: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

10/40

Example measures of Partition Cont’d

2 To calculate the 1st quartile, we need to identify the class where

the 1st quartile falls.

2 Therefore, the lower or first quartile will be(1×344

)thvalue.

2(1×344

)th= 8.5 value

2 We locate where this value falls in the cumulative frequency

2 Based on this, the 1st quartile falls between the class of 56 - 60. It

has the frequency value of 6 and the cumulative frequency before the

class is 3.

2 Therefore:

Lower quartile (Q1) = 56 +60 − 56

6

(1 × 34

4− 3

)= 59.667

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 10 / 40

Page 11: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

11/40

Example measures of Partition Cont’d

2 To calculate the 2nd quartile (the same as median), we need to

identify the class where the 2nd quartile falls.

2 Therefore, the median/second quartile will be(2×344

)thvalue.

2(2×344

)th= 17 value

2 We locate where this value falls in the cumulative frequency

2 Base on this, the 2nd quartile falls between the class of 60- 64. It

has the frequency value of 10 and the cumulative frequency before

the class is 9.

2 Therefore:

Upper quartile (Q2) = 64 +64 − 60

10

(2 × 34

4− 9

)= 63.2

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 11 / 40

Page 12: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

12/40

Example measures of Partition Cont’d

2 To calculate the 3rd quartile, we need to identify the class where

the 3rd quartile falls.

2 Therefore, the upper/third quartile will be(3×344

)thvalue.

2(3×344

)th= 25.5 value

2 We locate where this value falls in the cumulative frequency

2 Base on this, the 3rd quartile falls between the class of 68- 72. It

has the frequency value of 8 and the cumulative frequency before the

class is 23.

2 Therefore:

Upper quartile (Q3) = 68 +72 − 68

8

(3 × 34

4− 23

)= 69.25

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 12 / 40

Page 13: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

13/40

Example measures of Partition Cont’d

2 Therefore:

Lower quartile (Q1) is = 59.667

Median/second quartile (Q2) is = 63.2

Upper/third quartile(Q3) is = 69.25

2 Observe that:

2 All the values fall within the class of the quartiles.

2 If you obtain any value(s) outside the range of the class value for

any of the quartiles, re-check your computation, because you

would be wrong!

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 13 / 40

Page 14: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

14/40

Example measures of Partition Cont’d

Interquartile Range (IQR)

2 The interquartile range (IQR), also called the midspread or

middle 50%, or technically H-spread, is a measure of statistical

dispersion, being equal to the difference between 75th and 25th

percentiles or between upper and lower quartiles.

2 Interquartile Range (IQR): is given as the difference between

the 3rd quartile and the 1st quartile.

2 That is:

IQR = Q3 − Q1

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 14 / 40

Page 15: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

15/40

Example measures of Partition Cont’d

2 Thus, The Semi-Interquartile Range (SIQR): is given as the

average of difference between the 3rd quartile and the 1st quartile.

2 That is:

SIQR =Q3 − Q1

2

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 15 / 40

Page 16: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

16/40

Measures of Partition Cont’d

Deciles

2 In a similar way, the deciles of a distribution are the nine values

that split the data set into ten equal parts.

2 You should not try to calculate deciles from small data sets – For

instance; a single class of marks is too small to get useful values.

2 However the deciles can be useful descriptions for larger data sets

such as national distributions for marks from standard tests such as

WAEC, UTME etc.

2 As shown in Figure 2

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 16 / 40

Page 17: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

17/40

Measures of Partition Cont’d

Deciles Cont’d

Figure 2: Distribution of DecilesProf. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 17 / 40

Page 18: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

18/40

Measures of Partition Cont’d

Deciles Cont’d

2 When applied to a distribution (a large group of marks), there are

nine deciles, each of which is a mark.

2 A student whose mark is below the first decile is said to be in

decile 1.

2 Similarly, a student whose mark is between the first and second

deciles is in decile 2.

2 .. . . and a student whose mark is above the ninth decile is in

decile 10.

2 When applied to individual students, the term ’decile’ is therefore

a number between 1 and 10.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 18 / 40

Page 19: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

19/40

Measures of Partition Cont’d

First example: i= 1 if we are to calculate the 1st decile ; i = 9 if we

are to calculate the 9th decile.

2 The lower or first decile will be(1×n10

)thvalue.

Lower decile (D1) = a +b − a

f

(1 × n

D− Fa

)(4)

2 The upper or nineth decile will be(9×n10

)thvalue.

Upper decile (D9) = a +b − a

f

(9 × n

D− Fa

)(5)

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 19 / 40

Page 20: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

20/40

Measures of Partition Cont’d

Deciles Cont’d

2 Decile stands for 10.

2 Just like in quartile, the D is the decile which is = 10.

2 Every procedure used in quartiles is exactly the same as in deciles,

only that, D = 10; as in equations (4 & 5)

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 20 / 40

Page 21: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

21/40

Measures of Partition Cont’d

Deciles Cont’d

2 For example, the histogram below shows the distribution of marks

in a test (out of 60) that was attempted by 600 students. Each

student’s mark is represented by a square in the histogram.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 21 / 40

Page 22: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

22/40

Measures of Partition Cont’d

Deciles Cont’d

2 17.5 is the first decile. Hence, the weakest tenth of the students

in the class had a mark below 17.5. This decile therefore summarises

the performance of the weakest students

2 As such, Students with marks below 17.5 are said to be in decile

1. Those with marks between 17.5 and 26.5 are in decile 2, and so

on, up to students with marks higher than 54.5 who are in decile 10.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 22 / 40

Page 23: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

23/40

Measures of Partition Cont’d

Percentiles

2 In a similar view, the percentiles of a distribution are the 99 values

that split the data set into a hundred equal parts.

2 These percentiles can be used to categorise the individuals into

percentile 1, ..., percentile 100

2 Just like in Deciles, a very large data set is required before the

extreme percentiles can be estimated with any accuracy.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 23 / 40

Page 24: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

24/40

Measures of Partition Cont’d- Percentiles Cont’d

First example: i= 1 if we are to calculate the 1st percentile ; i = 99 if

we are to calculate the 99th percentile.

2 The lower or first decile will be(1×n100

)thvalue.

Lower percentile (P1) = a +b − a

f

(1 × n

P− Fa

)(6)

2 The upper or ninety-nineth percentile will be(99×n100

)thvalue.

Upper percentile (P99) = a +b − a

f

(99 × n

P− Fa

)(7)

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 24 / 40

Page 25: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

25/40

Measures of Partition Cont’d

Percentiles Cont’d

2 Pecentile stands for 100.

2 Just like in quartile and decile, the P is the percentile = 100.

2 Every procedure used in quartiles and deciles is exactly the same

as in percentiles, only that, P = 100; as in equations (6 & 7)

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 25 / 40

Page 26: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

26/40

Measures of Partition Cont’d

Hint: Summary of measures of relative positions.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 26 / 40

Page 27: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

27/40

Z-Score

Z-Score

2 A z-score is the number of standard deviations from the mean of a

data point.

2 More technically, it’s a measure of how far the standard deviation

is below or above the population mean in a raw score.

2 A z-score is also known as a standard score and it can be placed

on a normal distribution curve.

2 Z-scores range from -3 standard deviations (which would fall to

the far left of the normal distribution curve) up to +3 standard

deviations (which would fall to the far right of the normal distribution

curve).

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 27 / 40

Page 28: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

28/40

Z-Score Cont’d

Z-Score Cont’d

2 In order to use a z-score, you need to know the mean µ and also

the population standard deviation σ.

2 More technically, it’s a measure of how far the standard deviation

is below or above the population mean in a raw score.

2 Z-scores are a way to compare results from a test to a “normal”

population.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 28 / 40

Page 29: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

29/40

Z-Score Formular

Z-Score formular

2 The basic z score formula for a population is:

Z-Score formular:

Z =x − µ

σ

2 The basic z score formula for a sample is:

Z-Score formular:

Z =x − x̄

s

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 29 / 40

Page 30: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

30/40

Z-Score Formular

Example of Z-Score

2 For example, let’s say you have a test score of 200. The test has a

mean (µ) of 190 and a standard deviation (σ) of 20. Assuming a

normal distribution, your z score would be:

Z =200 − 190

20= 0.5

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 30 / 40

Page 31: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

31/40

Z-Score Formular

Interpretation:

2 In this example, your Z score is 0.5 standard deviation above the

mean. This is because, the value obtained is positive.

2 If the value is negative, then it will be below the mean.

2 That is:

When it is positive

µ+Zscore ∗σ = 190 + 0.5(20), i .e., 0.5(20) value points above mean

When it is negative

µ+Zscore ∗σ = 190−0.5(20), i .e., 0.5(20) value points below mean

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 31 / 40

Page 32: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

32/40

Z-Score Formular: Standard Error of the Mean

Z-Score formular: Standard Error of the Mean

2 When you have multiple samples and want to describe the

standard deviation of those sample means (the standard error), you

would use this z score formula:

2 When the population standard deviation is known:

Z =x − µ

σ√n

2 When the population standard deviation is unknown:

Z =x − x̄

s√n

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 32 / 40

Page 33: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

33/40

Outliers

Outliers

Outliers

2 An outlier is an observation that appears to deviate markedly from

other observations in the sample.

Identification of potential outliers is important for the following

reasons.

2 An outlier may indicate bad data.

2 For example, the data may have been coded incorrectly or an

experiment may not have been run correctly. If it can be determined

that an outlying point is in fact erroneous, then the outlying value

should be deleted from the analysis (or corrected if possible).

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 33 / 40

Page 34: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

34/40

Outliers Cont’d

Outliers Cont’d

Outliers Cont’d

Identification of potential outliers is important for the following

reasons.

2 In some cases, it may not be possible to determine if an outlying

point is bad data. Outliers may be due to random variation or may

indicate something scientifically interesting.

2 In any event, we typically do not want to simply delete the

outlying observation. However, if the data contains significant

outliers, we may need to consider the use of robust statistical

techniques.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 34 / 40

Page 35: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

35/40

Detection of Outliers

Detection of Outliers

2 There are two ways in which to determine if an observation is an

outlier:

2 One method (z-score) only applies to data sets with frequency

distributions that are mound shaped and symmetric.

2 The other method could be using Boxplot.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 35 / 40

Page 36: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

36/40

Detection of Outliers Cont’d

Detection of Outliers

Calculate the Z-Score

2 In this procedure we calculate the z-score for each observation.

Any z-score greater than 3 or less than -3 is considered to be an

outlier.

2 This rule of thumb is based on the empirical rule. From this rule

we see that almost all of the data (99.7%) should be within three

standard deviations from the mean.

2 By calculating the z-score we are standardizing the observation,

meaning the standard deviation is now 1. Thus from the empirical

rule we expect 99.7% of the z-scores to be within -3 and 3.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 36 / 40

Page 37: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

37/40

Detection of Outliers Cont’d

Detection of Outliers

Box Plot

2 A box plot is constructed by drawing a box between the upper and

lower quartiles with a solid line drawn across the box to locate the

median.

2 The following quantities (called fences) are needed for identifying

extreme values in the tails of the distribution:

2 lower inner fence: Q1 − 1.5 ∗ IQ2 upper inner fence: Q3 + 1.5 ∗ IQ2 lower outer fence: Q1 − 3 ∗ IQ2 upper outer fence: Q3 + 3 ∗ IQ

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 37 / 40

Page 38: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

38/40

Detection of Outliers Cont’d

Detection of Outliers

Box Plot Cont’d

2 A point beyond an inner fence on either side is considered a mild

outlier.

2 A point beyond an outer fence is considered an extreme outlier.

See Figure 4 below for a typical outlier case in a pictuorial

representation

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 38 / 40

Page 39: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

39/40

Pictorial Outlier Illustration

Figure 4: A typical example of Outlier in a data set.

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 39 / 40

Page 40: Lecture Five General Statistics (STA 114) Measures of ... of... · lecture, we shall consider measures of relative position/partitions. At the end of this lecture, you should have:

40/40

Q & A

Q & A

Prof. A.A. Sodipo ( Department of Statistics University of Ibadan)Lecture FiveGeneral Statistics (STA 114)Measures of Partition: Measures of Relative Position – Quartiles, Interquartile Range, Deciles and Percentiles. Z-Score and Detection of Outliers.May 23, 2018 40 / 40