statistics in education- made simple

31
8 STATISTICS IN EDUCATION Introduction We are living an information age, which is invariably bound up with the notion of counting and measurement. There would be no exaggeration in saying that the process of counting and measurement in quite near to our lives also in education field. In fact one can easily establish that the process of counting and measurement has been with us ever since human race stepped towards civilization. It is the sheer importance and applications of counting that has led to the emergence of the discipline, ‘STATISTICS’ The term ‘statistics’ seems to have been derived from the Latin word ‘status’ or Italian word ‘statista’ or the German word ‘statistik’, each of which means ‘Political State’. Statistics was born as the ’Science of Kings’. It had its origin in the needs of the ruling chiefs in the olden days for collecting data on vital matters such as population, man power, and wealth in the form of land, buildings and other assets with a view to framing their military and fiscal politics. Statistics – Definition “Statistics is the science which deals with collection, classification and tabulation of numeric facts as a basis for explanation, description and comparison of phenomenon” - Lovitt “Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data” - Croxten and Cowden Basic Terms in Statistics Discrete Variable: Those variables which can be assume only distinct or particular values are called discrete variable. They are exact or finite and are not normally fractions. Continuous Variable: Those variables which can take any numerical values are known as continuous variable. Series: A series, as used statistically, may be defined as things or attributes of things arranged according to some logical order. Discrete Series: Ant series represented by discrete variables is called discrete series. Continuous Series: Any series represented by continuous variable is called continuous series. Raw Data: A mass of statistical data in its original form is called data or ungrouped data. Class: It is a decided group of magnitude. Eg. 0 – 10, 10 – 20 etc. Prepared by KSK

Upload: satheesh

Post on 27-Apr-2015

269 views

Category:

Documents


9 download

DESCRIPTION

Exclusively for B.Ed. Students

TRANSCRIPT

Page 1: Statistics in Education- Made Simple

8

STATISTICS IN EDUCATION

Introduction

We are living an information age, which is invariably bound up with the notion of counting and measurement. There would be no exaggeration in saying that the process of counting and measurement in quite near to our lives also in education field. In fact one can easily establish that the process of counting and measurement has been with us ever since human race stepped towards civilization. It is the sheer importance and applications of counting that has led to the emergence of the discipline, ‘STATISTICS’

The term ‘statistics’ seems to have been derived from the Latin word ‘status’ or Italian word ‘statista’ or the German word ‘statistik’, each of which means ‘Political State’. Statistics was born as the ’Science of Kings’. It had its origin in the needs of the ruling chiefs in the olden days for collecting data on vital matters such as population, man power, and wealth in the form of land, buildings and other assets with a view to framing their military and fiscal politics.

Statistics – Definition

“Statistics is the science which deals with collection, classification and tabulation of numeric facts as a basis for explanation, description and comparison of phenomenon” - Lovitt

“Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data”

- Croxten and Cowden

Basic Terms in Statistics

Discrete Variable: Those variables which can be assume only distinct or particular values are called discrete variable. They are exact or finite and are not normally fractions.Continuous Variable: Those variables which can take any numerical values are known as continuous variable.Series: A series, as used statistically, may be defined as things or attributes of things arranged according to some logical order.Discrete Series: Ant series represented by discrete variables is called discrete series.Continuous Series: Any series represented by continuous variable is called continuous series.Raw Data: A mass of statistical data in its original form is called data or ungrouped data.Class: It is a decided group of magnitude. Eg. 0 – 10, 10 – 20 etc.Open-end Class: A lowest class lacking of lower limit and a highest class lacking an upper limit are called open end classes.Eg. Below 5 -------- Open-end class

5 – 1010 – 1515 – 2020 above ------ Open-end Class

Inclusive type classes or working class: The classes in the form of 1 – 5, 6 – 10, 11 – 15, ---- are called Inclusive type classes. Here both limits (lower limit and upper limit) included in the same class itself.Exclusive type classes or Actual classes: The classes in which upper limit not included. 0 – 10, 10 – 20, 20 – 30, ---- etc. are Exclusive type classes.Class limit: The class limits are the lowest and highest values of the variable that can be included in that class.Class boundaries: The class limits of the exclusive type classes or actual classes are called actual limits or class boundaries.

Prepared by KSK

Page 2: Statistics in Education- Made Simple

8

Mid points of the class or class marks: The mid point of a class is the average of the upper and lower limit of the class.Class interval: The class interval or class width is the difference between the upper limit and lower limit of the class.

Conversion of Inclusive type classes into Exclusive type classes: Note the difference between one upper limit and next lower limit of the inclusive class. Divide the difference by 2 Subtract that value from the lower limit and ass the same to the upper limits Do the same in all classes.

Frequency: the number of times a given value in an observation appears is the frequency.Class frequency: the number of values in each of the quantitative classes is called the class frequency.Total frequency: the sum total of the frequencies is known as the total frequency.

SCORING AND TABULATION OF SCORES

Frequency DistributionFrequency distribution is an important method of condensing and presenting data. This representation is

also called ‘frequency table’

Discrete frequenct distribution

It is a frequency distribution in which we make an array by listing all the values occurring in the series and noting the number of times each value occurs.

Steps

Note the different values in the series Arrange three columns with heads scores, tally mark and frequency Go through the series and put tally marks against respective scores. Write the sum of the tally marks of each score in the frequency column. Note that the sum of the frequencies of all scores should be equal to the total number of observations.

Eg.

The following data give the number of children per family in each of 25 families. Construct a Discrete Frequency Distribution: 1, 4, 3, 2, 1, 2, 0, 2, 1, 2, 3, 2, 1, 0, 2, 3, 0, 3, 2, 1, 2, 2, 1, 4, 2

Number of Children

Tally Marks No. of Children

01234

IIIIIII IIIII IIIIIIIIII

36

1042

Total 25

Continuous (Grouped) Frequency Distribution

Prepared by KSK

Frequency distribution

Discrete frequenct distribution Continuouse (grouped) frequency distribution

Page 3: Statistics in Education- Made Simple

8

Continuous (Grouped) Frequency Distribution is a table in which the data are grouped into different classes and the number of observations falls in each class are noted.

Eg. Construct a Continuous frequency distribution for the following set of observations

70 45 33 64 50 25 65 75 30 2055 60 65 58 52 36 45 42 35 4051 47 39 61 53 59 49 41 20 5542 53 78 65 45 49 64 52 48 46

Classes Tally Marks Frequency20 – 2930 – 3940 – 4950 – 5960 – 6970 - 79

IIIIIII IIII IIII IIIIII IIIIIIII IIIII

35

121073

Total 40Class Construction – Points to remember

Class interval should be uniform through out. As far as possible class interval should be multiple of 5 As far as possible the number of classes should be vary from 4 to 20

(We have a rule for determining the number of classes known as ‘Sturge’s’ rule, It is given by k = 1 + 3.22 log N, where ‘k’ denote the number of classes, N – is the total observations.)

The class limit should be chosen as to give mid points which are representative of the frequencies in the class Class should be mutually exclusive. As far as possible ‘open-end classes’ should be avoided

Cumulative Frequency DistributionCumulative Frequency Distribution is a table which gives how many observations are lying below and above a particular value.

Less than Cumulative frequency distribution

Less than Cumulative frequency distribution ia table which fives the number of observations falling bellow the upper limit of a class.

Eg. Construct Less than Cumulative Frwquency Distribution

F r e q u e n c y D i s t r i b u ti o n

Prepared by KSK

Class Frequency0 – 5

5 – 1010 – 1515 – 2020 - 25

47

1252

Class Frequency <CF0 – 5

5 – 1010 – 1515 – 2020 - 25

47

1252

4(4 + 7) 11(4 + 7 + 12) 23(4 + 7 + 12 + 5) 28(4 + 7 + 12 + 5 + 2) 30

Cumulative frequency distribution

Less than Cumulative frequency distribution Greater than Cumulative frequency distribution

Page 4: Statistics in Education- Made Simple

8

Greater than Cumulative frequency distribution

Less than Cumulative frequency distribution ia table which fives the number of observations lying above the lower limit of the class

Construct Graeter than Cumulative Frwquency Distribution

F r e q u e n c y D i s t r i b u ti o n

Greater than Cumulative Frwquency Distribution

Graphical and Diagrammatic representation of data

Apart from tabulation, data can also be presented through diagrams and graphs. Graphs and Diagrams are visual aids for the presentation of data. They are most convincing and appealing methods by which statistical data can be presented.

The following are commonly used graphs and Diagrams.

1. Histogram2. Frequency Polygon3. Frequency Curve4. Cumulative Frequency Curve (Ogive)

a. Less than Cumulative Frequency Curve (Less than Ogive)b. Greater than Cumulative Frequency Curve (Greater than Ogive)

5. Pie Diagram (Sector Diagram)6. Bar Diagram

1. Histogram

Graphical representation of continuous frequency distribution It is a graph including vertical rectangles with no space between the rectangles. The class interval taken along the horizontal axis (X – axis) and the respective class frequencies are taken on

the vertical axis (Y – axis) using suitable scales of each classes. For each class a rectangle is drawn with base as width of the class and height as the class frequency. The area of each rectangle will be proportional to or equal to respective frequencies of the class The total area of the histogram will be proportional or equal to the total frequency of the distribution.

Construct histogram for the following frequency distribution.Prepared by KSK

Class Frequency0 – 5

5 – 1010 – 1515 – 2020 - 25

47

1252

Class Frequency <CF0 – 5

5 – 1010 – 1515 – 2020 - 25

47

1252

(2 + 7 + 12 + 5 + 2 ) 30(7 + 12 + 5 + 2) 26(12 + 5 + 2) 19(5 + 2 ) 7 2

Page 5: Statistics in Education- Made Simple

8

Class Frequency

0 – 10 4

10 – 20 10

20 – 30 21

30 – 40 9

40 – 50 4

50 – 60 2

Total 50

2. Frequency Polygon

It is a graphical representation of continuous frequency distribution It can be constructed by drawing Histogram or directly plotting the points To draw Frequency Polygon by drawing Histogram, join the mid-points of the top of the rectangles of the

Histogram using straight lines Frequency Polygon can also drawn by joining the consecutive points, plotted by taking the mid-points of

the classes on X-axis and corresponding frequencies on Y-axis. The end points are extended at each end and to join the X-axis. he total area under the Frequency Polygon is equal to or proportional to (numerically) the total frequency

of the given distribution.

Construct Frequency Polygon for the following frequency distribution

Class Frequency

0 – 10 4

10 – 20 10

20 – 30 21

30 – 40 9

40 – 50 4

50 – 60 2

Total 50

Prepared by KSK

10 20 30 40 50 60

-10 10 20 30 40 50 60 70

X

Y

First Method

Frequency Polygon Frequency Polygon

Page 6: Statistics in Education- Made Simple

8

3. Frequency Curve

It is a graphical representation of continuous frequency distribution It can be constructed by drawing Histogram or directly plotting the points To draw Frequency curve by drawing Histogram, join the mid-points of the top of the rectangles of the

Histogram using smooth curve by free hand method Frequency curve can also drawn by joining the consecutive points, plotted by taking the mid-points of the

classes on X-axis and corresponding frequencies on Y-axis. The end points are extended at each end and to join the X-axis. The total area under the Frequency Curve is equal to or proportional to (numerically) the total frequency of

the given distribution.

Construct Frequency Curve for the following frequency distribution

Class Frequency

0 – 10 4

10 – 20 10

20 – 30 21

30 – 40 9

40 – 50 4

50 – 60 2

Total 50

Prepared by KSK

-10 10 20 30 40 50 60 70

252015105

Freq

uenc

y --

---->

-5 5 15 25 35 45 55 65

Freq

uenc

y --

---->

252015105

Classes ------>

ScaleX axis - 1 cm = 10 unitsY axis - 1 cm = 5 units

Second Method

Classes ------>

Third Method

-10 10 20 30 40 50 60 70

-5 5 15 25 35 45 55 65

252015105

Classes ------>

ScaleX axis - 1 cm = 10 unitsY axis - 1 cm = 5 units

-10 10 20 30 40 50 60 70

252015105

Classes ------>

ScaleX axis - 1 cm = 10 unitsY axis - 1 cm = 5 units

First Method

Third MethodSecond Method

Frequency Curve Frequency Curve

Page 7: Statistics in Education- Made Simple

8

4. Cumulative Frequency Curve (Ogive)

It is the graphical representation of cumulative Frequency Distribution

Two types

a). Less than Cumulative Frequency Curve (Less than Ogive)

It is the graphical representation of Less than Cumulative Frequency distribution. Less than Cumulative Frequency Curve is drawn by joining smoothly the points obtained by plotting the

upper limit of the actual classes against their Less than cumulative Frequencies.

Construct Less than Cumulative Frequency Curve for the following frequency distribution

Prepared by KSK

Class Frequency <CF

0 – 10 5 5

10 – 20 12 17

20 – 30 28 45

30 – 40 40 85

40 – 50 21 106

50 – 60 10 116

60 - 70 4 120

-10 0 10 20 30 40 50 60 70 80

120

100

80

60

40

20

Upper limit of Classes ------>

Less

than

Cum

ulati

ve fr

eque

ncy

ScaleX axis - 1 cm = 10 unitsY axis - 1 cm = 20 units

Page 8: Statistics in Education- Made Simple

8

b). Greater than Cumulative Frequency Curve (Greater than Ogive)

It is the graphical representation of Greater than Cumulative Frequency distribution. Greater than Cumulative Frequency Curve is drawn by joining smoothly the points obtained by plotting the

Lower limit of the actual classes against their Greater than cumulative Frequencies.

Construct Greater than Cumulative Frequency Curve for the following frequency distribution

5. Pie Diagram

Pie diagram consist of circle whose area proportional to the magnitude of the variable they present The component part of the variable represented by means of sectors of the circle The area of the sector proportional to the frequencies of the component parts of the variable. If A1 and A2 are the total magnitude of the two variables, to represent the data by means of Pie diagram,

draw two circles with radius r1 and r2 given by

Draw Pie Diagram for the following data

Category No. of Students

Angle of the Circle

Distinction 20

First class 40

Second class 50

Third class 45

Failure 25

Total 180 360

Prepared by KSK

Class Frequency <CF

0 – 10 5 120

10 – 20 12 115

20 – 30 28 103

30 – 40 40 75

40 – 50 21 35

50 – 60 10 14

60 - 70 4 4

-10 0 10 20 30 40 50 60 70 80

120

100

80

60

40

20

Lower limit of Classes ------>

Gre

ater

than

Cum

ulati

ve

freq

uenc

y

ScaleX axis - 1 cm = 10 unitsY axis - 1 cm = 20 units

Page 9: Statistics in Education- Made Simple

8

7. Bar Diagram (simple Bar Diagram)

Bar diagram is the simplest diagrammatic representation of data. They are also called one dimensional diagram. These diagrams are generally drawn in the shape of horizontal or vertical bars. The bars should be of equal breadth and the height of the bars should be proportional to the magnitude

of each quantity. Leave equal space between the bars.

Draw simple bar diagram

Category No. of Students

Distinction 20

First class 40

Second class 50

Third class 45

Failure 25

Total 180

Diagrammatic and Graphic representation – Merits It permits easy visualization Easy to understand the nature of the data Comparative study of different aspect of a given

data is possible. Help analysis of the data Help to interpret and draw conclusion

They are interesting, attractive, and impressive They are the simplest method of presenting data They have universal validity; they are used to

supply information to common man They give a bird’s eye-view of the entire data They have a great memorizing effect

Diagrammatic and Graphic representation – Limitations It is difficult to show minor differences with their

help Diagram can be used only to show a limited

amount of information Diagrams show only approximate values

It is subjective in character; its interpretation varies from person to person.

Diagrams and graphs can be misused very easily Diagrams and graphs are not substitute of the

original dataMEASURES OF CENTRAL TENDENCY

When we collected data from a sample of study, the majority of scores in that collected data always show a tendency to be closer the average. This phenomenon is called ‘central tendency’.

The value of the point around which scores tend to cluster is called ‘Measures of Central Tendency’. A measure of central tendency may be defined as a single measure representing all the scores of given data.

Commonly used Measures of Central Tendency are1. Mean2. Median3. Mode

1. MEAN (ARITHMETIC MEAN)Case – I: Ungrouped Data (Discrete data)If x1, x2, x3, …………..xn are N observations

Eg, Calculate A.M of the observations: 12, 18, 14, 15, 16

Prepared by KSKA.M (X) = = = = 15

Then A.M (X) = = = A.M () =

x – Observations (Scores)

N- Total frequency

Sum of the observationsTotal No. of observations

12+18+14+15+165

x1+x2+x3+……………xn

N

Page 10: Statistics in Education- Made Simple

8

Case – II: Ungrouped Frequency Distribution (Discrete Frequency Distribution)If x1, x2, x3, …………..xn are observations and f1, f2, f3, …………..fn then A.M is given by

Eg. Calculate A.M of the following data

Observations Frequency 5 36 8

7 128 109 7

Case – III: Grouped Frequency Distribution (Continuous Frequency Distribution)

Class f0 - 10 3

10 – 20 12

20 - 30 20

30 - 40 10

40 - 50 5

Prepared by KSK

Observations(x)

Frequency(f) fx

5 3 15

6 8 48

7 12 84

8 10 80

9 7 63

N = 40 ∑fx =290

A.M ( ) =

=

A.M () = x – Mid-value of classesf – FrequencyN- Total frequency

Answer

Class f mid-value (x)

f x

0 - 10 3 5 15

10 – 20 12 15 180

20 - 30 20 25 500

30 - 40 10 35 350

40 - 50 5 45 225

N=50 = 1270

Class f mid-value (x)

d f d

0 - 10 3 5 -2 -6

10 – 20 12 15 -1 -12

20 - 30 20 25 - A 0 0

30 - 40 10 35 1 10

40 - 50 5 45 2 10 N=50 = 2

Question

Assumed Mean MethodA.M () =A+

A- Assumed Meand- deviations , d = , x – Mid-value of classesf – Frequency , N- Total frequencyc – class width

Answer –Direct Method Answer –Assumed Mean Method

A.M ( ) = = = A.M () =

x – Observations (Scores)f – FrequencyN- Total frequency

Page 11: Statistics in Education- Made Simple

8

A.M ( ) = A.M ( ) =A+

= = 25.4 = 25 + = 25.4

Arithmetic Mean – Merits It is rigidly defined AM is easy to understand Simple to calculate Based on all observations It is capable for further algebraic treatment.

Arithmetic Mean – demerits AM is affected by extreme values AM may lead to wrong conclusion if the

figures from which it is computed are not known.

AM can’t be calculated for a distribution having open end classes.

2. MEDIAN Median is defined as the middle most observation when the observations are arranged in ascending or

descending order of magnitude.

CALCULATION OF MEDIAN1. Discrete Data & Discrete Frequency DistributionLet N be the total number of observations,

Case I: N is odd

Eg.1 Calculate Median: 8, 12, 16, 10, 9, 6, 17, 20, 25Data in Ascending order of magnitude: 6. 8, 9, 10, 12, 16, 17, 20, 25

Here N = 9, Then Median = ( th observation = 5th observation

= 12Case II: N is even

Eg.2 Calculate Median: 30, 26, 42, 28, 35, 20, 32, 50Data in Ascending order of magnitude: 20, 26, 28, 30, 32, 35, 42, 50

Here N = 8 Median =

=

Prepared by KSK

Median = (th observation when the data are arranged in ascending or descending order of magnitude

Median =Average of (th observation and (th observation when the data are arranged in ascending or descending order of magnitude.

Median =

Page 12: Statistics in Education- Made Simple

8

= = 31

Eg.3 Calculate Median

Here N = 41

Median = ( th observation = ( th observation

= 11th observation = 6

2. Grouped (Contiguous) Frequency Distribution

Eg. Calculate Median

Graphical determination of MedianI Method

II Method

Prepared by KSK

Observation frequency5 36 87 128 109 8

Total 41

Median =lm + ( ) ×clm – Actual lower limit of Median Class

(Median Class – Class in which ( observation fallsN – Total Frequencycfm – Cumulative frequency Up to Median Classfm – frequency of Median Classc – Class interval

Class Frequency <CF

0 – 5 5 5

5 – 10 10 15

10 – 15 15 30

15 – 20 12 42

25 – 25 8 50

N = 50

Answer:

Median = lm + ( ) ×c

= 10 + ( ) ×5

= 10 + ( ) ×5

Median Class

Herelm = 10N = 50cfm = 15fm = 15c = 5

Steps: Draw Less than or Greater than Ogive. Locate N/2 on the Y – Axis At N/2 draw a perpendicular to the Y –

Axis and extent it to meet the Ogive From that point of intersection draw a

perpendicular to the X – Axis The point at which the perpendicular

meets the X- Axis will be the Median.

N/2

Steps: Draw Less than and Greater than Ogive

simultaneously Draw perpendicular from the point of

intersection to the X - Axis The point at which the perpendicular

meets the X- Axis will be the Median. Median

N

Page 13: Statistics in Education- Made Simple

8

Median – Merits

It is rigidly defined It is easy to understand Simple to calculate It can be located by mere inspection It is not affected by extreme values It can be calculated for a distribution having

open end classes It can be determined graphically.

Median – demerits

It is not based on all observations Median is a non-algebric measure and hence

not suitable for further algebric treatment It is can’t be used for computing other

statistical measures such as Standard Deviation, Coefficient of correlation etc.

When there are wide variations between the values of different scores, a Median may not be representative of the distribution.

3. MODE Mode is the value of the variable which occurs most frequently. In certain cases such as exact Mode may not exist or there may be Two or Three Modes in a distribution. When there are Two Modes we call it Bi-Modal Distribution If there are Three Modes, we call it Tri-Modal Distribution.

Calculation of Mode

1. Discrete Distribution

Eg: Calculate Mode

Mode = 7

2. Continuous Distribution

Eg: Find Mode:

Prepared by KSK

Observation frequency5 36 87 128 109 8

Total 41

Answer:

Mode =lm + ( ) ×c

=64.5 + ( ) ×5

= 66.9

Median

Mode =lm + ( ) ×clm – Actual lower limit of Model Class

(Median Class – Class having maximum frequencyf1 – Frequency of the class just below the Model Classf2 – Frequency of the class just above the Model Classc – Class interval

Class Frequency

80 – 84 4

75 – 79 8

70 – 74 8

65 – 69 12

60 – 64 9

55 – 59 7

50 – 54 5

45 – 40 3

f2

Modal Class

f1

Page 14: Statistics in Education- Made Simple

8

Mode – Merits Easy to locate

Not affected by extreme values

Can calculate the Mode for the distribution

having open-end classes, if open-end classes

have less frequency

It is useful in business matters.

Mode – demerits It is not based on all observations It is not capable for further algebric

treatment A slight change in the distribution may

extensively disturb the Mode In a ungrouped data, if no score is repeated,

it may lead to wrong conclusion that the distribution have no mode.

As there be 2 or 3 modal values, it becomes impossible to set a definite value of a Mode.

EMPIRICAL RELATION

In a large distribution, that is almost Normal, Mode can be calculated by using the relation Mean – Mode = 3(Mean – Median) Mode = 3Median – 2 Mean

MEASURES OF DISPERSION (MEASURES OF VARIABILITY)

Measures of central tendency need not give an exact picture of the distribution. If we compare two groups, merely on the basis of the average, there is a possibility of being mislead to

incorrect judgmentEg: consider the Marks of two Groups

2, 8, 20, 28, 42 ------------------ Group 118, 19, 20, 21, 22 ------------------ Group 2Here when we calculate the Mean for both groups, we get Mean = 20 But when we examine the scores, we can find that Group1 is Heterogeneous Group and Group2 is a Homogeneous Group.

The statistical measures used to determine the Nature and extent of dispersion of the scores are known as Measures of Dispersion or Measures of Variability.

Measures of Dispersion measures the spreading of observations from the central value of the distribution.Commonly used Measures of Dispersion

1. Range2. Quartile Deviation

3. Mean Deviation4. Standard Deviation

1. RANGERange is the difference between the highest and lowest scores in a Distribution.

Eg: find Range 53, 51, 70, 45, 60, 62, 40, 53, 71, 55Range (R) = H – L

= 71 – 40

Prepared by KSK

Range (R) = H – LH – Highest ValueL – Lowest Value

Page 15: Statistics in Education- Made Simple

8

= 31

In a continuous distribution, Range is the difference between the upper limit of the highest class and lower limit of the lowest class.

Eg:

Class f10 – 20 1220 - 30 2030 - 40 1040 - 50 5

Range – Merits Simplest measure of dispersion Easy to calculate and easy to understand.

Range – Merits Not based on all observations. Very mush affected by extreme values. It is influenced by fluctuations of sampling. For open-end classes, calculation of Mode is

impossible.

3. QUARTILE DEVIATION (SEMI INTER QUARTILE RANGE)

The quartile deviation is half the difference between the upper and lower quartiles in a distribution.

It is a measure of the spread through the middle half of a distribution.

It can be useful because it is not influenced by extremely high or extremely low scores.

Quartile: One of the four divisions of observations which have been grouped into four equal-sized sets based on their statistical rank.

Lower Quartile (first quartile) Q1: first point of division of observations which have been grouped into four equal-sized sets based on their statistical rank.

Upper Quartile (Third quartile) Q3: Third point of division of observations which have been grouped into four equal-sized sets based on their statistical rank.

Second Quartile Q2: Second point of division of observations which have been grouped into four equal-sized sets based on their statistical rank.

Second Quartile is called Median

Calculation of Quartile Deviation

Prepared by KSK

Observation frequency5 36 87 128 109 8

Total 41

Range (R) = H – L = 50 - 10 = 40

Range (R) = H – L = 9 - 5 = 4

Quartile Deviation (Q) = Q1 – Lower (First) QuartileQ3 – Upper (Third) Quartile

Q1 =l1 + ( ) ×c

Q3 =l3 + ( ) ×c

Page 16: Statistics in Education- Made Simple

8

Q1 Q3

1. Discrete Data:

Eg: find Quartile deviation: 2, 13, 17, 20, 25, 28, 30, 33, 37, 40, 41

Answer

2 13 17 20 25 28 30 33 37 40 41

Quartile Deviation (Q) =

=

= 10

2.Continuous Distribution

3. MEAN DEVIATION (AVERAGE DEVIATION) Mean Deviation is the average of the deviations of the scores taken from the Mean It may be calculated by taking the deviations of each of the scores from the mean and fins the average of

these scores. Deviations may –ve or +ve, so take absolute value of deviations.

Prepared by KSK

Q2

Q1 Class

Class Frequency <CF

30 – 35 10 10

35 – 40 16 26

40 – 45 18 44

45 – 50 27 71

50 – 55 18 89

55 – 60 8 97

60 – 65 3 100

Q3 Class

AnswerQ1 =l1 + ( ) ×c = 35 + ( ) ×5 = 39.68

Q1 =l3 + ( ) ×c = 50 + ( ) ×5 = 51.11

Quartile Deviation (Q) = = = 5.715

l1 = 35N = 100cf1 = 10c = 5f1 = 16

l1 = 50N = 100cf1 = 71c = 5f1 = 18

Discrete DataMean Deviation = x - Scores - Arithmetic MeanN – Total Number of scores

Discrete DistributionMean Deviation = x - Scores - Arithmetic Meanf - FrequencyN – Total frequency

Continuous DistributionMean Deviation = x – Mid-value - Arithmetic Meanf - FrequencyN – Total frequency

Median =l1 + ( ) ×cl1 – Actual lower limit of Q1 Class

(Q1 Class – Class in which ( observation fallsN – Total Frequencycf1 – Cumulative frequency Up to Q1 Classf1 – frequency of Q1 Classc – Class interval

Page 17: Statistics in Education- Made Simple

8

Calculation of Mean Deviation1. Discrete SeriesCalculate Mean Deviation 8, 10, 12, 14, 16, 18, 20, 22,

2. Discrete DistributionEg:

3. Continuous DistributionEg:

Prepared by KSK

Score (x)8 7

10 512 314 116 118 320 722 8

Answer:

=

= 15

Mean Deviation =

=

Answer:

Score (x) f fx

22 5 110 14 70

27 10 270 19 90

32 25 800 4 100

37 30 1110 1 30

42 20 840 6 120

47 10 470 11 110

∑fx =3600 =520

Mean Deviation =

=

= 5.2

Score (x)

f

22 5

27 10

32 25

37 30

42 20

47 10

Score (x)

f

20 - 24 5

25 – 29 10

30 – 34 25

35 – 39 30

40 – 44 20

45 - 49 10

Answer:Class Mid-Value (x) f fx

20 - 24 22 5 110 14 70

25 – 29 27 10 270 19 90

30 – 34 32 25 800 4 100

35 – 39 37 30 1110 1 30

40 – 44 42 20 840 6 120

45 - 49 47 10 470 11 110

∑fx =3600 =520

Mean Deviation =

=

= 5.2

Page 18: Statistics in Education- Made Simple

8

4. STANDARD DEVIATION

Standard Deviation is the square root of the average of the squares of the deviations of the scores taken from the mean. SD denoted by the symbol σ (sigma).

The Arithmetic Mean (Average) of the squares of deviations is known as Variance. Standard Deviation is the square root of the Variance.

Calculation of Standard Deviation – Steps1. Find the Arithmetic Mean of the given data.2. Find the deviations from Arithmetic Mean of scores.3. Find the average of squares of deviations taken from the Mean.4. Find the square root of the average of squares of deviations.

Calculation of Standard Deviation

1. Discrete Series

Find Standard Deviation: 35, 49, 32, 45, 39

2. Discrete Frequency Distribution (Ungrouped Distribution)

Find Standard Distribution

Prepared by KSK

Discrete DataStandard Deviation = x - Scores - Arithmetic MeanN – Total Number of scores

Discrete DistributionStandard Deviation = x - Scores - Arithmetic Meanf - FrequencyN – Total frequency

Continuous DistributionStandard Deviation = x – Mid-value - Arithmetic Meanf - FrequencyN – Total frequency

Score

35 -5 25

49 9 81

32 -8 64

45 5 25

40 1 1

Answer

=

= 40

Standard Deviation =

Score Frequency

22 5

27 10

32 25

37 30

42 20

47 10

N=100

Answer

Score Frequency fx d=(x - ) (x - )2 f(x - )2

22 5 110 -14 196 980

27 10 270 -9 81 810

32 25 800 -4 16 400

37 30 1110 1 1 30

42 20 840 6 36 720

47 10 470 11 121 1210

N=100 ∑fx=3600 ∑fd2=4150

Page 19: Statistics in Education- Made Simple

8

3. Continues Frequency Distribution (Grouped Distribution)

Calculate Standard Deviation

Answer

Score x Frequency fx (x - ) (x - )2 f(x - )2

20 – 24 22 5 110 -14 196 980

25 – 29 27 10 270 -9 81 810

30 – 34 32 25 800 -4 16 400

35 - 39 37 30 1110 1 1 30

40 – 44 42 20 840 6 36 720

45 - 49 47 10 470 11 121 1210

N=100 ∑fx=3600 ∑fd2=4150

For a large distribution, Short-cut method (Assumed Mean Method) can be used to calculate Standard Deviation

Calculate Standard Deviation Using Assumed Mean Method

Prepared by KSK

A.M ( ) =

=

= 36

Standard Deviation =

=

= 6.44

SD = c

c – Class intervalf – FrequencyN – Total Frequencyd - Deviations

d =

x – Mid-point

- Assumed Mean

AnswerDistribution

Score Frequency

20 – 24 5

25 – 29 10

30 – 34 25

35 - 39 30

40 – 44 20

45 - 49 10

N=100

Standard Deviation =

=

= 6.44

Page 20: Statistics in Education- Made Simple

8

Standard Deviation – Advantages Rigidly defined Based on all the observations It is capable for further algebric treatment SD is used in many advance statistical studies It is less affected by fluctuations in sampling

Standard Deviation – limitations Statistical interpretation is comparatively

difficult It gives more weight to extreme scores and

less to those which are near the mean; because the squares of the deviations are taken. These squares will become very large as the deviations increase

CORRELATION Correlation may be defined as the relationship between two variables. There are three types of correlation

o Positive correlationo Negative correlationo Zero correlation

Positive correlation: When the first variable increase or decrease, the other variable also increases or decrease respectively, then the relationship between this two variables are said to be in Positive correlation.

Eg: Intelligent and AchievementNegative correlation: When the first variable increase or decrease, the other variable decrease or increases respectively, then the relationship between this two variables are said to be in Negative correlation.

Eg: Time spend to practice and Number of typing errorZero correlation: if there is no relationship between two variables, then the relationship between this variable are said to be in Zero correlation.

Eg: Body weight and IntelligentCOEFFICIENT OF CORRELATIONo The ratio indicating the degree of relationship between two related variables is called the coefficient of

correlation. For a perfect positive correlation, the Coefficient of Correlation is +1 and for a perfect Negative correlation, the

Coefficient of Correlation will be -1. Perfect positive or Negative correlation is possible only in Physical Science.

In a Social Science like Education, the correlation between two variables will lie within the limit +1 and -1

Prepared by KSK

class f x fx d d2 fd fd2

45 - 49 2 47 94 5 25 10 5040 - 44 3 42 126 4 16 12 4835 - 39 2 37 74 3 9 6 1830 - 34 6 32 192 2 4 12 2425 - 29 8 27 216 1 1 8 8

20 - 24 8 22 - 176 0 0 0 0

15 - 19 7 17 119 -1 1 -7 710 - 14 5 12 60 -2 4 -10 20

5 - 9 9 7 63 -3 9 -27 81N = 50 ∑fd = 4 ∑fd2 = 256

class f45 - 49 240 - 44 335 - 39 230 - 34 625 - 29 820 - 24 815 - 19 710 - 14 5

5 - 9 9N = 50

SD = c

SD = 5

= 11.31

Page 21: Statistics in Education- Made Simple

8

Positive correlation varies from 0 to +1 and Negative correlation varies from 0 to -1 Zero correlation indicates that there is no consistent relationship between two variables.Use of Coefficient of Correlation

It helps to determine the validity of a test. It helps to determine the reliability of a test. It can be used to ascertain the degree of the

objectivity of a test. It can answer the validity arguments for or

against a statement.

It indicates the nature of the relationship between two variables.

It predicts the value of one variable given the value of another related variable.

It helps to ascertain the traits and capacities of pupils.

Calculation of Correlation Coefficient There are two important techniques for calculating Correlation coefficient

Rank Correlation Product Moment Correlation

Rank Correlation Spearman who for the first time measures the extent of correlation between two set of scores by the

method of Rank Difference

Find Rank Correlation Coefficient

Product Moment CorrelationKarl Pearson devised formula for the calculation of Product Moment Correlation coefficient

Eg: Find Product Moment Correlation coefficient

Prepared by KSK

Rank Correlation Coefficient ρ = 1 -

D – Rank Difference

D=

N – Total Score

Name of Students

Score in

Maths

Score in

Physics

Nikhil 45 68Santhosh 53 76John 67 70Jenna 40 64Gopal 35 54Mohammed 50 66

Name of Students

Score in Maths

Score in Physics

Rank in Maths

(R1)

Rank in Physics

(R2)

Rank Difference (D= )

D2

Nikhil 45 68 4 3 1 1Santhosh 53 76 2 1 1 1John 67 70 1 2 1 1Jenna 40 64 5 5 0 0Gopal 35 54 6 6 0 0Mohammed 50 66 3 4 1 1

∑ D2= 4

Answer

Rank Correlation Coefficient = 1 -

= 1 - Here the correlation is found to be Positive and High

r = 1 -

Height ofFather(h1)

Height of Son (h2)

deviation from Mean

x

Deviation from Mean

yx2 y2 xy

65 67 -3 -2 9 4 6

66 68 -2 -1 4 1 2

67 65 -1 -4 1 16 4

67 68 -1 -1 1 1 1

68 72 0 3 0 9 0

69 72 1 3 1 9 3

70 69 2 0 4 0 0

72 71 4 2 16 4 8

∑h1=544 ∑h2=552 ∑ x2=36 ∑ y2=44 ∑ xy =24

Height ofFather(h1)

Height of Son (h2)

65 67

66 68

67 65

67 68

68 72

69 72

70 69

72 71

Product Moment

Correlation Coefficient

Answer

x, y : the deviations of the first set of scores and the second set of scores from their respective Means

n : Number of scores in a set

– Standard deviations of the first set of scores and second set of

scores respectively

Page 22: Statistics in Education- Made Simple

8

Find Product Moment Correlation coefficient

Normal Probability Curve

Prepared by KSK

AM of h1 =

AM of h2 =

SD of h1 (

SD of h2 (

r = 1 -

=

= 0.6

x, y : first set of scores and the second set of N : Number of scores in a set

Product Moment

Correlation coefficient

Short-cut Method

r =

students

Mark Test1 (x)

Mark Test2 (y)

A 8 9

B 6 7

C 4 3

D 7 6

E 3 5

F 6 6

G 5 5

H 4 5

I 5 4

J 6 5

students

mark Test1 (x)

Mark Test2 (y)

x2 y2 xy

A 8 9 64 81 72

B 6 7 36 49 42

C 4 3 16 9 12

D 7 6 49 36 42

E 3 5 9 25 15

F 6 6 36 36 36

G 5 5 25 25 25

H 4 5 16 25 20

I 5 4 25 16 20

J 6 5 36 25 30

∑ x =54 ∑ y =55 ∑ x2 = 312 ∑y2 = 312 ∑xy = 314

Answer

r =

r = = 0.76

Correlation is Positive and High

Page 23: Statistics in Education- Made Simple

8

Meaning and importance of Normal Probability Curve

The normal probability curve is curve that graphically represents a Normal Distribution. In a Normal Distribution, when the scores are arranged in the order of magnitude, those at the centre will

have the maximum frequency. The frequencies will gradually go on decreasing towards the right and left of the score at the centre.

Because of this property, the curve representing a normal distribution will show symmetry on either side of its central axis. Hence it will be in ‘bell-shaped’

Normal Probability Curve

These special features of the Normal Distribution will be seen in the dispersion of scores regarding natural phenomena as intelligence, height, weight etc. in a population.

This characteristic of Normal Distribution is found to be true to a great extent with regard to achievement scores of a well conducted examination, if the number taking the examination is sufficiently large.

Hence properties of Normal Distribution and Normal Distribution curve are of great importance in the study of group and their characteristics with respect to given variables.

Properties of Normal Probability Curve

It is bell-shaped. This means that its peak is in the middle. It is symmetrical. If a perpendicular is drawn from the peak to X-axis, this will divide the whole area of the curve

into two equal parts. The majority of scores will show a tendency to cluster around the centre. On either side of the central axis the

frequencies of scores will go on reducing, these being least at the two ends. All the three Measures of Central Tendency, viz Mean, Median, and Mode of a normal curve coincide, that is,

they are all equal. The first and third quartiles are equidistant from the median. The ordinate at the mean is the highest. The height of other ordinates at various sigma distances from the mean

are also in fixed relationship with the height of the mean ordinate. The curve will gradually go on the nearer to the base line, but it will never meat the base line. For practical

purpose, the curve may be taken to end at points -3o- to +3o- distance from the mean, because this region will cover almost 100% of the cases.

If the total area enclosed by the normal probability curve is represented by N, the total number of cases in the group considered, we can find out the area between any two points with the help of mathematical formulae.

The most important relationship in the Normal Probability Curve is the area relationship. In a normal distribution 34.13% cases will be distributed between M and a score at a distance of 1o- from M. Thus 68.26% cases are included between M+1o-. 99.37% or almost all the cases are included between M+3o-.

Skewness and Kurtosis

Skewness

Prepared by KSK

Page 24: Statistics in Education- Made Simple

8

If the distribution is not perfectly normal or symmetrical or the frequencies on either side not even, then the frequency curve deviates from Normalcy. Such curve are said to be skewed in nature.

The lack of symmetry due to extended tails in a particular direction is known as Skewness. In a skewed distribution the Mean, Median and Mode will not be the same. There are two types of Skewness

Negative Skewness Positive Skewness

Negative Skewness

If the tail extends to the left (Negative direction of the curve), the distribution is said to be Negatively Skewed.

Positive Skewness

If the tail extends to the right (Positive direction of the curve), the distribution is said to be Positively Skewed.

The distance between the Mean and Median will indicate the extent of skewness.

In a negatively skewed curve the Mean lies to the left of the Median. In a positively skewed curve the Mean will lie to the right of the Median. The degree of Skewness of a frequency distribution may be calculated using the formula

Sk = When Mean, Median and Standard Deviation are given.

When the percentiles are available the following formula is used to find out the skewness Sk =

(Here P90 is the 90th Percentile and P10 is the 10th percentile) For a Normal curve the skewness is Zero.

Kurtosis

Kurtosis refers to the Peakedness or Flatness of curve of frequency distribution compared to Normal curve. The curve of A frequency distribution, which is more peaked than the normal curve , is said to be Leptokurtic If the peak is found to be flatter than a normal curve, the curve is said to be Platykurtic. The curve of a normal distribution is said to be Mesokurtic.

Prepared by KSK

Page 25: Statistics in Education- Made Simple

8

Ku = (Q – Quartile Deviation)

Standard Scores Mean is the most representative score for commending about the position of other given scores. The distance from the mean is usually expressed in terms of the Standard deviation of the scores of the

distribution concerned. The scores used to indicate the standard deviation away from the mean of a given distribution is known as

standard scores. Commonly using standard scores are Z score and T score

Z Score

Z score indicated how many standard deviations away from the mean and in which direction is a given raw score of a distribution.

Z = X –

σ where X - Row score

- Mean

σ - Standards Deviation

Example A Example BX = 76 X = 67

= 82 = 62

σ = 4 σ = 5

z = 76 - 82 z = 76 - 62 4 5 = -1.50 = +1.00

The raw score of 76 in Example A may be expressed as a z score of -1.50, indicating that 76 is 1.5 standard deviations below the Mean.

The raw score of 67 in Example B may be expressed as a z score of +1.00, indicating that 67 is 1 standard deviation above the Mean.

T Score T score has been devise to avoid some confusion resulting from negative z score (below the mean) and also

to eliminate decimal values. To find out the T score, multiply the z score by 10 and add 50. T = 50 + 10z T score are always rounded to nearest whole number.

Prepared by KSK

Page 26: Statistics in Education- Made Simple

8

For example, In Example A, T = 50 + 10(-1.50) = 50 + (-15.0) = 35 In Example B, T = 50 + 10(1.00) = 50 + 10 = 60

Prepared by KSK