statistics€¦ · pie charts (circular diagram): this is a pictorial representation of statistical...

STATISTICS SIMPLIFIED

A Quantitative Methods practice book for BBA II Semester Students of Bangalore University

Contains Concepts, Formulae, Exercises and Assignments

Compiled by

Lenin Arumanayagam,

Freelance Faculty

Table of Contents

Sl. No. Title Page No.

1 Introduction to Business Statistics 1

2 Classification and Tabulation 3

3 Diagrammatic Representation 5

4 Measures of Central Tendency 7

5 Measures of Variation 17

6 Measures of Skewness 25

7 Correlation & Regression 27

8 Index Numbers 35

9 Formulae 39

10 Assignments 43

11 University Question Papers 55

Business Statistics | Concepts and Exercises Page | 1

Chapter 1: Introduction to Business Statistics

Murray R Spiegel : Statistics is concerned with scientific method for collecting, organizing, summarizing,

presenting and analyzing data as well as drawing valid conclusions and making reasonable decisions on the

basis of such analysis

Characteristics of Statistics

1. Statistics are numerical facts

2. Statistics are aggregate of facts

3. Statistics are affected to a great extent by multiplicity of factors

4. Statistics are either enumerated or estimated with reasonable standard of accuracy

5. Statistics are collected in a systematic manner and for a predetermined purpose

6. Statistics should be capable of being placed in relation to each other

Functions of Statistics

1. Presents facts in simple forms

2. Reduces the complexity of data

3. Facilitates comparison

4. Testing hypothesis

5. Formulation of policies

6. Forecasting and estimating

7. Derives valid inferences

Limitations of Statistics

1. Statistics does not study the qualitative phenomenon

2. Statistics does not study the individual changes

3. Statistics results are true only in general and on an average

4. Statistics can be misused by ignorant and wrongly motivated persons

5. Statistics does not reveal the entire story

6. Statistics is liable to be misused

Scope of Statistics

Statistics and Planning; Statistics and Business; Statistics and Economics; Statistics and Administration;

Statistics and Business Management; Statistics and Research; Statistics and Mathematics; Statistics and

Science. Scope of Statistics in Business: Marketing; Production; Finance; Banking; Investment; Purchase;

Accounting; Control

Data Collection

Data: Facts and figures collected for a specific purpose, processed and used to help decision-making.

Census: The method of collection of data in which every unit of the population is included. This method is

accurate and reliable but expensive, time consuming and involves much labor.

Sample: A sample is a group of units selected from a larger group (the population) for specific investigation.

Primary Data: Data originally collected for the first time directly from the source using surveys are called

primary data. It may be obtained through direct observation, interviews, questionnaires, etc.

Secondary Data: Data already collected by someone other than the user are called secondary data. They may

be obtained from newspapers, agencies, journals, records, reports, etc.


Chapter 2: Classification & Tabulation of Data

Data is a collection of any number of related observations on one or more variables. Raw data is information

that has not been processed to be made presentable or analyzed by statistical methods.

Classification of data is a process of arranging data into sequences and groups or classes according to their

attributes and or characteristics. It refers to the sorting out of a heterogeneous mass data into a number of

homogeneous groups and sub-groups.

Tabulation is defined as the orderly or systematic presentation of numerical data in rows and columns,

designed to facilitate comparison between the figures.

Parts of a Table: Table number, Title, Titles of rows, columns, sub-rows and sub-columns, Totals, Footnotes

and Source.

Objectives and Functions of Classification of Data

1. To convert the raw data into organized data

2. To present the complex data into a simple form

3. To facilitate comparison

4. To bring out the uniformity among facts

5. To present data in a condensed form

Types of Classification

Qualitative Classification: a classification in which data are classified according to attributes or qualities.

Generally the qualitative phenomena are not measurable. E.g. Classification based on marital status, gender

etc.

Quantitative Classification: A classification in which data are classified according to quantities that are

measurable such as age, weights, marks, wages, etc.

Other Important Definitions:

Individual Observations/series: Data that are listed as they are observed, collected and recorded. They are in

a raw form and unorganized.

Discrete Classes: Data that do not progress from one class to the next without a break is called discrete class.

In other words, they are classes that represent distinct categories or counts.

Continuous Data: Data that may progress from one class to the next without a break and may be expressed in

whole numbers or decimals.

Frequency: Frequency is the number of times each value of the variable occurs in the series. It is the rate of

occurrence of a particular value thing, or event.

Frequency Distribution: It is the summary of frequency of variables according to their magnitude individually

or in groups.

Cumulative Frequency: It is the total of all the frequencies up to and including the respective class interval

when the class intervals are in the ascending or descending order of values.

Population: A collection of all the elements we are studying and about which we are trying to draw

conclusions.

Sample: A collection of some, but not all, of the elements of the population under study, used to describe the

population.

Classification and Tabulation


Two-Way Table: A table which is used to categorize the data based on two or more attributes.

Exercise 2.1

1. Draw a blank table to present the following information regarding the students of a college according to:

a. Faculty: Arts, Science and Commerce

b. Sex: Boys and Girls

c. Years: 1993 and 1994

d. Age Group: Below 20 years and above 20 years.

2. The total number of accidents in Southern Railway in 1960 was 3,500 and it decreased by 300 in 1961 and

by 700 in 1962. The total number of accidents in meter gauge section showed a progressive increase from

1960 to 1962. It was 245 in 1960, 346 in 1961 and 428 in 1962. In the meter gauge section, the number of

non-compensated cases were 49 in 1960, 77 in 1961, and 108 in 1962. The number of compensated cases

in the broad gauge section were 2,867, 2,687 and 2,152 in those years in order. Tabulate the data.

3. Present the following information in a suitable form supplying the figures not directly given:

In 1975, out of a total of 4,000 workers in a factory, 3,300 were members of a trade union. The number of

women workers employed was 500 out of which 400 did not belong to any union. In 1974, the number of

workers in the union was 3,450 of which 3,200 were men. The number of non-union workers was 760 of

which 330 were women.

4. Following data gives the number of children in 50 families. Construct a suitable frequency table.

4 2 0 2 3 2 2 1 0 2 3 5

1 1 4 2 1 3 4 2 6 1 2 2

2 1 3 4 1 0 2 4 3 0 1 3

6 1 0 1 1 3 4 1 0 1 2 2

2 5 (Answer: 6, 13, 14, 7, 6, 2, 2)

5. Following are the weights of 50 college students in kg. Construct a frequency table.

42 42 46 54 41 37 54 44 38 45 47 50

58 49 51 42 46 37 42 39 54 39 51 58

47 51 43 48 49 48 49 41 41 40 58 49

49 59 57 52 56 38 45 52 46 40 51 41

51 41 (Answer: 6, 13, 14, 11, 6)

6. Following are figures of income (x) and percentage expenditure on food (y) in 25 families. Construct a

bivariate (two-way) frequency table.

X 550 623 310 420 600 225 310 640 512 690

Y 12 14 18 16 15 25 26 20 18 12

X 680 300 425 555 325 202 255 492 587 643

Y 13 25 16 51 23 29 27 18 21 19

X 689 523 317 384 400

Y 11 12 18 17 19


Chapter 3: Diagrammatic Representation

Diagrams are visual aids of presenting the data in pictures, geometric figures and curves. They present a bird’s

eye view of huge mass of quantitative data in a condensed form attractively.

Uses of Diagrams and Graphs

1. They present a bird’s eye view of huge mass of information

2. They leave a huge impression on the minds of the readers as they are attractive

3. Easy to understand and consumes less time to understand the information

4. Entire data is visible at a glance

Limitations of Diagrams and Graphs

1. They are useful to layman but to experts, their utility is limited

2. They fail to furnish details

3. They present data only in a particular range.

4. They are not subject to further mathematical analysis

Types of Diagrams and Graphs:

Note: For examples and sample diagrams, please refer your textbook.

Line Diagrams: These diagrams are used when there is a large number of values of variable with variations in

their values within a small range

Simple Bar Diagram: These diagrams are suitable for individual observations and time series. The bars have

the uniform width.

Multiple Bar Diagram: These diagrams are used when two or more phenomena and a number of attributes

are compared with each other. Different shades may be used to identify the various attributes or periods.

Sub-divided (Component) Bar Diagrams: These diagrams are used when two or more components are present

in a single phenomenon

Sub-divided Percentage Bar Diagrams: These are sub-divided diagrams which are used to depict the values of

variable in percentage. All the bars are equal in height representing the value as 100%.

Pie Charts (Circular Diagram): This is a pictorial representation of statistical data with several components in a

circular form. Pie charts consist of a circle sub-divided into several sectors by radius.

Pictograms: It is a representation in which pictures are used to represent the data. Each full diagram

represents a certain quantity.

Histograms: Histogram is a device of graphical representation of a frequency distribution. It is constructed by

erecting a set of rectangles on each interval on the horizontal axis. The height of the rectangle represents the

frequency of the class interval.

Frequency Polygon: A line graph connecting the midpoints of each class in a data set, plotted at the height

corresponding to the frequency of the class. It can also be drawn by joining the midpoints of the top of the

vertical bars of a histogram.

Frequency Curve: A frequency polygon with smoothed curve to eliminate the accidental irregularities in the

data.

Diagrammatic Representation


Ogive Curve: This is a graphical representation of cumulative frequency distribution of a continuous series.

There are two types of Ogive Curves: 1. More than Ogive and 2. Less-than Ogive

Exercise 3.1

1. Draw a simple bar diagram from the following data relating to the number of small scale industrial units in

various states in the year 2008

States Karnataka TN Kerala Andhra Maharashtra MP UP

No. of SS Units 10 12 15 15 18 25 22

2. Present the following data of results of BBM students in statistics examination of Bangalore University

held in June 2006, 2007 and 2008 using multiple bar diagram:

Year I Class II Class III Class Failed Total

June 2006 100 300 500 300 1200

June 2007 120 400 600 280 1400

June 2008 100 500 700 300 1600

3. Represent the following data using sub-divided bar diagram and percentage sub-divided bar diagram:

Cost Per Equipment 2006 (₹) 2007 (₹) 2008 (₹)

Raw Material 2,160 2,600 2,700

Labor 540 700 810

Direct Expenses 600 300 350

Factory Expenses 360 200 360

Office Expenses 180 200 270

Total 3,840 4,000 4,490

4. Represent the following figures using line graph:

Year 2003 2004 2005 2006 2007 2008

Exports (lakh ₹) 25 110 80 130 90 150

Imports (lakh ₹) 5 70 110 90 140 130

Balance of Payments +20 +40 -30 +40 -50 +20

5. Draw a pie diagram to represent the following data of investment pattern in the state budget (in ₹ crore):

Agriculture Industry Education Transportation Social Services

600 400 300 450 250


Chapter 4: Measures of Central Tendency

A statistical average or measure of central tendency is a single number around which the greatest proportion

of the data concentrates.

Characteristics of a Good Measure of Central Tendency

1. It should be well defined.

2. It should be easy to understand and calculate.

3. It should be based on all the observations.

4. It should be capable of further treatment.

5. It should be affected as little as possible by fluctuations of sampling.

6. It should not be affected by extreme values.

Commonly Used Measures of Central Tendency

1. Arithmetic Mean or Simple Mean

2. Median

3. Mode

4. Geometric Mean

5. Harmonic Mean

Arithmetic Mean

A mathematical representation of the typical value of a series of numbers, computed as the sum of all the

numbers divided by the count of all numbers in the series.

Merits of Arithmetic Mean

1. It is simple to understand and easy to compute.

2. All items are used in calculation.

3. Mean is well defined.

4. It is capable of further algebraic treatment.

5. It is not affected by sampling fluctuations.

6. It is the center of gravity.

7. It is a calculated value and not based on position in the series.

Limitations / Demerits of Arithmetic Mean

1. The value of mean is affected by extreme items.

2. In case of open ended classes, the value of mean cannot be calculated without making assumptions

regarding the size of the interval.

3. It may not be a good measure in some cases, for instance, asymmetrical distributions.

Formulae

Individual Series: X̅ = Σx

n; X̅ = A +

Σd

n

Discrete Series: X̅ = Σfx

N; X̅ = A +

Σfd

N

Measures of Central Tendency


Continuous Series: X̅ = Σfm

N ; X̅ = A +

Σfd

Σf; X̅ = A +

Σfd′

Nx i

Weighted Arithmetic Mean: X̅ = Σxw

Σw

Combined Arithmetic Mean: X̅(1,2) = n1x̅1+ n2x̅2

n1+ n2

Exercise 4.1

1. Find the AM of 5, 8, 10, 15, 24 and 28 (Answer: 15)

2. The wages of 9 workers are: 150, 80, 120, 60, 75, 125, 95, 115, 130. Find the mean wages. (Answer: 105.5)

3. In the city, 30 members were surveyed as to how many domestic appliances they had purchased and the

replies were as under. Prepare a frequency table and find the mean. (Answer: 2.83)

1, 2, 5, 1, 2, 1, 4, 2, 3, 4, 2, 4, 3, 2, 6, 3, 2, 4, 3, 6, 2, 2, 3, 3, 7, 2, 3, 0, 2, 1

4. Find the mean runs scored by a batsman during his career using direct method and shortcut method

(Answer: 46):

x 10 20 30 40 50 60 70 80 90

f 7 18 15 25 30 20 16 7 2

5. Compute the mean of the following data using direct method and shortcut method (Answer: 13.54):

x 9 10 11 12 13 14 15 16 17 18

f 1 2 3 6 10 11 7 3 2 1

6. Calculate the mean from the following data using direct, shortcut and step-deviation methods (Answer:

36.36):

CI 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80

f 5 10 25 30 20 10 5 5

7. Calculate the mean wages from the following data (Answer: 73.44):

Wages 48 – 56 56 – 64 64 – 72 72 – 80 80 – 88 88 – 96 96 – 104

No. of Workers 8 3 11 14 5 7 2

8. Calculate the mean from the following data using direct, shortcut and step-deviation methods (Answer:

49.3):

CI 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89

f 5 9 14 20 25 15 8 4

9. Calculate the mean marks from the following data (Answer : 43.7):



Marks Below 10 20 30 40 50 60 70 80 90 100

No. of students 5 12 25 45 70 80 88 92 96 100

10. Calculate the mean sales from the following data (Answer: 28.73):

Sales less than 10 20 30 40 50 60

Frequency 4 20 35 55 62 67

11. A college wanted to give monthly scholarship to B.Com students securing 50% and above marks in the

following manner:

Percentage of Marks 50 – 55 55 – 60 60 – 65 65 – 70 70 – 75

Scholarship (₹) 25 30 35 40 45

The percentage of marks of 20 students who were eligible for scholarship are given below:

52, 62, 51, 71, 54, 53, 51, 50, 57, 64, 56, 54, 69, 63, 65, 59, 58, 68, 57, 62

Calculate the average monthly scholarship payable to the students. (Answer: 31.5)

12. A limited company wants to pay bonus to the members of its staff as under:

Salary (₹ ‘000) 100 - 120 120 – 140 140 – 160 160 – 180 180 – 200 200 - 220 Above 220

Bonus (₹ ‘000) 50 50 70 80 90 100 110

Actual salaries of the members of the staff are as follows, in rupees: 200, 180, 185, 195, 218, 187, 160,

250, 198, 190, 168, 170, 178, 175, 140, 120, 148, 165, 155, 145, 125, 110, 162, 130, 150

What is the total bonus paid? What is the average bonus paid per staff? (Answer: 78.4)

13. From the following data of calculation of AM, find the missing value. Mean value is 126.3 (Answer: 120):

x 60 80 100 - 160 180 200

f 5 8 12 22 10 7 6

14. The AM of the following frequency distribution is 67.45 inches. Find the missing frequency. (Answer: 126):

Height (Inches) 60 – 62 63 – 65 66 – 68 69 – 71 72 - 74

F 15 54 ? 81 24

15. The mean of the following data is 67.45. Find the missing frequencies (Answer: 42, 27).

CI 60 – 62 63 – 65 66 – 68 69 – 71 72 - 74 Total

F 5 18 - - 8 100

16. The mean of the following data is 25. Find the missing frequencies (Answer: 10, 10).



x 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 Total

f 5 - 15 - 5 45

17. Find the weighted arithmetic average price of coal purchased by an industry (Answer: 50.36):

Month January February March April May June

Price per ton (₹) 42.50 51.25 50.00 52.00 44.25 54.00

No. of tons 25 30 40 50 10 45

18. The mean weight of 25 male workers in a factory is 63 kg, and the mean weight of 35 female workers in

the same factory is 55 kg. Find the combined average weight of the 60 workers in the factory. (Answer:

58.33)

19. The arithmetic mean of a group of 80 boys is 10 years, and that of second group of 20 boys is 15 years.

Find the arithmetic mean of the two groups taken together. (Answer: 11)

Median

Median is the middle value of the distribution, and therefore it is called the positional average. So, the place of

median in a series is such that, an equal number of items lie on either side of it.

Merits of Median

1. Median is especially useful in case of open ended classes, since it is not necessary that the value of all

items be known.

2. Median is not influenced by extreme values.

3. In a markedly skewed distribution, median is especially useful.

4. The value of median can be determined graphically, whereas the value of mean cannot be graphically

ascertained.

Limitations / Demerits of Median

1. For calculating median it is necessary to arrange the data, whereas, other averages do not need any

arrangement.

2. It is not determined by each and every observation.

3. Median is not capable of further algebraic treatment.

4. It is affected by sampling fluctuations.

5. The median in some cases cannot be computed exactly, as in the case of mean.

Formulae

Individual Series: M = [(n+1)

2]

th

term when n is odd and M = [(

n

2)

thterm + (

n

2+1)

thterm

2] when n is even.

Discrete Series: M = [(n+1)

2]

th

term

Continuous Series: M = L + N

2 − c.f.

f x i



Exercise 4.2

1. Find the median: 43, 62, 15, 80, 56, 72, 34, 8, 25 (Answer: 43)

2. The wages of 9 workers are: 150, 80, 120, 60, 75, 125, 95, 115, 130. Find the median. (Answer: 115)

3. Find the median: 36, 5, 19, 26, 6, 28, 56, 18, 63, 4 (Answer: 22.5)

4. Find the median: 105, 89, 93, 142, 112, 136, 82, 97, 128, 135, 110, 104 (Answer: 107.5)

5. In a class of 15 students, 5 failed in a test. The marks of those who passed were, 9, 6, 7, 8, 9, 6, 5, 4, 7 and

8. Calculate the median marks of the 15 students.

6. Find the median: (Answer: 40)

x 10 20 30 40 50 60 70 80 90 100

f 10 16 18 13 6 3 8 4 6 8

7. Find the median:

Wages 5 10 15 20 25 30

Frequency 7 12 37 25 22 11

8. Find the median (Answer: 37.7):

Age < 20 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 - 50 > 50

No. of Workers 13 29 46 60 112 94 45 21

9. Calculate the median (Answer: 50.3):

CI 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89

f 5 9 14 20 25 15 8 4


CI 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 – 70 71 - 80 81 - 90

f 3 7 13 19 14 11 9 9 5

11. Calculate the median marks (Answer: 42):

Marks Below 10 20 30 40 50 60 70 80 90 100

No. of students 5 12 25 45 70 80 88 92 96 100


Value less than 10 20 30 40 50 60 70 80

No. of students 4 16 40 76 96 112 120 125



13. Calculate the median marks (Answer: 30):

Values above 10 20 30 40 50 60

No. of students 50 40 25 16 10 2


Marks more than 10 20 30 40 50 60 70 80

Frequency 115 103 88 68 43 23 13 3

15. In a group of 1000 wage earners, the monthly wages of 4% are below ₹60 and those of 15% are under

₹62.50. 15% earned ₹95 and over, and 5% got ₹100 and over. Find the median wage (Answer: 78.75).

16. 10% of the workers in a factory employing a total of 1000 workers, earn between ₹5 and 9.99, 30%

between ₹10 and ₹14.99, 250 workers between ₹15 and 19.99 and the rest ₹20 and above. What is the

median wage? (Answer: 17)

17. Compute the median after amending the table (Answer: 14):

X f x f

Less than 5 7 20 – 25 20

Less than 10 20 25 and above 5

5 – 15 38 30 and above 1

15 and above 35


Mid values 115 125 135 145 155 165 175 185 195

Frequencies 6 25 48 72 116 60 38 22 3


Mid values 5.5 9.5 13.5 17.5 21.5 25.5

Frequencies 12 23 40 65 17 3

20. Calculate the median using Ogive curve (Answer: 46.6):

Wages 0 – 20 20 – 40 40 – 60 60 – 80 80 – 100

No. of workers 82 112 150 95 48

21. Locate the median using Ogive curve (Answer: 44):

Marks Less Than 20 30 40 50 60 70

No. of Students 5 13 24 39 52 60



22. Marks of 100 students are given below. If median is 33, find the missing frequencies. (Answer: 17, 16)

Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70

No. of Students 12 15 - 20 - 10 10

23. Find the missing frequencies if the value of median is 36.5 and N = 120. (Answer: 30, 11)

Class Interval 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60

Frequencies 8 15 28 - 22 - 4 2

Mode According to A M Tuttle, “Mode is the value which has the greatest frequency in the neighborhood.” Just as

median, mode too is a positional average. So, the most frequent or the item which is repeated maximum

times in the series is the mode of the series.

Merits of Mode

1. Mode is not affected by extremely large or small items.

2. Mode can be determined in open-ended classes without assuming the class limits.

3. The value of mode can be determined graphically, whereas, the value of mean cannot be ascertained.

Limitations / Demerits of Mode

1. The value of mode cannot always be determined. For instance, bi-modal and multi-modal series.

2. Mode is not capable of further algebraic treatment.

3. The value of mode is not based on each item.

4. It is not a rigidly defined measure. So it is the most unstable average.

Formulae

Individual Series: The variable that occurs most frequently.

Discrete Series: The value which has the greatest frequency in the neighborhood.

Continuous Series: Z or M0 = L + ∆1

∆1 + ∆2 x i; ∆1 = |f1 – f0| and ∆2 = |f1 – f2|

Bi-modal Class: Z or M0 = 3 median − 2 mean

Exercise 4.3

1. Find the mode: 3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3

2. Find the mode: 54, 66, 42, 64, 44, 86, 104, 94, 100, 80, 72, 64, 64, 44, 64, 72, 54, 54, 48, 52, 50

3. Find the mode: 122, 234, 638, 420, 512, 234, 270, 420, 900, 195, 360

4. Find the mode (Answer: 4):

x 1 2 3 4 5 6

f 2 8 11 18 9 7



5. Compute the mode (Answer: 32):

x 8 16 24 32 40 48

f 2 4 20 19 10 5

6. Calculate the mode (Answer: 8):

x 2 4 6 8 10 12 14

f 6 8 16 16 12 6 4


Wages 48 – 56 56 – 64 64 – 72 72 – 80 80 – 88 88 – 96 96 – 104

No. of Workers 8 3 11 14 5 7 2

8. Compute the mode (Answer: 52.833):

CI 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89

f 5 9 14 20 25 15 8 4

9. Find the mode (Answer: 11.35):

Attendance below 5 10 15 20 25 30 35 40 45

No. of students 29 224 465 582 634 644 650 653 655

10. Twenty percent of the workers in a firm employing a total of 2000 workers earn less than ₹2.00 per hour,

440 earn from ₹2.00 to ₹2.24 per hour, 24% earn from ₹2.25 to ₹2.49 per hour, 370 earn from ₹2.50 to

₹2.74 per hour, 12% earn from ₹2.75 to ₹2.99 per hour and the rest ₹3.00 or more per hour. Set up a

frequency table and calculate the modal wage. (Answer: 2.3117)

11. Compute the mode (Answer: 40):

CI 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80

f 4 12 24 32 32 16 8 2

12. Compute the mode (Answer: 89.5):

CI 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109 110 – 119

f 7 9 10 6 13 10 13 10


Weight (in Kg) 5 10 15 20 25 30 35 40

No. of persons 8 19 27 45 24 45 22 10



14. Find the mode (Answer: 59.62):

Weight (in Kg) 45 48 52 56 60 64 68 72 76 80

No. of persons 110 116 116 100 96 96 96 84 72 62

15. Locate the mode using Histogram, Frequency polygon and smoothed frequency curve (Answer:50.71):

Class Interval 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70

Frequencies 5 30 90 180 250 260 130

16. Locate the mode using Histogram, Frequency polygon and smoothed frequency curve (Answer: 24.44):

Class Interval 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60

Frequencies 14 23 35 20 8 5

17. Find the mean, median and mode (Answer: Mean = 151.29, Median = 149.6, Mode = 146.211):

Mid values 115 125 135 145 155 165 175 185 195

Frequencies 120 116 116 100 96 96 96 84 72

18. Find the mean, median and mode:

90, 78, 86, 51, 96, 104, 51, 78, 50, 72, 49, 77, 90, 74, 69, 70, 68, 69, 104, 80, 79, 54, 79, 73, 58, 91, 78, 67,

50, 84, 76, 110, 53, 74, 40, 60, 42, 82, 41, 76, 84, 76, 42, 65, 60, 77, 61, 75, 115, 81

19. a. Z = 50, and M = 45. X̅ = ?

b. X̅ = 12, Z = 13, M = ?

c. If Mean = 20.2, Median = 22.1, find the mode.


Chapter 5: Measures of Variation

Kafka defines measures of variation as, “the measurement of the scatteredness of the mass of figures in a

series about an average.”

Objectives of Measuring Variation

1. To measure exactly the reliability of an average

2. To serve as the basis for the control of variability

3. To compare two or more series with regard to their variability

4. To facilitate the use of other statistical measures

Properties of a Good Measure of Variation

1. It should be simple to understand.

2. It should be easy to compute.

3. It should be well defined.

4. It should be based on each item of the distribution.

5. It should be capable of further algebraic treatment.

6. It should have sampling stability.

7. It should not be affected by extreme values.

Relative and Absolute Measures of Variation

1. Absolute measures of dispersion are expressed in the same statistical unit in which the original data are

given, such as Rupees, kg, tons etc. These variables may be used to compare the variation in two

distributions if the variables are expressed in the same units, and are of the same average size.

2. If the two sets of data are expressed in different units, such as quintals of sugar versus tons of

sugarcane, or if the average size is very different, such as the manager’s salary versus worker’s wages,

relative measures should be used. Relative measures of dispersion are also called a coefficient of

dispersion.

Some important measures of dispersion are discussed below.

Range

Definition

Range is defined as “The difference between the two extreme items of the distribution” or the difference

between the largest and smallest items of the distribution.

Merits of Range

1. Range is simple to understand

2. It is easy to calculate

3. It gives a quick rather than an accurate picture of variability.

Limitations of Range

1. It is not based on each observation

Measures of Variation


2. It is affected by extreme values in the series

3. It cannot be calculated for open-ended classes

4. It is highly affected by fluctuations of sampling

Uses of Range

1. It is useful in studying the variations in the prices of shares and stock, gold, jewelry etc.

2. In weather forecasts, range is used to determine the difference between the maximum and minimum

temperature.

3. It is used in industries for statistical quality control.

Interquartile Range & Quartile Deviation

Meaning

Inter-quartile range includes the middle 50% of the distribution. In other words, it represents the difference

between the third quartile and the first quartile.

Merits of Quartile Deviation

1. It is based on 50% of the observations

2. QD can be calculated for open ended classes also, because Q1 and Q3 are positional averages.

3. It is not affected by extreme values.

Limitations of Quartile Deviation

1. It ignores 50% items.

2. It is not a measure of dispersion as it does not show the scatter around an average.

3. It is not capable of further algebraic treatment.

4. It is affected if the central items are irregular.

5. It is highly affected by sampling fluctuations

6. It is not affected by distribution of items outside the two quartiles.

Formulae

Range: L – S (Where L = Largest variable and S = Smallest variable)

Coefficient of Range: L−S

L+S

Interquartile Range: IQR = Q3 – Q1

Quartile Deviation: QD = Q3− Q1

2

Quartile Deviation: CQD = Q3− Q1

Q3+ Q1

Exercise 5.1

1. Compute the range and coefficient of range of the following series and state which is more dispersed.

a. 13, 14, 15, 16, 17 b. 9, 12, 15, 18, 21 c. 1, 8, 15, 22, 29

Individual & Discrete Series Continuous Series

Q1 [(n + 1)

4]

th

term L +

N4

− c. f.

f x i

Q3 [3(n + 1)

4]

th

term L +

3N4

− c. f.

f x i



2. Find the range and coefficient of range of the following distribution (36, 0.75):

x 6 12 18 24 30 36 42

f 7 18 15 25 30 20 16

3. Compute range and coefficient of range of the following series (Answer: 80, 1):

CI 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80

F 5 10 25 30 20 10 5 5

4. From the following data, calculate the Quartile Deviation and its Coefficient (Answer: 19.75, 0.339)

30, 43, 48, 89, 54, 25, 84, 61, 67, 37, 72, 80

5. Calculate the Quartile Deviation and its Coefficient from the following data (Answer: 1.5, 0.0244):

X 58 59 60 61 62 63 64 65 66

F 15 20 32 35 33 22 20 10 8

6. Compute the Quartile Deviation and its Coefficient from the following data (Answer: 5, 0.25):

Wages 5 10 15 20 25 30

Frequency 7 12 37 25 22 11


Wages (₹) 4 – 8 8 – 12 12 – 16 16 – 20 20 – 24 24 – 28 28 – 32 32 –36 36 - 40

No. of workers

6 10 18 30 15 12 10 6 2


CI 5 – 7 8 – 10 11 – 13 14 – 16 17 – 19

f 14 24 38 20 4

Mean Deviation

Meaning

It is the average difference between the items in a distribution and the mean of that series.

Merits of Mean Deviation

1. It is simple to understand and easy to compute.

2. It is based on each item of the data.

3. It is less affected by the values of extreme items than the Standard Deviation.



4. Since deviations are taken from a central value, comparison about the formation of different

distributions can easily be made.

Limitations of Mean Deviation

1. Algebraic signs are ignored.

2. It is not capable of further algebraic treatment.

3. It is rarely used in social science studies.

4. It does not give us accurate results.

Formulae

Individual Series Discrete Series Continuous Series

Mean Deviation Ʃ |D|

n

Ʃ f |D|

N

Ʃ f |D|

N

|D| |x − x̅| 𝑜𝑟 |x − M| |x − x̅| 𝑜𝑟 |x − M| |m − x̅| 𝑜𝑟 |m − M|

Coefficient of MD MD

x̅ 𝑜𝑟

MD

M

MD

x̅ 𝑜𝑟

MD

M

MD

x̅ 𝑜𝑟

MD

M

Exercise 5.2

1. Calculate mean deviation & Coefficient of mean deviation using mean and median (Answer: 0.1193):

3000, 4300, 4000, 4800, 4200, 5800, 4600, 4500

2. Calculate mean deviation & Coefficient of mean deviation using mean and median (Answer: 0.38, 0.43):

90, 280, 65, 60, 50, 120, 100, 110, 70, 80, 75

3. Compute the mean deviation and its coefficient using mean and median (Answer: 7.66, 7.6 & 0.38, 0.38):

x 5 10 15 20 25 30 35 40

f 16 32 36 44 28 18 12 14

4. Compute the mean deviation and its coefficient using mean and median (Answer: 1.53, 1.49 & 0.407,

0.372):

No of Home Appliances

0 1 2 3 4 5 6 7

No. of Families 14 21 25 43 51 40 39 12

5. Compute the mean deviation and its coefficient using mean and median (Answer: 11.33 & 0.252):

Marks 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80

No. of Students 4 6 10 20 10 6 4



6. Compute the mean deviation and its coefficient using mean and median (Answer: 7.6, 7.296 & 0.196,

0.194):

Mid Values 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5 62.5

Frequency 6 12 17 28 12 10 8 5 2

7. Compute the mean deviation and its coefficient (Answer: 40.417 & 0.425):

Wages below 25 50 80 110 150 200 300

No. of Workers 4 10 20 40 50 56 60

Standard Deviation Standard Deviation is the square root of the means of the squared deviations from the arithmetic mean. SD is

also known as Root Mean Square Deviation for this reason. It is the most widely used measure of variation.

Differences between Mean Deviation and Standard Deviation

1. Algebraic symbols are ignored while calculating mean deviation, whereas in the calculation of

standard deviation, signs are taken into account.

2. Mean deviation can be computed either from median or mean; standard deviation is always

calculated from mean.

Merits of Standard Deviation

1. It is based on each item of the data.

2. It is amenable to further algebraic treatment. It is possible to calculate the combined SD of two or

more groups.

3. For comparing the variability of two or more groups, coefficient of variation is considered to be the

most appropriate as it is based on mean and standard deviation.

4. Standard deviation is also used in further statistical work. For example, in calculating skewness,

correlation etc., standard deviation is used.

Limitations of Standard Deviation

1. Standard deviation is difficult to compute compared to other measures.

Formulae

Individual Series Discrete & Continuous Series

Direct Method σ = √Ʃd2

n d = x − x̅ σ = √

Ʃfd2

N d = x − x̅ or m − x̅

Short-cut Method σ = √Ʃd2

n− (

Ʃd

n)

2

d = x − A σ = √Ʃfd2

N− (

Ʃfd

N)

2

d = x − A or m − A

Step – Deviation Method

- σ = √Ʃfd′2

N− (

Ʃfd′

N)

2

x i d′ =x − A

i or

m − A

i

Variance = σ2 Coefficient of Variation, CV = σ

x̅ x 100



Exercise 5.3

1. Calculate the standard deviation of the marks of 11 students (Answer: 60.49):

90, 280, 65, 60, 50, 120, 100, 110, 70, 80, 75

2. Calculate the SD and Coefficient of Variation using direct method and shortcut method (Answer: 23.066 &

59.91%):

5, 10, 20, 25, 40, 42, 45, 48, 70, 80

3. Following are the runs scored by two batsmen X and Y in ten innings. Find who is a better scorer and who

is more consistent (Answer: CV(X) = 84.072%; CV(Y) = 82.707%):

X 100 22 0 36 82 45 7 13 65 14

Y 97 12 40 96 13 8 85 8 56 16

4. Compute the coefficient of variation (Answer: 43.63%):

x 10 20 30 40 50 60

f 8 12 20 10 7 3

5. The following table gives the age distribution of boys and girls in a high school. Find which of the two

groups is more variable in age. (Answer: CV(boys) = 7.85%; CV(girls) = 7.34%)

Age 13 14 15 16 17

No. of boys 12 15 15 5 3

No. of girls 13 10 12 2 1

6. The goals scored by teams A and B in a few football matches are as follows. Which team is more

consistent? (Answer: CV(A) = 124.94%; CV(B) = 108.97%)

Goals 0 1 2 3 4

No. of matches – Team A 27 9 8 4 5

No. of Matches – Team B 17 9 6 5 3

7. Compute the variance (Answer: 311.52):

Marks 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 - 99

No. of Students 5 12 15 20 18 10 6 4

8. Compute the coefficient of variation from the following data (Answer: 152.77%):

Profit/Loss - 4 – -3 -3 – -2 -2 – -1 -1 – 0 0 – 1 1 – 2 2 – 3 3 – 4 4 – 5 5 – 6

No. of shops 4 10 22 28 38 56 40 24 18 10



9. Find which class is more consistent in scoring marks, from the following table (Answer: 24.99 & 23.53):

Marks 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70

Class A 7 10 20 18 7

Class B 5 9 21 15 6

10. Following data relates to the wages of workers in factories A and B. which factory wages are more

variable (Answer: CV(A) = 54.14%; CV(B) = 49.89?

Wages up to (₹) 5 10 15 20 25 30

No. of workers – A 20 38 68 93 113 128

No. of workers – B 15 35 70 100 118 135

11. The number of employees, average wages per employee, and the variance of wages for two factories is

given below. (Answer: 2.5% and 4.71%)

Factory A Factory B

No. of employees

50 100

Average wages ₹120 ₹85

Variance 9 16

In which factory is there greater variation in the distribution of wages/employees? Which factory pays more?

12. Mean and standard deviation of the following continuous series are 31 and 15.9 respectively. The

distribution after taking step deviations is as follows. Determine the class intervals. (Answer: i = 10, CI = 0

– 10, 10 – 20 etc.).

d' -3 -2 -1 0 1 2 3

f 10 15 25 25 10 10 5

13. a. If x̅ = 56 and Variance = 144, find CV. (Answer: 21.43)

b. If Variance = 16, and CV = 50% find x̅. (Answer: 8)

c. If CV = 58% and x̅ = 36.55, find σ. (Answer: 21.2)


Chapter 6: Measures of Skewness

Skewness is a measure of asymmetrical statistical distribution. It characterizes the degree of symmetry or

asymmetry around its mean.

Absolute and Relative Measures of Skewness

1. Absolute measures of Skewness

Absolute measure of skewness explains the extent of asymmetry and the direction.

2. Relative Measures of Skewness

Relative measure of skewness is useful for comparative study of two or more series

Symmetrical Distribution

A distribution is symmetrical if the Mean = Median = Mode

A distribution is positively skewed if Mean > Median > Mode

A distribution is negatively skewed if Mean < Median < Mode

Interpretation of coefficient of skewness

If skewness is less than -1 or greater than +1 (-1 >Skp or Skp> +1), the distribution is highly skewed

If skewness is between -1 and -½ or between +½ and +1 (-1 ≤ Skp ≤ -½ or +½ ≤ Skp ≤ +1), the distribution is

moderately skewed

If skewness is between -½ and +½ (-½ ≤ Skp≤ +½), the distribution is approximately symmetric.

Uses of Skewness

1. Skewness is a measure to study whether a distribution is symmetrical or not.

2. Many models assume normal distribution; i.e., data are symmetric about the mean. The normal

distribution has a skewness of zero. But in reality, data points may not be perfectly symmetric. So, an

understanding of the skewness of the dataset indicates whether deviations from the mean are going

to be positive or negative.

Differences between Measures of Variation and Skewness

Dispersion:

1. It is concerned with the amount of dispersion

2. It gives scatterdness of the observations

3. It does not depend on the skewness

4. It is based on the averages of the first order (Mean, Median and Mode)

Skewness:

1. It tells us about the direction of the variation or departure from the symmetry

2. It indicates to what extent and in what direction the distribution differs from the symmetry

3. It depends on the dispersion to some extent.

4. It is based on the averages of the first order (Mean, Median and Mode) and second order (SD)

Formulae

For unimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = X ̅− M0

σ

For bimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = 3(X̅ − M)

σ

Measures of Skewness


Bowley’s Coefficient of Skewness, SB = 𝑄3+ 𝑄1 − 2M

𝑄3 − 𝑄1

Exercise 6.1

1. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: 0.3453 & 0.2):

23, 45, 12, 28, 23, 19, 27, 23, 28, 30


112, 75, 140, 89, 112, 98, 134, 129, 98, 121, 136

3. Calculate Pearson’s and Bowley’s Coefficients of Skewness (Answer: – 0.2445 & 0):

x 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5

f 35 40 48 100 125 87 43 22

4. Compute the two Coefficients of Skewness (Answer: – 0.8761 & -0.2):

x 4 8 12 16 20 24 28 32 36

f 18 21 20 9 7 20 22 17 8

5. Which group is more skewed?

i) Mean = 22; Median = 24, SD = 10 ii) Mean = 22, Median = 25, SD = 12

6. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness. (Answer: –0.0518 & –0.0165)

Class Interval 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80

Frequency 6 12 22 48 56 32 18 6


Marks Above 0 10 20 30 40 50 60 70 80 90

No. of Students 100 98 95 90 80 50 35 23 13 5

8. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: –0.2078 & –0.058):

Mid Value 21 27 33 39 45 51 57

Frequency 18 22 40 50 38 12 4

9. Compute Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: –0.310 & –0.2314):

CI 3 – 7 8 – 12 13 – 17 18 – 22 23 – 27 28 – 32 33 – 37 38 – 42

f 7 9 10 6 13 10 13 10

10. a. In a distribution, Mean = 65, Median = 70, Skp = – 0.6. Find i) Mode and ii) CV. (Answers: 80, 38.46%)

b. Skp = – 0.7, σ = 6, M = 12.8. Find the Mean and CV. (Answers: 11.4, 52.63%)


Chapter 7: Correlation and Regression

The statistical tool with the help of which the relationship between two or more variables is studied is called

correlation. The measure of correlation is called the Correlation Coefficient.

Uses of Correlation Coefficient

1. Helps us measure the relationship between the variables.

2. If the variables are closely related, we can estimate the value of one variable, given the value of

another with the help of Regression Analysis

3. Helps in analyzing the economic behavior

4. Helps in the study of social science. For e.g. The relationship between smoking and lung cancer.

Correlation and Causation

1. The correlation may be due to pure chance, especially in a sample. For e.g., relationship between

salary and weight.

2. Both the correlated variables may be influenced by one or more variables. For e.g., a high degree of

correlation between the yield per acre of rice and wheat may be due to heavy rainfall or fertilizers

used.

3. Both the variables may be mutually influencing each other, so that neither can be designated as

cause and other effect. For e.g., demand and price.

4. Nonsense / Illusory Correlation: A correlation between two variables that is not due to any causal

relationship but related to a third variable, or to random sampling fluctuations. E.g. Global warming

and no. of pirates.

Types of Correlation

1. Positive Correlation or Direct Correlation: When the two variables are directly related, i.e., when one

increases the other also increases, it is said to be positive correlation. For e.g., Supply and price.

2. Negative or Indirect Correlation: When the two variables are inversely related, i.e., when one

increases the other decreases, it is said to be negative correlation. For e.g., Demand and supply

3. Partial Correlation: When one variable is independent and the other is dependent on the former, it is

a case of partial correlation

4. Simple Correlation: When only two variable are studied, it is called simple correlation

5. Multiple Correlation: When three or more variables are studied, it is called multiple correlation

6. Linear Correlation: When the two variable change by a fixed proportion, thus forming a straight line,

it is said to be linear correlation

7. Non-linear or Curvilinear Correlation: If the variables, when plotted on a graph do not form a straight

line, it is said to be curvilinear correlation. In other words, the amount of change in one variable does

not bear a constant change in the other variable.

Methods of Determining Correlation

1. Karl Pearson’s Coefficient of Correlation 2. Spearman’s Rank Coefficient of Correlation

3. Concurrent Deviation Method 4. Scatter Diagram method 5. Method of Least Squares

Correlation and Regression


Karl Pearson’s Coefficient of Correlation

This is the most widely used method of measuring correlation. It is popularly known as Pearsonian coefficient

of correlation. It is denoted by the symbol ‘r’.

Assumptions While Using Karl Pearson’s Coefficient of Correlation

While using Karl Pearson’s coefficient of correlation, it is assumed that,

1. The distribution is normal

2. There is cause and effect relationship between the variables.

3. There is a linear relationship between the variables.

Properties of Karl Pearson’s Coefficient of Correlation

1. The value of r always lies between -1 and +1. Interpretation: ±1 – Perfect correlation; ±0.9 to ±0.1 –

Very high degree; ±0.75 to ±0.9 – High degree; ±0.60 to ±0.75 – Moderate degree; ±0.30 to ±0.60 –

Low degree; 0 to ±0.30 – Very low degree; 0 – No correlation.

2. It is independent of change of scale and origin of X and Y variables.

3. It is the geometric mean of two regression coefficients.𝑟 = √𝑏𝑥𝑦 x 𝑏𝑦𝑥

Merits of Karl Pearson’s Coefficient of Correlation

1. This is the most popular among the mathematical methods

2. It summarizes in one value the degree of correlation and its direction – direct or inverse.

Limitations of Karl Pearson’s Coefficient of Correlation

1. It assumes a linear relationship.

2. There are chances of misinterpretation.

3. It is more time consuming compared to other methods.

Probable Error

It is the value that helps determine the reliability of the value of the correlation coefficient in the condition of

random sampling. It helps interpret the correlation coefficient.

Methods of Interpretation

1. If r < 6PE, the value of r may not be significant.

2. If r > 6PE, the value of r is significant or practically certain.

3. Using the limits of population, we get the range within which population correlation lies. ρ = r ± PE

Formulae

Using Actual Mean: r = Σdx.dy

√Σdx2 x Σdy2

Using Assumed Mean: r = Σdx.dy −

Σdx.Σdy

N

√Σdx2− (Σdx)2

N x √Σdy2−

(Σdy)2

N

Probable Error: P. E = 0.6745 x 1 − r2

√N



Exercise 7.1

1. Compute the coefficient of correlation from the following data: (Ans.: +0.9243)

Internal Marks 25 30 22 12 19 24

External Marks 56 68 40 24 28 60


X 6 8 9 14 17 28 24 31 7

Y 10 12 15 15 18 25 22 26 28


X 45 55 56 58 60 65 68 70 75 80

Y 56 50 48 60 62 64 65 70 74 82

4. Compute the coefficient of correlation from the following data: (Ans.: – 0.7327)

X 43 44 46 40 44 42 45 42 38 40 42 57

Y 29 31 19 18 19 27 27 29 41 30 26 10

5. Calculate the coefficient of correlation between age and playing habits of students: (Ans.: – 0.9895)

Age 15 16 17 18 19 20

No. of Students 250 200 150 120 100 80

Regular Players 250 150 90 48 30 16

6. The following table gives the distribution of the total population and those blind among them. Calculate the

coefficient of correlation and probable error. (Ans.: 0.898)

Age 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80

No. of Persons (‘000) 100 60 40 36 24 11 6 3

Blind Persons 55 40 40 40 36 22 18 15

7. Calculate ‘r’ between age and failure of candidates in the results of B.Com students: (Ans.: 0.7745)

Age 13 – 14 14 – 15 15 – 16 16 – 17 17 – 18 18 – 19 19 – 20 20 – 21 21 – 22 22 – 23

Candidates Appeared

200 300 100 50 150 400 250 150 25 75

Candidates failed 76 120 35 16 51 148 105 69 13 42

8. a. If r = +0.111 and N = 5, find PE (Answer: 0.2984)

b. If r = +0.9668 and PE = 0.01463, find N (Answer: 9) [Hint: Round off answer to the closest whole number]

c. If PE = 0.11857 and N = 8, find r. (Answer: +0.709)



Spearman’s Rank Correlation

Formulae

Unique Ranks: rs = 1 − 6 Σd2

N3− N 𝑑 = 𝑅1 − 𝑅2

Tied Ranks: rs = 1 − 6 [Σd2 +

1

12(m1

3− m1) + 1

12(m2

3− m2)+⋯+ 1

12 (mn

3 − mn)]

N3− N where m = No. of tied ranks

Exercise 7.2

1. Two ladies ranked seven brands of lipsticks as follows. Find the degree of agreement (Ans.: 0.786):

Lady 1 1 3 2 7 6 4 5

Lady 2 2 1 4 6 7 3 5

2. In a beauty competition, two judges ranked 12 participants as follows. What is the degree of agreement

between them? (Ans.: – 0.4546)

X 3 4 1 5 2 10 6 9 8 7 12 11

Y 6 10 12 3 9 2 5 8 7 4 1 11

3. Compute the rank correlation from the following data (Ans.: 0.8322):

X 60 34 40 50 45 41 22 43 42 66 64 46

Y 75 32 35 40 45 33 12 30 36 72 41 57

4. From the marks scored in accountancy and statistics by 12 students, compute rank correlation (Ans.: 0):

Accountancy 60 15 20 28 12 40 80 20

Statistics 10 40 30 50 30 20 60 30

5. Compute the coefficient of rank correlation (Ans.: 0.733):

X 48 33 40 9 16 16 65 24 16 57

Y 13 13 24 6 15 4 20 9 6 19

6. Compute the rank correlation between the length of service and order of merit (Ans.: 0.7937):

Length of Service 5 2 10 8 6 4 12 2 7 5 9 3

Order of Merit 6 12 1 9 8 5 2 10 3 7 4 11

7. Ten competitors in a voice contest are ranked by three judges in the following order. Find which pair of

judges have the nearest approach to common liking in voice (Ans.: -0.212, -0,297, 0.6364; Judges 1 & 3):

Judge 1 1 6 5 10 3 2 4 9 7 8

Judge 2 3 5 8 4 7 10 2 1 6 9

Judge 3 6 4 9 8 1 2 3 10 5 7



Regression The statistical tool with the help of which we are in a position to estimate or predict the unknown values of

one variable from known values of another variable is called regression.

Correlation vs. Regression

1. Correlation coefficient is a measure of degree of co-variability between two variables, but regression

analysis helps to predict the value of one variable given the value of the other.

2. The cause and effect relation is clearly indicated more through regression analysis than by

correlation, which is more a tool of ascertaining the degree of relationship between the variables.

Formulae

Equation X on Y: (X − X̅) = bxy (Y − Y̅)

Equation Y on X: (Y − Y̅) = byx (X − X̅)

Formulae to Find the Regression Coefficients:

Using Actual Mean: bxy = Σdx.dy

Σdy2 ; byx = Σdx.dy

Σdx2

Using Assumed Mean: bxy = N Σdx.dy − Σdx.Σdy

N Σdy2 − (Σdy)2 ; byx = N Σdx.dy − Σdx.Σdy

N Σdx2 − (Σdx)2

Using Standard Deviation: bxy = r.σx

σy; byx = r.

σy

σx

Coefficient of Correlation: r = √bxy x byx

Exercise 7.3

1. Find the Regression Equations (Answer: X = 1.3Y – 4.4 & Y = 0.65X + 4.1):

X 2 4 6 8 10

Y 5 7 9 8 11

2. A panel of judges P & Q graded seven dramatic performances by awarding marks as follows. Obtain the

two Regression Equations: (Answer: X = 0.75Y + 14.5 & Y = 0.75X + 5.75)

Performance 1 2 3 4 5 6 7

Marks by P 46 42 44 40 43 41 45

Marks by Q 40 38 36 35 39 37 41

3. Following Table shows the exports of raw cotton and the imports of manufactured goods into India for

seven years.

Exports 42 44 58 55 89 98 60

Imports 56 49 53 58 67 76 58



Obtain the two Regression Equations and estimate the imports when export in a particular year was ₹ 70

crore. (Answer: 62.03; X = 2.198Y – 67.244 & Y = 0.391X + 34.651)

4. The advertisement expenses and sales data of ABC company are as follows:

Advertisement Expenses (₹ Lakh) 60 62 65 70 73 75 71

Sales (₹ Crore) 10 11 13 15 16 19 14

Find:

a. Sales for advertisement expenses of ₹ 80 lakhs. (Answer: ₹ 20.525 Crore)

b. Advertisement expenses for a sales target of ₹ 25 Crore. (Answer: ₹ 87.786 Lakh)

c. Coefficient of Correlation (Answer: 0.9870)

(The Regression Equations are: X = 1.807Y + 42.619 and Y = 0.539X – 22.613)

5. Following data are available on sales and advertisement:

Sales (₹) Advertisement Expenses (₹)

Mean 70,000 15,000

Standard Deviation 15,000 3,000

Coefficient of correlation is +0.8

Find:

a. The two Regression Equations (Answer: X = 4Y + 10,000 & Y = 0.16X + 3,800)

b. The advertisement budget if the company desires to achieve the target sales of ₹ 1,00,000

(Answer: ₹ 19,800)

6. Coefficient of correlation between the ages of brothers and sisters in a community was found to be 0.8.

Average age of the brothers was 25 and that of sisters 22 years. Their variances were 16 and 25

respectively.

Find:

a. The expected age of the brother when sister’s age is 12 years. (Answer: 18.6 years)

b. The expected age of the sister when brother’s age is 33 years. (Answer: 30 years)

(The Regression Equations are: X = 0.64Y + 10.92 and Y = X – 3)

7. a. 𝐼𝑓 𝑟 = 0.42, 𝜎𝑦 = 16.8 𝑎𝑛𝑑 𝜎𝑥 = 10.8, 𝑓𝑖𝑛𝑑 𝑏𝑥𝑦 𝑎𝑛𝑑 𝑏𝑦𝑥 (Answers: 0.269 & 0.653)

b. 𝐼𝑓 𝑏𝑥𝑦 = 0.2, 𝑟 = 0.533 𝑎𝑛𝑑 𝜎𝑥 = 5, 𝑓𝑖𝑛𝑑 𝜎𝑦 (Answer: 13.325)

c. 𝐼𝑓 𝑏𝑥𝑦 = 2.1 𝑎𝑛𝑑 𝑏𝑦𝑥 = 0.456, 𝑓𝑖𝑛𝑑 𝑟 (Answer: 0.978)

d. 𝐼𝑓 𝑏𝑥𝑦 = 2 𝑎𝑛𝑑 𝑟 = 0.578, 𝑓𝑖𝑛𝑑 𝑏𝑦𝑥 (Answer: 0.167)


Chapter 8: Index Numbers

A specialized average designed to measure the change in the level of phenomenon with respect to time,

geographic location or other characteristics such as income, price, etc.

Features of Index Numbers

1. Index numbers are specialized averages

An average is not suitable measure of comparing different groups of data if they are expressed in

different units. But index numbers help compare different groups of data even if they are expressed

in different unites. For instance, the spending on food, clothing, house rent etc. can be compared

using index numbers.

2. Index numbers measure the change in the level of phenomenon

For instance, if the index of industrial production is 108 in 2012 compared to 100 in 2011, it means

there is a net increase of 8% in industrial production.

3. Index numbers measure the effect of change over a period of time

For instance, BSE index, introduced in 1986, is used to study the movements in the share prices till

date.

Uses of Index Numbers

1. They help in framing suitable policies: For instance, wages and salaries are adjusted based on

Consumer Price Index.

2. They reveal trends and tendencies: For instance, to study the export trend after economic

liberalization in 1991, the current index can be compared with that of 1991.

3. Useful in deflating: Deflation is the process of adjusting original data for price changes. For instance,

nominal income can be adjusted to real income.

Types of Index Numbers

1. Unweighted Index: The method of constructing index numbers in which weights are not assigned to

the items is called Unweighted Index. It includes Simple Aggregative and Simple Average of relatives.

2. Weighted Index: The method of constructing index numbers in which weights are assigned to the

items is called weighted index. It includes Weighted Aggregative and Weighted Average of Relatives.

Some Important Definitions

1. Base Year: Base year is any reference year earlier than the year for which the indices are calculated.

They are used as the reference points for comparison of changes in phenomenon.

2. Fixed Base: Refers to the base year, which remains fixed over a period of time. The fixed base year

serves as a common standard of comparison for all prices during the period.

3. Chain Base: Refers to the base year which changes from year to year. Generally the previous year will

be the base year for calculation index number for the current year.

4. Consumer Price Index or Cost of Living Index: CPI measures the effect of change in prices of

consumer goods which may include may include food, clothing, fuel, lighting, house rent etc., on the

working class families or consumers, during any year with respect to some fixed year.

5. Time Reversal Test: A formula for an index number should maintain time consistency by working

both forward and backward with respect to time. This is called time reversal test. It is expressed in

the form of an equation as follows: P01 x P10 = 1

Index Numbers


6. Factor Reversal Test: The index must permit interchanging the prices and quantities without giving

inconsistent results. The two results multiplied together should give a true value ratio. This is given by

the expression: P01 x Q01 = Σ p1q1

Σ p0q0

Points to be Considered While Selecting the Base Year

1. It should be a normal year

2. It should not be too distant in the past

3. Fixed base or Chain base

Limitations of Index Numbers

1. Sampling errors

2. It is assumed that the quality of the products remain the same

3. Specific index for specific purpose

4. It is assumed that there is no change in tastes, habits and customs

5. No single formula to calculate the index which may be suitable for all situations

6. Unreliable comparisons over longer periods

7. It is difficult to select a normal year as base year

Fisher’s Ideal Index Number

Fisher’s Index Number is called ideal for the following reasons:

1. It is based on geometric mean which is considered to be the best average for constructing index

numbers

2. It takes into account both, current year as well as base year prices and quantities.

3. It satisfies both Time Reversal Test (TRT) and Factor Reversal Test (FRT).

4. It is free from bias.

Formulae

Simple Aggregative Method: P01 = Σp1

Σp0 x 100

Weighted Aggregative Method: P01 = Σp1q0

Σp0q0 x 100

Simple Average of Price Relatives Method: P01 = ΣI

n =

Σ(p1p0

x 100)

n

CPI/CLI: Aggregate Expenditure Method: P01 = Σp1q0

Σp0q0 x 100

Family Budget Method: P01 = ΣIW

ΣW where I =

p1

p0 x 100 and W = p0q0

Fisher’s Ideal Index Number: P01 = √Σp1q0

Σp0q0 x

Σp1q1

Σp0q1 x 100

Time Reversal Test: P01x P10 = √Σp1q0

Σp0q0 x

Σp1q1

Σp0q1 x √

Σp0q1

Σp1q1 x

Σp0q0

Σp1q0= 1

Factor Reversal Test: P01x Q01 = √Σp1q0

Σp0q0 x

Σp1q1

Σp0q1 x √

Σp0q1

Σp0q0 x

Σp1q1

Σp1q0=

Σp1q1

Σp0q0

Index Numbers


Exercise 8.1

1. Calculate the price index for 2006, 2007 and 2008 using the simple aggregative method on the basis of 1995

(Answers: 124.37, 139.42, 153.20):

Commodity Unit 1995 2006 2007 2008

Rice Kg ₹10.50 ₹12.10 ₹14.30 ₹18.60

Wheat Kg 9.25 11.40 12.70 13.40

Milk L 4.75 7.00 9.00 10.50

Sugar Kg 8.60 14.00 16.00 17.00

Oil Kg 27.50 32.00 35.00 36.50

Pulses Kg 11.20 12.80 13.10 14.00

2. Calculate the weighted aggregative index number for the following commodities for the year 2001 and 2008 taking

the year 1991 as the base year (Answers: 130.45, 156.97):

Commodity Units Consumed

1991

Price per unit (₹)

1991 2001 2008

Rice 10 kg 11.00 16.50 18.00

Wheat 5 kg 10.20 12.25 14.00

Grams 3 kg 5.00 7.00 9.00

Milk 30 litres 6.70 9.00 10.50

Oil 4 kg 29.00 32.00 38.00

Sugar 12 kg 8.80 11.30 16.30

3. Calculate the price index numbers for the following data for 2007 and 2008 using simple average of price relative

method (Answers: 147, 196):

Commodity Bricks Timber Board Sand Cement

Prices – 2001 10 20 5 2 7

Prices – 2007 16 21 6 3 14

Prices – 2008 18 22 7 5 21

4. Calculate the index number for the following data using simple average of price relative method (Answer: 122.92)

Commodity A B C D E F

Prices – 2008 4 6 2 5 8 10

Prices – 2009 5 6 3 7 9 11

Index Numbers


5. Calculate the Consumer Price Index or Cost of Living Index Number using Aggregative Expenditure Method and Family

Budget Method (Answer: 150):

Item Quantity Price

2005 2005 2010

A 5 8 15

B 2 9 12

C 3 16 20

6. Calculate the CPI using Aggregative Expenditure Method and Family Budget Method (Answer: 118.77):

Item Quantity Price

2008 2008 2009

A 6 quintals 5.75 6.00

B 6 quintals 5.00 8.00

C 1 quintal 6.00 9.00

D 6 quintals 8.00 10.00

E 4 kg 2.00 1.50

F 1 quintal 20.00 15.00

7. An enquiry into the budgets of middle class families in Bangalore gave the following information:

Commodity Food Rent Clothing Fuel Miscellaneous

Expenses – 2007 35% 15% 20% 10% 20%

Price relatives – 2008 116 120 125 125 150

What changes in the cost of living index of 2008 have taken place as compared 2007? How much dearness allowance

should be given to a worker who was drawing ₹200 as wages in 2007? (Answers: 126.10 & ₹52.20)

8. Following information relating to workers in an industrial town is given:

Item Food &

Beverages Clothing

Fuel & Lighting

Housing Miscellaneous

Group Index – 2009 (Base 2004)

225 185 150 200 180

Proportion of Expenditure

50% 10% 10% 15% 15%

Average wage per month in 2004 is ₹750. What should be the average wage per worker in 2009 in that town so that

the standard of living of the workers does not fall below that of 2004? (Answers: 203 & ₹1,522.50)

9. An enquiry into the budget of the middle class families in a city gave the following information. What changes in the

cost of living figures of 2005 as compared to that of 2002 are seen? (Answer: 102.75)

Index Numbers


Item Percentage Expenses

Price (₹) Price (₹)

2002 2005

Food 29% 140 147

Rent 15% 30 30

Clothing 25% 75 66

Fuel 10% 25 20

Miscellaneous 21% 40 52

10. The data below show the percentage increase in prices of selected food items and the weights attached to each of

them. Calculate the index number for the food group (Answer: 340, 304.6)

Food Item: Rice Wheat Dal Ghee Oil Spices Milk Fish Vegetables Refreshments

Weights 33 11 8 5 5 3 7 9 9 10

Increase in Price %

180 202 115 212 175 517 260 426 332 279

Using the above food index and information given below, calculate the cost of living index number:

Commodity: Food Clothing Lighting Rent Miscellaneous

Index - 310 220 150 300

Weight 60 - 8 9 18

11. The cost of living index number on a certain date was 200. From the base period, the percentage increase in prices

were Rent – ₹60, clothing – ₹250, Fuel and lighting – ₹150, Miscellaneous – ₹120. The weights of different groups

were Food – 60, Rent – 16, clothing – 12, fuel and lighting – 8, and miscellaneous – 4. What was the percentage

increase in food group? (Answer: 72.67)

12. A textile worker earns ₹350 per month. The cost of living index for that particular month is known to be 136. Using

the data given below, find the amounts spent by him on house rent and clothing (Answer: 42, 49):

Commodity: Food Clothing House Rent Fuel Miscellaneous

Expenditure 140 ? ? 56 63

Group Index 180 150 100 110 80

13. Compute Fisher’s Ideal Index Number and prove that it satisfies the Time Reversal Test and Factor Reversal Test

(Answer: 134.41):

Year Commodity: A B C D E

2008 Price 10 12 18 20 22

Consumption 49 25 10 5 8

2009

Price 12 15 20 40 45

Consumption 50 20 12 2 5

Index Numbers


14. Compute Fisher’s Ideal Index Number for the following five items (Answer: 266.615):

Commodity Price (₹) Quantity

2008 2009 2008 2009

A 16 40 100 120

B 4 12 30 20

C 2 4 40 50

D 4 10 20 16

E 2 10 80 60

15. Construct Fisher’s Ideal Index Number and prove that it satisfies TRT & FRT (Answer: 165.71):

Year Item: Rice Sugar Oil

2000 Value 210 100 40

Price 14 20 4

2008

Value 300 108 56

Price 25 27 7

16. Compute Fisher’s Ideal Index Number and prove that it satisfies the TRT and FRT (Answer: 112.10):

Year Item A B C D E

Base Year Price 10 12 20 18 28

Value 200 108 260 144 280

Current Year

Value 300 220 250 140 320

Quantity 25 22 10 7 10

17. Compute Fisher’s Ideal Index Number and prove that it satisfies the TRT and FRT (Answer: 219.12):

Year Commodity: A B C D

2013 Price 20 40 10 50

Expenditure 400 160 100 250

2014

Price 50 80 20 100

Expenditure 750 400 240 600


Quantitative Methods – Formulae

Arithmetic Mean


Direct Method X̅ = Σx

n X̅ =

Σfx

N X̅ =

Σfm

N

Shortcut Method X̅ = A + Σd

n; d = x – A X̅ = A +

Σfd

N; d = x − A X̅ = A +

Σfd

N; d = m − A

Step-Deviation Method

- X̅ = A + Σfd′

N x 𝑖; d = x − A X̅ = A +

Σfd′

N x 𝑖; d = m − A

Weighted Arithmetic Mean: X̅ = Σxw

Σw

Combined Arithmetic Mean: X̅(1,2) = n1x̅1+ n2x̅2

n1+ n2

Median

Individual Series: M = [(n+1)

2]

th

term when n is odd and M = [(

n

2)

thterm + (

n

2+1)

thterm

2] when n is even.

Discrete Series: M = [(n+1)

2]

th

term

Continuous Series: M = L + N

2 − c.f.

f x i

Mode

Individual Series: The variable that occurs most frequently.

Discrete Series: The value which has the greatest frequency in the neighborhood.

Continuous Series: Z or M0 = L + ∆1

∆1 + ∆2 x i; ∆1 = |f1 – f0| and ∆2 = |f1 – f2|

Bi-modal Class: Z or M0 = 3 median − 2 mean

Mean Deviation


Mean Deviation Ʃ |D|

n

Ʃ f |D|

N

Ʃ f |D|

N

|D| |x − x̅| 𝑜𝑟 |x − M| |x − x̅| 𝑜𝑟 |x − M| |m − x̅| 𝑜𝑟 |m − M|

Coefficient of MD MD

x̅ 𝑜𝑟

MD

M

MD

x̅ 𝑜𝑟

MD

M

MD

x̅ 𝑜𝑟

MD

M


Range

Range: L – S (Where L = Largest variable and S = Smallest variable)

Coefficient of Range: L−S

L+S

Quartile Deviation

Interquartile Range: IQR = Q3 – Q1

Quartile Deviation: QD = Q3− Q1

2

Quartile Deviation: CQD = Q3− Q1

Q3+ Q1

Individual & Discrete Series Continuous Series

Q1 [(n + 1)

4]

th

term L +

N4

− c. f.

f x i

Q3 [3(n + 1)

4]

th

term L +

3N4

− c. f.

f x i

Standard Deviation

Individual Series Discrete & Continuous Series

Direct Method σ = √Ʃd2

n d = x − x̅ σ = √

Ʃfd2

N d = x − x̅ or m − x̅

Short-cut Method σ = √Ʃd2

n− (

Ʃd

n)

2

d = x − A σ = √Ʃfd2

N− (

Ʃfd

N)

2

d = x − A or m − A

Step – Deviation Method

- σ = √Ʃfd′2

N− (

Ʃfd′

N)

2

x i d′ =x − A

i or

m − A

i

Variance = σ2 Coefficient of Variation, CV = σ

x̅ x 100

Coefficient of Skewness

For unimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = X ̅− M0

σ

For bimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = 3(X̅ − M)

σ

Bowley’s Coefficient of Skewness, SB = 𝑄3+ 𝑄1 − 2M

𝑄3 − 𝑄1

Karl Pearson’s Coefficient of Correlation

Using Actual Mean: r = Σdx.dy

√Σdx2 x Σdy2

Index Numbers


Using Assumed Mean: r = Σdx.dy −

Σdx.Σdy

N

√Σdx2− (Σdx)2

N x √Σdy2−

(Σdy)2

N

Probable Error: P. E = 0.6745 x 1 − r2

√N

Spearman’s Rank Correlation

Unique Ranks: rs = 1 − 6 Σd2

N3− N 𝑑 = 𝑅1 − 𝑅2

Tied Ranks: rs = 1 − 6 [Σd2 +

1

12(m1

3− m1) + 1

12(m2

3− m2)+⋯+ 1

12 (mn

3 − mn)]

N3− N where m = No. of tied ranks

Regression

Equation X on Y: (X − X̅) = bxy (Y − Y̅)

Equation Y on X: (Y − Y̅) = byx (X − X̅)

Formulae to Find the Regression Coefficients

Using Actual Mean: bxy = Σdx.dy

Σdy2 ; byx = Σdx.dy

Σdx2

Using Assumed Mean: bxy = N Σdx.dy − Σdx.Σdy

N Σdy2 − (Σdy)2 ; byx = N Σdx.dy − Σdx.Σdy

N Σdx2 − (Σdx)2

Using Standard Deviation: bxy = r.σx

σy; byx = r.

σy

σx

Coefficient of Correlation: r = √bxy x byx

Index Numbers

Simple Aggregative Method: P01 = Σp1

Σp0 x 100

Weighted Aggregative Method: P01 = Σp1q0

Σp0q0 x 100

Simple Average of Price Relatives Method: P01 = ΣI

n =

Σ(p1p0

x 100)

n

CPI/CLI: Aggregate Expenditure Method: P01 = Σp1q0

Σp0q0 x 100

Family Budget Method: P01 = ΣIW

ΣW where I =

p1

p0 x 100 and W = p0q0

Fisher’s Ideal Index Number: P01 = √Σp1q0

Σp0q0 x

Σp1q1

Σp0q1 x 100

Time Reversal Test: P01x P10 = √Σp1q0

Σp0q0 x

Σp1q1

Σp0q1 x √

Σp0q1

Σp1q1 x

Σp0q0

Σp1q0= 1

Factor Reversal Test: P01x Q01 = √Σp1q0

Σp0q0 x

Σp1q1

Σp0q1 x √

Σp0q1

Σp0q0 x

Σp1q1

Σp1q0=

Σp1q1

Σp0q0

Business Statistics | Assignments Page | 45

Assignment 1: Classification & Tabulation

1. Prepare a blank table showing the number of persons leaving India to four different countries – USA,

Canada, Australia and to the Gulf countries for employment opportunities, according to sex from the four

metros – Mumbai, Kolkata, New Delhi and Chennai.

2. In 2012, the total number of visitors to the Wonder Land, Bangalore, was 25,000. Among them, there

were 8,600 female visitors from India and 6,500 foreign visitors out of which 3,500 were female visitors.

In 2013, the total number of visitors increased by 20% and that of Indian visitors increased by 10%.

Among them, there were 8,000 Indian male visitors and 6,000 foreign female visitors. Tabulate the data.

3. A survey of 370 students from Commerce faculty and 130 students from Science faculty revealed that 180

students were studying for only CA examinations, 140 for only Costing examinations and 80 for both CA

and Costing examinations. The rest opted for Part-time Management courses. Of those studying for

Costing, only 13 were girls and 90 boys belonged to Commerce faculty. Out of 80 studying for both CA and

Costing, 72 were from commerce faculty amongst which 70 were boys. Among those that opted for Part-

time Management courses, 50 boys were from Science faculty, and 30 boys and 10 girls were from

Commerce faculty. In all there were 110 boys in Science faculty. Present the above information in a

tabular form.

4. Prepare a frequency distribution from the following figures relating to bonus paid to workers (₹’000)

67 60 69 70 62 63 69 70 58 56 67 54

55 70 60 60 60 65 70 56 57 58 60 59

61 73 69 67 61 60 59 57

5. The following are the marks of 50 students in Statistics. Construct a suitable frequency table:

28 17 48 57 38 59 28 16 78 46 45 86

21 29 49 61 71 46 49 30 76 37 76 36

37 39 46 27 29 31 21 49 29 8 56 46

5 36 71 42 46 56 16 15 22 35 18 22

46 17

6. 25 values of two variables X and Y are given below. Form a two-way frequency table showing the

relationship between the two:

X 12 24 33 22 44 37 26 36 55 48 27

Y 140 256 360 470 470 380 280 315 420 390 440

X 57 21 51 27 42 43 52 57 44 48 48

Y 390 590 250 550 360 570 290 416 380 392 370

X 42 41 69

Y 312 330 590


Assignment 2: Diagrammatic Representation

1. Represent the following data using a simple bar diagram:

Year 1974 1975 1976 1977 1978 1979 1980 1981

Production (tons) 45 40 44 41 49 42 55 50

2. Present the following data on profit before tax and after tax using multiple bar diagram:

Year 1979 1980 1981 1982 1983

Profit Before Tax (lakh ₹) 190 191 200 109 127

Profit After Tax (lakh ₹) 79 71 90 36 89

3. Represent the cost per scooter using sub-divided bar diagram and percentage sub-divided bar diagram:

Particulars 1979 1980 1981

Raw Material 2,160 2,600 2,700

Labor 540 700 810

Direct Expenses 360 200 360

Factory Expenses 360 300 360

Office Expenses 180 200 270

Total 3,600 4,000 4,500

4. Draw a pie diagram to represent the expenditure (in ₹) of a family:

Food Rent Clothing Education Lighting Miscellaneous Savings

540 180 180 90 40 40 10

5. Present the following data using three variable line graph:

Year 2009 2010 2011 2012 2013

Income (₹ ‘000) 150 180 160 190 170

Expenses (₹ ‘000) 90 100 120 190 200

Profit/loss (₹ ‘000) +60 +80 +40 0 -30


Assignment 3: Measures of Central Tendency

1. Find the mean, median and mode (Using G & A Table) of the following data:

Weight 58 60 61 62 63 64 65 66

No. of Persons 4 12 24 32 32 16 8 2

2. Find the mean using Direct, shortcut and Step-Deviation methods:

Wages 0 – 20 20 – 40 40 - 60 60 - 80 80 – 100

No. of Workers 82 112 150 95 48

3. Find the mean, median and mode of the following data:

x 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 - 80

f 5 8 7 12 28 20 10 10

4. Find the mean, median and mode of the following data:

CI 4 – 7 8 – 11 12 – 15 16 – 19 20 – 23 24 – 27

Frequency 12 23 40 65 17 3

5. Find the mode of the following data:

Age below 5 10 15 20 25 30 35

No. of persons 24 56 84 100 132 142 150

6. 20% of the workers in a firm, employing a total of 4000 workers, earn less than ₹4 per hour, 880 earn

from ₹4 to ₹4.24 per hour, 24% earn from ₹4.25 to 4.49 per hour, 740 earn from ₹4.50 to ₹4.74 per hour,

12% earn from ₹4.75 to ₹4.99 per hour and rest earn ₹5 or more per hour. Calculate the median.

7. Find the median and mode of the following data using Ogive curves and Histogram respectively:

Mid values 115 125 135 145 155 165 175 185 195

Frequency 6 25 48 72 116 60 38 22 3

8. Find the missing frequencies, if total frequency is 120 and median is 36.5:

CI 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60

F 8 15 28 - 22 - 4 2

Answers

(1) 62.24, 62, 62 (2) 46.51 (3) 45, 46.43, 46.67 (4) 15.03, 15.81, 16.87 (5) 8.33 (6) 4.33

(7) 153.79, 154.4 (8) 30, 11


Assignment 4: Measures of Variation

1. Find the Interquartile Range, QD, CDQ, and MD (Using mean and median) from the following data:

Weight 58 59 60 61 62 63 64 65 66

No. of Persons 15 20 32 35 33 22 20 10 8

2. Find the Interquartile Range, QD, CDQ, and MD (Using mean and median) from the following data:

Value 90 – 99 80 – 89 70 – 79 60 – 69 50 – 59 40 – 49 30 – 39

Frequency 2 12 22 20 14 4 1

3. From the prices of shares of X and Y given below, state which share prices are more stable

X 55 54 53 53 56 68 52 50 51 49

Y 108 107 105 105 106 107 104 103 104 101

4. Find the coefficient of variation from the following data:

Wages up to 60 70 80 90 100 110 120 130

No. of workers

8 24 56 95 136 178 192 200

5. The life of two types of tyres in a sample survey is given below. Which one has a higher average? Based on

consistency, which one would you prefer?

Life (in ‘000 km) 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35

Type A 10 18 32 40 22 18

Type B 18 22 40 32 18 10

6. Given below is the distribution of boys and girls of a school. Find which group is more variable.

Age in Years 13 14 15 16 17

No. of Boys 12 15 15 5 3

No. of Girls 13 10 12 2 1

Answers

(1) QD: 3, 1.5, 0.024, 1.74; MD: 1.74, 0.028, 1.713, 0.028 (2) QD: 18.02, 9.01, 0.132; MD: 10.41, 0.153, 10.437,

0.152 (3) CV(X) = 9.37%, CV(Y) = 1.91% (4) 16.781% (5) CV(A) = 33.35% & CV(B) = 37.13%

(6) CV(X) = 7.855%& CV(Y) = 7.341%


Assignment 5: Measures of Skewness

1. Find Karl Pearson’s and Bowley’s Coefficients of Skewness: 25, 37, 48, 35, 22, 29, 37, 30, 41, 25

2. Find the Pearson’s and Bowley’s Coefficients of Skewness:

Age 12 14 15 18 21 24 26 27 31 33

No. of Persons 8 12 24 20 15 24 18 8 6 4

3. Find the Coefficient of Skewness from the following data using Pearson’s and Bowley’s methods:

Size 7 8 9 10 11 12 13 14

Frequency 2 11 36 64 39 30 22 2

4. Find the Skp and SB from the following data:

Marks Below 80 70 60 50 40 30 20 10

No. of Students

150 136 120 80 70 70 50 10

5. From the data given below, find the coefficient of skewness using both the methods:

X 23 – 27 28 – 32 33 – 37 38 – 42 43 – 47 48 – 52 53 – 57 58 – 62 63 – 67 68 – 72

F 2 6 9 14 32 16 12 6 2 1

6. From the data given below, find SKp and SB:

Marks Above 0 10 20 30 40 50 60 70 80 90

No. of students

100 89 73 64 52 49 32 20 12 5

7. Pearson’s coefficient of skewness is –0.7 and the value of the median and standard deviation are 12.8 and

6 respectively. Determine the value of mean.

8. In a distribution, mean = 65, median = 70 and Skp = – 0.6. Find i) SD, ii) Mode, iii) CV

Answers

(1) 0.155, - 0.1818 (2) – 0.2597, – 0.0909 (3) 0.3665, 1 (4) – 0.7539, –0.3636 (5) 0.0572, 0.0615

(6) – 0.2276, – 0.1858 (7) 11.4 (8) 25, 80, 38.46


Assignment 6: Coefficient of Correlation

1. Calculate Karl Pearson’s Coefficient of Correlation and the probable error for the following data regarding

price and demand of a commodity:

Price 10 28 49 50 70 75 98 100 110 120

Demand 112 110 75 60 55 50 40 30 20 10

2. Find the coefficient of correlation and PE of the following data:

Age 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65

Wages (‘000) 9 10 12 11 16 16 18 17 15

3. From the following data find the coefficient of correlation between average profits and average

advertisement expenditure per shop and interpret the result.

No. of Shops 30 45 14 26 12 16 22 35

Total Profits 60,000 135,000 42,000 52,000 36,000 64,000 66,000 105,000

Advertisement Expenses

3,000 45,000 7000 13,000 6,000 4,800 8,800 14,000

4. With the following data in 6 cities, calculate the Coefficient of Correlation between the density of

population and death rates.

Cities A B C D E F

Density of Population

200 500 400 700 600 300

Population (‘000) 30 90 40 42 72 24

No. of deaths 300 1440 560 840 1224 312

5. Calculate the Rank Correlation and the Probable Error:

Analyst A 15 18 12 22 15 21 15 27 16 24

Analyst B 16 19 17 21 19 26 12 16 18 20

6. Using Rank Correlation find out which pair of judges have a nearly common taste in fashion design.

Judge A 1 3 2 5 8 7 9 4 10 6

Judge B 3 5 4 6 7 9 8 1 2 10

Judge c 5 6 2 3 8 7 10 4 1 9

Answers

(1) – 0.975 (2) 0.855 (3) 0.141 (4) + 0.988 (5) 0.4182 (6) 0.3455, 0.7697, 0.2727


Assignment 7: Regression

1. The following data relate to the ages of husbands and wives:

Husband’s age 25 28 30 32 35 36 38 39 42 55

Wife’s age 20 26 29 30 25 18 26 35 35 46

Obtain the two regression equations and determine the most likely age of husband when the wife’s age is

25 years.

2. From the following data:

a. Find the two regression equations

b. Estimate the value of X when Y = 20 and the value of Y when X = 30

c. Determine the coefficient of correlation

X 20 24 26 34 36

Y 10 12 14 18 26

3. Find the regression lines for the following data and estimate the value of X when Y = 38.

X 25 28 35 32 36 37 29 39

Y 43 46 49 41 36 32 31 32

4. The heights (in cm) and weights (in kg) of a random sample of 9 adult males are shown below:

Height 177 163 173 182 171 168 174 176 184

Weight 71 67 77 85 69 62 73 78 80

Estimate the height when the weight is 75 and the weight when the height is 180.

5. A study of wheat prices per kg at Mysore and Bengaluru yields the following data:

Mysore Bengaluru

Average Price ₹ 24.63 ₹ 27.97

Standard Deviation ₹ 3.26 ₹ 2.07

Correlation Coefficient: 0.774

Estimate:

a. The price of wheat at Mysore when the price is ₹ 23.54 at Bengaluru.

b. The Price of wheat at Bengaluru when the price is ₹ 30.5 at Mysore.

Answers

(1) 32.6956 years (2) 32; 17.739 (3) 32.839 (4) 175.315 cm; 78.75 kg (5) ₹19.23; ₹30.855


Assignment 8: Index Numbers

1. Calculate the index using Simple Aggregate and Weighted Aggregate Methods:

Commodity Price (₹)

1999 Price (₹)

2000 Quantity

1999

Rice 30 40 10

Wheat 20 30 5

Pulses 40 50 6

Oil 35 40 5

Milk 40 50 10

2. Calculate the price index numbers for the following data for 2007 and 2008 using simple average of price

relative method:

Rice Wheat Pulses Oil Milk

Prices – 2001 35 30 25 15 40

Prices – 2002 40 40 35 25 50

3. Calculate the Consumer Price Index or Cost of Living Index Number using Aggregative Index Number and

Family Budget Method:

A B C D E

Quantity – 2004 50 100 60 30 40

Prices – 2004 6 2 4 10 8

Prices – 2009 10 2 6 12 12

4. The group indices and the corresponding weights for the working class cost of living index numbers in an

industrial city for 2009 and 2010 are as follows:

Group Weight Group Index

2009 2010

Food 71 370 380

Clothing 3 423 504

Fuel 9 469 336

House Rent 7 110 116

Miscellaneous 10 279 283

Compute the cost of living index number for 2009 and 2010. If a worker was getting ₹3000 per month in

2009, should he be given any extra allowance in 2010 so that he can maintain his 2009 standard of living?

Justify your answer.

5. The following table gives the cost of living index numbers for different groups with their respective

weights for the year 1992 (base year 1982). Calculate the overall cost of living index numbers.

If Mr. Bose got ₹550 in 1982, determine how much he should receive in 1992 to maintain the same

standard of living as in 1982.


Food Clothing Fuel &

Lighting Housing Miscellaneous

Cost of Living Index 525 325 240 180 200

Weight 40 16 15 20 9

6. The relative importance of the following 8 groups of family expenditure is tabulated below. If the corresponding

increase in prices (in %) for February, 1992 compared to January 1992, are 25, 1, 22, 18, 14, 13, 20 and 11, calculate

the CPI:

Food Rent Clothing Fuel Household Miscellaneous Services Drinks

348 88 97 65 71 35 70 217

7. Compute Fisher’s Ideal Index Number and show that it satisfies the TRT & FRT:

Commodity 2004 2008

Price (₹) Consumption (kg) Price (₹) Consumption (kg)

A 8 6 12 4

B 10 8 12 8

C 14 4 18 4

D 4 6 2 10

E 10 10 14 8

8. Compute Fisher’s Ideal Index Number and prove that it satisfies the TRT & FRT:

Year A B C D

Price 2000 2 4 1 5

2010 5 8 2 10

Value

2000 40 16 10 25

2010 75 40 24 60

9. A worker earns ₹750 per month. The cost of living index for January, 2009 is known to be 160. Using the data given

below, find the amounts spent by him on food house rent.

Food Clothing House Rent Fuel & Light Miscellaneous

Expenditure ? 125 ? 100 75

Group Index 190 181 140 118 101

Answers

(1) 127.27 & 127.57 (2) 135.86 (3) 139.71 (4) 353.2 & 351.58; No (5) 352; 1936.00 (6) 117.49

(7) 124.01 (8) 218.046 (9) 300, 150

Business Statistics | University Question Papers Page | 57

statistics€¦ · pie charts (circular diagram): this is a pictorial representation of statistical...

Documents