statistics€¦ · pie charts (circular diagram): this is a pictorial representation of statistical...
TRANSCRIPT
STATISTICS SIMPLIFIED
A Quantitative Methods practice book for BBA II Semester Students of Bangalore University
Contains Concepts, Formulae, Exercises and Assignments
Compiled by
Lenin Arumanayagam,
Freelance Faculty
Table of Contents
Sl. No. Title Page No.
1 Introduction to Business Statistics 1
2 Classification and Tabulation 3
3 Diagrammatic Representation 5
4 Measures of Central Tendency 7
5 Measures of Variation 17
6 Measures of Skewness 25
7 Correlation & Regression 27
8 Index Numbers 35
9 Formulae 39
10 Assignments 43
11 University Question Papers 55
Business Statistics | Concepts and Exercises Page | 1
Chapter 1: Introduction to Business Statistics
Murray R Spiegel : Statistics is concerned with scientific method for collecting, organizing, summarizing,
presenting and analyzing data as well as drawing valid conclusions and making reasonable decisions on the
basis of such analysis
Characteristics of Statistics
1. Statistics are numerical facts
2. Statistics are aggregate of facts
3. Statistics are affected to a great extent by multiplicity of factors
4. Statistics are either enumerated or estimated with reasonable standard of accuracy
5. Statistics are collected in a systematic manner and for a predetermined purpose
6. Statistics should be capable of being placed in relation to each other
Functions of Statistics
1. Presents facts in simple forms
2. Reduces the complexity of data
3. Facilitates comparison
4. Testing hypothesis
5. Formulation of policies
6. Forecasting and estimating
7. Derives valid inferences
Limitations of Statistics
1. Statistics does not study the qualitative phenomenon
2. Statistics does not study the individual changes
3. Statistics results are true only in general and on an average
4. Statistics can be misused by ignorant and wrongly motivated persons
5. Statistics does not reveal the entire story
6. Statistics is liable to be misused
Scope of Statistics
Statistics and Planning; Statistics and Business; Statistics and Economics; Statistics and Administration;
Statistics and Business Management; Statistics and Research; Statistics and Mathematics; Statistics and
Science. Scope of Statistics in Business: Marketing; Production; Finance; Banking; Investment; Purchase;
Accounting; Control
Data Collection
Data: Facts and figures collected for a specific purpose, processed and used to help decision-making.
Census: The method of collection of data in which every unit of the population is included. This method is
accurate and reliable but expensive, time consuming and involves much labor.
Sample: A sample is a group of units selected from a larger group (the population) for specific investigation.
Primary Data: Data originally collected for the first time directly from the source using surveys are called
primary data. It may be obtained through direct observation, interviews, questionnaires, etc.
Secondary Data: Data already collected by someone other than the user are called secondary data. They may
be obtained from newspapers, agencies, journals, records, reports, etc.
Business Statistics | Concepts and Exercises Page | 3
Chapter 2: Classification & Tabulation of Data
Data is a collection of any number of related observations on one or more variables. Raw data is information
that has not been processed to be made presentable or analyzed by statistical methods.
Classification of data is a process of arranging data into sequences and groups or classes according to their
attributes and or characteristics. It refers to the sorting out of a heterogeneous mass data into a number of
homogeneous groups and sub-groups.
Tabulation is defined as the orderly or systematic presentation of numerical data in rows and columns,
designed to facilitate comparison between the figures.
Parts of a Table: Table number, Title, Titles of rows, columns, sub-rows and sub-columns, Totals, Footnotes
and Source.
Objectives and Functions of Classification of Data
1. To convert the raw data into organized data
2. To present the complex data into a simple form
3. To facilitate comparison
4. To bring out the uniformity among facts
5. To present data in a condensed form
Types of Classification
Qualitative Classification: a classification in which data are classified according to attributes or qualities.
Generally the qualitative phenomena are not measurable. E.g. Classification based on marital status, gender
etc.
Quantitative Classification: A classification in which data are classified according to quantities that are
measurable such as age, weights, marks, wages, etc.
Other Important Definitions:
Individual Observations/series: Data that are listed as they are observed, collected and recorded. They are in
a raw form and unorganized.
Discrete Classes: Data that do not progress from one class to the next without a break is called discrete class.
In other words, they are classes that represent distinct categories or counts.
Continuous Data: Data that may progress from one class to the next without a break and may be expressed in
whole numbers or decimals.
Frequency: Frequency is the number of times each value of the variable occurs in the series. It is the rate of
occurrence of a particular value thing, or event.
Frequency Distribution: It is the summary of frequency of variables according to their magnitude individually
or in groups.
Cumulative Frequency: It is the total of all the frequencies up to and including the respective class interval
when the class intervals are in the ascending or descending order of values.
Population: A collection of all the elements we are studying and about which we are trying to draw
conclusions.
Sample: A collection of some, but not all, of the elements of the population under study, used to describe the
population.
Classification and Tabulation
Business Statistics | Concepts and Exercises Page | 4
Two-Way Table: A table which is used to categorize the data based on two or more attributes.
Exercise 2.1
1. Draw a blank table to present the following information regarding the students of a college according to:
a. Faculty: Arts, Science and Commerce
b. Sex: Boys and Girls
c. Years: 1993 and 1994
d. Age Group: Below 20 years and above 20 years.
2. The total number of accidents in Southern Railway in 1960 was 3,500 and it decreased by 300 in 1961 and
by 700 in 1962. The total number of accidents in meter gauge section showed a progressive increase from
1960 to 1962. It was 245 in 1960, 346 in 1961 and 428 in 1962. In the meter gauge section, the number of
non-compensated cases were 49 in 1960, 77 in 1961, and 108 in 1962. The number of compensated cases
in the broad gauge section were 2,867, 2,687 and 2,152 in those years in order. Tabulate the data.
3. Present the following information in a suitable form supplying the figures not directly given:
In 1975, out of a total of 4,000 workers in a factory, 3,300 were members of a trade union. The number of
women workers employed was 500 out of which 400 did not belong to any union. In 1974, the number of
workers in the union was 3,450 of which 3,200 were men. The number of non-union workers was 760 of
which 330 were women.
4. Following data gives the number of children in 50 families. Construct a suitable frequency table.
4 2 0 2 3 2 2 1 0 2 3 5
1 1 4 2 1 3 4 2 6 1 2 2
2 1 3 4 1 0 2 4 3 0 1 3
6 1 0 1 1 3 4 1 0 1 2 2
2 5 (Answer: 6, 13, 14, 7, 6, 2, 2)
5. Following are the weights of 50 college students in kg. Construct a frequency table.
42 42 46 54 41 37 54 44 38 45 47 50
58 49 51 42 46 37 42 39 54 39 51 58
47 51 43 48 49 48 49 41 41 40 58 49
49 59 57 52 56 38 45 52 46 40 51 41
51 41 (Answer: 6, 13, 14, 11, 6)
6. Following are figures of income (x) and percentage expenditure on food (y) in 25 families. Construct a
bivariate (two-way) frequency table.
X 550 623 310 420 600 225 310 640 512 690
Y 12 14 18 16 15 25 26 20 18 12
X 680 300 425 555 325 202 255 492 587 643
Y 13 25 16 51 23 29 27 18 21 19
X 689 523 317 384 400
Y 11 12 18 17 19
Business Statistics | Concepts and Exercises Page | 5
Chapter 3: Diagrammatic Representation
Diagrams are visual aids of presenting the data in pictures, geometric figures and curves. They present a bird’s
eye view of huge mass of quantitative data in a condensed form attractively.
Uses of Diagrams and Graphs
1. They present a bird’s eye view of huge mass of information
2. They leave a huge impression on the minds of the readers as they are attractive
3. Easy to understand and consumes less time to understand the information
4. Entire data is visible at a glance
Limitations of Diagrams and Graphs
1. They are useful to layman but to experts, their utility is limited
2. They fail to furnish details
3. They present data only in a particular range.
4. They are not subject to further mathematical analysis
Types of Diagrams and Graphs:
Note: For examples and sample diagrams, please refer your textbook.
Line Diagrams: These diagrams are used when there is a large number of values of variable with variations in
their values within a small range
Simple Bar Diagram: These diagrams are suitable for individual observations and time series. The bars have
the uniform width.
Multiple Bar Diagram: These diagrams are used when two or more phenomena and a number of attributes
are compared with each other. Different shades may be used to identify the various attributes or periods.
Sub-divided (Component) Bar Diagrams: These diagrams are used when two or more components are present
in a single phenomenon
Sub-divided Percentage Bar Diagrams: These are sub-divided diagrams which are used to depict the values of
variable in percentage. All the bars are equal in height representing the value as 100%.
Pie Charts (Circular Diagram): This is a pictorial representation of statistical data with several components in a
circular form. Pie charts consist of a circle sub-divided into several sectors by radius.
Pictograms: It is a representation in which pictures are used to represent the data. Each full diagram
represents a certain quantity.
Histograms: Histogram is a device of graphical representation of a frequency distribution. It is constructed by
erecting a set of rectangles on each interval on the horizontal axis. The height of the rectangle represents the
frequency of the class interval.
Frequency Polygon: A line graph connecting the midpoints of each class in a data set, plotted at the height
corresponding to the frequency of the class. It can also be drawn by joining the midpoints of the top of the
vertical bars of a histogram.
Frequency Curve: A frequency polygon with smoothed curve to eliminate the accidental irregularities in the
data.
Diagrammatic Representation
Business Statistics | Concepts and Exercises Page | 6
Ogive Curve: This is a graphical representation of cumulative frequency distribution of a continuous series.
There are two types of Ogive Curves: 1. More than Ogive and 2. Less-than Ogive
Exercise 3.1
1. Draw a simple bar diagram from the following data relating to the number of small scale industrial units in
various states in the year 2008
States Karnataka TN Kerala Andhra Maharashtra MP UP
No. of SS Units 10 12 15 15 18 25 22
2. Present the following data of results of BBM students in statistics examination of Bangalore University
held in June 2006, 2007 and 2008 using multiple bar diagram:
Year I Class II Class III Class Failed Total
June 2006 100 300 500 300 1200
June 2007 120 400 600 280 1400
June 2008 100 500 700 300 1600
3. Represent the following data using sub-divided bar diagram and percentage sub-divided bar diagram:
Cost Per Equipment 2006 (₹) 2007 (₹) 2008 (₹)
Raw Material 2,160 2,600 2,700
Labor 540 700 810
Direct Expenses 600 300 350
Factory Expenses 360 200 360
Office Expenses 180 200 270
Total 3,840 4,000 4,490
4. Represent the following figures using line graph:
Year 2003 2004 2005 2006 2007 2008
Exports (lakh ₹) 25 110 80 130 90 150
Imports (lakh ₹) 5 70 110 90 140 130
Balance of Payments +20 +40 -30 +40 -50 +20
5. Draw a pie diagram to represent the following data of investment pattern in the state budget (in ₹ crore):
Agriculture Industry Education Transportation Social Services
600 400 300 450 250
Business Statistics | Concepts and Exercises Page | 7
Chapter 4: Measures of Central Tendency
A statistical average or measure of central tendency is a single number around which the greatest proportion
of the data concentrates.
Characteristics of a Good Measure of Central Tendency
1. It should be well defined.
2. It should be easy to understand and calculate.
3. It should be based on all the observations.
4. It should be capable of further treatment.
5. It should be affected as little as possible by fluctuations of sampling.
6. It should not be affected by extreme values.
Commonly Used Measures of Central Tendency
1. Arithmetic Mean or Simple Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean
Arithmetic Mean
A mathematical representation of the typical value of a series of numbers, computed as the sum of all the
numbers divided by the count of all numbers in the series.
Merits of Arithmetic Mean
1. It is simple to understand and easy to compute.
2. All items are used in calculation.
3. Mean is well defined.
4. It is capable of further algebraic treatment.
5. It is not affected by sampling fluctuations.
6. It is the center of gravity.
7. It is a calculated value and not based on position in the series.
Limitations / Demerits of Arithmetic Mean
1. The value of mean is affected by extreme items.
2. In case of open ended classes, the value of mean cannot be calculated without making assumptions
regarding the size of the interval.
3. It may not be a good measure in some cases, for instance, asymmetrical distributions.
Formulae
Individual Series: X̅ = Σx
n; X̅ = A +
Σd
n
Discrete Series: X̅ = Σfx
N; X̅ = A +
Σfd
N
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 8
Continuous Series: X̅ = Σfm
N ; X̅ = A +
Σfd
Σf; X̅ = A +
Σfd′
Nx i
Weighted Arithmetic Mean: X̅ = Σxw
Σw
Combined Arithmetic Mean: X̅(1,2) = n1x̅1+ n2x̅2
n1+ n2
Exercise 4.1
1. Find the AM of 5, 8, 10, 15, 24 and 28 (Answer: 15)
2. The wages of 9 workers are: 150, 80, 120, 60, 75, 125, 95, 115, 130. Find the mean wages. (Answer: 105.5)
3. In the city, 30 members were surveyed as to how many domestic appliances they had purchased and the
replies were as under. Prepare a frequency table and find the mean. (Answer: 2.83)
1, 2, 5, 1, 2, 1, 4, 2, 3, 4, 2, 4, 3, 2, 6, 3, 2, 4, 3, 6, 2, 2, 3, 3, 7, 2, 3, 0, 2, 1
4. Find the mean runs scored by a batsman during his career using direct method and shortcut method
(Answer: 46):
x 10 20 30 40 50 60 70 80 90
f 7 18 15 25 30 20 16 7 2
5. Compute the mean of the following data using direct method and shortcut method (Answer: 13.54):
x 9 10 11 12 13 14 15 16 17 18
f 1 2 3 6 10 11 7 3 2 1
6. Calculate the mean from the following data using direct, shortcut and step-deviation methods (Answer:
36.36):
CI 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
f 5 10 25 30 20 10 5 5
7. Calculate the mean wages from the following data (Answer: 73.44):
Wages 48 – 56 56 – 64 64 – 72 72 – 80 80 – 88 88 – 96 96 – 104
No. of Workers 8 3 11 14 5 7 2
8. Calculate the mean from the following data using direct, shortcut and step-deviation methods (Answer:
49.3):
CI 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89
f 5 9 14 20 25 15 8 4
9. Calculate the mean marks from the following data (Answer : 43.7):
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 9
Marks Below 10 20 30 40 50 60 70 80 90 100
No. of students 5 12 25 45 70 80 88 92 96 100
10. Calculate the mean sales from the following data (Answer: 28.73):
Sales less than 10 20 30 40 50 60
Frequency 4 20 35 55 62 67
11. A college wanted to give monthly scholarship to B.Com students securing 50% and above marks in the
following manner:
Percentage of Marks 50 – 55 55 – 60 60 – 65 65 – 70 70 – 75
Scholarship (₹) 25 30 35 40 45
The percentage of marks of 20 students who were eligible for scholarship are given below:
52, 62, 51, 71, 54, 53, 51, 50, 57, 64, 56, 54, 69, 63, 65, 59, 58, 68, 57, 62
Calculate the average monthly scholarship payable to the students. (Answer: 31.5)
12. A limited company wants to pay bonus to the members of its staff as under:
Salary (₹ ‘000) 100 - 120 120 – 140 140 – 160 160 – 180 180 – 200 200 - 220 Above 220
Bonus (₹ ‘000) 50 50 70 80 90 100 110
Actual salaries of the members of the staff are as follows, in rupees: 200, 180, 185, 195, 218, 187, 160,
250, 198, 190, 168, 170, 178, 175, 140, 120, 148, 165, 155, 145, 125, 110, 162, 130, 150
What is the total bonus paid? What is the average bonus paid per staff? (Answer: 78.4)
13. From the following data of calculation of AM, find the missing value. Mean value is 126.3 (Answer: 120):
x 60 80 100 - 160 180 200
f 5 8 12 22 10 7 6
14. The AM of the following frequency distribution is 67.45 inches. Find the missing frequency. (Answer: 126):
Height (Inches) 60 – 62 63 – 65 66 – 68 69 – 71 72 - 74
F 15 54 ? 81 24
15. The mean of the following data is 67.45. Find the missing frequencies (Answer: 42, 27).
CI 60 – 62 63 – 65 66 – 68 69 – 71 72 - 74 Total
F 5 18 - - 8 100
16. The mean of the following data is 25. Find the missing frequencies (Answer: 10, 10).
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 10
x 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 Total
f 5 - 15 - 5 45
17. Find the weighted arithmetic average price of coal purchased by an industry (Answer: 50.36):
Month January February March April May June
Price per ton (₹) 42.50 51.25 50.00 52.00 44.25 54.00
No. of tons 25 30 40 50 10 45
18. The mean weight of 25 male workers in a factory is 63 kg, and the mean weight of 35 female workers in
the same factory is 55 kg. Find the combined average weight of the 60 workers in the factory. (Answer:
58.33)
19. The arithmetic mean of a group of 80 boys is 10 years, and that of second group of 20 boys is 15 years.
Find the arithmetic mean of the two groups taken together. (Answer: 11)
Median
Median is the middle value of the distribution, and therefore it is called the positional average. So, the place of
median in a series is such that, an equal number of items lie on either side of it.
Merits of Median
1. Median is especially useful in case of open ended classes, since it is not necessary that the value of all
items be known.
2. Median is not influenced by extreme values.
3. In a markedly skewed distribution, median is especially useful.
4. The value of median can be determined graphically, whereas the value of mean cannot be graphically
ascertained.
Limitations / Demerits of Median
1. For calculating median it is necessary to arrange the data, whereas, other averages do not need any
arrangement.
2. It is not determined by each and every observation.
3. Median is not capable of further algebraic treatment.
4. It is affected by sampling fluctuations.
5. The median in some cases cannot be computed exactly, as in the case of mean.
Formulae
Individual Series: M = [(n+1)
2]
th
term when n is odd and M = [(
n
2)
thterm + (
n
2+1)
thterm
2] when n is even.
Discrete Series: M = [(n+1)
2]
th
term
Continuous Series: M = L + N
2 − c.f.
f x i
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 11
Exercise 4.2
1. Find the median: 43, 62, 15, 80, 56, 72, 34, 8, 25 (Answer: 43)
2. The wages of 9 workers are: 150, 80, 120, 60, 75, 125, 95, 115, 130. Find the median. (Answer: 115)
3. Find the median: 36, 5, 19, 26, 6, 28, 56, 18, 63, 4 (Answer: 22.5)
4. Find the median: 105, 89, 93, 142, 112, 136, 82, 97, 128, 135, 110, 104 (Answer: 107.5)
5. In a class of 15 students, 5 failed in a test. The marks of those who passed were, 9, 6, 7, 8, 9, 6, 5, 4, 7 and
8. Calculate the median marks of the 15 students.
6. Find the median: (Answer: 40)
x 10 20 30 40 50 60 70 80 90 100
f 10 16 18 13 6 3 8 4 6 8
7. Find the median:
Wages 5 10 15 20 25 30
Frequency 7 12 37 25 22 11
8. Find the median (Answer: 37.7):
Age < 20 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 - 50 > 50
No. of Workers 13 29 46 60 112 94 45 21
9. Calculate the median (Answer: 50.3):
CI 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89
f 5 9 14 20 25 15 8 4
10. Calculate the median (Answer: 42.6):
CI 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 – 70 71 - 80 81 - 90
f 3 7 13 19 14 11 9 9 5
11. Calculate the median marks (Answer: 42):
Marks Below 10 20 30 40 50 60 70 80 90 100
No. of students 5 12 25 45 70 80 88 92 96 100
12. Calculate the median (Answer: 36.25):
Value less than 10 20 30 40 50 60 70 80
No. of students 4 16 40 76 96 112 120 125
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 12
13. Calculate the median marks (Answer: 30):
Values above 10 20 30 40 50 60
No. of students 50 40 25 16 10 2
14. Calculate the median (Answer: 44.2):
Marks more than 10 20 30 40 50 60 70 80
Frequency 115 103 88 68 43 23 13 3
15. In a group of 1000 wage earners, the monthly wages of 4% are below ₹60 and those of 15% are under
₹62.50. 15% earned ₹95 and over, and 5% got ₹100 and over. Find the median wage (Answer: 78.75).
16. 10% of the workers in a factory employing a total of 1000 workers, earn between ₹5 and 9.99, 30%
between ₹10 and ₹14.99, 250 workers between ₹15 and 19.99 and the rest ₹20 and above. What is the
median wage? (Answer: 17)
17. Compute the median after amending the table (Answer: 14):
X f x f
Less than 5 7 20 – 25 20
Less than 10 20 25 and above 5
5 – 15 38 30 and above 1
15 and above 35
18. Calculate the median (Answer: 153.79):
Mid values 115 125 135 145 155 165 175 185 195
Frequencies 6 25 48 72 116 60 38 22 3
19. Calculate the median (Answer: 15.81):
Mid values 5.5 9.5 13.5 17.5 21.5 25.5
Frequencies 12 23 40 65 17 3
20. Calculate the median using Ogive curve (Answer: 46.6):
Wages 0 – 20 20 – 40 40 – 60 60 – 80 80 – 100
No. of workers 82 112 150 95 48
21. Locate the median using Ogive curve (Answer: 44):
Marks Less Than 20 30 40 50 60 70
No. of Students 5 13 24 39 52 60
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 13
22. Marks of 100 students are given below. If median is 33, find the missing frequencies. (Answer: 17, 16)
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
No. of Students 12 15 - 20 - 10 10
23. Find the missing frequencies if the value of median is 36.5 and N = 120. (Answer: 30, 11)
Class Interval 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60
Frequencies 8 15 28 - 22 - 4 2
Mode According to A M Tuttle, “Mode is the value which has the greatest frequency in the neighborhood.” Just as
median, mode too is a positional average. So, the most frequent or the item which is repeated maximum
times in the series is the mode of the series.
Merits of Mode
1. Mode is not affected by extremely large or small items.
2. Mode can be determined in open-ended classes without assuming the class limits.
3. The value of mode can be determined graphically, whereas, the value of mean cannot be ascertained.
Limitations / Demerits of Mode
1. The value of mode cannot always be determined. For instance, bi-modal and multi-modal series.
2. Mode is not capable of further algebraic treatment.
3. The value of mode is not based on each item.
4. It is not a rigidly defined measure. So it is the most unstable average.
Formulae
Individual Series: The variable that occurs most frequently.
Discrete Series: The value which has the greatest frequency in the neighborhood.
Continuous Series: Z or M0 = L + ∆1
∆1 + ∆2 x i; ∆1 = |f1 – f0| and ∆2 = |f1 – f2|
Bi-modal Class: Z or M0 = 3 median − 2 mean
Exercise 4.3
1. Find the mode: 3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3
2. Find the mode: 54, 66, 42, 64, 44, 86, 104, 94, 100, 80, 72, 64, 64, 44, 64, 72, 54, 54, 48, 52, 50
3. Find the mode: 122, 234, 638, 420, 512, 234, 270, 420, 900, 195, 360
4. Find the mode (Answer: 4):
x 1 2 3 4 5 6
f 2 8 11 18 9 7
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 14
5. Compute the mode (Answer: 32):
x 8 16 24 32 40 48
f 2 4 20 19 10 5
6. Calculate the mode (Answer: 8):
x 2 4 6 8 10 12 14
f 6 8 16 16 12 6 4
7. Find the mode (Answer: 74):
Wages 48 – 56 56 – 64 64 – 72 72 – 80 80 – 88 88 – 96 96 – 104
No. of Workers 8 3 11 14 5 7 2
8. Compute the mode (Answer: 52.833):
CI 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89
f 5 9 14 20 25 15 8 4
9. Find the mode (Answer: 11.35):
Attendance below 5 10 15 20 25 30 35 40 45
No. of students 29 224 465 582 634 644 650 653 655
10. Twenty percent of the workers in a firm employing a total of 2000 workers earn less than ₹2.00 per hour,
440 earn from ₹2.00 to ₹2.24 per hour, 24% earn from ₹2.25 to ₹2.49 per hour, 370 earn from ₹2.50 to
₹2.74 per hour, 12% earn from ₹2.75 to ₹2.99 per hour and the rest ₹3.00 or more per hour. Set up a
frequency table and calculate the modal wage. (Answer: 2.3117)
11. Compute the mode (Answer: 40):
CI 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
f 4 12 24 32 32 16 8 2
12. Compute the mode (Answer: 89.5):
CI 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109 110 – 119
f 7 9 10 6 13 10 13 10
13. Find the mode (Answer: 20):
Weight (in Kg) 5 10 15 20 25 30 35 40
No. of persons 8 19 27 45 24 45 22 10
Measures of Central Tendency
Business Statistics | Concepts and Exercises Page | 15
14. Find the mode (Answer: 59.62):
Weight (in Kg) 45 48 52 56 60 64 68 72 76 80
No. of persons 110 116 116 100 96 96 96 84 72 62
15. Locate the mode using Histogram, Frequency polygon and smoothed frequency curve (Answer:50.71):
Class Interval 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequencies 5 30 90 180 250 260 130
16. Locate the mode using Histogram, Frequency polygon and smoothed frequency curve (Answer: 24.44):
Class Interval 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
Frequencies 14 23 35 20 8 5
17. Find the mean, median and mode (Answer: Mean = 151.29, Median = 149.6, Mode = 146.211):
Mid values 115 125 135 145 155 165 175 185 195
Frequencies 120 116 116 100 96 96 96 84 72
18. Find the mean, median and mode:
90, 78, 86, 51, 96, 104, 51, 78, 50, 72, 49, 77, 90, 74, 69, 70, 68, 69, 104, 80, 79, 54, 79, 73, 58, 91, 78, 67,
50, 84, 76, 110, 53, 74, 40, 60, 42, 82, 41, 76, 84, 76, 42, 65, 60, 77, 61, 75, 115, 81
19. a. Z = 50, and M = 45. X̅ = ?
b. X̅ = 12, Z = 13, M = ?
c. If Mean = 20.2, Median = 22.1, find the mode.
Business Statistics | Concepts and Exercises Page | 17
Chapter 5: Measures of Variation
Kafka defines measures of variation as, “the measurement of the scatteredness of the mass of figures in a
series about an average.”
Objectives of Measuring Variation
1. To measure exactly the reliability of an average
2. To serve as the basis for the control of variability
3. To compare two or more series with regard to their variability
4. To facilitate the use of other statistical measures
Properties of a Good Measure of Variation
1. It should be simple to understand.
2. It should be easy to compute.
3. It should be well defined.
4. It should be based on each item of the distribution.
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7. It should not be affected by extreme values.
Relative and Absolute Measures of Variation
1. Absolute measures of dispersion are expressed in the same statistical unit in which the original data are
given, such as Rupees, kg, tons etc. These variables may be used to compare the variation in two
distributions if the variables are expressed in the same units, and are of the same average size.
2. If the two sets of data are expressed in different units, such as quintals of sugar versus tons of
sugarcane, or if the average size is very different, such as the manager’s salary versus worker’s wages,
relative measures should be used. Relative measures of dispersion are also called a coefficient of
dispersion.
Some important measures of dispersion are discussed below.
Range
Definition
Range is defined as “The difference between the two extreme items of the distribution” or the difference
between the largest and smallest items of the distribution.
Merits of Range
1. Range is simple to understand
2. It is easy to calculate
3. It gives a quick rather than an accurate picture of variability.
Limitations of Range
1. It is not based on each observation
Measures of Variation
Business Statistics | Concepts and Exercises Page | 18
2. It is affected by extreme values in the series
3. It cannot be calculated for open-ended classes
4. It is highly affected by fluctuations of sampling
Uses of Range
1. It is useful in studying the variations in the prices of shares and stock, gold, jewelry etc.
2. In weather forecasts, range is used to determine the difference between the maximum and minimum
temperature.
3. It is used in industries for statistical quality control.
Interquartile Range & Quartile Deviation
Meaning
Inter-quartile range includes the middle 50% of the distribution. In other words, it represents the difference
between the third quartile and the first quartile.
Merits of Quartile Deviation
1. It is based on 50% of the observations
2. QD can be calculated for open ended classes also, because Q1 and Q3 are positional averages.
3. It is not affected by extreme values.
Limitations of Quartile Deviation
1. It ignores 50% items.
2. It is not a measure of dispersion as it does not show the scatter around an average.
3. It is not capable of further algebraic treatment.
4. It is affected if the central items are irregular.
5. It is highly affected by sampling fluctuations
6. It is not affected by distribution of items outside the two quartiles.
Formulae
Range: L – S (Where L = Largest variable and S = Smallest variable)
Coefficient of Range: L−S
L+S
Interquartile Range: IQR = Q3 – Q1
Quartile Deviation: QD = Q3− Q1
2
Quartile Deviation: CQD = Q3− Q1
Q3+ Q1
Exercise 5.1
1. Compute the range and coefficient of range of the following series and state which is more dispersed.
a. 13, 14, 15, 16, 17 b. 9, 12, 15, 18, 21 c. 1, 8, 15, 22, 29
Individual & Discrete Series Continuous Series
Q1 [(n + 1)
4]
th
term L +
N4
− c. f.
f x i
Q3 [3(n + 1)
4]
th
term L +
3N4
− c. f.
f x i
Measures of Variation
Business Statistics | Concepts and Exercises Page | 19
2. Find the range and coefficient of range of the following distribution (36, 0.75):
x 6 12 18 24 30 36 42
f 7 18 15 25 30 20 16
3. Compute range and coefficient of range of the following series (Answer: 80, 1):
CI 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
F 5 10 25 30 20 10 5 5
4. From the following data, calculate the Quartile Deviation and its Coefficient (Answer: 19.75, 0.339)
30, 43, 48, 89, 54, 25, 84, 61, 67, 37, 72, 80
5. Calculate the Quartile Deviation and its Coefficient from the following data (Answer: 1.5, 0.0244):
X 58 59 60 61 62 63 64 65 66
F 15 20 32 35 33 22 20 10 8
6. Compute the Quartile Deviation and its Coefficient from the following data (Answer: 5, 0.25):
Wages 5 10 15 20 25 30
Frequency 7 12 37 25 22 11
7. Calculate the Quartile Deviation and its Coefficient from the following data (Answer: 5.208, 0.2643):
Wages (₹) 4 – 8 8 – 12 12 – 16 16 – 20 20 – 24 24 – 28 28 – 32 32 –36 36 - 40
No. of workers
6 10 18 30 15 12 10 6 2
8. Calculate the Quartile Deviation and its Coefficient from the following data (Answer: 2.273, 0.2039):
CI 5 – 7 8 – 10 11 – 13 14 – 16 17 – 19
f 14 24 38 20 4
Mean Deviation
Meaning
It is the average difference between the items in a distribution and the mean of that series.
Merits of Mean Deviation
1. It is simple to understand and easy to compute.
2. It is based on each item of the data.
3. It is less affected by the values of extreme items than the Standard Deviation.
Measures of Variation
Business Statistics | Concepts and Exercises Page | 20
4. Since deviations are taken from a central value, comparison about the formation of different
distributions can easily be made.
Limitations of Mean Deviation
1. Algebraic signs are ignored.
2. It is not capable of further algebraic treatment.
3. It is rarely used in social science studies.
4. It does not give us accurate results.
Formulae
Individual Series Discrete Series Continuous Series
Mean Deviation Ʃ |D|
n
Ʃ f |D|
N
Ʃ f |D|
N
|D| |x − x̅| 𝑜𝑟 |x − M| |x − x̅| 𝑜𝑟 |x − M| |m − x̅| 𝑜𝑟 |m − M|
Coefficient of MD MD
x̅ 𝑜𝑟
MD
M
MD
x̅ 𝑜𝑟
MD
M
MD
x̅ 𝑜𝑟
MD
M
Exercise 5.2
1. Calculate mean deviation & Coefficient of mean deviation using mean and median (Answer: 0.1193):
3000, 4300, 4000, 4800, 4200, 5800, 4600, 4500
2. Calculate mean deviation & Coefficient of mean deviation using mean and median (Answer: 0.38, 0.43):
90, 280, 65, 60, 50, 120, 100, 110, 70, 80, 75
3. Compute the mean deviation and its coefficient using mean and median (Answer: 7.66, 7.6 & 0.38, 0.38):
x 5 10 15 20 25 30 35 40
f 16 32 36 44 28 18 12 14
4. Compute the mean deviation and its coefficient using mean and median (Answer: 1.53, 1.49 & 0.407,
0.372):
No of Home Appliances
0 1 2 3 4 5 6 7
No. of Families 14 21 25 43 51 40 39 12
5. Compute the mean deviation and its coefficient using mean and median (Answer: 11.33 & 0.252):
Marks 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
No. of Students 4 6 10 20 10 6 4
Measures of Variation
Business Statistics | Concepts and Exercises Page | 21
6. Compute the mean deviation and its coefficient using mean and median (Answer: 7.6, 7.296 & 0.196,
0.194):
Mid Values 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5 62.5
Frequency 6 12 17 28 12 10 8 5 2
7. Compute the mean deviation and its coefficient (Answer: 40.417 & 0.425):
Wages below 25 50 80 110 150 200 300
No. of Workers 4 10 20 40 50 56 60
Standard Deviation Standard Deviation is the square root of the means of the squared deviations from the arithmetic mean. SD is
also known as Root Mean Square Deviation for this reason. It is the most widely used measure of variation.
Differences between Mean Deviation and Standard Deviation
1. Algebraic symbols are ignored while calculating mean deviation, whereas in the calculation of
standard deviation, signs are taken into account.
2. Mean deviation can be computed either from median or mean; standard deviation is always
calculated from mean.
Merits of Standard Deviation
1. It is based on each item of the data.
2. It is amenable to further algebraic treatment. It is possible to calculate the combined SD of two or
more groups.
3. For comparing the variability of two or more groups, coefficient of variation is considered to be the
most appropriate as it is based on mean and standard deviation.
4. Standard deviation is also used in further statistical work. For example, in calculating skewness,
correlation etc., standard deviation is used.
Limitations of Standard Deviation
1. Standard deviation is difficult to compute compared to other measures.
Formulae
Individual Series Discrete & Continuous Series
Direct Method σ = √Ʃd2
n d = x − x̅ σ = √
Ʃfd2
N d = x − x̅ or m − x̅
Short-cut Method σ = √Ʃd2
n− (
Ʃd
n)
2
d = x − A σ = √Ʃfd2
N− (
Ʃfd
N)
2
d = x − A or m − A
Step – Deviation Method
- σ = √Ʃfd′2
N− (
Ʃfd′
N)
2
x i d′ =x − A
i or
m − A
i
Variance = σ2 Coefficient of Variation, CV = σ
x̅ x 100
Measures of Variation
Business Statistics | Concepts and Exercises Page | 22
Exercise 5.3
1. Calculate the standard deviation of the marks of 11 students (Answer: 60.49):
90, 280, 65, 60, 50, 120, 100, 110, 70, 80, 75
2. Calculate the SD and Coefficient of Variation using direct method and shortcut method (Answer: 23.066 &
59.91%):
5, 10, 20, 25, 40, 42, 45, 48, 70, 80
3. Following are the runs scored by two batsmen X and Y in ten innings. Find who is a better scorer and who
is more consistent (Answer: CV(X) = 84.072%; CV(Y) = 82.707%):
X 100 22 0 36 82 45 7 13 65 14
Y 97 12 40 96 13 8 85 8 56 16
4. Compute the coefficient of variation (Answer: 43.63%):
x 10 20 30 40 50 60
f 8 12 20 10 7 3
5. The following table gives the age distribution of boys and girls in a high school. Find which of the two
groups is more variable in age. (Answer: CV(boys) = 7.85%; CV(girls) = 7.34%)
Age 13 14 15 16 17
No. of boys 12 15 15 5 3
No. of girls 13 10 12 2 1
6. The goals scored by teams A and B in a few football matches are as follows. Which team is more
consistent? (Answer: CV(A) = 124.94%; CV(B) = 108.97%)
Goals 0 1 2 3 4
No. of matches – Team A 27 9 8 4 5
No. of Matches – Team B 17 9 6 5 3
7. Compute the variance (Answer: 311.52):
Marks 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 - 99
No. of Students 5 12 15 20 18 10 6 4
8. Compute the coefficient of variation from the following data (Answer: 152.77%):
Profit/Loss - 4 – -3 -3 – -2 -2 – -1 -1 – 0 0 – 1 1 – 2 2 – 3 3 – 4 4 – 5 5 – 6
No. of shops 4 10 22 28 38 56 40 24 18 10
Measures of Variation
Business Statistics | Concepts and Exercises Page | 23
9. Find which class is more consistent in scoring marks, from the following table (Answer: 24.99 & 23.53):
Marks 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Class A 7 10 20 18 7
Class B 5 9 21 15 6
10. Following data relates to the wages of workers in factories A and B. which factory wages are more
variable (Answer: CV(A) = 54.14%; CV(B) = 49.89?
Wages up to (₹) 5 10 15 20 25 30
No. of workers – A 20 38 68 93 113 128
No. of workers – B 15 35 70 100 118 135
11. The number of employees, average wages per employee, and the variance of wages for two factories is
given below. (Answer: 2.5% and 4.71%)
Factory A Factory B
No. of employees
50 100
Average wages ₹120 ₹85
Variance 9 16
In which factory is there greater variation in the distribution of wages/employees? Which factory pays more?
12. Mean and standard deviation of the following continuous series are 31 and 15.9 respectively. The
distribution after taking step deviations is as follows. Determine the class intervals. (Answer: i = 10, CI = 0
– 10, 10 – 20 etc.).
d' -3 -2 -1 0 1 2 3
f 10 15 25 25 10 10 5
13. a. If x̅ = 56 and Variance = 144, find CV. (Answer: 21.43)
b. If Variance = 16, and CV = 50% find x̅. (Answer: 8)
c. If CV = 58% and x̅ = 36.55, find σ. (Answer: 21.2)
Business Statistics | Concepts and Exercises Page | 25
Chapter 6: Measures of Skewness
Skewness is a measure of asymmetrical statistical distribution. It characterizes the degree of symmetry or
asymmetry around its mean.
Absolute and Relative Measures of Skewness
1. Absolute measures of Skewness
Absolute measure of skewness explains the extent of asymmetry and the direction.
2. Relative Measures of Skewness
Relative measure of skewness is useful for comparative study of two or more series
Symmetrical Distribution
A distribution is symmetrical if the Mean = Median = Mode
A distribution is positively skewed if Mean > Median > Mode
A distribution is negatively skewed if Mean < Median < Mode
Interpretation of coefficient of skewness
If skewness is less than -1 or greater than +1 (-1 >Skp or Skp> +1), the distribution is highly skewed
If skewness is between -1 and -½ or between +½ and +1 (-1 ≤ Skp ≤ -½ or +½ ≤ Skp ≤ +1), the distribution is
moderately skewed
If skewness is between -½ and +½ (-½ ≤ Skp≤ +½), the distribution is approximately symmetric.
Uses of Skewness
1. Skewness is a measure to study whether a distribution is symmetrical or not.
2. Many models assume normal distribution; i.e., data are symmetric about the mean. The normal
distribution has a skewness of zero. But in reality, data points may not be perfectly symmetric. So, an
understanding of the skewness of the dataset indicates whether deviations from the mean are going
to be positive or negative.
Differences between Measures of Variation and Skewness
Dispersion:
1. It is concerned with the amount of dispersion
2. It gives scatterdness of the observations
3. It does not depend on the skewness
4. It is based on the averages of the first order (Mean, Median and Mode)
Skewness:
1. It tells us about the direction of the variation or departure from the symmetry
2. It indicates to what extent and in what direction the distribution differs from the symmetry
3. It depends on the dispersion to some extent.
4. It is based on the averages of the first order (Mean, Median and Mode) and second order (SD)
Formulae
For unimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = X ̅− M0
σ
For bimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = 3(X̅ − M)
σ
Measures of Skewness
Business Statistics | Concepts and Exercises Page | 26
Bowley’s Coefficient of Skewness, SB = 𝑄3+ 𝑄1 − 2M
𝑄3 − 𝑄1
Exercise 6.1
1. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: 0.3453 & 0.2):
23, 45, 12, 28, 23, 19, 27, 23, 28, 30
2. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: 0.162 & 0.164):
112, 75, 140, 89, 112, 98, 134, 129, 98, 121, 136
3. Calculate Pearson’s and Bowley’s Coefficients of Skewness (Answer: – 0.2445 & 0):
x 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5
f 35 40 48 100 125 87 43 22
4. Compute the two Coefficients of Skewness (Answer: – 0.8761 & -0.2):
x 4 8 12 16 20 24 28 32 36
f 18 21 20 9 7 20 22 17 8
5. Which group is more skewed?
i) Mean = 22; Median = 24, SD = 10 ii) Mean = 22, Median = 25, SD = 12
6. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness. (Answer: –0.0518 & –0.0165)
Class Interval 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
Frequency 6 12 22 48 56 32 18 6
7. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: 0.401 & 0.3750):
Marks Above 0 10 20 30 40 50 60 70 80 90
No. of Students 100 98 95 90 80 50 35 23 13 5
8. Calculate Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: –0.2078 & –0.058):
Mid Value 21 27 33 39 45 51 57
Frequency 18 22 40 50 38 12 4
9. Compute Karl Pearson’s and Bowley’s Coefficients of Skewness (Answer: –0.310 & –0.2314):
CI 3 – 7 8 – 12 13 – 17 18 – 22 23 – 27 28 – 32 33 – 37 38 – 42
f 7 9 10 6 13 10 13 10
10. a. In a distribution, Mean = 65, Median = 70, Skp = – 0.6. Find i) Mode and ii) CV. (Answers: 80, 38.46%)
b. Skp = – 0.7, σ = 6, M = 12.8. Find the Mean and CV. (Answers: 11.4, 52.63%)
Business Statistics | Concepts and Exercises Page | 27
Chapter 7: Correlation and Regression
The statistical tool with the help of which the relationship between two or more variables is studied is called
correlation. The measure of correlation is called the Correlation Coefficient.
Uses of Correlation Coefficient
1. Helps us measure the relationship between the variables.
2. If the variables are closely related, we can estimate the value of one variable, given the value of
another with the help of Regression Analysis
3. Helps in analyzing the economic behavior
4. Helps in the study of social science. For e.g. The relationship between smoking and lung cancer.
Correlation and Causation
1. The correlation may be due to pure chance, especially in a sample. For e.g., relationship between
salary and weight.
2. Both the correlated variables may be influenced by one or more variables. For e.g., a high degree of
correlation between the yield per acre of rice and wheat may be due to heavy rainfall or fertilizers
used.
3. Both the variables may be mutually influencing each other, so that neither can be designated as
cause and other effect. For e.g., demand and price.
4. Nonsense / Illusory Correlation: A correlation between two variables that is not due to any causal
relationship but related to a third variable, or to random sampling fluctuations. E.g. Global warming
and no. of pirates.
Types of Correlation
1. Positive Correlation or Direct Correlation: When the two variables are directly related, i.e., when one
increases the other also increases, it is said to be positive correlation. For e.g., Supply and price.
2. Negative or Indirect Correlation: When the two variables are inversely related, i.e., when one
increases the other decreases, it is said to be negative correlation. For e.g., Demand and supply
3. Partial Correlation: When one variable is independent and the other is dependent on the former, it is
a case of partial correlation
4. Simple Correlation: When only two variable are studied, it is called simple correlation
5. Multiple Correlation: When three or more variables are studied, it is called multiple correlation
6. Linear Correlation: When the two variable change by a fixed proportion, thus forming a straight line,
it is said to be linear correlation
7. Non-linear or Curvilinear Correlation: If the variables, when plotted on a graph do not form a straight
line, it is said to be curvilinear correlation. In other words, the amount of change in one variable does
not bear a constant change in the other variable.
Methods of Determining Correlation
1. Karl Pearson’s Coefficient of Correlation 2. Spearman’s Rank Coefficient of Correlation
3. Concurrent Deviation Method 4. Scatter Diagram method 5. Method of Least Squares
Correlation and Regression
Business Statistics | Concepts and Exercises Page | 28
Karl Pearson’s Coefficient of Correlation
This is the most widely used method of measuring correlation. It is popularly known as Pearsonian coefficient
of correlation. It is denoted by the symbol ‘r’.
Assumptions While Using Karl Pearson’s Coefficient of Correlation
While using Karl Pearson’s coefficient of correlation, it is assumed that,
1. The distribution is normal
2. There is cause and effect relationship between the variables.
3. There is a linear relationship between the variables.
Properties of Karl Pearson’s Coefficient of Correlation
1. The value of r always lies between -1 and +1. Interpretation: ±1 – Perfect correlation; ±0.9 to ±0.1 –
Very high degree; ±0.75 to ±0.9 – High degree; ±0.60 to ±0.75 – Moderate degree; ±0.30 to ±0.60 –
Low degree; 0 to ±0.30 – Very low degree; 0 – No correlation.
2. It is independent of change of scale and origin of X and Y variables.
3. It is the geometric mean of two regression coefficients.𝑟 = √𝑏𝑥𝑦 x 𝑏𝑦𝑥
Merits of Karl Pearson’s Coefficient of Correlation
1. This is the most popular among the mathematical methods
2. It summarizes in one value the degree of correlation and its direction – direct or inverse.
Limitations of Karl Pearson’s Coefficient of Correlation
1. It assumes a linear relationship.
2. There are chances of misinterpretation.
3. It is more time consuming compared to other methods.
Probable Error
It is the value that helps determine the reliability of the value of the correlation coefficient in the condition of
random sampling. It helps interpret the correlation coefficient.
Methods of Interpretation
1. If r < 6PE, the value of r may not be significant.
2. If r > 6PE, the value of r is significant or practically certain.
3. Using the limits of population, we get the range within which population correlation lies. ρ = r ± PE
Formulae
Using Actual Mean: r = Σdx.dy
√Σdx2 x Σdy2
Using Assumed Mean: r = Σdx.dy −
Σdx.Σdy
N
√Σdx2− (Σdx)2
N x √Σdy2−
(Σdy)2
N
Probable Error: P. E = 0.6745 x 1 − r2
√N
Correlation and Regression
Business Statistics | Concepts and Exercises Page | 29
Exercise 7.1
1. Compute the coefficient of correlation from the following data: (Ans.: +0.9243)
Internal Marks 25 30 22 12 19 24
External Marks 56 68 40 24 28 60
2. Compute the coefficient of correlation from the following data: (Ans.: +0.6051)
X 6 8 9 14 17 28 24 31 7
Y 10 12 15 15 18 25 22 26 28
3. Compute the coefficient of correlation from the following data: (Ans.: +0.8818)
X 45 55 56 58 60 65 68 70 75 80
Y 56 50 48 60 62 64 65 70 74 82
4. Compute the coefficient of correlation from the following data: (Ans.: – 0.7327)
X 43 44 46 40 44 42 45 42 38 40 42 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
5. Calculate the coefficient of correlation between age and playing habits of students: (Ans.: – 0.9895)
Age 15 16 17 18 19 20
No. of Students 250 200 150 120 100 80
Regular Players 250 150 90 48 30 16
6. The following table gives the distribution of the total population and those blind among them. Calculate the
coefficient of correlation and probable error. (Ans.: 0.898)
Age 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80
No. of Persons (‘000) 100 60 40 36 24 11 6 3
Blind Persons 55 40 40 40 36 22 18 15
7. Calculate ‘r’ between age and failure of candidates in the results of B.Com students: (Ans.: 0.7745)
Age 13 – 14 14 – 15 15 – 16 16 – 17 17 – 18 18 – 19 19 – 20 20 – 21 21 – 22 22 – 23
Candidates Appeared
200 300 100 50 150 400 250 150 25 75
Candidates failed 76 120 35 16 51 148 105 69 13 42
8. a. If r = +0.111 and N = 5, find PE (Answer: 0.2984)
b. If r = +0.9668 and PE = 0.01463, find N (Answer: 9) [Hint: Round off answer to the closest whole number]
c. If PE = 0.11857 and N = 8, find r. (Answer: +0.709)
Correlation and Regression
Business Statistics | Concepts and Exercises Page | 30
Spearman’s Rank Correlation
Formulae
Unique Ranks: rs = 1 − 6 Σd2
N3− N 𝑑 = 𝑅1 − 𝑅2
Tied Ranks: rs = 1 − 6 [Σd2 +
1
12(m1
3− m1) + 1
12(m2
3− m2)+⋯+ 1
12 (mn
3 − mn)]
N3− N where m = No. of tied ranks
Exercise 7.2
1. Two ladies ranked seven brands of lipsticks as follows. Find the degree of agreement (Ans.: 0.786):
Lady 1 1 3 2 7 6 4 5
Lady 2 2 1 4 6 7 3 5
2. In a beauty competition, two judges ranked 12 participants as follows. What is the degree of agreement
between them? (Ans.: – 0.4546)
X 3 4 1 5 2 10 6 9 8 7 12 11
Y 6 10 12 3 9 2 5 8 7 4 1 11
3. Compute the rank correlation from the following data (Ans.: 0.8322):
X 60 34 40 50 45 41 22 43 42 66 64 46
Y 75 32 35 40 45 33 12 30 36 72 41 57
4. From the marks scored in accountancy and statistics by 12 students, compute rank correlation (Ans.: 0):
Accountancy 60 15 20 28 12 40 80 20
Statistics 10 40 30 50 30 20 60 30
5. Compute the coefficient of rank correlation (Ans.: 0.733):
X 48 33 40 9 16 16 65 24 16 57
Y 13 13 24 6 15 4 20 9 6 19
6. Compute the rank correlation between the length of service and order of merit (Ans.: 0.7937):
Length of Service 5 2 10 8 6 4 12 2 7 5 9 3
Order of Merit 6 12 1 9 8 5 2 10 3 7 4 11
7. Ten competitors in a voice contest are ranked by three judges in the following order. Find which pair of
judges have the nearest approach to common liking in voice (Ans.: -0.212, -0,297, 0.6364; Judges 1 & 3):
Judge 1 1 6 5 10 3 2 4 9 7 8
Judge 2 3 5 8 4 7 10 2 1 6 9
Judge 3 6 4 9 8 1 2 3 10 5 7
Correlation and Regression
Business Statistics | Concepts and Exercises Page | 31
Regression The statistical tool with the help of which we are in a position to estimate or predict the unknown values of
one variable from known values of another variable is called regression.
Correlation vs. Regression
1. Correlation coefficient is a measure of degree of co-variability between two variables, but regression
analysis helps to predict the value of one variable given the value of the other.
2. The cause and effect relation is clearly indicated more through regression analysis than by
correlation, which is more a tool of ascertaining the degree of relationship between the variables.
Formulae
Equation X on Y: (X − X̅) = bxy (Y − Y̅)
Equation Y on X: (Y − Y̅) = byx (X − X̅)
Formulae to Find the Regression Coefficients:
Using Actual Mean: bxy = Σdx.dy
Σdy2 ; byx = Σdx.dy
Σdx2
Using Assumed Mean: bxy = N Σdx.dy − Σdx.Σdy
N Σdy2 − (Σdy)2 ; byx = N Σdx.dy − Σdx.Σdy
N Σdx2 − (Σdx)2
Using Standard Deviation: bxy = r.σx
σy; byx = r.
σy
σx
Coefficient of Correlation: r = √bxy x byx
Exercise 7.3
1. Find the Regression Equations (Answer: X = 1.3Y – 4.4 & Y = 0.65X + 4.1):
X 2 4 6 8 10
Y 5 7 9 8 11
2. A panel of judges P & Q graded seven dramatic performances by awarding marks as follows. Obtain the
two Regression Equations: (Answer: X = 0.75Y + 14.5 & Y = 0.75X + 5.75)
Performance 1 2 3 4 5 6 7
Marks by P 46 42 44 40 43 41 45
Marks by Q 40 38 36 35 39 37 41
3. Following Table shows the exports of raw cotton and the imports of manufactured goods into India for
seven years.
Exports 42 44 58 55 89 98 60
Imports 56 49 53 58 67 76 58
Correlation and Regression
Business Statistics | Concepts and Exercises Page | 32
Obtain the two Regression Equations and estimate the imports when export in a particular year was ₹ 70
crore. (Answer: 62.03; X = 2.198Y – 67.244 & Y = 0.391X + 34.651)
4. The advertisement expenses and sales data of ABC company are as follows:
Advertisement Expenses (₹ Lakh) 60 62 65 70 73 75 71
Sales (₹ Crore) 10 11 13 15 16 19 14
Find:
a. Sales for advertisement expenses of ₹ 80 lakhs. (Answer: ₹ 20.525 Crore)
b. Advertisement expenses for a sales target of ₹ 25 Crore. (Answer: ₹ 87.786 Lakh)
c. Coefficient of Correlation (Answer: 0.9870)
(The Regression Equations are: X = 1.807Y + 42.619 and Y = 0.539X – 22.613)
5. Following data are available on sales and advertisement:
Sales (₹) Advertisement Expenses (₹)
Mean 70,000 15,000
Standard Deviation 15,000 3,000
Coefficient of correlation is +0.8
Find:
a. The two Regression Equations (Answer: X = 4Y + 10,000 & Y = 0.16X + 3,800)
b. The advertisement budget if the company desires to achieve the target sales of ₹ 1,00,000
(Answer: ₹ 19,800)
6. Coefficient of correlation between the ages of brothers and sisters in a community was found to be 0.8.
Average age of the brothers was 25 and that of sisters 22 years. Their variances were 16 and 25
respectively.
Find:
a. The expected age of the brother when sister’s age is 12 years. (Answer: 18.6 years)
b. The expected age of the sister when brother’s age is 33 years. (Answer: 30 years)
(The Regression Equations are: X = 0.64Y + 10.92 and Y = X – 3)
7. a. 𝐼𝑓 𝑟 = 0.42, 𝜎𝑦 = 16.8 𝑎𝑛𝑑 𝜎𝑥 = 10.8, 𝑓𝑖𝑛𝑑 𝑏𝑥𝑦 𝑎𝑛𝑑 𝑏𝑦𝑥 (Answers: 0.269 & 0.653)
b. 𝐼𝑓 𝑏𝑥𝑦 = 0.2, 𝑟 = 0.533 𝑎𝑛𝑑 𝜎𝑥 = 5, 𝑓𝑖𝑛𝑑 𝜎𝑦 (Answer: 13.325)
c. 𝐼𝑓 𝑏𝑥𝑦 = 2.1 𝑎𝑛𝑑 𝑏𝑦𝑥 = 0.456, 𝑓𝑖𝑛𝑑 𝑟 (Answer: 0.978)
d. 𝐼𝑓 𝑏𝑥𝑦 = 2 𝑎𝑛𝑑 𝑟 = 0.578, 𝑓𝑖𝑛𝑑 𝑏𝑦𝑥 (Answer: 0.167)
Business Statistics | Concepts and Exercises Page | 33
Chapter 8: Index Numbers
A specialized average designed to measure the change in the level of phenomenon with respect to time,
geographic location or other characteristics such as income, price, etc.
Features of Index Numbers
1. Index numbers are specialized averages
An average is not suitable measure of comparing different groups of data if they are expressed in
different units. But index numbers help compare different groups of data even if they are expressed
in different unites. For instance, the spending on food, clothing, house rent etc. can be compared
using index numbers.
2. Index numbers measure the change in the level of phenomenon
For instance, if the index of industrial production is 108 in 2012 compared to 100 in 2011, it means
there is a net increase of 8% in industrial production.
3. Index numbers measure the effect of change over a period of time
For instance, BSE index, introduced in 1986, is used to study the movements in the share prices till
date.
Uses of Index Numbers
1. They help in framing suitable policies: For instance, wages and salaries are adjusted based on
Consumer Price Index.
2. They reveal trends and tendencies: For instance, to study the export trend after economic
liberalization in 1991, the current index can be compared with that of 1991.
3. Useful in deflating: Deflation is the process of adjusting original data for price changes. For instance,
nominal income can be adjusted to real income.
Types of Index Numbers
1. Unweighted Index: The method of constructing index numbers in which weights are not assigned to
the items is called Unweighted Index. It includes Simple Aggregative and Simple Average of relatives.
2. Weighted Index: The method of constructing index numbers in which weights are assigned to the
items is called weighted index. It includes Weighted Aggregative and Weighted Average of Relatives.
Some Important Definitions
1. Base Year: Base year is any reference year earlier than the year for which the indices are calculated.
They are used as the reference points for comparison of changes in phenomenon.
2. Fixed Base: Refers to the base year, which remains fixed over a period of time. The fixed base year
serves as a common standard of comparison for all prices during the period.
3. Chain Base: Refers to the base year which changes from year to year. Generally the previous year will
be the base year for calculation index number for the current year.
4. Consumer Price Index or Cost of Living Index: CPI measures the effect of change in prices of
consumer goods which may include may include food, clothing, fuel, lighting, house rent etc., on the
working class families or consumers, during any year with respect to some fixed year.
5. Time Reversal Test: A formula for an index number should maintain time consistency by working
both forward and backward with respect to time. This is called time reversal test. It is expressed in
the form of an equation as follows: P01 x P10 = 1
Index Numbers
Business Statistics | Concepts and Exercises Page | 34
6. Factor Reversal Test: The index must permit interchanging the prices and quantities without giving
inconsistent results. The two results multiplied together should give a true value ratio. This is given by
the expression: P01 x Q01 = Σ p1q1
Σ p0q0
Points to be Considered While Selecting the Base Year
1. It should be a normal year
2. It should not be too distant in the past
3. Fixed base or Chain base
Limitations of Index Numbers
1. Sampling errors
2. It is assumed that the quality of the products remain the same
3. Specific index for specific purpose
4. It is assumed that there is no change in tastes, habits and customs
5. No single formula to calculate the index which may be suitable for all situations
6. Unreliable comparisons over longer periods
7. It is difficult to select a normal year as base year
Fisher’s Ideal Index Number
Fisher’s Index Number is called ideal for the following reasons:
1. It is based on geometric mean which is considered to be the best average for constructing index
numbers
2. It takes into account both, current year as well as base year prices and quantities.
3. It satisfies both Time Reversal Test (TRT) and Factor Reversal Test (FRT).
4. It is free from bias.
Formulae
Simple Aggregative Method: P01 = Σp1
Σp0 x 100
Weighted Aggregative Method: P01 = Σp1q0
Σp0q0 x 100
Simple Average of Price Relatives Method: P01 = ΣI
n =
Σ(p1p0
x 100)
n
CPI/CLI: Aggregate Expenditure Method: P01 = Σp1q0
Σp0q0 x 100
Family Budget Method: P01 = ΣIW
ΣW where I =
p1
p0 x 100 and W = p0q0
Fisher’s Ideal Index Number: P01 = √Σp1q0
Σp0q0 x
Σp1q1
Σp0q1 x 100
Time Reversal Test: P01x P10 = √Σp1q0
Σp0q0 x
Σp1q1
Σp0q1 x √
Σp0q1
Σp1q1 x
Σp0q0
Σp1q0= 1
Factor Reversal Test: P01x Q01 = √Σp1q0
Σp0q0 x
Σp1q1
Σp0q1 x √
Σp0q1
Σp0q0 x
Σp1q1
Σp1q0=
Σp1q1
Σp0q0
Index Numbers
Business Statistics | Concepts and Exercises Page | 35
Exercise 8.1
1. Calculate the price index for 2006, 2007 and 2008 using the simple aggregative method on the basis of 1995
(Answers: 124.37, 139.42, 153.20):
Commodity Unit 1995 2006 2007 2008
Rice Kg ₹10.50 ₹12.10 ₹14.30 ₹18.60
Wheat Kg 9.25 11.40 12.70 13.40
Milk L 4.75 7.00 9.00 10.50
Sugar Kg 8.60 14.00 16.00 17.00
Oil Kg 27.50 32.00 35.00 36.50
Pulses Kg 11.20 12.80 13.10 14.00
2. Calculate the weighted aggregative index number for the following commodities for the year 2001 and 2008 taking
the year 1991 as the base year (Answers: 130.45, 156.97):
Commodity Units Consumed
1991
Price per unit (₹)
1991 2001 2008
Rice 10 kg 11.00 16.50 18.00
Wheat 5 kg 10.20 12.25 14.00
Grams 3 kg 5.00 7.00 9.00
Milk 30 litres 6.70 9.00 10.50
Oil 4 kg 29.00 32.00 38.00
Sugar 12 kg 8.80 11.30 16.30
3. Calculate the price index numbers for the following data for 2007 and 2008 using simple average of price relative
method (Answers: 147, 196):
Commodity Bricks Timber Board Sand Cement
Prices – 2001 10 20 5 2 7
Prices – 2007 16 21 6 3 14
Prices – 2008 18 22 7 5 21
4. Calculate the index number for the following data using simple average of price relative method (Answer: 122.92)
Commodity A B C D E F
Prices – 2008 4 6 2 5 8 10
Prices – 2009 5 6 3 7 9 11
Index Numbers
Business Statistics | Concepts and Exercises Page | 36
5. Calculate the Consumer Price Index or Cost of Living Index Number using Aggregative Expenditure Method and Family
Budget Method (Answer: 150):
Item Quantity Price
2005 2005 2010
A 5 8 15
B 2 9 12
C 3 16 20
6. Calculate the CPI using Aggregative Expenditure Method and Family Budget Method (Answer: 118.77):
Item Quantity Price
2008 2008 2009
A 6 quintals 5.75 6.00
B 6 quintals 5.00 8.00
C 1 quintal 6.00 9.00
D 6 quintals 8.00 10.00
E 4 kg 2.00 1.50
F 1 quintal 20.00 15.00
7. An enquiry into the budgets of middle class families in Bangalore gave the following information:
Commodity Food Rent Clothing Fuel Miscellaneous
Expenses – 2007 35% 15% 20% 10% 20%
Price relatives – 2008 116 120 125 125 150
What changes in the cost of living index of 2008 have taken place as compared 2007? How much dearness allowance
should be given to a worker who was drawing ₹200 as wages in 2007? (Answers: 126.10 & ₹52.20)
8. Following information relating to workers in an industrial town is given:
Item Food &
Beverages Clothing
Fuel & Lighting
Housing Miscellaneous
Group Index – 2009 (Base 2004)
225 185 150 200 180
Proportion of Expenditure
50% 10% 10% 15% 15%
Average wage per month in 2004 is ₹750. What should be the average wage per worker in 2009 in that town so that
the standard of living of the workers does not fall below that of 2004? (Answers: 203 & ₹1,522.50)
9. An enquiry into the budget of the middle class families in a city gave the following information. What changes in the
cost of living figures of 2005 as compared to that of 2002 are seen? (Answer: 102.75)
Index Numbers
Business Statistics | Concepts and Exercises Page | 37
Item Percentage Expenses
Price (₹) Price (₹)
2002 2005
Food 29% 140 147
Rent 15% 30 30
Clothing 25% 75 66
Fuel 10% 25 20
Miscellaneous 21% 40 52
10. The data below show the percentage increase in prices of selected food items and the weights attached to each of
them. Calculate the index number for the food group (Answer: 340, 304.6)
Food Item: Rice Wheat Dal Ghee Oil Spices Milk Fish Vegetables Refreshments
Weights 33 11 8 5 5 3 7 9 9 10
Increase in Price %
180 202 115 212 175 517 260 426 332 279
Using the above food index and information given below, calculate the cost of living index number:
Commodity: Food Clothing Lighting Rent Miscellaneous
Index - 310 220 150 300
Weight 60 - 8 9 18
11. The cost of living index number on a certain date was 200. From the base period, the percentage increase in prices
were Rent – ₹60, clothing – ₹250, Fuel and lighting – ₹150, Miscellaneous – ₹120. The weights of different groups
were Food – 60, Rent – 16, clothing – 12, fuel and lighting – 8, and miscellaneous – 4. What was the percentage
increase in food group? (Answer: 72.67)
12. A textile worker earns ₹350 per month. The cost of living index for that particular month is known to be 136. Using
the data given below, find the amounts spent by him on house rent and clothing (Answer: 42, 49):
Commodity: Food Clothing House Rent Fuel Miscellaneous
Expenditure 140 ? ? 56 63
Group Index 180 150 100 110 80
13. Compute Fisher’s Ideal Index Number and prove that it satisfies the Time Reversal Test and Factor Reversal Test
(Answer: 134.41):
Year Commodity: A B C D E
2008 Price 10 12 18 20 22
Consumption 49 25 10 5 8
2009
Price 12 15 20 40 45
Consumption 50 20 12 2 5
Index Numbers
Business Statistics | Concepts and Exercises Page | 38
14. Compute Fisher’s Ideal Index Number for the following five items (Answer: 266.615):
Commodity Price (₹) Quantity
2008 2009 2008 2009
A 16 40 100 120
B 4 12 30 20
C 2 4 40 50
D 4 10 20 16
E 2 10 80 60
15. Construct Fisher’s Ideal Index Number and prove that it satisfies TRT & FRT (Answer: 165.71):
Year Item: Rice Sugar Oil
2000 Value 210 100 40
Price 14 20 4
2008
Value 300 108 56
Price 25 27 7
16. Compute Fisher’s Ideal Index Number and prove that it satisfies the TRT and FRT (Answer: 112.10):
Year Item A B C D E
Base Year Price 10 12 20 18 28
Value 200 108 260 144 280
Current Year
Value 300 220 250 140 320
Quantity 25 22 10 7 10
17. Compute Fisher’s Ideal Index Number and prove that it satisfies the TRT and FRT (Answer: 219.12):
Year Commodity: A B C D
2013 Price 20 40 10 50
Expenditure 400 160 100 250
2014
Price 50 80 20 100
Expenditure 750 400 240 600
Business Statistics | Concepts and Exercises Page | 39
Quantitative Methods – Formulae
Arithmetic Mean
Individual Series Discrete Series Continuous Series
Direct Method X̅ = Σx
n X̅ =
Σfx
N X̅ =
Σfm
N
Shortcut Method X̅ = A + Σd
n; d = x – A X̅ = A +
Σfd
N; d = x − A X̅ = A +
Σfd
N; d = m − A
Step-Deviation Method
- X̅ = A + Σfd′
N x 𝑖; d = x − A X̅ = A +
Σfd′
N x 𝑖; d = m − A
Weighted Arithmetic Mean: X̅ = Σxw
Σw
Combined Arithmetic Mean: X̅(1,2) = n1x̅1+ n2x̅2
n1+ n2
Median
Individual Series: M = [(n+1)
2]
th
term when n is odd and M = [(
n
2)
thterm + (
n
2+1)
thterm
2] when n is even.
Discrete Series: M = [(n+1)
2]
th
term
Continuous Series: M = L + N
2 − c.f.
f x i
Mode
Individual Series: The variable that occurs most frequently.
Discrete Series: The value which has the greatest frequency in the neighborhood.
Continuous Series: Z or M0 = L + ∆1
∆1 + ∆2 x i; ∆1 = |f1 – f0| and ∆2 = |f1 – f2|
Bi-modal Class: Z or M0 = 3 median − 2 mean
Mean Deviation
Individual Series Discrete Series Continuous Series
Mean Deviation Ʃ |D|
n
Ʃ f |D|
N
Ʃ f |D|
N
|D| |x − x̅| 𝑜𝑟 |x − M| |x − x̅| 𝑜𝑟 |x − M| |m − x̅| 𝑜𝑟 |m − M|
Coefficient of MD MD
x̅ 𝑜𝑟
MD
M
MD
x̅ 𝑜𝑟
MD
M
MD
x̅ 𝑜𝑟
MD
M
Business Statistics | Concepts and Exercises Page | 40
Range
Range: L – S (Where L = Largest variable and S = Smallest variable)
Coefficient of Range: L−S
L+S
Quartile Deviation
Interquartile Range: IQR = Q3 – Q1
Quartile Deviation: QD = Q3− Q1
2
Quartile Deviation: CQD = Q3− Q1
Q3+ Q1
Individual & Discrete Series Continuous Series
Q1 [(n + 1)
4]
th
term L +
N4
− c. f.
f x i
Q3 [3(n + 1)
4]
th
term L +
3N4
− c. f.
f x i
Standard Deviation
Individual Series Discrete & Continuous Series
Direct Method σ = √Ʃd2
n d = x − x̅ σ = √
Ʃfd2
N d = x − x̅ or m − x̅
Short-cut Method σ = √Ʃd2
n− (
Ʃd
n)
2
d = x − A σ = √Ʃfd2
N− (
Ʃfd
N)
2
d = x − A or m − A
Step – Deviation Method
- σ = √Ʃfd′2
N− (
Ʃfd′
N)
2
x i d′ =x − A
i or
m − A
i
Variance = σ2 Coefficient of Variation, CV = σ
x̅ x 100
Coefficient of Skewness
For unimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = X ̅− M0
σ
For bimodal distribution: Karl Pearson’s Coefficient of Skewness, Skp = 3(X̅ − M)
σ
Bowley’s Coefficient of Skewness, SB = 𝑄3+ 𝑄1 − 2M
𝑄3 − 𝑄1
Karl Pearson’s Coefficient of Correlation
Using Actual Mean: r = Σdx.dy
√Σdx2 x Σdy2
Index Numbers
Business Statistics | Concepts and Exercises Page | 41
Using Assumed Mean: r = Σdx.dy −
Σdx.Σdy
N
√Σdx2− (Σdx)2
N x √Σdy2−
(Σdy)2
N
Probable Error: P. E = 0.6745 x 1 − r2
√N
Spearman’s Rank Correlation
Unique Ranks: rs = 1 − 6 Σd2
N3− N 𝑑 = 𝑅1 − 𝑅2
Tied Ranks: rs = 1 − 6 [Σd2 +
1
12(m1
3− m1) + 1
12(m2
3− m2)+⋯+ 1
12 (mn
3 − mn)]
N3− N where m = No. of tied ranks
Regression
Equation X on Y: (X − X̅) = bxy (Y − Y̅)
Equation Y on X: (Y − Y̅) = byx (X − X̅)
Formulae to Find the Regression Coefficients
Using Actual Mean: bxy = Σdx.dy
Σdy2 ; byx = Σdx.dy
Σdx2
Using Assumed Mean: bxy = N Σdx.dy − Σdx.Σdy
N Σdy2 − (Σdy)2 ; byx = N Σdx.dy − Σdx.Σdy
N Σdx2 − (Σdx)2
Using Standard Deviation: bxy = r.σx
σy; byx = r.
σy
σx
Coefficient of Correlation: r = √bxy x byx
Index Numbers
Simple Aggregative Method: P01 = Σp1
Σp0 x 100
Weighted Aggregative Method: P01 = Σp1q0
Σp0q0 x 100
Simple Average of Price Relatives Method: P01 = ΣI
n =
Σ(p1p0
x 100)
n
CPI/CLI: Aggregate Expenditure Method: P01 = Σp1q0
Σp0q0 x 100
Family Budget Method: P01 = ΣIW
ΣW where I =
p1
p0 x 100 and W = p0q0
Fisher’s Ideal Index Number: P01 = √Σp1q0
Σp0q0 x
Σp1q1
Σp0q1 x 100
Time Reversal Test: P01x P10 = √Σp1q0
Σp0q0 x
Σp1q1
Σp0q1 x √
Σp0q1
Σp1q1 x
Σp0q0
Σp1q0= 1
Factor Reversal Test: P01x Q01 = √Σp1q0
Σp0q0 x
Σp1q1
Σp0q1 x √
Σp0q1
Σp0q0 x
Σp1q1
Σp1q0=
Σp1q1
Σp0q0
Business Statistics | Assignments Page | 45
Assignment 1: Classification & Tabulation
1. Prepare a blank table showing the number of persons leaving India to four different countries – USA,
Canada, Australia and to the Gulf countries for employment opportunities, according to sex from the four
metros – Mumbai, Kolkata, New Delhi and Chennai.
2. In 2012, the total number of visitors to the Wonder Land, Bangalore, was 25,000. Among them, there
were 8,600 female visitors from India and 6,500 foreign visitors out of which 3,500 were female visitors.
In 2013, the total number of visitors increased by 20% and that of Indian visitors increased by 10%.
Among them, there were 8,000 Indian male visitors and 6,000 foreign female visitors. Tabulate the data.
3. A survey of 370 students from Commerce faculty and 130 students from Science faculty revealed that 180
students were studying for only CA examinations, 140 for only Costing examinations and 80 for both CA
and Costing examinations. The rest opted for Part-time Management courses. Of those studying for
Costing, only 13 were girls and 90 boys belonged to Commerce faculty. Out of 80 studying for both CA and
Costing, 72 were from commerce faculty amongst which 70 were boys. Among those that opted for Part-
time Management courses, 50 boys were from Science faculty, and 30 boys and 10 girls were from
Commerce faculty. In all there were 110 boys in Science faculty. Present the above information in a
tabular form.
4. Prepare a frequency distribution from the following figures relating to bonus paid to workers (₹’000)
67 60 69 70 62 63 69 70 58 56 67 54
55 70 60 60 60 65 70 56 57 58 60 59
61 73 69 67 61 60 59 57
5. The following are the marks of 50 students in Statistics. Construct a suitable frequency table:
28 17 48 57 38 59 28 16 78 46 45 86
21 29 49 61 71 46 49 30 76 37 76 36
37 39 46 27 29 31 21 49 29 8 56 46
5 36 71 42 46 56 16 15 22 35 18 22
46 17
6. 25 values of two variables X and Y are given below. Form a two-way frequency table showing the
relationship between the two:
X 12 24 33 22 44 37 26 36 55 48 27
Y 140 256 360 470 470 380 280 315 420 390 440
X 57 21 51 27 42 43 52 57 44 48 48
Y 390 590 250 550 360 570 290 416 380 392 370
X 42 41 69
Y 312 330 590
Business Statistics | Assignments Page | 46
Assignment 2: Diagrammatic Representation
1. Represent the following data using a simple bar diagram:
Year 1974 1975 1976 1977 1978 1979 1980 1981
Production (tons) 45 40 44 41 49 42 55 50
2. Present the following data on profit before tax and after tax using multiple bar diagram:
Year 1979 1980 1981 1982 1983
Profit Before Tax (lakh ₹) 190 191 200 109 127
Profit After Tax (lakh ₹) 79 71 90 36 89
3. Represent the cost per scooter using sub-divided bar diagram and percentage sub-divided bar diagram:
Particulars 1979 1980 1981
Raw Material 2,160 2,600 2,700
Labor 540 700 810
Direct Expenses 360 200 360
Factory Expenses 360 300 360
Office Expenses 180 200 270
Total 3,600 4,000 4,500
4. Draw a pie diagram to represent the expenditure (in ₹) of a family:
Food Rent Clothing Education Lighting Miscellaneous Savings
540 180 180 90 40 40 10
5. Present the following data using three variable line graph:
Year 2009 2010 2011 2012 2013
Income (₹ ‘000) 150 180 160 190 170
Expenses (₹ ‘000) 90 100 120 190 200
Profit/loss (₹ ‘000) +60 +80 +40 0 -30
Business Statistics | Assignments Page | 47
Assignment 3: Measures of Central Tendency
1. Find the mean, median and mode (Using G & A Table) of the following data:
Weight 58 60 61 62 63 64 65 66
No. of Persons 4 12 24 32 32 16 8 2
2. Find the mean using Direct, shortcut and Step-Deviation methods:
Wages 0 – 20 20 – 40 40 - 60 60 - 80 80 – 100
No. of Workers 82 112 150 95 48
3. Find the mean, median and mode of the following data:
x 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 - 80
f 5 8 7 12 28 20 10 10
4. Find the mean, median and mode of the following data:
CI 4 – 7 8 – 11 12 – 15 16 – 19 20 – 23 24 – 27
Frequency 12 23 40 65 17 3
5. Find the mode of the following data:
Age below 5 10 15 20 25 30 35
No. of persons 24 56 84 100 132 142 150
6. 20% of the workers in a firm, employing a total of 4000 workers, earn less than ₹4 per hour, 880 earn
from ₹4 to ₹4.24 per hour, 24% earn from ₹4.25 to 4.49 per hour, 740 earn from ₹4.50 to ₹4.74 per hour,
12% earn from ₹4.75 to ₹4.99 per hour and rest earn ₹5 or more per hour. Calculate the median.
7. Find the median and mode of the following data using Ogive curves and Histogram respectively:
Mid values 115 125 135 145 155 165 175 185 195
Frequency 6 25 48 72 116 60 38 22 3
8. Find the missing frequencies, if total frequency is 120 and median is 36.5:
CI 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60
F 8 15 28 - 22 - 4 2
Answers
(1) 62.24, 62, 62 (2) 46.51 (3) 45, 46.43, 46.67 (4) 15.03, 15.81, 16.87 (5) 8.33 (6) 4.33
(7) 153.79, 154.4 (8) 30, 11
Business Statistics | Assignments Page | 48
Assignment 4: Measures of Variation
1. Find the Interquartile Range, QD, CDQ, and MD (Using mean and median) from the following data:
Weight 58 59 60 61 62 63 64 65 66
No. of Persons 15 20 32 35 33 22 20 10 8
2. Find the Interquartile Range, QD, CDQ, and MD (Using mean and median) from the following data:
Value 90 – 99 80 – 89 70 – 79 60 – 69 50 – 59 40 – 49 30 – 39
Frequency 2 12 22 20 14 4 1
3. From the prices of shares of X and Y given below, state which share prices are more stable
X 55 54 53 53 56 68 52 50 51 49
Y 108 107 105 105 106 107 104 103 104 101
4. Find the coefficient of variation from the following data:
Wages up to 60 70 80 90 100 110 120 130
No. of workers
8 24 56 95 136 178 192 200
5. The life of two types of tyres in a sample survey is given below. Which one has a higher average? Based on
consistency, which one would you prefer?
Life (in ‘000 km) 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35
Type A 10 18 32 40 22 18
Type B 18 22 40 32 18 10
6. Given below is the distribution of boys and girls of a school. Find which group is more variable.
Age in Years 13 14 15 16 17
No. of Boys 12 15 15 5 3
No. of Girls 13 10 12 2 1
Answers
(1) QD: 3, 1.5, 0.024, 1.74; MD: 1.74, 0.028, 1.713, 0.028 (2) QD: 18.02, 9.01, 0.132; MD: 10.41, 0.153, 10.437,
0.152 (3) CV(X) = 9.37%, CV(Y) = 1.91% (4) 16.781% (5) CV(A) = 33.35% & CV(B) = 37.13%
(6) CV(X) = 7.855%& CV(Y) = 7.341%
Business Statistics | Assignments Page | 49
Assignment 5: Measures of Skewness
1. Find Karl Pearson’s and Bowley’s Coefficients of Skewness: 25, 37, 48, 35, 22, 29, 37, 30, 41, 25
2. Find the Pearson’s and Bowley’s Coefficients of Skewness:
Age 12 14 15 18 21 24 26 27 31 33
No. of Persons 8 12 24 20 15 24 18 8 6 4
3. Find the Coefficient of Skewness from the following data using Pearson’s and Bowley’s methods:
Size 7 8 9 10 11 12 13 14
Frequency 2 11 36 64 39 30 22 2
4. Find the Skp and SB from the following data:
Marks Below 80 70 60 50 40 30 20 10
No. of Students
150 136 120 80 70 70 50 10
5. From the data given below, find the coefficient of skewness using both the methods:
X 23 – 27 28 – 32 33 – 37 38 – 42 43 – 47 48 – 52 53 – 57 58 – 62 63 – 67 68 – 72
F 2 6 9 14 32 16 12 6 2 1
6. From the data given below, find SKp and SB:
Marks Above 0 10 20 30 40 50 60 70 80 90
No. of students
100 89 73 64 52 49 32 20 12 5
7. Pearson’s coefficient of skewness is –0.7 and the value of the median and standard deviation are 12.8 and
6 respectively. Determine the value of mean.
8. In a distribution, mean = 65, median = 70 and Skp = – 0.6. Find i) SD, ii) Mode, iii) CV
Answers
(1) 0.155, - 0.1818 (2) – 0.2597, – 0.0909 (3) 0.3665, 1 (4) – 0.7539, –0.3636 (5) 0.0572, 0.0615
(6) – 0.2276, – 0.1858 (7) 11.4 (8) 25, 80, 38.46
Business Statistics | Assignments Page | 50
Assignment 6: Coefficient of Correlation
1. Calculate Karl Pearson’s Coefficient of Correlation and the probable error for the following data regarding
price and demand of a commodity:
Price 10 28 49 50 70 75 98 100 110 120
Demand 112 110 75 60 55 50 40 30 20 10
2. Find the coefficient of correlation and PE of the following data:
Age 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65
Wages (‘000) 9 10 12 11 16 16 18 17 15
3. From the following data find the coefficient of correlation between average profits and average
advertisement expenditure per shop and interpret the result.
No. of Shops 30 45 14 26 12 16 22 35
Total Profits 60,000 135,000 42,000 52,000 36,000 64,000 66,000 105,000
Advertisement Expenses
3,000 45,000 7000 13,000 6,000 4,800 8,800 14,000
4. With the following data in 6 cities, calculate the Coefficient of Correlation between the density of
population and death rates.
Cities A B C D E F
Density of Population
200 500 400 700 600 300
Population (‘000) 30 90 40 42 72 24
No. of deaths 300 1440 560 840 1224 312
5. Calculate the Rank Correlation and the Probable Error:
Analyst A 15 18 12 22 15 21 15 27 16 24
Analyst B 16 19 17 21 19 26 12 16 18 20
6. Using Rank Correlation find out which pair of judges have a nearly common taste in fashion design.
Judge A 1 3 2 5 8 7 9 4 10 6
Judge B 3 5 4 6 7 9 8 1 2 10
Judge c 5 6 2 3 8 7 10 4 1 9
Answers
(1) – 0.975 (2) 0.855 (3) 0.141 (4) + 0.988 (5) 0.4182 (6) 0.3455, 0.7697, 0.2727
Business Statistics | Assignments Page | 51
Assignment 7: Regression
1. The following data relate to the ages of husbands and wives:
Husband’s age 25 28 30 32 35 36 38 39 42 55
Wife’s age 20 26 29 30 25 18 26 35 35 46
Obtain the two regression equations and determine the most likely age of husband when the wife’s age is
25 years.
2. From the following data:
a. Find the two regression equations
b. Estimate the value of X when Y = 20 and the value of Y when X = 30
c. Determine the coefficient of correlation
X 20 24 26 34 36
Y 10 12 14 18 26
3. Find the regression lines for the following data and estimate the value of X when Y = 38.
X 25 28 35 32 36 37 29 39
Y 43 46 49 41 36 32 31 32
4. The heights (in cm) and weights (in kg) of a random sample of 9 adult males are shown below:
Height 177 163 173 182 171 168 174 176 184
Weight 71 67 77 85 69 62 73 78 80
Estimate the height when the weight is 75 and the weight when the height is 180.
5. A study of wheat prices per kg at Mysore and Bengaluru yields the following data:
Mysore Bengaluru
Average Price ₹ 24.63 ₹ 27.97
Standard Deviation ₹ 3.26 ₹ 2.07
Correlation Coefficient: 0.774
Estimate:
a. The price of wheat at Mysore when the price is ₹ 23.54 at Bengaluru.
b. The Price of wheat at Bengaluru when the price is ₹ 30.5 at Mysore.
Answers
(1) 32.6956 years (2) 32; 17.739 (3) 32.839 (4) 175.315 cm; 78.75 kg (5) ₹19.23; ₹30.855
Business Statistics | Assignments Page | 52
Assignment 8: Index Numbers
1. Calculate the index using Simple Aggregate and Weighted Aggregate Methods:
Commodity Price (₹)
1999 Price (₹)
2000 Quantity
1999
Rice 30 40 10
Wheat 20 30 5
Pulses 40 50 6
Oil 35 40 5
Milk 40 50 10
2. Calculate the price index numbers for the following data for 2007 and 2008 using simple average of price
relative method:
Rice Wheat Pulses Oil Milk
Prices – 2001 35 30 25 15 40
Prices – 2002 40 40 35 25 50
3. Calculate the Consumer Price Index or Cost of Living Index Number using Aggregative Index Number and
Family Budget Method:
A B C D E
Quantity – 2004 50 100 60 30 40
Prices – 2004 6 2 4 10 8
Prices – 2009 10 2 6 12 12
4. The group indices and the corresponding weights for the working class cost of living index numbers in an
industrial city for 2009 and 2010 are as follows:
Group Weight Group Index
2009 2010
Food 71 370 380
Clothing 3 423 504
Fuel 9 469 336
House Rent 7 110 116
Miscellaneous 10 279 283
Compute the cost of living index number for 2009 and 2010. If a worker was getting ₹3000 per month in
2009, should he be given any extra allowance in 2010 so that he can maintain his 2009 standard of living?
Justify your answer.
5. The following table gives the cost of living index numbers for different groups with their respective
weights for the year 1992 (base year 1982). Calculate the overall cost of living index numbers.
If Mr. Bose got ₹550 in 1982, determine how much he should receive in 1992 to maintain the same
standard of living as in 1982.
Business Statistics | Assignments Page | 53
Food Clothing Fuel &
Lighting Housing Miscellaneous
Cost of Living Index 525 325 240 180 200
Weight 40 16 15 20 9
6. The relative importance of the following 8 groups of family expenditure is tabulated below. If the corresponding
increase in prices (in %) for February, 1992 compared to January 1992, are 25, 1, 22, 18, 14, 13, 20 and 11, calculate
the CPI:
Food Rent Clothing Fuel Household Miscellaneous Services Drinks
348 88 97 65 71 35 70 217
7. Compute Fisher’s Ideal Index Number and show that it satisfies the TRT & FRT:
Commodity 2004 2008
Price (₹) Consumption (kg) Price (₹) Consumption (kg)
A 8 6 12 4
B 10 8 12 8
C 14 4 18 4
D 4 6 2 10
E 10 10 14 8
8. Compute Fisher’s Ideal Index Number and prove that it satisfies the TRT & FRT:
Year A B C D
Price 2000 2 4 1 5
2010 5 8 2 10
Value
2000 40 16 10 25
2010 75 40 24 60
9. A worker earns ₹750 per month. The cost of living index for January, 2009 is known to be 160. Using the data given
below, find the amounts spent by him on food house rent.
Food Clothing House Rent Fuel & Light Miscellaneous
Expenditure ? 125 ? 100 75
Group Index 190 181 140 118 101
Answers
(1) 127.27 & 127.57 (2) 135.86 (3) 139.71 (4) 353.2 & 351.58; No (5) 352; 1936.00 (6) 117.49
(7) 124.01 (8) 218.046 (9) 300, 150
Business Statistics | University Question Papers Page | 57
Business Statistics | University Question Papers Page | 58
Business Statistics | University Question Papers Page | 59
Business Statistics | University Question Papers Page | 60
Business Statistics | University Question Papers Page | 61
Business Statistics | University Question Papers Page | 62