statistics digital text book

23
Statistics Deepu Krishnan R

Upload: deepuplr

Post on 20-Jun-2015

134 views

Category:

Education


6 download

DESCRIPTION

digital text book

TRANSCRIPT

Page 1: Statistics digital text book

Statistics Deepu Krishnan R

Page 2: Statistics digital text book

PREFACEMathematics forms an integral part of everyday life. We have to teach it with freshness and variety to make it meaningfully applicable to life. Statistics helps you interpret data in your daily lives and make good decisions! For example, is it possible to eat too much grapefruit? Is it safer for your brain cells to use a headset when you talk on the phone? Can an online profile help you get a job? What steps can you take during college to increase your future salary? I cannot claim that all the materials I have written in this book are mine. I have learned the subject from many excellent books. This text books is designed to meet the everyday requirements of students at school and the general readers of mathematics.

Suggestions for improvement are welcome.

The Author

Contents

1Measures Of Central Tendency

Page 3: Statistics digital text book

1.1 Mean1.2 Median 1.3 Mode

2. Measures of Dispersion

2.1 Variance2.2 Standard deviation

3. Central Moments

3.1 Skewness

3.2 Kurtosis

Unit 1

Measures of Central Tendency

Introduction

Page 4: Statistics digital text book

Measures of central tendency are statistical measures which describe the position of a distribution. They are also called statistics of location, and are the complement of statistics of dispersion, which provide information concerning the variance or distribution of observations. In the univariate context, the mean, median and mode are the most commonly used measures of central tendency. Computable values on a distribution that discuss the behavior of the center of a distribution.

Measures of Central Tendency The value or the figure which represents the whole series is neither the lowest value in the series nor the highest it lies somewhere between these two extremes.The average represents all the measurements made on a group, and gives a concise description of the group as a whole.When two are more groups are measured, the central tendency provides the basis of comparison between them.

1.1Mean

Page 5: Statistics digital text book

In mathematics, mean has several different definitions depending on the context. In probability and statistics, mean and expected value are used synonymously to refer to one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution.[1] In the case of a discrete probability distribution of a random variable X, the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value x of X and its probability P(x), and then adding all these products together, giving. An analogous formula applies to the case of a continuous probability distribution. Not every probability distribution has a defined mean; see the Cauchy distribution for an example. Moreover, for some distributions the mean is infinite: for example, when the probability of the value   is   for n = 1, 2, 3,…

For a data set, the terms arithmetic mean, mathematical expectation, and sometimes average are used synonymously to refer to a central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values. The arithmetic mean of a set of numbers x1, x2, ..., xn is typically denoted by  , pronounced "x bar". If the data set were based on a series of observations obtained

Page 6: Statistics digital text book

by sampling from a statistical population, the arithmetic mean is termed the sample mean (denoted ) to distinguish it from the population mean (denoted   or ).

Types of mean

In mathematics, the three classical Pythagorean means are the arithmetic mean (A), the geometric mean (G), and the harmonic mean (H). They are defined by:

Arithmetic mean

The most common type of average is the arithmetic mean. If n numbers are given, each number denoted by ai, where i = 1, n, the arithmetic mean is the [sum] of the ai' s divided by n or

The arithmetic mean, often simply called the mean, of two numbers, such as 2 and 8, is obtained by finding a value A such that 2 + 8 = A + A. One may find that A = (2 + 8)/2 = 5. Switching the order of 2 and 8 to read 8 and 2 does not change the resulting value obtained for A. The mean 5 is not less than the minimum 2 or greater than the maximum 8. If we increase the number of terms in the list to 2, 8, and 11, the arithmetic mean is found by solving for the value

Page 7: Statistics digital text book

of A in the equation 2 + 8 + 11 = A + A + A. One finds that A= (2 + 8 + 11)/3 = 7.

Arithmetic Mean Calculated Methods:

• Direct Method :

• Short cut method :

• Step deviation Method :

Geometric mean

The geometric mean of n non-negative numbers is obtained by multiplying them all together and then taking the nth root. In algebraic terms, the geometric mean of a1, a2… an is defined as

Page 8: Statistics digital text book

Geometric mean can be thought of as the antilog of the arithmetic mean of the logs of the numbers.

Example: Geometric mean of 2 and 8 is 

Harmonic mean

Harmonic mean for a non-empty collection of numbers a1, a2,…, an, all different from 0, is defined as the reciprocal of the arithmetic mean of the reciprocals of the ai’s:

One example where the harmonic mean is useful is when examining the speed for a number of fixed-distance trips. For example, if the speed for going from point A to B was 60 km/h, and the speed for returning from B to A was 40 km/h, then the harmonic mean speed is given by

Inequality concerning AM, GM, and HM

A well known inequality concerning arithmetic, geometric, and harmonic means for any set of positive numbers is

It is easy to remember noting that the alphabetical order of the letters A, G,

Page 9: Statistics digital text book

and H is preserved in the inequality. See Inequality of arithmetic and geometric means.

Thus for the above harmonic mean example: AM = 50, GM = 49, and HM = 48 km/h.

Problems

1.Calculated the AM,GM,HM of the following.

x 15 12 15 23 14 17 18 19 20 16

f 4 3 2 3 5 4 1 2 7 8

Median

Median is a central value of the distribution, or the value which divides the distribution in equal parts, each part containing equal number of items. Thus it is the central value of the variable, when the values are arranged in order of magnitude.

Connor has defined as “The median is that value of the variable which divides the group into two equal parts, one part comprising of all values greater, and the other, and all values less than median”

Page 10: Statistics digital text book

Calculation of Median –Discrete series:

Arrange the data in ascending or descending order.

Calculate the cumulative frequencies. Apply the formula.

Calculation of median – Continuous series

For calculation of median in a continuous frequency distribution the following formula will be employed. Algebraically,

Advantages of Median:

• Median can be calculated in all distributions.

• Median can be understood even by common people.

• Median can be ascertained even with the extreme items.

• It can be located graphically

• It is most useful dealing with qualitative data

Disadvantages of Median:

• It is not based on all the values.

Page 11: Statistics digital text book

• It is not capable of further mathematical treatment.

• It is affected fluctuation of sampling.

• In case of even no. of values it may not the value from the data.

ModeMode is the most frequent value or score in the distribution. It is defined as that value of the item in a series. Thus to find the median, order the list according to its elements' magnitude and then repeatedly remove the pair consisting of the highest and lowest values until either one or two values are left. If exactly one value is left, it is the median; if two values, the median is the arithmetic mean of these two. This method takes the list 1, 7, 3, 13 and orders it to read 1, 3, 7, and 13. Then the 1 and 13 are removed to obtain the list 3, 7. Since there are two elements in this remaining list, the median is their arithmetic mean, (3 + 7)/2 = 5.

Advantages of Mode:

• Mode is readily comprehensible and easily calculated

• It is the best representative of data

• It is not at all affected by extreme value.

• The value of mode can also be determined graphically.

Page 12: Statistics digital text book

• It is usually an actual value of an important part of the series.

Disadvantages of mode;

• It is not based on all observations.

• It is not capable of further mathematical manipulation.

• Mode is affected to a great extent by sampling fluctuations.

Choice of grouping has great influence on the value of mode

Unit 2

Measures of DispersionIntroduction

Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other

The more similar the scores are to each other, the lower the measure of dispersion will be

The less similar the scores are to each other, the higher the measure of dispersion will be

Page 13: Statistics digital text book

In general, the more spread out a distribution is, the larger the measure of dispersion will be

There are three main measures of dispersion:

1. The range

2. The semi- interquartile range (SIR)

3. Variance / standard deviation

Variance This measure the average of the squared deviations from the mean (as opposed the average of the absolute deviations) is called the variance.

The variance is the usual measure of dispersion in statistical theory, but it has a drawback when researchers want to describe the dispersion in data in a practical way.

To calculate variance;

Find the mean of the data.

Hint – mean is the average so add up the values and divide by the number of items

Subtract the mean from each value – the result is called the deviation from the mean.

Square each deviation of the mean

Divide the total by the number of items.

Page 14: Statistics digital text book

The variance formula includes the Sigma Notation, which represents the sum of all the items to the right of Sigma.

Standard deviation Standard Deviation shows the variation in data. If the data is close together, the standard deviation will be small. If the data is spread out, the standard deviation will be large

Standard Deviation is often denoted by the lowercase Greek letter sigma, .

The standard deviation formula can be represented using Sigma Notation:

Find the variance and standard deviation. The math test scores of five students are: 92,88,80,68 and 52

Unit 3

Central moments Introduction

Central Moments- The average of all the deviations of all observations in a dataset from the mean of the observations raised to the power r

Page 15: Statistics digital text book

In the previous equation, n is the number of observations, X is the value of each individual observation, m is the arithmetic mean of the observations, and r is a positive integer.

There are 4 central moments:

The first central moment, r=1, is the sum of the difference of each observation from the sample Average (arithmetic mean), which always equals 0

The second central moment, r=2, is variance. The third central moment, r=3, is skewness.

Skewness Skweness describes how the sample differs in shape from a symmetrical distribution. If a normal distribution has a skewness of 0, right skewed is greater than 0 and left skewed is less than 0.Negatively skewed distributions, skewed to the left, occur when most of the scores are toward the high end of the distribution.In a normal distribution where skewness is 0, the mean, median and mode are equal. In a negatively skewed distribution, the mode > median > mean.

Positively skewed distributions occur when most of the scores are toward the low end of the distribution.

Page 16: Statistics digital text book

In a positively skewed distribution, mode< median< mean

Page 17: Statistics digital text book

KurtosisKurtosis is the 4th central moment.

This is the “peakedness” of a distribution.

It measures the extent to which the data are distributed in the tails versus the center of the distribution

There are three types of peakedness.

Leptokurtic- very peaked

Platykurtic – relatively flat

Mesokurtic – in between

Mesokurtic has a kurtosis of 0

Leptokurtic has a kurtosis that is +

Platykurtic has a kurtosis that is -

Page 18: Statistics digital text book

Reference

Web resources https://www.google.co.in/?

gfe_rd=cr&ei=gJweVIPQK6vM8geL6YDwCg&gws_rd=ssl#q=variance+standard+deviation+ppt

Page 19: Statistics digital text book