descriptive

17
Chapter 3

Upload: swapnil-sinha

Post on 11-Jan-2016

213 views

Category:

Documents


0 download

DESCRIPTION

descriptive distribution

TRANSCRIPT

Page 1: Descriptive

Chapter 3

Page 2: Descriptive

Summary Statistics

• To describe characteristics of the data set, we can use single numbers called summary statistics.

• More exact in nature & provides more meaningful information.

• Summary Statistics constitute:• Measure of Central Tendency• Measure of Dispersion• Skewness • Kurtosis

Summary statistics

Page 3: Descriptive

Central Tendency

• Central tendency is the middle point of a distribution• Measures of central tendency are also called Measures

of Location• To describe the “bunching up” of the data

Central Tendency

Page 4: Descriptive

Dispersion

• Dispersion is spread of the data in a distribution, that is, the extent to which the observations are scattered.

• It shows the variability present in the data set.• Dispersion is contrasted with location or central

tendency, and together they are the most used properties of distributions.

Dispersion

Page 5: Descriptive

Skewness

• Curves representing data points in the data set may be either symmetrical or skewed.

• They are skewed because values in their frequency distributions are concentrated at either the low end or high end of scale.

• If skewness is positive, the data are positively skewed or skewed right, meaning that the right tail of the distribution is longer than the left.

• If skewness is negative, the data are negatively skewed or skewed left, meaning that the left tail is longer.

• If skewness = 0, the data are perfectly symmetrical.

Skewness

Page 6: Descriptive

Kurtosis

• The height and sharpness of the peak relative to the rest of the data are measured by a number called kurtosis.

• Higher values indicate a higher, sharper peak; lower values indicate a lower, less distinct peak.

• This occurs because higher kurtosis means more of the variability is due to a few extreme differences from the mean, rather than a lot of modest differences from the mean.

Kurtosis

Page 7: Descriptive

Kurtosis

• Distributions with zero excess kurtosis are called mesokurtic. e.g normal distribution

• A distribution with positive excess kurtosis is called leptokurtic. e.g. Cauchy distribution, Student's t-distribution, Poisson distribution and the logistic distribution.

• A distribution with negative excess kurtosis is called platykurtic, e.g. continuous or discrete uniform distributions

Kurtosis

Page 8: Descriptive

Measure of Central Tendency: Mean

• For a data set, the mean is the sum of the values divided by the number of values.

• Mean is a unique measure of central tendency, because every data set has one and only one mean. The most useful measure, generally refer to as “average”.

• It is highly influenced by extreme values.• Mean can not be calculated for open class interval.• Can be calculated only for quantitative measurement.

Mean

Page 9: Descriptive

Measure of Central Tendency: Weighted Mean

• The weighted mean enables us to calculate an average that takes into account the importance of each value to the overall total.

• Weighted mean is calculated when there are several observation have same value but different frequencies.

Weighted Mean

Page 10: Descriptive

Measure of Central Tendency: Geometric Mean

• When we are dealing with quantities that change over a period time, we are interested to know an average rate of change.

• In such cases geometric mean is preferred over arithmetic mean.

• Geometric mean is used to show multiplicative effects over time in compound interest and inflation calculations.

Geometric Mean

Page 11: Descriptive

Measure of Central Tendency: Median

• The median is a single value from the data set that measures the central item in the data. It is the middlemost (most central) value.

• Median is not influenced by extreme values.• Easy to understand and can be calculated for any kind

of data even for grouped data with open ended classes.

• Useful for the situation, when data are qualitative descriptions.

• We must array the data before we can calculate median.

Median

Page 12: Descriptive

Measure of Central Tendency: Mode• Mode is the value that is repeated most often in the data

set.• Mode is different from mean but somewhat similar to

median because it is not actually calculated by process of arithmetic.

• Mode can be used as a measure of location for quantitative as well as qualitative data.

• Mode is not affected by extreme values and can be used for open ended data also.

• If data set have two or more modes it is difficult to intercept

• For continuous data sometimes there is no mode

Mode

Page 13: Descriptive

Measure of Dispersion: Range

• The range is the difference between the highest and lowest observed values.

• The range is easy to understand and to find, but its usefulness as a measure of dispersion is limited.

• In open ended distributions range can not be calculated.• Range depends on only two observation of the dataset

and fails to take account of all other observations in the data set.

• It is heavily influenced by extreme values but ignores the nature of variation among all the other observations.

Range

Page 14: Descriptive

Measure of Dispersion: Variance

• Variance and Standard Deviation are average deviation measures.

• They are based on average deviation/distance from the mean of distribution.

• The variance is the mean (average) of the squared deviation between the mean and each item in the population.

• It take into account all possible values and provides more weight to the large deviations.

Variance

Page 15: Descriptive

Measure of Dispersion: Standard Deviation

• Standard deviation is square root of the variance.• It is the most useful and popular measure of

dispersion.• It is used to compare distributions and to compute

standard scores.• It can not be computed from open ended classes.• Extreme values in distribution affect the value of

standard deviation.

Standard Deviation

Page 16: Descriptive

Measure of Relative Dispersion: Coefficient of Variation

• Standard deviation is an absolute measure of dispersion that express variation in the same units as in the original data.

• The standard deviation can not be sole basis for comparing two distributions.

• The coefficient of variation relates standard deviation and the mean by expressing standard deviation as percentage of mean.

• It is useful in comparing the variability/consistency present in two or more distribution/data set.

Coefficient of Variation

Page 17: Descriptive

Exploratory data analysis

• Exploratory Data Analysis (EDA) uses some simple techniques and diagrams to summarize and describe the data.

• Stem and Leaf is one of the most useful techniques of EDA.

• Stem and Leaf displays gives the rank order of the items in the data set and the shape of the distribution.

• Stem and Leaf is a histogram like display but also display all the original values along with the frequencies.

Exploratory data analysis