july, 2000guang jin statistics in applied science and technology chapter 4 summarizing data

18
July, 2000 Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

Upload: elmer-fields

Post on 24-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Statistics in Applied Science and Technology

Chapter 4 Summarizing Data

Page 2: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Key Concepts in This Chapter

Mean Median Mode Range Standard Deviation Variance Coefficient of Variation

Page 3: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Measures of Central Tendency

Central tendency - the tendency of a set of data to center around certain values.

The three most common values are the mean, the median, and the mode.

Page 4: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

The Mean The arithmetic mean (or simply, mean) is

computed by summing all the observations in the sample and dividing the sum by the number of observations.

Symbolically, the mean x

n

xx

n

ii

1

x1 is the first and xi is the ith in a series of observations.n is the total number of observations

Page 5: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

The Mean (Continued)

The arithmetic mean may be considered the balance point, or fulcrum, in a distribution.

The arithmetic mean is the point that balances the positive and negative deviations from the fulcrum.

The mean is affected by values of each observations of the distribution and may be distorted when extreme values exist.

Page 6: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

The Median

Median is defined as the middle value when observations are ordered.

Median is the value above which there are the same number of observations as below.

For an even number of observations, the median is the average of the two middlemost values.

Page 7: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

The Mode

The mode is the observation that occurs most frequently.

Mode can be read from a graph as that value on the horizontal axis that corresponds to the peak of the distribution.

Page 8: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Which Average Should You Use for Quantitative Data? When a distribution of observation is normal or

not too skewed, the values of the mode, the median and the mean are same or similar, and any of them can be used to describe central tendency.

When a distribution is skewed, appreciable difference between the values of mean and median, therefore both the mean and median should be reported.

Page 9: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

The mode always can be used with qualitative data

Median can be used whenever the qualitative data is ordinal

Mean is not appropriate for qualitative data

Measures of central tendency for Qualitative Data

Page 10: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Measures of Variation

Measure of variation (or variability) is important to know whether observations tend to be quite similar (homogeneous) or whether they vary considerably (heterogeneous).

Three most common measures of variation include the range, the standard deviation, and the variance.

Page 11: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Range

The range is defined as the difference in value between the highest (maximum) and lowest (minimum) observation:

Range = X max - X min

Page 12: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Standard Deviation and Variance

By far the most widely used measure of variation is the standard deviation, represented by symbol s.

Standard deviation is the square root of the variance (represented by symbol s2) of the observation.

The larger the standard deviation and variance, the more heterogeneous the distribution.

Page 13: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Variance

The variance (s2) is computed by squaring each deviation from the mean, adding them up, and dividing their sum by one less than n, the sample size:

1

)(1

2

2

n

xxs

n

ii

Page 14: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Standard Deviation The standard deviation (s, sometimes represented by SD) is computed by extracting the square root of the variance:

The units of the standard deviation is the same as the unites of raw data.

2ss

Page 15: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Important Generalizations

For most frequency distributions, a majority (often as many as 68%) of all observations are within one standard deviation on either side of the mean.

For most frequency distributions, a small minority (often as many as 5%) of all observations deviate more than two standard deviations on either side of the mean.

Page 16: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Variability for Qualitative Data

For qualitative data can not be ordered, measures of variability are nonexistent.

For qualitative data can be ordered, it is appropriate to describe variability by identifying extreme observations.

Page 17: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Coefficient of Variation

Coefficient of variation (represented by CV) is defined as the ratio of the standard deviation to the absolute value of the mean, expressed as a percentage:

CV depicts the size of the standard deviation relative to its mean and can be used to compare the relative variation of even unrelated quantities.

Page 18: July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data

July, 2000 Guang Jin

Equations for Population and Sample Means and Standard Deviation

Quantity Sample Population

Mean

Variance

Standarddeviation

2ss 2

1

)(1

2

2

n

xxs

n

ii

N

xN

ii

1

2

2

)(

n

xx

n

ii

1

N

xN

ii

1