chapter 022
TRANSCRIPT
1Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Chapter 22
Using Statistics To Describe Variables
2Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Using Statistics to Describe Variables
Two major classes of statistics Descriptive statistics
• To reveal characteristics of the sample dataset
Inferential statistics• To gain information about effects in the population being
studied
3Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Using Statistics to Describe
All quantitative research uses descriptive statistics For description of the sample For initial description of variables
For analysis of the primary research problem Descriptive statistics for descriptive research Inferential statistics for interventional and
correlational research
4Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Using Statistics to Summarize Data
Terms: the number of elements in a sample is the “n” of the sample Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43,
47, 67, 39, 52, 48, 36, 42, 61, 57 n = 20
Descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
5Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Frequency Distributions
Table or figure (line graph, pie chart, etc.) Continuous variable: the higher numbers
represent more of that variable, and the lower numbers represent less of that variable
6Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Frequency Table
Listing every possible value in the first column of numbers, and the frequency (tally) of each value as the second column of numbers
Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43, 47, 67, 39, 52, 48, 36, 42, 61, 57 (ages) Sort from lowest to highest values Tally each value
7Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Ungrouped Frequency Distribution
List all categories of the variable on which they have data, and tally each datum on the listing
Age Frequency
26 2
28 1
31 1
32 1
36 1
39 1
42 2
43 1
45 1
47 1
48 1
51 1
52 1
55 1
57 1
59 1
61 1
67 1
8Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Grouped Frequency Distribution
Categories are grouped into ranges Ranges must be mutually exhaustive and
mutually exclusive
Age Frequency
20 - 29 3
30 - 39 4
40 - 49 6
50 - 59 5
60 - 69 2
9Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Grouped Frequency Distribution with Percentages
Adult Age Range
Frequency (f)
Percentage (%) Cumulative Percentage
20 – 29 3 15 15
30 – 39 4 20 35
40 – 49 6 30 65
50 – 59 5 25 90
60 – 69 2 10 100
Total 20 100
10Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Frequency Distributions Presented in Figures
Graphs Charts Histograms Frequency polygons
12Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Frequency Table of Smoking Status
Smoking Status Frequency Percent
Current smoker 1 10
Former Smoker 6 60
Never Smoked 3 30
Total 10 100
13Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Histogram of Smoking Status
14Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Measures of Central Tendency
Statistics that provides the center or hallmark value of a data set Mode Median (MD) Mean
15Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Mode
The most common value in a data set Bimodal: two modes exist Multimodal: more than two modes
16Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Median (MD)
The middle value in the data set (after sorting values from lowest to highest)
If the “n” is even, the two values in the middle are averaged
The 50th percentile
17Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Mean
Arithmetic average of all a variable's values Most commonly reported measure of central
tendency Sum of the scores divided by the number of
values in the data set Formula:
18Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
When to Use Mean
Mean: normally distributed values measured at the interval or ratio level
Ordinal level data from a rating scale If The n is large The data are normally distributed Small values denote very little of the measured
quantity; large ones denote a lot Mean is sensitive to extreme scores such as
outliers
19Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
When to Use Median and Mode
Median: used for non-normal distributions with small n
Mode: used for nominal values
20Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Using Statistics to Explore Deviations in the Data
Using measures of central tendency to describe the nature of a data set obscures the impact of extreme values or deviations in the data
Measures of dispersion, provide important insight into nature of the data
21Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Measures of Dispersion
Quantifications of how tightly clustered around the mean the sample is: Tightly clustered = fairly homogeneous Widely dispersed = heterogeneous
Range Difference score Variance Standard deviation
22Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Range
Presented in two ways: The lowest score and the highest score (2 through 17) The difference between the highest and the lowest
score (range of 15)
23Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Difference Score
Subtract the mean from each score Sometimes referred to as a deviation score The difference score is positive when score is
above the mean, and negative when score is below the mean
The total of all the difference scores is zero Formula:
24Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Mean Deviation
Average difference score, using the absolute values
Example:
25Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Variance (s²)
Variance commonly used “s2” is used to represent a sample variance “2” is used to represent population variance Always a positive value, has no upper limit Bigger variances = more spread Formula:
26Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Standard Deviation (s)
Square root of the variance Sometimes reported as SD Most commonly reported measure of
dispersion Formula:
27Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
The Normal Curve
28Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
The Normal Curve (Cont’d)
Represents the frequency distribution of a variable that is perfectly normally distributed
Signifies: The mean is the most commonly occurring value There are just as many values above the mean as there are
below the mean When frequency table is constructed, values are perfectly
symmetric 68% of values are –1 to +1 standard deviations from mean 95% of values are –2 to +2 standard deviations from mean
30Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
z-Score (Cont’d)
Synonymous with a standard deviation unit A z value of 1.0 represents 1 standard deviation
unit above the mean A z value of –1.0 represents 1 standard deviation
unit below the mean Formula:
31Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Sampling Error
Described by the statistic “standard error” Standard error of the mean is calculated to determine
the magnitude of the variability associated with the mean Formula:
where = standard error of the mean s = standard deviation n = sample size
32Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Confidence Interval
Determines how closely a sample value approximates a population value
Can be created for many statistics, such as a mean, proportion, and odds ratio
Using a table of statistical values, the t-value is accessed, for the desired interval, usually 95%
33Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Confidence Interval (Cont’d)
To calculate a 95% confidence interval around a mean, for example: Calculate the mean Calculate the standard error of the mean Calculate the degrees of freedom (df) [df = n – 1] Look up the two-tailed t-value for p < 0.05
34Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Degrees of Freedom
The number of independent pieces of information that are free to vary
For confidence interval, the degrees of freedom (df) are n – 1 This means that there are n – 1 independent
observations in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the confidence interval