chapter 022

1Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Chapter 22

Using Statistics To Describe Variables


Using Statistics to Describe Variables

Two major classes of statistics Descriptive statistics

• To reveal characteristics of the sample dataset

Inferential statistics• To gain information about effects in the population being

studied


Using Statistics to Describe

All quantitative research uses descriptive statistics For description of the sample For initial description of variables

For analysis of the primary research problem Descriptive statistics for descriptive research Inferential statistics for interventional and

correlational research


Using Statistics to Summarize Data

Terms: the number of elements in a sample is the “n” of the sample Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43,

47, 67, 39, 52, 48, 36, 42, 61, 57 n = 20

Descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion


Frequency Distributions

Table or figure (line graph, pie chart, etc.) Continuous variable: the higher numbers

represent more of that variable, and the lower numbers represent less of that variable


Frequency Table

Listing every possible value in the first column of numbers, and the frequency (tally) of each value as the second column of numbers

Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43, 47, 67, 39, 52, 48, 36, 42, 61, 57 (ages) Sort from lowest to highest values Tally each value


Ungrouped Frequency Distribution

List all categories of the variable on which they have data, and tally each datum on the listing

Age Frequency

26 2

28 1

31 1

32 1

36 1

39 1

42 2

43 1

45 1

47 1

48 1

51 1

52 1

55 1

57 1

59 1

61 1

67 1


Grouped Frequency Distribution

Categories are grouped into ranges Ranges must be mutually exhaustive and

mutually exclusive

Age Frequency

20 - 29 3

30 - 39 4

40 - 49 6

50 - 59 5

60 - 69 2


Grouped Frequency Distribution with Percentages

Adult Age Range

Frequency (f)

Percentage (%) Cumulative Percentage

20 – 29 3 15 15

30 – 39 4 20 35

40 – 49 6 30 65

50 – 59 5 25 90

60 – 69 2 10 100

Total 20 100


Frequency Distributions Presented in Figures

Graphs Charts Histograms Frequency polygons


Line Graph


Frequency Table of Smoking Status

Smoking Status Frequency Percent

Current smoker 1 10

Former Smoker 6 60

Never Smoked 3 30

Total 10 100


Histogram of Smoking Status


Measures of Central Tendency

Statistics that provides the center or hallmark value of a data set Mode Median (MD) Mean


Mode

The most common value in a data set Bimodal: two modes exist Multimodal: more than two modes


Median (MD)

The middle value in the data set (after sorting values from lowest to highest)

If the “n” is even, the two values in the middle are averaged

The 50th percentile


Mean

Arithmetic average of all a variable's values Most commonly reported measure of central

tendency Sum of the scores divided by the number of

values in the data set Formula:


When to Use Mean

Mean: normally distributed values measured at the interval or ratio level

Ordinal level data from a rating scale If The n is large The data are normally distributed Small values denote very little of the measured

quantity; large ones denote a lot Mean is sensitive to extreme scores such as

outliers


When to Use Median and Mode

Median: used for non-normal distributions with small n

Mode: used for nominal values


Using Statistics to Explore Deviations in the Data

Using measures of central tendency to describe the nature of a data set obscures the impact of extreme values or deviations in the data

Measures of dispersion, provide important insight into nature of the data


Measures of Dispersion

Quantifications of how tightly clustered around the mean the sample is: Tightly clustered = fairly homogeneous Widely dispersed = heterogeneous

Range Difference score Variance Standard deviation


Range

Presented in two ways: The lowest score and the highest score (2 through 17) The difference between the highest and the lowest

score (range of 15)


Difference Score

Subtract the mean from each score Sometimes referred to as a deviation score The difference score is positive when score is

above the mean, and negative when score is below the mean

The total of all the difference scores is zero Formula:


Mean Deviation

Average difference score, using the absolute values

Example:


Variance (s²)

Variance commonly used “s2” is used to represent a sample variance “2” is used to represent population variance Always a positive value, has no upper limit Bigger variances = more spread Formula:


Standard Deviation (s)

Square root of the variance Sometimes reported as SD Most commonly reported measure of

dispersion Formula:


The Normal Curve


The Normal Curve (Cont’d)

Represents the frequency distribution of a variable that is perfectly normally distributed

Signifies: The mean is the most commonly occurring value There are just as many values above the mean as there are

below the mean When frequency table is constructed, values are perfectly

symmetric 68% of values are –1 to +1 standard deviations from mean 95% of values are –2 to +2 standard deviations from mean


z-Score


z-Score (Cont’d)

Synonymous with a standard deviation unit A z value of 1.0 represents 1 standard deviation

unit above the mean A z value of –1.0 represents 1 standard deviation

unit below the mean Formula:


Sampling Error

Described by the statistic “standard error” Standard error of the mean is calculated to determine

the magnitude of the variability associated with the mean Formula:

where = standard error of the mean s = standard deviation n = sample size


Confidence Interval

Determines how closely a sample value approximates a population value

Can be created for many statistics, such as a mean, proportion, and odds ratio

Using a table of statistical values, the t-value is accessed, for the desired interval, usually 95%


Confidence Interval (Cont’d)

To calculate a 95% confidence interval around a mean, for example: Calculate the mean Calculate the standard error of the mean Calculate the degrees of freedom (df) [df = n – 1] Look up the two-tailed t-value for p < 0.05


Degrees of Freedom

The number of independent pieces of information that are free to vary

For confidence interval, the degrees of freedom (df) are n – 1 This means that there are n – 1 independent

observations in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the confidence interval

chapter 022

Documents

imprint of elsevier

frequency tally

frequency distributions

listing age frequency

exclusive age frequency

sample data set

column of numbers data

correlational research