chapter 022

34
1 Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc. Chapter 22 Using Statistics To Describe Variables

Upload: stanbridge

Post on 11-Aug-2015

102 views

Category:

Documents


0 download

TRANSCRIPT

1Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Chapter 22

Using Statistics To Describe Variables

2Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Using Statistics to Describe Variables

Two major classes of statistics Descriptive statistics

• To reveal characteristics of the sample dataset

Inferential statistics• To gain information about effects in the population being

studied

3Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Using Statistics to Describe

All quantitative research uses descriptive statistics For description of the sample For initial description of variables

For analysis of the primary research problem Descriptive statistics for descriptive research Inferential statistics for interventional and

correlational research

4Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Using Statistics to Summarize Data

Terms: the number of elements in a sample is the “n” of the sample Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43,

47, 67, 39, 52, 48, 36, 42, 61, 57 n = 20

Descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

5Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Frequency Distributions

Table or figure (line graph, pie chart, etc.) Continuous variable: the higher numbers

represent more of that variable, and the lower numbers represent less of that variable

6Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Frequency Table

Listing every possible value in the first column of numbers, and the frequency (tally) of each value as the second column of numbers

Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43, 47, 67, 39, 52, 48, 36, 42, 61, 57 (ages) Sort from lowest to highest values Tally each value

7Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Ungrouped Frequency Distribution

List all categories of the variable on which they have data, and tally each datum on the listing

Age Frequency

26 2

28 1

31 1

32 1

36 1

39 1

42 2

43 1

45 1

47 1

48 1

51 1

52 1

55 1

57 1

59 1

61 1

67 1

8Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Grouped Frequency Distribution

Categories are grouped into ranges Ranges must be mutually exhaustive and

mutually exclusive

Age Frequency

20 - 29 3

30 - 39 4

40 - 49 6

50 - 59 5

60 - 69 2

9Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Grouped Frequency Distribution with Percentages

Adult Age Range

Frequency (f)

Percentage (%) Cumulative Percentage

20 – 29 3 15 15

30 – 39 4 20 35

40 – 49 6 30 65

50 – 59 5 25 90

60 – 69 2 10 100

Total 20 100

10Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Frequency Distributions Presented in Figures

Graphs Charts Histograms Frequency polygons

11Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Line Graph

12Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Frequency Table of Smoking Status

 Smoking Status Frequency Percent

Current smoker 1 10

Former Smoker 6 60

Never Smoked 3 30

Total 10 100

13Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Histogram of Smoking Status

14Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Measures of Central Tendency

Statistics that provides the center or hallmark value of a data set Mode Median (MD) Mean

15Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Mode

The most common value in a data set Bimodal: two modes exist Multimodal: more than two modes

16Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Median (MD)

The middle value in the data set (after sorting values from lowest to highest)

If the “n” is even, the two values in the middle are averaged

The 50th percentile

17Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Mean

Arithmetic average of all a variable's values Most commonly reported measure of central

tendency Sum of the scores divided by the number of

values in the data set Formula:

18Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

When to Use Mean

Mean: normally distributed values measured at the interval or ratio level

Ordinal level data from a rating scale If The n is large The data are normally distributed Small values denote very little of the measured

quantity; large ones denote a lot Mean is sensitive to extreme scores such as

outliers

19Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

When to Use Median and Mode

Median: used for non-normal distributions with small n

Mode: used for nominal values

20Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Using Statistics to Explore Deviations in the Data

Using measures of central tendency to describe the nature of a data set obscures the impact of extreme values or deviations in the data

Measures of dispersion, provide important insight into nature of the data

21Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Measures of Dispersion

Quantifications of how tightly clustered around the mean the sample is: Tightly clustered = fairly homogeneous Widely dispersed = heterogeneous

Range Difference score Variance Standard deviation

22Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Range

Presented in two ways: The lowest score and the highest score (2 through 17) The difference between the highest and the lowest

score (range of 15)

23Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Difference Score

Subtract the mean from each score Sometimes referred to as a deviation score The difference score is positive when score is

above the mean, and negative when score is below the mean

The total of all the difference scores is zero Formula:

24Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Mean Deviation

Average difference score, using the absolute values

Example:

25Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Variance (s²)

Variance commonly used “s2” is used to represent a sample variance “2” is used to represent population variance Always a positive value, has no upper limit Bigger variances = more spread Formula:

26Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Standard Deviation (s)

Square root of the variance Sometimes reported as SD Most commonly reported measure of

dispersion Formula:

27Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

The Normal Curve

28Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

The Normal Curve (Cont’d)

Represents the frequency distribution of a variable that is perfectly normally distributed

Signifies: The mean is the most commonly occurring value There are just as many values above the mean as there are

below the mean When frequency table is constructed, values are perfectly

symmetric 68% of values are –1 to +1 standard deviations from mean 95% of values are –2 to +2 standard deviations from mean

29Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

z-Score

30Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

z-Score (Cont’d)

Synonymous with a standard deviation unit A z value of 1.0 represents 1 standard deviation

unit above the mean A z value of –1.0 represents 1 standard deviation

unit below the mean Formula:

31Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Sampling Error

Described by the statistic “standard error” Standard error of the mean is calculated to determine

the magnitude of the variability associated with the mean Formula:

where = standard error of the mean s = standard deviation n = sample size

32Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Confidence Interval

Determines how closely a sample value approximates a population value

Can be created for many statistics, such as a mean, proportion, and odds ratio

Using a table of statistical values, the t-value is accessed, for the desired interval, usually 95%

33Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Confidence Interval (Cont’d)

To calculate a 95% confidence interval around a mean, for example: Calculate the mean Calculate the standard error of the mean Calculate the degrees of freedom (df) [df = n – 1] Look up the two-tailed t-value for p < 0.05

34Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.

Degrees of Freedom

The number of independent pieces of information that are free to vary

For confidence interval, the degrees of freedom (df) are n – 1 This means that there are n – 1 independent

observations in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the confidence interval