why do we need statistics? - johns hopkins university...(metrics) • temperature, bp, ht & wt...

Sharon L. Kozachik, PhD, RN, FAAN 1

Biostatistics for Evidence-Based Practice

Module 1

Sharon L. Kozachik, PhD, RN, FAAN

Johns Hopkins University

School of Nursing

Why do we need statistics?• Evidence based practice depends on

solid statistical evidence• “Evidence based practice is a problem-

solving approach to clinical decision making within a healthcare organization that integrates the bestavailable scientific evidence with the best available experiential (patient and practitioner) evidence.” (pg. 3)

• Considers internal and external influences on nursing practice

Newhouse, R. P., Dearholt, S. L., Poe, S. S., Pugh, L. C., & White, K. M. (2007). Johns Hopkins nursing evidence based practice: Model and guidelines. Indianapolis: Sigma Theta Tau.

Evidence based practice• Develop an answerable clinical

question

What is the best practice for managing pain in cancer patients?

• Search for relevant research-based evidence

• Appraise and synthesize the evidence• Integrate the evidence with other

factors• Assess effectiveness of the change


Types of studies that guide EBP

• Descriptive studies– What symptoms emerge during cancer treatment?– Use descriptive statistics

• Explanatory studies– Among persons with lung cancer, are women more

likely to report pain than men?– Use inferential statistics

• Prediction and control studies (RCT)– Will mindfulness meditation reduce pain to a

greater degree than distraction?– Use inferential statistics

What is nursing research?

• A systematic inquiry– Disciplined methods– Answers questions/solves problems of

importance to nurses & nursing profession

• Two basic categories– Basic research: knowledge production– Applied research: knowledge

implementation (problem-solving)

Nursing research develops knowledge to:

• Build the scientific foundation for clinical practice

• Prevent disease and disability• Manage and eliminate symptoms

caused by illness• Enhance end-of-life and palliative care

http://www.ninr.nih.gov/


Two paradigms guide nursing research

1. Positivist: reality exists and there is only 1 truth; the real world is driven by natural causes that can be quantified and analyzed– Answers research questions– Tests hypotheses

1. Naturalist: reality is multiple, subjective, and constructed by individuals within their context

This course focuses on the positivist paradigm

What is a Hypothesis?• A prediction that specifies the expected

relationship between variables• 3 types:1. Null – used in statistics

– There is no association between sleep and pain

2. Non-Directional– There is an association between sleep and

pain3. Directional

– Persons with poor sleep will have greater subsequent day pain

What is a Variable?• A characteristic that varies

– From person to person– Within a person over time– Examples: Hair color, Blood type, BP, Ht,

Wt

• What do we call a characteristic that does not vary?

• In research, there are two categories of variables


What is Measurement?• The assignment of numbers to represent

the amount of an attribute present in an object or person, using specific rules (metrics)

• Temperature, BP, ht & wt have rules for measuring

• Advantages:– Removes guesswork– Provides precise information– Less vague than words

Scales for Measurement

• Provides the unit of measurement– Level of measurement

• Provides the range and type of possible values– Infinite– Finite, as few as two

• What if there is only one value? – Measurement unit can be continuous– Measurement unit can be discrete

Levels of measurement

• Researchers strive to use highest level of measurement possible, especially for the dependent variable (DV)– More information about DV– Can use more powerful statistical tests

• Determines what type of data analysis you are able to perform

• There are four levels of measurement in statistics


Nominal• Also called categorical• Lowest level of measurement• Exclusive & exhaustive• Uses numbers to categorize attributes

– Examples: Sex, Race, Blood Type, Religion• Discrete variable• Each category is assigned a number

for the purpose of analyses, number does not have quantitative importance– Sex: Male = 1; Female = 2

Ordinal

• Exclusive, exhaustive & rank ordered• Ranks object based on its relative

standing on an attribute• Discrete variable • Does not tell how much greater one

level is than another – unequal intervals between rankings– Assistance with ADLs or IADLs– Patient satisfaction with care

Interval

• Exclusive, exhaustive, ranked, and numerically equal intervals

• Does not have a meaningful/true zero, only defines position on the scale– Temperature (Celsius, Fahrenheit)– IQ

• Continuous or discrete variable


Ratio• Highest level of measurement• Exclusive, exhaustive, ranked, equal distance

between intervals and a meaningful zero (point at which the variable is absent)– Provides information about the absolute

magnitude of the attribute– Weight: someone who weighs 200 lbs is twice as

heavy as someone who weighs 100 lbs– Urine output, bleeding, burn surface area, BP,

AR, RR• Continuous or discrete variable, depending

upon how measured

How we use Statistics in Research

1. Describe and summarize data2. Make predictions about future events

based on current evidence3. Make generalizations about population

occurrences based on sample observations

4. Identify associations/relationships or differences between sets of observations

Two types of statistics

• Descriptive statistics– Used to describe or characterize sample

characteristics by summarizing them

• Inferential statistics– A set of statistical techniques that provide

predictions about population characteristics based on information obtained from a sample taken from that population


Descriptive statistics - Univariate

• Univariate = 1 variable• Frequency distributions (counts,

percentages)• Central tendency = where the

masses huddle• Dispersion/variability = spread

Example: Data Table of Descriptive Statistics

Commonly presented descriptive statistics

• Mean• Median• Mode• Percentage & percentiles• Count• Minimum/Maximum• Range• Standard deviation (sd)• Variance• Inter-Quartile Range

Measures of central tendency

Measures of variability/dispersion


Data Organization

• Frequency Distribution– Systematic arrangement of data values– Imposes order on the data – List from lowest to highest – Provides a frequency count (f) and the

percentage of times each value occurred– The sum of all value frequencies = sample

size

Σf = n

Example: Frequency distributionSystematic arrangement of values

1. Lowest to highest (rank-ordered)

2. Indicates the count and percentage of the occurrence of each value in the data set

Raw data

Frequency Distribution: Education


Frequency Distributions for Variables with Many Values

• When a variable has many possible values, a regular frequency distribution may be unwieldy – For example, weight

values (here, in pounds)

Grouped Frequency Distributions

• Forming groups communicates information more conveniently than individual weights

Reporting Frequency Information

• Narrative in text (e.g., “83% of study participants were male”)

• Frequency distribution table (multiple variables often presented in a single table)

• Graphically


Graphic displays of frequency distributions• Bar graphs and pie charts

• Histograms, frequency polygon

• Shapes of distributions

• Modality

• Symmetry and skewness

• Kurtosis

• The Normal distribution

Bar Graphs

• Used for nominal (and many ordinal) level variables

• Horizontal dimension (X axis) that specifies categories (i.e., data values)

• Vertical dimension (Y axis) specifies either frequencies or percentages

• Bars for each category drawn to the height that indicates the frequency or %

Bar Graph: Education

Bars do not touch


Pie Chart

• Nominal (and many ordinal) level variables

• Circle is divided into pie-shaped wedges corresponding to percentages for a given category or data value

• All pieces add up to 100%

• Should place wedges in order, with biggest wedge starting at “12 o’clock”

Pie Chart

Histograms

• Interval- and ratio-level data • Similar to a bar graph, with an X and Y

axis—but adjacent values are on a continuum so bars touch one another

• Data values on X axis arranged from lowest to highest

• Bars drawn to height to show frequency or percentage (Y axis)

• May include a superimposed normal curve


Histogram: AgeBars touch

Normal curve superimposed

Frequency polygon

• Similar to a histogram• Resembles a line graph• Can be used to display a cumulative

frequency• Used in economical research

Frequency Polygon


Measures of Central Tendency

• An indicator of the center of the data– Typical / “average” data point– Center data point– Most frequently occurring data point

• Is it important to know where the center of the data is located?

Measures of Central Tendency• Mean – the average or typical value

– Interval or ratio level data – Sample mean:

• Median – the value that cuts the data in half, 50th %ile– Ordinal, interval or ratio level data (if

outliers)

• Mode – the most frequently occurring value– Categorical, ordinal, interval or ratio level

data

Example: MeanUsing the data values below, what is the mean?

86, 82, 94, 76, 88, 92, 92, 94, 94, 941. Rank order the data:

76, 82, 86, 88, 92, 92, 94, 94, 94, 942. Sum the values and divide by n (number of

values)Mean = 76+82+86+88+92+92+94+94+94+94

10Mean = 89.2

How might an outlier value affect the mean?


Example: MedianUsing the data values below, what is the

median?86, 82, 94, 76, 88, 92, 92, 94, 94, 94

• Steps1. Rank order the data:76, 82, 86, 88, 92, 92, 94, 94, 94, 94

2. Find the value that ‘cuts’ the data in half, that is the median (50th percentile)

76, 82, 86, 88, 92, 92, 94, 94, 94, 94

Median for our Data76, 82, 86, 88, 92, 92, 94, 94, 94, 94 -> 10

valuesFor even numbers of values:

we calculate the average of: (10/2)th + (1 + 10/2)th values (5th value) + (6th value)

For our data values: (92 + 92) = 922

92 splits our data in half:

76, 82, 86, 88, 92, 92, 94, 94, 94, 94

How might an outlier value affect the median?

ModeLet’s return to our data:

76, 82, 86, 88, 92, 92, 94, 94, 94, 94Make a frequency count for each value:76 – 182 – 186 – 188 – 192 – 294 – 4 94 is the mode of this data array


Central Tendency Comparisons:Normal Distribution

• In a normal distribution, the mean, median, and mode are equal

Symmetry• Symmetrical distribution: the two halves

of the distribution, folded over in the middle, are identical

KurtosisConcerned with peakedness relative to the normal distribution


Central Tendency: Skewed Distributions

• In a skewed distribution, the mean is pulled “off center” in the direction of the skew– What causes a distribution to skew?

Measures of Variability/Dispersion

• The spread of the data in a distribution– Two distributions with the same mean

could have different dispersion• Reported through 4 mechanisms

– Range: highest value (maximum) – lowest value (minimum)

– Interquartile range– Standard deviation: the average deviation

of all scores from the mean, the degree of error of the sample mean

– Variance: (standard deviation)2

Variability

High variability: (A) heterogeneous

distribution

Low variability: (B) homogeneous

distribution


Range

• Difference between highest and lowest value in distribution

• Weights (pounds):

110 120 130 140 150 150 160 170 180 190

• The range for these data is 80 (190 – 110)

Interquartile Range (IQR)• Reported with median value

• Based on quartiles– Lower quartile (Q1): Point below which 25% of scores lie– Upper quartile (Q3): Point below which 75% of scores lie

• IQR = Q3 - Q1

• IQR Example: Weights (pounds):

110 120 130 135 140 150 150 165 170 170 180 190

Q3 = 170, Q1 = 130

• IQR = 40.0 (170 – 130 )

Standard Deviation

• An index that conveys how much, on average, scores in a distribution vary

• Based on deviation scores, calculated by subtracting the mean from each individual score

X’ = X - X


Computing Standard Deviation

• X = mean of all scores• X = each individual score• ∑ = sum (in this case, the sum of the

differences of each score from the mean, squared)

• n = number of sample values

∑(X ‐ X)2

n‐1Standard Deviation =

Standard Deviation

• Advantages:– Takes all data into account in describing variability

– Is more stable as a measure of variability than the range or IQR

– Helpful in interpreting individual scores when data are distributed approximately normally

• Disadvantages:– Can be influenced by extreme scores/outliers

– Not as “intuitive” or as easy to interpret as the range

Variance

• An important variability concept in inferential statistics, but not used descriptively

• The variance = SD2

• Not easily interpreted because it is not in units of original data—it is in units squared

• Formula for sample variance:

• Formula for population variance:


Sample Variance: Example

110 ‐40 1600

120 ‐30 900

130 ‐20 400

140 ‐10 100

150 0 0

150 0 0

160 10 100

170 20 400

180 30 900

190 40 1600

∑ = 1500 = 0 = 6000

=1500/10=150

SD = 666.6725.82

Measurement Scales andDescriptive Statistics

Level of Measurement

Central Tendency Statistic

Variability Statistic

Nominal Mode ‐‐

Ordinal Median Range, IQR

Interval or Ratio

Mean (what if outliers present?)

Standard deviation, Variance

Normal Distribution

• Bell shaped symmetric curveWhat do we mean by symmetric curve?

• Mean, median and mode have the same value

• Approximately 68% values lie within 1 SD of mean

• Approximately 95% values lie within 2 SD of mean

• > 99% values lie within 3 SD of mean• Range is – ∞ to ∞


Normal Distribution - sd

You are in a class of 100 students. It is exam day, and the scores are normally distributed.If you score 1 sd below the mean score, what percentage of the class scored higher than you?If you scored 1 sd above the mean score, what percentage of the class scored higher than you?

Relative Standing

• Central tendency and variability indexes describe a distribution

• There are descriptive statistics that tell us the relative standing or position of a score in a distribution

• Two types:1. Standard Score

2. Percentile Rank

Standard Scores • An index of relative standing of raw

scores/values• Each value is standardized using mean

and SD of the distribution• Called a z-score

• z- distribution: mean = 0; sd = 1• If normal distribution is standardized, it

is called standardized or standard normal distribution


Standard Scores and Relative Standing: the z score

Z-score example • Heart rate data from a sample has a mean =

65.21 and a sample SD = 4.50• What is the z-score for a heart rate score of 70?

• What is the z-score for a heart rate score of 56?

• What is the probability that an individual has rate between 56 and 70 bpm given heart rate follows normal distribution with mean = 65.21 and SD = 4.50?• Probabilities range from 0 - 1

**

Z = 1.06Z = ‐2.05


.8554 ‐ .0202 = .8352

**

Z = 1.06Z = ‐2.05

We can roughly estimate by looking at the normal curve:13.59 + 34.13 + 34.13 = 81.85%We are only accounting for the area between 2 sd below and 1 sd above the mean

why do we need statistics? - johns hopkins university...(metrics) • temperature, bp, ht & wt...

Documents