biostatistics in practice

35
Biostatistics in Practice Session 2: Summarization of Quantitative Information Youngju Pak Biostatistician http://research.LABioMed.org/ Biostat 1

Upload: morton

Post on 06-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Biostatistics in Practice. Session 2: Summarization of Quantitative Information. Youngju Pak Biostatistician http://research.LABioMed.org/Biostat. Topics for this Session. Experimental Units Independence of Measurements Graphs: Summarizing Results Graphs: Aids for Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biostatistics in Practice

Biostatistics in Practice

Session 2: Summarization of Quantitative

Information

Youngju PakBiostatistician

http://research.LABioMed.org/Biostat 1

Page 2: Biostatistics in Practice

Topics for this Session

Experimental Units

Independence of Measurements

Graphs: Summarizing Results

Graphs: Aids for Analysis

Summary Measures

Confidence Intervals

Prediction Intervals 2

Page 3: Biostatistics in Practice

Experimental Units_____

Independence of Measurements

3

Page 4: Biostatistics in Practice

Units and IndependenceExperiments may be designed such that each measurement does not give additional independent information.

Many basic statistical methods require that measurements are “independent” for the analysis to be valid.

In mathematics, two events are independent if and only if the occurrence of one event makes it neither more nor less probable that the other occurs. 4

Page 5: Biostatistics in Practice

Population

Sample

Sample estimate of population parameter

Population parameter

Sampling mechanism: random sample or convenience sample

Confidence Interval

for population parameter

5

Page 6: Biostatistics in Practice

Summarizing the Data with Descriptive Statistics

6

Page 7: Biostatistics in Practice

Experimental Units in Case Study

What is the experimental unit in this study? 1. School 2. Child 3. Parent 4. GHA score (results from three diets)

Are all GHA scores(eg. 153 x 3 groups=459 GHA scores for 3-4 years old children)

The analysis MUST incorporate this possible correlation (clustering) if there exists.

7

Page 8: Biostatistics in Practice

Common Descriptive Statistics used

Sample Mean and Standard Deviation (SD)

Sample Median and Inter-Quartile Range (IRQ)

Sample Correlation

Sample Survival Probability

Sample Risks & Odds 8

Page 9: Biostatistics in Practice

Mean vs. Median(measure the central tendency)

• Mean – What most people

think of as “average”– Easy to calculate– Easily distorted– Be cautious with

SKEWED data– Calculate:

sum of data / number of data points

• Median– Relatively easy to

obtain– Not affected by

extreme values so it is considered a “ROBUST” statistic

– Calculate: • Sort data • If odd number points,

the middle is the median

• Otherwise, the median is the average of the middle two numbers

9

Page 10: Biostatistics in Practice

Standard Deviation (SD) &Inter-Quartile Range(IRQ)(measuring the variability or scatterness of the data )

• Inter-Quartile Range (IQR)=

75th percentile (Q3) - 25th percentile(Q1)

, where 25% of the data <Q1 , 75% of the data < Q3

• SD is usually used for the normally distributed data (bellshape, symmetric around the mean)

• IQR is usually used when the data distribution is skewed.• Range = Max -Min

10

Page 11: Biostatistics in Practice

Summarization of the Case Study

How are the outcome measures summarized? e.g., Table 2:

11

Page 12: Biostatistics in Practice

Summary Statistics:Relative Likelihood of an Event

Compare groups A and B on mortality.

Relative Risk = ProbA[Death] / ProbB[Death]where Prob[Death] ≈ Deaths per 100 Persons

Odds Ratio = OddsA[Death] / OddsB[Death] where Odds= Prob[Death] / Prob[Survival]

Hazard Ratio ≈ IA[Death] / IB[Death]where I = Incidence

= Deaths per 100 PersonDays12

Page 13: Biostatistics in Practice

Summarizing the Data with Graphs

13

Page 14: Biostatistics in Practice

Data Graphical DisplaysMany of the following examples are from StatisticalPractice.com

Histogram Scatter plot

Raw DataSummarized*

* Raw data version is a stem-leaf plot. We will see one later.14

Page 15: Biostatistics in Practice

Data Graphical Displays

Dot Plot Box Plot

Raw Data Summarized

15

Page 16: Biostatistics in Practice

Bar Charts

16

Page 17: Biostatistics in Practice

Pie Charts

17

Page 18: Biostatistics in Practice

Data Graphical DisplaysLine or Profile Plot

Summarized - bars can represent various types of ranges18

Page 19: Biostatistics in Practice

Data Graphical Displays

Kaplan-Meier Plot

Interval (Start-End)

# At Risk at Start of Interval

# Censored During Interval

# At Risk at End of Interval

# Who Died at End of Interval

Proportion Surviving This Interval

Cumulative Survival at End of Interval

0-1 7 0 7 16/7 = 0.86

0.86

1-4 6 2 4 13/4 = 0.75

0.86 * 0.75 = 0.64

4-10 3 1 2 11/2 = 0.5

0.86 * 0.75 * 0.5 = 0.31

10-12 1 0 1 01/1 = 1.0

0.86 * 0.75 * 0.5 * 1.0 = 0.31

(Source: www.cancerguide.org) 19

Page 20: Biostatistics in Practice

Graphs:

Aids for Analysis

20

Page 21: Biostatistics in Practice

Graphical Aids for Analysis

Most statistical analyses involve modeling.

Parametric methods (t-test, ANOVA, Χ2) have stronger requirements than non-parametric methods (rank -based).

Every method is based on data satisfying certain requirements.

Many of these requirements can be assessed with some useful common graphics.

21

Page 22: Biostatistics in Practice

Look at the Data for Analysis Requirements

What do we look for?

In Histograms (one variable):Ideal: Symmetric, bell-shaped.

Potential Problems:• Skewness.• Multiple peaks.• Many values at, say, 0, and bell-shaped

otherwise.• Outliers. 22

Page 23: Biostatistics in Practice

Example Histogram: OK for Typical* Analyses

• Symmetric.• One peak.• Roughly bell-shaped.• No outliers.

*Typical: mean, SD, confidence intervals, to be discussed in later slides. 23

Page 24: Biostatistics in Practice

Z- Score = (Measure - Mean)/SD

35 45 55 65 75 85 95

0

5

10

15

20

25

Time

Fre

qu

ency

Standardizes a measure to have mean=0 and SD=1.

Z-scores make different measures comparable.

35 45 55 65 75 85 95

0

5

10

15

20

25

Time

Fre

qu

ency

Mean = 60.6 min.

Mean = 60.6 min.SD = 9.6 min.

SD = 9.6 min.

Z-Score = (Time-60.6)/9.6

-2 0 2

41 61 79

Mean = 0SD = 1

24

Page 25: Biostatistics in Practice

Outcome Measure in Case StudyGHA = Global Hyperactivity Aggregate

For each child at each time:Z1 = Z-Score for ADHD from TeachersZ2 = Z-Score for WWP from ParentsZ3 = Z-Score for ADHD in ClassroomZ4 = Z-Score for Conner on Computer, where weekly score=changes from T0All have higher values ↔ more hyperactive.Z’s make each measure scaled similarly.

GHA= Mean of Z1, Z2, Z3, Z4 25

Page 26: Biostatistics in Practice

Summary Statistics:Rule of Thumb

For bell-shaped distributions of data (“normally” distributed):

• ~ 68% of values are within mean ±1 SD

• ~ 95% of values are within mean ±2 SD “(Normal) Reference Range”

• ~ 99.7% of values are within mean ±3 SD26

Page 27: Biostatistics in Practice

876543210

150

100

50

0

Intensity

Fre

qu

en

cyHistograms: Not OK for Typical Analyses

Skewed

Need to transform intensity to another scale,

e.g. Log(intensity)

1207020

20

10

0

Tumor Volume

Fre

quen

cy

Multi-Peak

Need to summarize with percentiles, not

mean.27

Page 28: Biostatistics in Practice

Look at the Data for Analysis Requirements

What do we look for?

In Scatter Plots (two variables): Ideal: Football-shaped; ellipse.

Potential Problems:• Outliers.• Funnel-shaped.• Gap with no values for one or both variables. 28

Page 29: Biostatistics in Practice

Example Scatter Plot: OK for Typical Correlation Analyses

29

Page 30: Biostatistics in Practice

Summary Statistics:Two Variables (Correlation)

• Always look at scatterplot.• Correlation, r, ranges from -1 (perfect

inverse relation) to +1 (perfect direct). Zero=no relation.

• Specific to the ranges of the two variables.• Typically, cannot extrapolate to populations

with other ranges.• Measures association, not causation.

We will examine details in Session 5.30

Page 31: Biostatistics in Practice

Correlation Depends on Range of Data

Graph B contains only the points from graph A that are in the ellipse.

Correlation is reduced in graph B.

Thus: correlation between two quantities may be quite different in different study populations.

BA

31

Page 32: Biostatistics in Practice

Correlation and Measurement Precision

A lack of correlation for the subpopulation with 5<x<6 may be due to inability to measure x and y well.

Lack of evidence of association is not evidence of lack of association.

B

A

r=0 for s

Boverall

5 6

12

10

32

Page 33: Biostatistics in Practice

Confidence Interval (CI)• How well your sample mean(m) reflects

the true( or population) mean How confident? 95%?

• A confidence interval (CI) is one of inferential statistics that estimate the true unknown parameter using interval scales.

33

Page 34: Biostatistics in Practice

Confidence Interval for Population Mean

95% Reference range or “Normal Range”, is

sample mean ± 2(SD) _____________________________________

95% Confidence interval (CI) for the (true, but unknown) mean for the entire population is

sample mean ± 2(SD/√N)

SD/√N is called “Std Error of the Mean” (SEM)34

Page 35: Biostatistics in Practice

Confidence Interval: Case Study

Confidence Interval:

-0.14 ± 1.99(1.04/√73) =

-0.14 ± 0.24 → -0.38 to 0.10

Table 2

Normal Range:

-0.14 ± 1.99(1.04) =

-0.14 ± 2.07 → -2.21 to 1.93

0.13 -0.12 -0.37

Adjusted CI

close to

35