how statistics can empower your research? part ii xiayu (stacy) huang bioinformatics shared resource...

63
Statistics Can Empower Your Resea Part II Xiayu (Stacy) Huang Bioinformatics Shared Resource Sanford | Burnham Medical Research Institute

Upload: marilynn-jacobs

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

How Statistics Can Empower Your Research?Part II

Xiayu (Stacy) Huang

Bioinformatics Shared Resource

Sanford | Burnham Medical Research Institute

OUTLINE

Summary of Previous Talk

Descriptive & inferential statistics Student’s T test, one-way ANOVA

More common statistical tests and applications

Repeated measures one-way ANOVA

Two-way ANOVA

Power analysis

Common data transformation methods

SUMMARY OF PREVIOUS TALK

• Descriptive statistics• Measure of central tendency, dispersion, etc.

• Inferential statistics• Hypothesis, errors, p-value, power

• Three statistical tests and their applications• Two sample unpaired test, paired t test and one way ANOVA

Power point presentation at http://bsrweb.burnham.org

ONE-WAY ANOVA EXAMPLE

• Goal:studying the effect of mice genotypes on their learning skills on rotarod.

• Dependent variable: number of seconds staying on a rotarod

Group1 Group2 Group3 Group4

170 116 30 114

214 102 60 24

122 120 136 72

44 82 126 42

80 90 56 20

130 54 6 32

DECISION TREE

DECISION TREE----ONE-WAY ANOVA

ASSUMPTION CHECK IN GRAPHPAD PRISM

DATA ANALYSIS IN GRAPHPAD PRISM

Variance check

REPEATED MEASURES ONE-WAY ANOVA

Compares the means of 3 or more groups

Repeated measurements on the same group of subjects

Assumptions:

Sampling should be independent and randomized.

Equal sample size per group preferred.

Sphericity or homogeneity of covariance

Data is normally distributed.

APPLICATION OF REPEATED MEASURES ONE-WAY ANOVA IN BIOLOGY

Days

REPEATED MEASURES ONE-WAY ANOVA EXAMPLE

• Goal:studying the effect of practice on maze learning for rats.

• independent variable : days

• dependent variable: number of errors made each day

Rat ID Day 1 Day 2 Day 3 Day 4

Rat_1 3 1 0 0

Rat_2 3 2 2 1

Rat_3 6 3 1 2

Rat_1

Rat_2

Rat_3

DECISION TREE----ONE-WAY REPEATED ANOVA

TABLE FORMAT IN GRAPHPAD PRISM– REPEATED MEASURES ONE-WAY ANOVA

DATA FORMAT AND CHOOSING ANALYSIS METHODS

DATA ANALYSIS IN GRAPHPAD PRISM

ANALYSIS RESULT

ONE-WAY REPEATED ANOVA COMPARED WITH REGULAR ONE-WAY ANOVA

TWO-WAY ANOVA

One dependent variable and two independent variables or factors

Assumptions samples are normally or approximately normally distributed The samples from each treatment group must be independent The variances of the populations must be equal equal sample size per treatment group preferred

Treatment group all possible combinations of the two factors

Treatment Placebo Drug

Gender Female Male Female Male

Main effect Effect of individual factor

Interaction effect Effect of one factor on the other

Hypotheses The population means of the first factor A are equal The population means of the second factor B are equal There is no interaction between the two factors

Test F test: mean square for each main effect and the interaction

effect divided by the within variance

TWO-WAY ANOVA

MAIN EFFECTS

B--Treatment

A--Time

I. No main effects for both time and treatment II. Main effect of treatment only

III. Main effect of time only IV. Main effects of time and treatment

1st hr 2nd hr

Pa

in

sc

ore

A

B

• Asprin

•Ibuprophen

• Asprin

•Ibuprophen

1st hr 2nd hr

• Asprin

•Ibuprophen

1st hr 2nd hr

• Asprin

•Ibuprophen

1st hr 2nd hr

MAIN EFFECT AND INTERACTION EFFECT

V. Interaction effect only VI. Main effect of time only and interaction effect

VII. Main effect of treatment only and interaction effect

• Asprin

•Ibuprophen

Pa

in

sc

ore

1st hr 2nd hr

• Asprin

•Ibuprophen

1st hr 2nd hr

• Asprin

•Ibuprophen

1st hr 2nd hr

• Asprin

•Ibuprophen

1st hr 2nd hr

VIII. Main effects of time and treatment, and interaction effect

Control Treated

Time 0 4 4

Time 2 4 4

Time 4 4 4

Time 8 4 4

Control Treated

Time 0 3 4

Time 2 6 8

Time 4 3 4

Time 8 9 12

Control Treated

Time 0 1 1

Time 2 1 1

Time 4 1 1

Time 8 1 1

Control Treated

Time 0 4 3

Time 2 2 2

Time 4 2 2

Time 8 3 4

TWO-WAY ANOVA EXPERIMENTAL DESIGN

I. Balanced design with equal replication (Best) II. Proportional design replication (Acceptable)

III. One replication only (Not recommended) IV. Disproportional design (Bad)

APPLICATION OF TWO-WAY ANOVA IN BIOLOGY

Microarray: Time-dose relationship

0 mM

50 mM

75 mM

TWO-WAY ANOVA WITH REPLICATION EXAMPLE

Study the effect of gender and anti-cancer drugs on tumor growth

Drug cisplatin vinblastine 5-fluorouracil

Gender Female Male Female Male Female Male

TumorSize

65 50 70 45 55 35

70 55 65 60 65 40

60 80 60 85 70 35

60 65 70 65 55 55

60 70 65 70 55 35

55 75 60 70 60 40

60 75 60 80 50 45

50 65 50 60 50 40

DECISION TREE– FACTORIAL ANOVA

TABLE FORMAT IN PRISM—TWO-WAY ANOVA

DATA FORMAT AND CHOOSING ANALYSIS METHODS

CHOOSING MODEL

ANALYSIS RESULT

TWO-WAY REPEATED MEASURES ANOVA EXAMPLE

Subject Sex Lowcaff Medcaff Highcaff

1 Male 10 15 17

2 Male 9 12 11

3 Male 11 14 15

4 Male 13 11 12

5 Male 11 10 16

6 Male 12 6 12

7 Female 10 14 14

8 Female 12 21 22

9 Female 21 18 23

10 Female 9 18 22

11 Female 12 16 20

12 Female 15 17 26

Goal: Investigating gender and caffeine consumption on the effect of memory

Independent variables: gender and caffeine consumptions

Dependent variable: memory score

DECISION TREE----TWO-WAY REPEATED ANOVA

TABLE FORMAT– TWO-WAY REPEATED MEASURES ANOVA

DATA FORMAT AND ANALYSIS METHODS

CHOOSING MODEL

ANALYSIS RESULT

Matching not effective???

RECONSIDERING REGULAR TWO-WAY ANOVA

OUTLINE

Summary of Previous Talk

Descriptive & inferential statistics Student’s T test, one-way ANOVA

More common Statistical tests and Applications

Repeated-measures one-way ANOVA

Two-way ANOVA

Power analysis

Common data transformation methods

Power depends on:

Sample size ( )

Standard deviation ( or )

Minimal detectable difference ( )

False positive rate ( )

Power analysis includes:

Sample size required

Effect size or Minimal detectable difference

Power of the test

POWER ANALYSIS

n

s

effect size

POWER ANALYSIS SOFTWARE/PACKAGES

G*Power (free!!!)

Optimal design (free!!!)

SPSS sample power

PASS

SAS proc power, Stata sampsi, etc

Mplus for more advanced/complicated analysis

Many free on-line programs http://www.stat.uiowa.edu/~rlenth/Power/

TWO INDEPENDENT SAMPLE POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER

Sample size required

Input parameters Effect size ( ) False positive rate ( ) Minimum Power ( ) Ratio of two sample sizes

Output parameters Noncentrality parameter ( ) Critical t Degree of freedom Sample size for each group Total sample size Actual power

f

1

Effect size

Input parameters False positive rate Minimum power Sample size for each group

Output parameters Noncentrality parameter Critical t Degree of freedom Effect size Minimal detectable difference

TWO INDEPENDENT SAMPLES POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER

COMPUTE SAMPLE SIZE– TWO INDEPENDENT SAMPLES

DETERMINING EFFECT SIZE– TWO INDEPENDENT SAMPLES

ANALYSIS RESULTS– TWO INDEPENDENT SAMPLES

COMPUTE EFFECT SIZE– TWO INDEPENDENT SAMPLES

X-Y PLOT FOR A RANGE OF VALUES

FACTOR AFFECTING POWER—TWO INDEPENDENT SAMPLES

Power increases as total sample size increases

Power increases as effect size increases

Power increases as significance level increases

ONE-WAY ANOVA POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER

Sample size required

Input parameters Effect size ( ) False positive rate ( ) Minimum Power ( ) Number of groups

Output parameters Noncentrality parameter ( ) Critical F Degree of freedom Total sample size Actual power

f

1

Effect size

Input parameters False positive rate Minimum power Total sample size Number of groups

Output parameters Noncentrality parameter Critical F Numerator and denominator degree of freedom Effect size Minimal detectable difference

ONE-WAY ANOVA SAMPLE POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER

COMPUTE SAMPLE SIZE-- ONE-WAY ANOVA

COMPUTE EFFECT SIZE– ONE-WAY ANOVA

FACTORS AFFECTING POWER—ONE-WAY ANOVA

Total sample size

Pow

er

(1-β

err

pro

b)

F tests - ANOVA: Fixed effects, omnibus, one-wayNumber of groups = 4, α err prob = 0.05, Effect size f = 0.424264

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10 20 30 40 50 60 70 80 90 100α err prob

Pow

er

(1-β

err

pro

b)

F tests - ANOVA: Fixed effects, omnibus, one-wayNumber of groups = 4, Total sample size = 68, Effect size f = 0.424264

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Effect size f

Pow

er

(1-β

err

pro

b)

F tests - ANOVA: Fixed effects, omnibus, one-wayNumber of groups = 4, Total sample size = 68, α err prob = 0.05

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Power increases as total sample size increases

Power increases as effect size increases

Power increases as significance level increases

OUTLINE

Summary of Previous Talk

Descriptive & inferential statistics Student’s T test, one-way ANOVA

More common Statistical tests and Applications

Repeated-measures one-way ANOVA

Two-way ANOVA

Power analysis

Common data transformation methods

DATA TRANSFORMATION

Why? Many biological variables do not follow normal

distribution

How? Applying a mathematical function on each observation Performing statistical tests using transformed data Interpreting results using back transformation

Common data transformation methods in biology Log transformation Square root transformation Arcsine transformation Reciprocal transformation

LOG TRANSFORMATION

Usage

Convert a positively skewed distribution into a symmetrical one

Applicable when there is heteroscedasticity and standard deviations are proportional to the means

Mathematical function

Logarithms in any base are satisfactory

Back transformation:

2' log ( 1)x x

2 ^ ' 1x x

SQUARE ROOT TRANSFORMATION

Usage Applicable when the group variances are proportional to

the means Samples taken from Poisson distribution such as counting

data

Mathematical function

Back transformation:

' 0.5x x

'^ 2 0.5x x

ARCSINE TRANSFORMATION

Usage Applicable when data (proportions or

percentages) was taken from a binomial distribution

Mathematical function

Back transformation:

Shortcoming Not good at the ends of the range (near 0 and

100%) Adjustment needed when p near 0 and 100%

' arcsinp p

(sin ') ^ 2p p

CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION

Shape Figure Transformation

Reverse J A 1/XSevere skew right B Log (X)Moderate skew right C sqrt (X)

CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION

Shape Figure Transformation

Moderate skew left D 1/sqrt(X)Severe skew left E -1/Log (X)J-shaped F -1/X

LOG TRANSFORMATION

UntransformedSquare-root transformed

Log transformed

38 6.164 1.580

1 1.000 0.000

13 3.606 1.114

2 1.414 0.301

13 3.606 1.114

20 4.472 1.301

50 7.071 1.699

9 3.000 0.954

28 5.292 1.447

6 2.449 0.778

4 2.000 0.602

43 6.557 1.633

SUMMARY

ANOVA One-way ANOVA

With or without repeated measures Two-way ANOVA

Regular two-way ANOVA Two-way repeated ANOVA

Power analysis Two independent samples One-way ANOVA

Data Transformations Log transformation Square root transformation Arcsine transformation

BASIC STATISTICS TOOLS

Statistics software and packages:

1.Graphpad prism, SPSS and excel addins

2. G*power, Optimal design, etc

3. SAS, R, Stata, etc

Basic statistics books:

1.Intro Stats, SDSU, 2nd edition, Deveaux, Velleman, Bock

2. Choosing and Using Statistics: A Biologist's Guide

3. Biostatistical analysis, Jerrold H. Zar

4. Biostatistics: the bare essentials, Norman Streiner

5. Handbook of biological statistics

Thank You All for Coming!!!

Questions???