how statistics can empower your research? part ii xiayu (stacy) huang bioinformatics shared resource...
TRANSCRIPT
How Statistics Can Empower Your Research?Part II
Xiayu (Stacy) Huang
Bioinformatics Shared Resource
Sanford | Burnham Medical Research Institute
OUTLINE
Summary of Previous Talk
Descriptive & inferential statistics Student’s T test, one-way ANOVA
More common statistical tests and applications
Repeated measures one-way ANOVA
Two-way ANOVA
Power analysis
Common data transformation methods
SUMMARY OF PREVIOUS TALK
• Descriptive statistics• Measure of central tendency, dispersion, etc.
• Inferential statistics• Hypothesis, errors, p-value, power
• Three statistical tests and their applications• Two sample unpaired test, paired t test and one way ANOVA
Power point presentation at http://bsrweb.burnham.org
ONE-WAY ANOVA EXAMPLE
• Goal:studying the effect of mice genotypes on their learning skills on rotarod.
• Dependent variable: number of seconds staying on a rotarod
Group1 Group2 Group3 Group4
170 116 30 114
214 102 60 24
122 120 136 72
44 82 126 42
80 90 56 20
130 54 6 32
REPEATED MEASURES ONE-WAY ANOVA
Compares the means of 3 or more groups
Repeated measurements on the same group of subjects
Assumptions:
Sampling should be independent and randomized.
Equal sample size per group preferred.
Sphericity or homogeneity of covariance
Data is normally distributed.
REPEATED MEASURES ONE-WAY ANOVA EXAMPLE
• Goal:studying the effect of practice on maze learning for rats.
• independent variable : days
• dependent variable: number of errors made each day
Rat ID Day 1 Day 2 Day 3 Day 4
Rat_1 3 1 0 0
Rat_2 3 2 2 1
Rat_3 6 3 1 2
Rat_1
Rat_2
Rat_3
TWO-WAY ANOVA
One dependent variable and two independent variables or factors
Assumptions samples are normally or approximately normally distributed The samples from each treatment group must be independent The variances of the populations must be equal equal sample size per treatment group preferred
Treatment group all possible combinations of the two factors
Treatment Placebo Drug
Gender Female Male Female Male
Main effect Effect of individual factor
Interaction effect Effect of one factor on the other
Hypotheses The population means of the first factor A are equal The population means of the second factor B are equal There is no interaction between the two factors
Test F test: mean square for each main effect and the interaction
effect divided by the within variance
TWO-WAY ANOVA
MAIN EFFECTS
B--Treatment
A--Time
I. No main effects for both time and treatment II. Main effect of treatment only
III. Main effect of time only IV. Main effects of time and treatment
1st hr 2nd hr
Pa
in
sc
ore
A
B
• Asprin
•Ibuprophen
• Asprin
•Ibuprophen
1st hr 2nd hr
• Asprin
•Ibuprophen
1st hr 2nd hr
• Asprin
•Ibuprophen
1st hr 2nd hr
MAIN EFFECT AND INTERACTION EFFECT
V. Interaction effect only VI. Main effect of time only and interaction effect
VII. Main effect of treatment only and interaction effect
• Asprin
•Ibuprophen
Pa
in
sc
ore
1st hr 2nd hr
• Asprin
•Ibuprophen
1st hr 2nd hr
• Asprin
•Ibuprophen
1st hr 2nd hr
• Asprin
•Ibuprophen
1st hr 2nd hr
VIII. Main effects of time and treatment, and interaction effect
Control Treated
Time 0 4 4
Time 2 4 4
Time 4 4 4
Time 8 4 4
Control Treated
Time 0 3 4
Time 2 6 8
Time 4 3 4
Time 8 9 12
Control Treated
Time 0 1 1
Time 2 1 1
Time 4 1 1
Time 8 1 1
Control Treated
Time 0 4 3
Time 2 2 2
Time 4 2 2
Time 8 3 4
TWO-WAY ANOVA EXPERIMENTAL DESIGN
I. Balanced design with equal replication (Best) II. Proportional design replication (Acceptable)
III. One replication only (Not recommended) IV. Disproportional design (Bad)
TWO-WAY ANOVA WITH REPLICATION EXAMPLE
Study the effect of gender and anti-cancer drugs on tumor growth
Drug cisplatin vinblastine 5-fluorouracil
Gender Female Male Female Male Female Male
TumorSize
65 50 70 45 55 35
70 55 65 60 65 40
60 80 60 85 70 35
60 65 70 65 55 55
60 70 65 70 55 35
55 75 60 70 60 40
60 75 60 80 50 45
50 65 50 60 50 40
TWO-WAY REPEATED MEASURES ANOVA EXAMPLE
Subject Sex Lowcaff Medcaff Highcaff
1 Male 10 15 17
2 Male 9 12 11
3 Male 11 14 15
4 Male 13 11 12
5 Male 11 10 16
6 Male 12 6 12
7 Female 10 14 14
8 Female 12 21 22
9 Female 21 18 23
10 Female 9 18 22
11 Female 12 16 20
12 Female 15 17 26
Goal: Investigating gender and caffeine consumption on the effect of memory
Independent variables: gender and caffeine consumptions
Dependent variable: memory score
OUTLINE
Summary of Previous Talk
Descriptive & inferential statistics Student’s T test, one-way ANOVA
More common Statistical tests and Applications
Repeated-measures one-way ANOVA
Two-way ANOVA
Power analysis
Common data transformation methods
Power depends on:
Sample size ( )
Standard deviation ( or )
Minimal detectable difference ( )
False positive rate ( )
Power analysis includes:
Sample size required
Effect size or Minimal detectable difference
Power of the test
POWER ANALYSIS
n
s
effect size
POWER ANALYSIS SOFTWARE/PACKAGES
G*Power (free!!!)
Optimal design (free!!!)
SPSS sample power
PASS
SAS proc power, Stata sampsi, etc
Mplus for more advanced/complicated analysis
Many free on-line programs http://www.stat.uiowa.edu/~rlenth/Power/
TWO INDEPENDENT SAMPLE POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER
Sample size required
Input parameters Effect size ( ) False positive rate ( ) Minimum Power ( ) Ratio of two sample sizes
Output parameters Noncentrality parameter ( ) Critical t Degree of freedom Sample size for each group Total sample size Actual power
f
1
Effect size
Input parameters False positive rate Minimum power Sample size for each group
Output parameters Noncentrality parameter Critical t Degree of freedom Effect size Minimal detectable difference
TWO INDEPENDENT SAMPLES POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER
FACTOR AFFECTING POWER—TWO INDEPENDENT SAMPLES
Power increases as total sample size increases
Power increases as effect size increases
Power increases as significance level increases
ONE-WAY ANOVA POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER
Sample size required
Input parameters Effect size ( ) False positive rate ( ) Minimum Power ( ) Number of groups
Output parameters Noncentrality parameter ( ) Critical F Degree of freedom Total sample size Actual power
f
1
Effect size
Input parameters False positive rate Minimum power Total sample size Number of groups
Output parameters Noncentrality parameter Critical F Numerator and denominator degree of freedom Effect size Minimal detectable difference
ONE-WAY ANOVA SAMPLE POWER ANALYSIS--INPUT AND OUTPUT PARAMETERS IN G*POWER
FACTORS AFFECTING POWER—ONE-WAY ANOVA
Total sample size
Pow
er
(1-β
err
pro
b)
F tests - ANOVA: Fixed effects, omnibus, one-wayNumber of groups = 4, α err prob = 0.05, Effect size f = 0.424264
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10 20 30 40 50 60 70 80 90 100α err prob
Pow
er
(1-β
err
pro
b)
F tests - ANOVA: Fixed effects, omnibus, one-wayNumber of groups = 4, Total sample size = 68, Effect size f = 0.424264
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Effect size f
Pow
er
(1-β
err
pro
b)
F tests - ANOVA: Fixed effects, omnibus, one-wayNumber of groups = 4, Total sample size = 68, α err prob = 0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Power increases as total sample size increases
Power increases as effect size increases
Power increases as significance level increases
OUTLINE
Summary of Previous Talk
Descriptive & inferential statistics Student’s T test, one-way ANOVA
More common Statistical tests and Applications
Repeated-measures one-way ANOVA
Two-way ANOVA
Power analysis
Common data transformation methods
DATA TRANSFORMATION
Why? Many biological variables do not follow normal
distribution
How? Applying a mathematical function on each observation Performing statistical tests using transformed data Interpreting results using back transformation
Common data transformation methods in biology Log transformation Square root transformation Arcsine transformation Reciprocal transformation
LOG TRANSFORMATION
Usage
Convert a positively skewed distribution into a symmetrical one
Applicable when there is heteroscedasticity and standard deviations are proportional to the means
Mathematical function
Logarithms in any base are satisfactory
Back transformation:
2' log ( 1)x x
2 ^ ' 1x x
SQUARE ROOT TRANSFORMATION
Usage Applicable when the group variances are proportional to
the means Samples taken from Poisson distribution such as counting
data
Mathematical function
Back transformation:
' 0.5x x
'^ 2 0.5x x
ARCSINE TRANSFORMATION
Usage Applicable when data (proportions or
percentages) was taken from a binomial distribution
Mathematical function
Back transformation:
Shortcoming Not good at the ends of the range (near 0 and
100%) Adjustment needed when p near 0 and 100%
' arcsinp p
(sin ') ^ 2p p
CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION
Shape Figure Transformation
Reverse J A 1/XSevere skew right B Log (X)Moderate skew right C sqrt (X)
CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION
Shape Figure Transformation
Moderate skew left D 1/sqrt(X)Severe skew left E -1/Log (X)J-shaped F -1/X
LOG TRANSFORMATION
UntransformedSquare-root transformed
Log transformed
38 6.164 1.580
1 1.000 0.000
13 3.606 1.114
2 1.414 0.301
13 3.606 1.114
20 4.472 1.301
50 7.071 1.699
9 3.000 0.954
28 5.292 1.447
6 2.449 0.778
4 2.000 0.602
43 6.557 1.633
SUMMARY
ANOVA One-way ANOVA
With or without repeated measures Two-way ANOVA
Regular two-way ANOVA Two-way repeated ANOVA
Power analysis Two independent samples One-way ANOVA
Data Transformations Log transformation Square root transformation Arcsine transformation
BASIC STATISTICS TOOLS
Statistics software and packages:
1.Graphpad prism, SPSS and excel addins
2. G*power, Optimal design, etc
3. SAS, R, Stata, etc
Basic statistics books:
1.Intro Stats, SDSU, 2nd edition, Deveaux, Velleman, Bock
2. Choosing and Using Statistics: A Biologist's Guide
3. Biostatistical analysis, Jerrold H. Zar
4. Biostatistics: the bare essentials, Norman Streiner
5. Handbook of biological statistics