nemours biomedical research statistics march 26, 2009 tim bunnell, ph.d. & jobayer hossain,...

37
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Statistics

March 26, 2009Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.

Nemours Bioinformatics Core Facility

Page 2: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Hypothesis Testing (Quantitative variable)

Sign test

One sample t-test

One group sample

Mann-Whitney U test

Two-sample t-test

Independent

Wilcoxon Signed Rank test

Paired t-test

Not Independent

Two-group sample

Kruskal Wallis test

Analysis of variance

More than two groups sample

Hypothesis Testing Procedure

Page 3: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

One vs two sided tests

• One-sided– One direction of effect (e.g. mean efficacy of treatment

group is greater than the mean efficacy of placebo group)

– Greater power to detect difference in expected direction

• Two-sided– Effect could be in either direction (e.g. mean efficacy of

treatment group is not equal to the mean efficacy of placebo

group)

– More conservative

Page 4: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

One vs two sided tests

One group Two groups

One sided A single mean differs from a known value in a specific direction. e.g. mean > 0 or median > 0

Two means differ from one another in a specific direction. e.g., mean2 < mean1

median2 < median1

Two sided A single mean differs from a known value in either direction. e.g., mean ≠ 0median ≠ 0

Two means are not equal. E.g.,mean1 ≠ mean2

median1 ≠ median2

Page 5: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Single group sample - one sample t-test

• Test for value of a single mean

• E.g., to see if mean SBP of all AIDHC employees is

120 mm Hg

• Assumptions

– Parent population is normal

– Sample observations (subjects) are independent

Page 6: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Single group sample-one sample t-test

• FormulaConsider a sample of size n from a normal population with mean µ and variance σ2, then the following statistic is distributed as Student’s t with (n-1) degrees of freedom.

ns

xt

/

μ−=

x is the sample mean and s is the sample standard deviation

Page 7: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

One group sample - Sign Test (Nonparametric)

• Use:(1) Compares the median of a single group with a specified value (instead of single sample t-test).

• Hypothesis: H0:Median = c

Ha:Median c • Test Statistic:

We take the difference of observations from median (xi - c). The number of positive difference follows a Binomial distribution. For large sample size, this distribution follows normal distribution. (Rcmdr does not have a default sign test)

Page 8: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

One group sample: Rcmdr demonstration

• Load data default to Rcmdr and then select-

• Statistics->Means->single sample t-test. From this window, select

variable (e.g.hgt), alternative hypothesis, value for null hypothesis

(e.g. 50). data: data$hgt t = -2.6925, df = 59, p-value = 0.009217alternative hypothesis: true mean is not equal to 50 95 percent confidence interval: 46.50166 49.48460 sample estimates:mean of x 47.99313

Page 9: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (independent) samples - two-sample t-statistic

• Use– Test for equality of two means

• Assumptions– Parent population is normal – Sample observations (subjects) are independent.

Page 10: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (independent) samples - two-sample t-statistic

• Formula (two groups) – Case 1: Equal Population Standard Deviations:

• The following statistic is distributed as t distribution with (n1+n2 -2) d.f.

The pooled standard deviation,

n1 and n2 are the sample sizes and S1 and S2 are the sample standard deviations of two groups.

21

21

11

)(

nnS

xxt

p +

−=

2

)1()1(

21

222

211

−+−+−

=nn

SnSnSp

Page 11: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (independent) samples - two-sample t-statistic

• Formula (two groups) – Case 2: Unequal population standard deviations

• The following statistic follows t distribution.

• The d.f. of this statistic is,

2

22

1

21

2121 )()(

ns

ns

xxt

+

−−−=

μμ

( )

1)/(

1)/(

//

2

22

22

1

21

21

2

2221

21

−+

+=

nns

nns

nsnsv

Page 12: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-sample (independent) t-statistic

• Statistics->Means-> Independent sample t test

• From the small window, select response variable (e.g. PLUC.pre), select group variable (e.g. sex) and select alternative hypothesis type (e.g. two sided), confidence level (e.g. .95), and Assume equal variance (e.g yes)

Two Sample t-testdata: PLUC.pre by sex t = -0.4866, df = 58, p-value = 0.6284alternative hypothesis: true difference in means is

not equal to 0 95 percent confidence interval: -1.4004136 0.8527181 sample estimates:mean in group f mean in group m 9.606148 9.879995

Page 13: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (independent) samples- Wilcoxon Rank-Sum Test (Nonparametric)

• Use: Compares medians of two independent groups.

• Corresponds to t-test for 2 Independent sample means

• Test Statistic:

Consider two samples of sizes m and n. Suppose N=m+n.

Compute the rank of all N observations. Then, the statistic,

Wm= Sum of the ranks of all observations of the first variable (size

m).

Page 14: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (independent) samples- Wilcoxon Rank-Sum Test (Nonparametric)

• Statistics-> Nonparametric tests->Two sample Wilcoxon test ->

select response variable (e.g. PLUC.pre), group variable (e.g.

sex), Alternative hypothesis (e.g. two sided), Type of Test (e.g.

exact).Wilcoxon rank sum test

data: PLUC.pre by sex

W = 439, p-value = 0.8776

alternative hypothesis: true location shift is not equal to 0

Page 15: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (matched) samples - paired t-statistic

• Use: Compares equality of means of two matched or

paired samples (e.g. pretest versus posttest)

• Assumptions:

– Parent population is normal

– Sample observations (subjects) are independent

Page 16: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (matched) samples - paired t-statistic

• Formula – The following statistic follows t distribution with n-1 d.f.

Where, d is the difference of two matched samples and Sd is the standard deviation of the variable d.

ns

dt

d /=

Page 17: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Paired t-test:Rcmdr demo

• Statistics -> Means->paired t-test-> pick First variable (e.g.

PLUC.pre), pick second variable (e.g. PLUC.post), Alternative

Hypothesis (e.g. Two sided), Confidence level (e.g. 0.95). data: data$PLUC.pre and data$PLUC.post

t = -5.4953, df = 59, p-value = 8.739e-07

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-3.275342 -1.526769

sample estimates:

mean of the differences

-2.401056

Page 18: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (matched) samples Wilcoxon Signed-Rank Test (Nonparametric)

• USE:

– Compares medians of two paired samples.

• Test Statistic

– Obtain Differences of two variables, Di = X1i - X2i

– Take Absolute Value of Differences, Di

– Assign Ranks to absolute values (lower to higher), Ri

– Sum up ranks for positive differences (T+) and negative

differences (T-)

• Test Statistic is smaller of T- or T+ (2-tailed)

Page 19: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Two-group (matched) samples Wilcoxon Signed-Rank Test (Nonparametric)

• Statistics -> Nonparametric tests ->Paired sample Wilcoxon

test -> select first variable (e.g. PLUC. Pre) and second

variable (e.g. PLUC.post), Alternative hypothesis (e.g.two

sided), type of test (e.g. normal approximation)median(data$PLUC.pre - data$PLUC.post, na.rm=TRUE) # median difference

[1] -2.283691

> wilcox.test(data$PLUC.pre, data$PLUC.post, alternative='two.sided',

+ paired=TRUE)

Page 20: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Comparing more than two independent samples: F statistic (Parametric)

• Use:– Compare means of more than two groups – Test the equality of variances.

Page 21: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Comparing > 2 independent samples: F statistic (Parametric)

• Let X and Y be two independent Chi-square variables with n1 and n2 d.f. respectively, then the following statistic follows a F distribution with n1 and n2 d.f.

• Let, X and Y are two independent normal variables with sample sizes n1 and n2. Then the following statistic follows a F distribution with n1 and n2 d.f.

Where, sx2 and sy

2 are sample variances of X and Y.

2/

1/21 , nY

nXF nn =

2

2

, 21

y

xnn s

sF =

Page 22: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Comparing > 2 independent samples: F statistic (Parametric)

• Hypotheses:H0: µ1= µ2=…. =µn

Ha: µ1≠ µ2 ≠ …. ≠µn

Comparison will be done using analysis of variance (ANOVA) technique.

ANOVA uses F statistic for this comparison.

The ANOVA technique will be covered in another class session.

Page 23: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

More than two independent samples: One-way analysis of variance (ANOVA)

• Create a new variable: PLUC.chng=PLUC.post – PLUC.pre

(Data-> Manage variable in active dataset-> Compute new

variable -> New variable name (on the left): PLUC.chng =

PLUC.post – PLUC.pre (on the right).

• For ANOVA: Statistics -> means -> One-way ANOVA -> Select

model, response variable (e.g. PLUC.chng), group variable (e.g.

grp), pairwise comparison of means.

Page 24: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

AnovaModel.5 <- aov(PLUC.chng ~ grp, data=data)

Df Sum Sq Mean Sq F value Pr(>F) > summary(AnovaModel.5)grp 2 93.64 46.82 4.5839 0.01426 *Residuals 57 582.17 10.21 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'

0.1 ' ' 1

> numSummary(data$PLUC.chng , groups=data$grp, statistics=c("mean", "sd"))

mean sd n1 1.023051 3.686748 202 2.132612 2.509201 203 4.047504 3.279040 20

More than two independent samples: One-way analysis of variance (ANOVA)

Contd.

Page 25: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

threegrp <- aov(PLUC.chng ~ grp, data=data)

> summary(threegrp)

Df Sum Sq Mean Sq F value Pr(>F)

grp 2 93.64 46.82 4.5839 0.01426 *

Residuals 57 582.17 10.21

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> numSummary(data$PLUC.chng , groups=data$grp, statistics=c("mean", "sd"))

mean sd n

1 1.023051 3.686748 20

2 2.132612 2.509201 20

3 4.047504 3.279040 20

More than two independent samples: One-way analysis of variance (ANOVA)

Contd.

Page 26: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

-1 0 1 2 3 4 5

3 - 2

3 - 1

2 - 1 (

(

(

)

)

)

95% family-wise confidence level

Linear Function

More than two independent samples: One-way analysis of variance (ANOVA)

Page 27: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

More than two groups: Nonparametric Kruskal-Wallis Test

• Compares median of three or more groups or (means of ranks

of three or more groups)

• Rcmdr Demo: Statistics -> Nonparametric tests ->Kruskal-Wallis

test and then select response and group variables etc.

> kruskal.test(PLUC.chng ~ grp, data=data)

Kruskal-Wallis rank sum test

data: PLUC.chng by grp

Kruskal-Wallis chi-squared = 6.5452, df = 2, p-value = 0.03791

Page 28: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Proportion Tests

• Use– Test for equality of two Proportions

• E.g. proportions of subjects in two treatment groups who benefited from treatment.

– Test for the value of a single proportion• E.g., to test if the proportion of smokers in a population

is some specified value (less than 1)

Page 29: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Proportion Tests

• Formula– One Group:

– Two Groups:

npp

ppz

)1(

ˆ

00

0

−−

=

.ˆ where

)11

)(ˆ1(ˆ

ˆˆ

21

21

21

21

nn

xxp

nnpp

ppz

++

=

+−

−=

Page 30: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Proportion Test

• Statistics-> proportions -> single sample (e.g.sex) or two sample-> Select, variable, Alternative hypothesis, null hypothesis, Confidence Level, Type of test.

> .Tablesex f m 30 30

> binom.test(rbind(.Table), alternative='two.sided', p=.5, conf.level=.95)

Exact binomial test

data: rbind(.Table) number of successes = 30, number of trials = 60, p-value = 1alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.368062 0.631938 sample estimates:probability of success 0.5

Page 31: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Chi-square statistic

• USE – Testing the population variance σ2= σ0

2.

– Testing the goodness of fit. – Testing the independence/ association of attributes

• Assumptions– Sample observations should be independent.– Cell frequencies should be >= 5.– Total observed and expected frequencies are equal

Page 32: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Chi-square statistic

• Formula: If xi (i=1,2,…n) are independent and normally distributed

with mean µ and standard deviation σ, then,

• If we don’t know µ, then we estimate it using a sample mean and

then,

d.f.n on with distributi a is 2

1

2

χσ

μ∑=

⎟⎠

⎞⎜⎝

⎛ −n

i

ix

d.f. 1)-(non with distributi a is 2

1

2

χσ∑

=⎟⎠

⎞⎜⎝

⎛ −n

i

i xx

Page 33: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Chi-square statistic

• For a contingency table, we use the following chi- square

test statistic,

Frequency Expected

Frequency Observed

d.f. 1)-(n with as ddistribute ,)( 2

1

22

==

−=∑

=

i

i

n

i i

ii

EO

EEO χχ

Page 34: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Chi-square statistic

Male

O(E)

Female

O(E)

Total

Group 1 9 (10) 11 (10) 20

Group 2 8 (10) 12 (10) 20

Group 3 11 (10) 9(10) 20

30 30 60

Page 35: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Chi-square statistic – calculation of expected frequency

• To obtain the expected frequency for any cell, use:

• Corresponding (row total X column total) / grand total

• E.g: cell for group 1 and female, substituting: (30 X

20 / 60) = 10

Page 36: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Chi-square statistic:Rcmdr demonstration

• Statistics->Contingency tables -> Two way table -> Pick

row and column variable, select type of cell percentage

you want (e.g. row), hypothesis tests (e.g. chi-square)> rowPercents(.Table) # Row Percentages grpsex 1 2 3 Total Count f 30.0 40.0 30.0 100.0 30 m 36.7 26.7 36.7 100.1 30

> .Test <- chisq.test(.Table, correct=FALSE)

> .Test

Pearson's Chi-squared test

data: .Table X-squared = 1.2, df = 2, p-value = 0.5488

Page 37: Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Thank you