outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/kechris...

10
9/9/15 1 Introduc.on to Concepts in Sta.s.cs 9/10/15 Katerina Kechris Department of Biosta/s/cs and Informa/cs Computa/onal Bioscience Program Outline 1. Tests of significance 2. Exercises 3. NonEparametric tests 4. Mul.ple tes.ng 5. Power & sample size Tests of significance (Hypothesis Tes.ng) To evaluate whether your observa.ons are due to chance (null hypothesis ) or due to real effects (alterna.ve hypothesis ). Null hypothesis not necessarily formulated in the same manner as the scien.fic hypothesis . Example: Scien.fic hypothesis: Drug treatment improves blood pressure. (effect) Null hypothesis: There is no effect of drug treatment on blood pressure. (varia.ons are due to chance) Concept: Expected vs observed Determine the expected value you would observe by chance and evaluate how extreme your observed value is. Example Observed difference: E0.08 (mean of observa.ons) Expected difference: 0 (specified by null hypothesis)

Upload: phungtram

Post on 08-Aug-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

1%

Introduc.on%to%Concepts%in%Sta.s.cs%%9/10/15%

Katerina%Kechris%Department)of)Biosta/s/cs)and)Informa/cs)

Computa/onal)Bioscience)Program)

Outline%%

1.  Tests%of%significance%2.  Exercises%3.  NonEparametric%tests%4.  Mul.ple%tes.ng%

5.  Power%&%sample%size%

Tests%of%significance%%(Hypothesis%Tes.ng)%

•  To%evaluate%whether%your%observa.ons%are%due%to%chance%(null%hypothesis)%or%due%to%real%effects%(alterna.ve%hypothesis).%%

•  Null%hypothesis%not%necessarily%formulated%in%the%same%manner%as%the%scien.fic%hypothesis.%

Example:%•  Scien.fic%hypothesis:%Drug%treatment%improves%blood%pressure.%(effect)%

•  Null%hypothesis:%There%is%no%effect%of%drug%treatment%on%blood%pressure.%(varia.ons%are%due%to%chance)%

Concept:%Expected%vs%observed%

•  Determine%the%expected%value%you%would%observe%by%chance%and%evaluate%how%extreme%your%observed%value%is.%

Example)

•  Observed%difference:%E0.08%(mean%of%observa.ons) %%

•  Expected%difference:%0%(specified%by%null%hypothesis)%

Page 2: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

2%

Example%

Subject( Gene( Tissue(0( Tissue(1( yij(=((log(yij1/yij0)(

1% 1% y110% y111% y11%

2% 1% y210% y211% y21%

3% 1% y310% y311% y31%

4% 1% y410% y411% y41%

5% 1% y510% y511% y51%

6% 1% y610% y611% y61%

7% 1% y710% y711% y71%

8% 1% y810% y811% y81%

9% 1% y910% y911% y91%

Subject( yij(=((log(yij1/yij0)(

1% E1.1%

2% 0.46%

3% E0.34%

4% 0.29%

5% 0.82%

6% E1.09%

7% 0.50%

8% E1.44%

9% 1.19%

Example)Observed%difference:%E0.08%(mean%of%observa.ons) %%Expected%difference:%0%(specified%by%null%hypothesis)%

Concept:%Standard%error%

•  How%extreme%is%the%observa.on%E0.08%from%0?%%%

•  We%need%to%scale%this%distance%in%units%of%standard%errors%(SE).%

Concept:%Standard%Error%(SE)%vs%Standard%Devia.on%(SD)%

•  SD%(%%%)%is%the%measure%of%spread%in%the%popula.on.%%

•  SE%(%%%%%)%is%the%measure%of%spread%in%the%sample%mean.%

σ

σX

Concept:%Test%sta.s.c%

•  The%tEsta.s.c%tells%us%how%many%SE’s%the%observa.on%is%from%the%expected%value.%

%%%%tEsta.s.c%%=%%

•  The%tEsta.s.c%is%an%example%of%a%test%sta.s.c.%

•  Test%sta.s.cs%measure%difference%between%data%and%what%is%expected%under%null%hypothesis.%

Page 3: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

3%

Concept:%Significance%Level%(pEvalue)%

Example:%•  mean%=%E0.08,%SD%=%.95,%n=9,%tEsta.s.c%=%=%E0.25%•  Is%E.25%extreme?%That%is,%what%is%the%chance%that%we%observe%a%

mean%value%that%is%E.25%SE’s%from%the%expected%value?%

•  Observed%significance%level%(or%pEvalue)%is%the%chance%of%obtaining%a%testEsta.s.c%extreme%or%more%extreme%as%observed%one.%

•  Computed%on%the%basis%of%null%hypothesis.%%•  Small%pEvalue%evidence%against%the%null%hypothesis%and%indicates%something%besides%chance%(a%real%effect)%%opera.ng%to%make%difference.%

Concept:%Compu.ng%tEtest%pEvalue%

Example:)t;distribu/on)with)8)degrees)of)freedom)(df))%•  Degrees%of%freedom%(df)%=%#%obs%E%1%(est.%sample%mean))%

•  Chance%we%observe%a%mean%value%that%is%E0.25%SE’s%from%the%expected%value?%

•  Calculate%area%under%the%curve%to%the%% %lea%of%(&%including)%E0.25%%%(standard%tables%&%soaware%do%this).%%

•  This%is%a%oneEsided%test.%%%•  pEvalue%(area%under%the%curve%≤%E0.25)%=%.40%

Concept:%OneEsided%vs%TwoEsided%Test%

•  OneEsided%(or%oneEtailed)%vs.%twoEsided%(or%twoEtailed)%•  Depends%on%precise%form%of%alterna.ve%hypothesis.%•  Alterna.ve%hypothesis%1:%Drug%treatment%improves%blood%pressure.%(oneEsided%E%right)%

•  Alterna.ve%hypothesis%2:%Drug%treatment%affects%blood%pressure.%(twoEsided)%

Summary%

1.  Set%up%the%null%hypothesis%

2.  Pick%a%testEsta.s.c%

3.  Compute%the%observed%significance%level%

Page 4: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

4%

Outline%%

1.  Tests%of%significance%2.  Exercises%3.  NonEparametric%tests%4.  Mul.ple%tes.ng%

5.  Power%&%sample%size%

What%test%to%use?%

•  Parametric%tests:%assume%data%are%distributed%according%to%a%known%family%of%probability%distribu.ons%(e.g.,%normal).%%–  If%devia.ons%from%distribu.on%of%interest%(e.g.,%Gaussian),%s.ll%appropriate%(robust%to%outliers)%if%sample%size%large%(>30).%

•  NonEparametric%tests%make%no%assump.ons%about%the%popula.on%distribu.on%(distribu.onEfree%tests).%%–  RankEbased%&%permuta.on%tests%– May%be%important%when%gross%viola.ons%to%distribu.onal%assump.ons%

Concept:%Rank%Tests%

•  Use%ranks%of%data%points.%%•  Wilcoxon%Rank%Sum%Test%(MannEWhitney%Test)%%alterna.ve%for%twoEsample%tEtest%

•  Evaluate%if%two%random % % % % % %%samples%are%from%same% % %%%%%%%%%%%%%%%%distribu.on%%or%if%shiaed% % % % % % % %%%%%%%in%loca.on.%

Example:%Rank%Test%

)Suppose%we%have%two%groups%and%expression%values%(log2)%for)m%replicate%samples:%

%x1,)x2,).).).,)xm%for%group%1%and%y1,)y2,).).).,)ym%for%group%2%

Is%the%distribu.on%of%the%expression%values%for%these%groups%significantly%different%(are%they%shiaed?)%

Group%1%Group%2%% % %EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE% % % % % % %Expression%values%

Page 5: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

5%

Example:%Rank%Test%

Sample%data%

Combine%groups%and%determine%the%overall%ranks:%

Example:%Rank%Test%

•  Are%the%ranks%for%group%1%(or%groups%2)%sufficiently%large?%%

•  Use%testEsta.s.c%sum%of%the%ranks%for%group%1%(SR*)=%28).%%

•  What%is%the%null%hypothesis?%Is%this%value%extreme?%

•  %%%%%=252%possible%ranks%for%each%group.%All%equally%likely.%%

Example:%Rank%Test%

•  Calculate%SR)for%each%possible%sample:%

Example:%Rank%Test%

•  Recall%the%defini.on%of%a%pEvalue!%%%% % % % %P(SR)≥)SR*)%=%#%(SR)≥)SR*)%%% % % % % % % % %%252%•  If%the%sample%size%is%large,%can%use%a%Normal%approxima.on.%%

15% 40%SR%

1/252%Prob

ability%

SR*%

Page 6: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

6%

Concept:%Permuta.on%Tests%

•  To%evaluate%significance%of%observed%testEsta.s.c.%%

•  Evaluate%all%possible%values%of%testEsta.s.c%on%permuted%data%sets%where%the%labels%have%been%rearranged%on%the%observed%data.%

•  The%null%hypothesis%is%generated%from%the%permuta.ons%(do%not%need%to%assume%distribu.on)%%

Example:%Permuta.on%Tests%

•  Calculate%tEsta.s.c%on%previous%example%

TwoEsample%tEsta.s.c%t*%=%0.22%

•  Do%not%assume%tEdistribu.on,%use%permuta.ons%

Example:%Permuta.on%Tests%

•  Permuted%data%set%1%

TwoEsample%tEsta.s.c%t1%=%E1.37%

•  Permuted%data%set%2%

TwoEsample%tEsta.s.c%t2%=%2.25%

•  ….%repeat%many%.mes…….%

Example:%Permuta.on%Tests%

•  If%sample%small%enough,%all%permuta.ons%(p)%can%be%evaluated.%Otherwise,%sample%randomly%(e.g.,%10000%.mes)%from%all%possible%permuta.ons.%

•  Recall%the%defini.on%of%a%pEvalue!%%P(tp))≥%%t)%=%%%%%%%%%% % % % %#%(tp))≥%%t))%%

%%% %%%%%%%%%%#%of%permuta.ons%

•  Possible%pEvalues%for%both%examples%are%discrete.%

Page 7: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

7%

Outline%%

1.  Tests%of%significance%2.  Exercises%3.  NonEparametric%tests%4.  Mul.ple%tes.ng%

5.  Power%&%sample%size%

Mul.ple%Tes.ng%

•  Suppose%we%are%tes.ng%~20,000%genes%for%differen.al%expression.%%%

•  What%is%the%null%hypothesis%for%each%gene?%Suppose%that%the%null%hypothesis%is%true%for%each%gene.%%

•  If%we%apply%a%pEvalue%(or%significance%level)%cutoff%of%0.01,%how%many%.mes%do%we%expect%to%incorrectly%reject%the%null%hypothesis%(i.e.,%observe%a%pEvalue%≤%.01)?%

Sta.s.cal%Inference%Decision%Matrix%

power%E%probability%of%rejec.ng%null%hypothesis%when%it%is%false.%It%is%probability%of%predic.ng%a%real%effect.%

Different%error%rates:%

•  perEcomparison%error%(PCER)%rate%is%expected%propor.on%of%true%null%hypo.%rejected%over%the%total%number%of%hypo.%

•  familyEwise%error%rate%(FWER)%is%probability%of%rejec.ng%>=1%true%hypo.%•  false%discovery%rate%(FDR)%is%the%expected%propor.on%of%false%predic.ons%

among%all%the%predic.ons%(null%hypo.%rejec.ons)%

Types%of%Control%

•  Many%different%mul.ple%tes.ng%correc.ons%based%on%what%type%of%error%rate%is%controlled%(FWER,%FDR,%etc.).%%

•  Bonferroni%procedure%controls%the%FWER.%%–  If%m%hypothesis%are%being%tested,%divide%your%significance%level%by%m%(e.g.,%.05/25000).%

•  This%procedure%is%very%conserva.ve%(i.e.,%real%effects%may%be%missed).%%

Page 8: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

8%

Types%of%Control %%

•  In%gene%expression%studies,%controlling%FDR%may%be%more%appropriate.%%

•  With%FDR%control%at%5%,%if%100%genes%significant,%this%set%is%enriched%with%95%%truly%differen.al%expressed%genes.%%

•  The%power%is%increased,%but%the%likelihood%of%type%I%errors%increases.%%

•  Conceptually%the%FDR%cutoff%is%not%a%pEvalue%cutoff!%

Example:%FDR%

Benjamini%and%Hochberg%(1995)%procedure%controls%the%FDR.%%

Suppose%we%obtain%pEvalues%for%all%genes:%1. %Sort%all%the%pEvalues%p1%to%pm%from%smallest%to%largest%

2. %Find%the%largest%k%so%that%pk)≤)q*(k)/)m)%3. %Reject%all%hypotheses%through%cutoff%value%c=pk%

Example:)q=0.1)and)m=10)(tests)%%

Example:%FDR%

Example:)q=0.1)and)m=10)(tests)%%

With%cutoff%value%c)=%0.029%%3%rejected%hypothesis%%Expected%that%q=10%%of%tests%rejected%are%false%discoveries%(null%hypothesis%true)%%

Bonferroni%correc.on%at%significance%level%0.1%%1%rejected%hypothesis%[c)=%0.10%/%10%=%0.010]%

Outline%%

1.  Tests%of%significance%2.  Exercises%3.  NonEparametric%tests%4.  Mul.ple%tes.ng%

5.  Power%&%sample%size%

Page 9: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

9%

Power%and%Sample%Size%

•  Power%analysis/sample%size%es.ma.on%depends%on%the%significance%level%and%effect%size.%%

•  The%greater%the%effect%size,%the%greater%the%power.%%

•  There%are%many%soaware%packages%to%calculate%sample%size.%

TEtest%Sample%Size%

To%calculate%the%es.mated%sample%size%you%need:%

•  Effect%size%%–  difference%between%means,%standard%devia.on%

•  Significance%level%–  %Type%I%error%probability%(e.g.,%.05)%

•  Power%of%test%%–  1%minus%Type%II%error%probability%(e.g.,%80%)%

•  Type%of%tEtest%–  one,%two%or%pairedEsamples%

•  Alterna.ve%hypothesis%–  %oneEsided%(greater%or%less)%or%twoEsided%

Comments%

•  PreEdetermined%sample%size%can%be%replaced%above%for%power%to%es.mate%the%power%of%your%test%for%that%given%sample%size.%%

•  A%power%of%level%of%≥0.80%is%considered%good%power.%%

•  The%effect%size%may%be%supported%by%previous%work%or%from%the%literature.%

Page 10: Outline%% - compbio.ucdenver.educompbio.ucdenver.edu/77112015/Kechris CPBS7711-IntroStats-9-10-15.pdf · 9/9/15% 4% Outline%% 1. Tests%of%significance% 2. Exercises% 3. –NonEparametric%tests%

9/9/15%

10%

Outline%%Take((bio)sta:s:cs(course(s)!(

BIOS%6606%(Sta.s.cs%for%the%Basic%Sciences)%%

BIOS%6611/12%(Biosta.s.cal%Methods) %%BIOS%6631/31%(Sta.s.cal%Theory)%

BIOS%7731%(Mathema.cal%Sta.s.cs,%Kechris)%BIOS%7659%(Sta.s.cal%Methods%in%Genomics,%Kechris)%–%

next%fall?%