adapt: a new class of ordered testing procedures · 2020. 11. 3. · rina foygel barber and...

22
AdaPT: A New Class of Ordered Testing Procedures Lihua Lei and William Fithian Department of Statistics, UC Berkeley JSM 2016, Chicago Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

Upload: others

Post on 01-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • AdaPT: A New Class of Ordered TestingProcedures

    Lihua Lei and William Fithian

    Department of Statistics, UC Berkeley

    JSM 2016, Chicago

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Table of Contents

    1 Setup

    2 Existing Methods

    3 Adaptive P-value Thresholding (AdaPT)

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Table of Contents

    1 Setup

    2 Existing Methods

    3 Adaptive P-value Thresholding (AdaPT)

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Multiple Testing Problem with FDR Control

    General setup: a sequence of hypotheses H1,H2, . . . ,Hn;

    H0 = {i : Hi is true} be the set of null hypotheses;S = {i : Hi is rejected} be the set of discoveries;FDP = VR∨1 be the False Discovery Proportion with V = |S|and R = |S ∩ H0|;FDR = EFDP be the False Discovery Rate, the target that aprocedure should control.

    A procedure that control FDR at level 0.1 produces arejection set S with roughly 90% being the true discoveries.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Ordered Hypothesis Testing

    Domain knowledge might be used to indicate whichhypothesis is more “promising”, i.e. likely to be rejected;

    Heuristically, more focus should be put on “promising”hypotheses;

    Sort H1, . . . ,Hn from most “promising” to least “promising”via the prior knowledge;

    A procedure that takes advantage of the ordering is called anordered hypothesis testing procedure.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Example: GEOquery Data

    GEOquery data1[LB15] consists of gene expressionmeasurements in response to estrogen in breast cancer cells;

    Consists of n = 22283 genes and two groups (a treatmentgroup and a control group) with 5 trials in each;

    Test Hi : F0i = F1i , where F0i and F1i are the distributions ofgene expression of gene i in the control group and thetreatment group, respectively;

    H1, . . . ,Hn are ordered by auxiliary data.

    1http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2324

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Example: GEOquery Data

    Original ordering

    Target FDR level α

    # of

    dis

    cove

    ries

    0.1 0.2 0.3

    040

    080

    0

    ● ●

    ● ● ●

    ● ● ● ●● ● ● ●

    SeqStepAccum. TestForwardStopAdaptive SeqStepBHStoreySABHAAdaPT

    Moderately informative ordering

    Target FDR level α

    # of

    dis

    cove

    ries

    0.1 0.2 0.3

    020

    0040

    00

    ● ● ●

    ●●●

    ● ●

    Highly informative ordering

    Target FDR level α

    # of

    dis

    cove

    ries

    0.1 0.2 0.3

    020

    0040

    00

    ● ● ●

    ●●

    ●●

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Table of Contents

    1 Setup

    2 Existing Methods

    3 Adaptive P-value Thresholding (AdaPT)

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Existing Methods Revisited: Accumulation Test

    F̂DPAT =C +

    ∑ki=1 h(pi )

    k + 1

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    Accumulation Test

    1 k n

    0s

    ● ● ●

    ●●

    ● ●

    ● ●

    h ∈ [0,C ],∫ 10 h(x)dx = 1;

    Find the maximum k suchthat F̂DPAT ≤ q;ForwardStop[GWCT15]:

    h(x) = − log(1− x);

    Seqstep[BC15]:

    h(x) =I (x > λ)

    1− λ;

    HingeExp[LB15]:

    h(x) = − I (x > λ)1− λ

    log(1− x1− λ

    ).

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Existing Methods Revisited: Selective Seqstep

    F̂DPSS =ks

    R(k ; s) ∨ 1·A(k ; s) + 1k(1− s)

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    Selective Seqstep

    1 k n

    0s

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k ; s) = |{i ≤ k : pi ≤ s}|;A(k; s) = |{i ≤ k : pi > s}|;s is pre-fixed;

    Find the maximum k suchthat F̂DPSS ≤ q.Turns out that the blue termshould be an approximationof π0,k ,

    π0,k =|{1, . . . , k} ∩ H0|

    k;

    Too conservative for small s.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Existing Methods Revisited: Adaptive Seqstep

    F̂DPAS =ks

    R(k ; s) ∨ 1·A(k ;λ) + 1k(1− λ)

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    Adaptive Seqstep

    1 k n

    0s

    lam

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k ; s) = |{i ≤ k : pi ≤ s}|;A(k;λ) = |{i ≤ k : pi > λ}|;s and λ are is pre-fixed;

    Find the maximum k suchthat F̂DPAS ≤ q;Much less conservative if alarge λ, say 0.5, is used.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Table of Contents

    1 Setup

    2 Existing Methods

    3 Adaptive P-value Thresholding (AdaPT)

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • AdaPT

    F̂DPAdaPT =A(k;λ) + 1

    R(k ; s) ∨ 1

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    AdaPT (Step 0)

    1 n

    0w

    1−w

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;

    Estimate next w by p̃i :

    p̃i =

    {pi (pi is black)NA (pi is red or blue)

    Repeat until F̂DPAdaPT ≤ q.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • AdaPT

    F̂DPAdaPT =A(k;λ) + 1

    R(k ; s) ∨ 1

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    AdaPT (Step 1)

    1 n

    0w

    1−w

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :

    p̃i =

    {pi (pi is black)NA (pi is red or blue)

    Repeat until F̂DPAdaPT ≤ q.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • AdaPT

    F̂DPAdaPT =A(k;λ) + 1

    R(k ; s) ∨ 1

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    AdaPT (Step 2)

    1 n

    0w

    1−w

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :

    p̃i =

    {pi (pi is black)NA (pi is red or blue)

    Repeat until F̂DPAdaPT ≤ q.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • AdaPT

    F̂DPAdaPT =A(k;λ) + 1

    R(k ; s) ∨ 1

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    AdaPT (Step 3)

    1 n

    0w

    1−w

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :

    p̃i =

    {pi (pi is black)NA (pi is red or blue)

    Repeat until F̂DPAdaPT ≤ q.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • AdaPT

    F̂DPAdaPT =A(k;λ) + 1

    R(k ; s) ∨ 1

    ● ● ●

    ●●

    ● ●

    ● ●

    Index

    pval

    s

    AdaPT (Step 4)

    1 n

    0w

    1−w

    1

    ● ● ●

    ●●

    ● ●

    ● ●

    R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :

    p̃i =

    {pi (pi is black)NA (pi is red or blue)

    Repeat until F̂DPAdaPT ≤ q.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Theorem 1.

    Assume that

    1 {pi : i = 1, . . . , n} are independent;2 {pi : i ∈ H0} are i.i.d. uniformly distributed on U[0, 1].

    Then AdaPT controls FDR at level q.

    Any method to update w guarantees the FDR control!

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Theorem 1.

    Assume that

    1 {pi : i = 1, . . . , n} are independent;2 {pi : i ∈ H0} are i.i.d. uniformly distributed on U[0, 1].

    Then AdaPT controls FDR at level q.

    Any method to update w guarantees the FDR control!

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • Real Example: GEOquery Data

    Original ordering

    Target FDR level α

    # of

    dis

    cove

    ries

    0.1 0.2 0.3

    040

    080

    0

    ● ●

    ● ● ●

    ● ● ● ●● ● ● ●

    SeqStepAccum. TestForwardStopAdaptive SeqStepBHStoreySABHAAdaPT

    Moderately informative ordering

    Target FDR level α

    # of

    dis

    cove

    ries

    0.1 0.2 0.3

    020

    0040

    00

    ● ● ●

    ●●●

    ● ●

    Highly informative ordering

    Target FDR level α

    # of

    dis

    cove

    ries

    0.1 0.2 0.3

    020

    0040

    00

    ● ● ●

    ●●

    ●●

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • References

    Rina Foygel Barber and Emmanuel J Candès.Controlling the false discovery rate via knockoffs.The Annals of Statistics, 43(5):2055–2085, 2015.

    Max Grazier G’Sell, Stefan Wager, Alexandra Chouldechova, andRobert Tibshirani.Sequential selection procedures and false discovery rate control.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 2015.

    Ang Li and Rina Foygel Barber.Accumulation tests for fdr control in ordered hypothesis testing.arXiv preprint arXiv:1505.07352, 2015.

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

  • THANK YOU!

    Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures

    SetupExisting MethodsAdaptive P-value Thresholding (AdaPT)