false discovery rate part i : introduction et enjeux

63
False Discovery Rate Part I : introduction et enjeux E. Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Point de Vue, 3rd February 2014 E. Roquain FDR : introduction, enjeux et perspectives. Part I. 1 / 33

Upload: others

Post on 14-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: False Discovery Rate Part I : introduction et enjeux

False Discovery RatePart I : introduction et enjeux

E. Roquain1

1Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France

Point de Vue, 3rd February 2014

E. Roquain FDR : introduction, enjeux et perspectives. Part I. 1 / 33

Page 2: False Discovery Rate Part I : introduction et enjeux

1 Introduction

2 False discovery rate control

3 FDR in other statistical issues

E. Roquain FDR : introduction, enjeux et perspectives. Part I. 2 / 33

Page 3: False Discovery Rate Part I : introduction et enjeux

1 Introduction

2 False discovery rate control

3 FDR in other statistical issues

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 3 / 33

Page 4: False Discovery Rate Part I : introduction et enjeux

A “multiple testing joke" (http://xkcd.com)

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33

Page 5: False Discovery Rate Part I : introduction et enjeux

A “multiple testing joke" (http://xkcd.com)

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33

Page 6: False Discovery Rate Part I : introduction et enjeux

A “multiple testing joke" (http://xkcd.com)

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33

Page 7: False Discovery Rate Part I : introduction et enjeux

A “multiple testing joke" (http://xkcd.com)

Multiplicity problem

P( make at least one false discovery )� P( the i-th is a false discovery )

A correction is needed to assess significancy!E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33

Page 8: False Discovery Rate Part I : introduction et enjeux

Some other examples

Paradoxes due to large scale experiments

Probable facts appear significant

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 5 / 33

Page 9: False Discovery Rate Part I : introduction et enjeux

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

PLoS Medicine | www.plosmedicine.org 0696

Essay

Open access, freely available online

August 2005 | Volume 2 | Issue 8 | e124

Published research fi ndings are sometimes refuted by subsequent evidence, with ensuing confusion

and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false fi ndings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research fi ndings are false. Here I will examine the key

factors that infl uence this problem and some corollaries thereof.

Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confi rmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research fi ndings solely on the basis of a single study assessed by formal statistical signifi cance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles

should be interpreted based only on p-values. Research fi ndings are defi ned here as any relationship reaching formal statistical signifi cance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null fi ndings.

As has been shown previously, the probability that a research fi nding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical signifi cance [10,11]. Consider a 2 × 2 table in which research fi ndings are compared against the gold standard of true relationships in a scientifi c fi eld. In a research fi eld both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the fi eld. R

is characteristic of the fi eld and can vary a lot depending on whether the fi eld targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fi elds where either there is only one true relationship (among many that can be hypothesized) or the power is similar to fi nd any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study fi nding a true relationship refl ects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists refl ects the Type I error rate, α. Assuming that c relationships are being probed in the fi eld, the expected values of the 2 × 2 table are given in Table 1. After a research fi nding has been claimed based on achieving formal statistical signifi cance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R⁄(R − βR + α). A research fi nding is thus

The Essay section contains opinion pieces on topics of broad interest to a general medical audience.

Why Most Published Research Findings Are False John P. A. Ioannidis

Citation: Ioannidis JPA (2005) Why most published research fi ndings are false. PLoS Med 2(8): e124.

Copyright: © 2005 John P. A. Ioannidis. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abbreviation: PPV, positive predictive value

John P. A. Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America. E-mail: [email protected]

Competing Interests: The author has declared that no competing interests exist.

DOI: 10.1371/journal.pmed.0020124

SummaryThere is increasing concern that most

current published research fi ndings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientifi c fi eld. In this framework, a research fi nding is less likely to be true when the studies conducted in a fi eld are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater fl exibility in designs, defi nitions, outcomes, and analytical modes; when there is greater fi nancial and other interest and prejudice; and when more teams are involved in a scientifi c fi eld in chase of statistical signifi cance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientifi c fi elds, claimed research fi ndings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

It can be proven that most claimed research

fi ndings are false.

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33

Page 10: False Discovery Rate Part I : introduction et enjeux

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

[Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments

1000 pure noise 30 perfect signal

publish results with a p-value ≤ 0.05

' 50 false discoveries 30 true discoveries

I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the

top medical literature ' 14%

I Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and

application to the top medical literature" is false

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33

Page 11: False Discovery Rate Part I : introduction et enjeux

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

[Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments

1000 pure noise 30 perfect signal

publish results with a p-value ≤ 0.05

' 50 false discoveries 30 true discoveries

I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the

top medical literature ' 14%

I Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and

application to the top medical literature" is false

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33

Page 12: False Discovery Rate Part I : introduction et enjeux

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

[Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments

1000 pure noise 30 perfect signal

publish results with a p-value ≤ 0.05

' 50 false discoveries 30 true discoveries

I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the

top medical literature ' 14%

I Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and

application to the top medical literature" is false

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33

Page 13: False Discovery Rate Part I : introduction et enjeux

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

[Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments

1000 pure noise 30 perfect signal

publish results with a p-value ≤ 0.05

' 50 false discoveries 30 true discoveries

I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the

top medical literature ' 14%

I Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and

application to the top medical literature" is false

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33

Page 14: False Discovery Rate Part I : introduction et enjeux

Multiplicity in microarray [Hedenfalk et al. (2001)]

BRCA1 vs BRCA2

gene

s

I expression level (activity)I genes differentially activated?I 1 test for each geneI thousands of genes

I nb replications� dimensionI correlations

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 7 / 33

Page 15: False Discovery Rate Part I : introduction et enjeux

Other applications

I Neuroimaging (FMRI)activated regions?

I Econometricswinning strategies?

I Astronomydirections with stars?

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 8 / 33

Page 16: False Discovery Rate Part I : introduction et enjeux

Canonical setting

I Xi = avg group 2 - avg group 1 (rescaled) for genes i

I Gaussian model : X1X2...

Xm

= µ

H1H2...

Hm

+

ε1ε2...εm

,

with µ > 0, H ∈ {0,1}m (fixed) and ε ∼ N (0, Γ) (Γi,i = 1).

I Γ = dependence structure = Im for now

Question: for each i , Hi = 0 or Hi = 1 ?Multiple testing : favors the “0" decision

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 9 / 33

Page 17: False Discovery Rate Part I : introduction et enjeux

Individual decision and errors

I Test statistic: Xi

I p-value: pi = Φ(Xi ), with Φ(z) = P(Z ≥ z), Z ∼ N (0,1)

pi such that

if Hi = 0, pi ∼ U(0,1)

if Hi = 1, pi ∼ Φ(Φ−1

(·)− µ)

I Choose Hi = 1{pi ≤ t} for some threshold t

I Two errors:Hi = 0 Hi = 1

Hi = 0 true negative false positiveHi = 1 false negative true positive

I False positive more annoying

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 10 / 33

Page 18: False Discovery Rate Part I : introduction et enjeux

Picture 1. m = 100; m0 = 50; µ = 2; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

000000000000

00000

0000

00

00000

0

00000

0000000

0000

00000

11111111111111111111111111

1 1111111111111 111

111

1

1

1

1

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 11 / 33

Page 19: False Discovery Rate Part I : introduction et enjeux

Picture 2. m = 100; m0 = 95; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

000000

0000000

0000

0000000

000000000

000000

00000000

0000000

00000

00000000

00000

00000000

00000

000

0000000

11111

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 12 / 33

Page 20: False Discovery Rate Part I : introduction et enjeux

Picture 3. m = 100; m0 = 50; µ = 0.01; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0000000

00

0000

00

0000

00

00000

000

0 0000000

00

0000

0000

000

11 1

1

1111

11111

11111

111

11

1111

111

11 1111

11111

11 111

11

11

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 13 / 33

Page 21: False Discovery Rate Part I : introduction et enjeux

Picture 4. m = 100; m0 = 95; µ = 0.01; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

000000

00000000

000000

0000000

000000

000

0000

000000

0000000

00000000000

000000

000000

00000

000000

00000000

1

1

1

1

1

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 14 / 33

Page 22: False Discovery Rate Part I : introduction et enjeux

Doing like for 1 test? t ≡ α = 0.1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 15 / 33

Page 23: False Discovery Rate Part I : introduction et enjeux

Doing like for 1 test? t ≡ α = 0.1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0000 0000

0000000

0000000000

0000000

00000

000000

0000000000

000000

0000

000

0000

00000000

00000

000000

000000

11

1

1

1

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 15 / 33

Page 24: False Discovery Rate Part I : introduction et enjeux

Doing like for 1 test? t ≡ α = 0.1

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 15 / 33

Page 25: False Discovery Rate Part I : introduction et enjeux

Union bound Bonferroni? t ≡ α/m = 0.1/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 16 / 33

Page 26: False Discovery Rate Part I : introduction et enjeux

Union bound Bonferroni? t ≡ α/m = 0.1/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0

00

0

0

00

00

0

0

0

000

1111111111111111

111111

11111

111

11111111

111

11

11

1

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 16 / 33

Page 27: False Discovery Rate Part I : introduction et enjeux

Do something in between! t` = α`/m = 0.1`/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 17 / 33

Page 28: False Discovery Rate Part I : introduction et enjeux

Do something in between! t` = α`/m = 0.1`/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0

00

0

0

00

00

0

0

0

000

1111111111111111

111111

11111

111

11111111

111

11

11

1

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 17 / 33

Page 29: False Discovery Rate Part I : introduction et enjeux

Do something in between! t` = α`/m = 0.1`/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 17 / 33

Page 30: False Discovery Rate Part I : introduction et enjeux

Smart !. . . and rigorous ?

E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 18 / 33

Page 31: False Discovery Rate Part I : introduction et enjeux

1 Introduction

2 False discovery rate control

3 FDR in other statistical issues

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 19 / 33

Page 32: False Discovery Rate Part I : introduction et enjeux

BH procedure

p-value view c.d.f. view

k = max{0 ≤ i ≤ m : p(i) ≤ αi/m} t = max{t ∈ [0,1] : Gm(t) ≥ t/α}

t = αk/m

Hi = 1{pi ≤ t} = 1{Xi ≥ Φ−1

(t)}

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 20 / 33

Page 33: False Discovery Rate Part I : introduction et enjeux

False discovery rate control

For a decision Hi = 1{pi ≤ t} (∀i),

FDP(t) =#{i : Hi = 0, Hi = 1}

#{i : Hi = 1}

(00

= 0)

FDR(t) = E[FDP(t)]

Theorem [Benjamini and Hochberg (1995)] [Benjamini and Yekutieli (2001)]

If Γ = Im and t threshold of BH procedure, ∀µ,H,

FDR(t) = (m0/m)α ≤ α

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 21 / 33

Page 34: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

0000

00

0000

00

1111111111111111111111

11

1

FDP(BH) = 0

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 35: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

00

000

000

0

000

0

0

1111111111111111111

11

111

FDP(BH) = 0.16

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 36: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0 0

000

0

000

00

0

1111111111111111111

111

1

1

1

FDP(BH) = 0.0833

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 37: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

00

00

000

0

1111111111111111111

1111

1

1

FDP(BH) = 0.08

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 38: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

000

00

00

0

0000

111111111111111

11111

11

1

1

1

FDP(BH) = 0.12

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 39: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0 000

0

0

000000

11111111111111111111

111

1

1FDP(BH) = 0.167

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 40: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

0

0

00

0

0

111111111111111111111

1

1

11

FDP(BH) = 0.0435

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 41: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

00

0

000

00

111111111111111

111111

111

1

FDP(BH) = 0

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 42: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0000

0000

0000

00

00

1111111111111111111

1

11

11

1

FDP(BH) = 0

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 43: False Discovery Rate Part I : introduction et enjeux

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

0

0

00000

00000

111111111111111111111

111

1

FDP(BH) = 0.04

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 22 / 33

Page 44: False Discovery Rate Part I : introduction et enjeux

Benjamini and Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to

multiple testing

False Discovery Rate 411

procedure such that FDR !α. However, later work has blurred this distinction: there are methods thatgiven a rejection procedureR estimate the unknown quantity ERV=R.

ImpactIn many ways, Benjamini and Hochberg (1995) is a very successful paper. Its influence is clear from its 4967citations (according to the Web of Science at the time of this session), which are still on the rise each year ascan be seen in Fig. 1. Although 607 of these are in the area of statistics and probability, the majority of thesepublications are in the life sciences, from genetics to biochemistry, from oncology to plant sciences, reflect-ing in large part the use of FDR in microarray-related research. Importantly, citations in other high dimen-sional application areas, such as neural imaging, are on the rise also, showing its ability to be applied in manydiverse types of application. The list of the 10 highest cited papers that cite Benjamini and Hochberg (1995),which is shown in Table 1, is particularly interesting, because it includes six statistical papers, suggestingthat further theoretical and methodological developments of the method have had significant influence.

1996 1998 2000 2002 2004 2006 2008

Year

Num

ber

of c

itatio

ns0

200

400

600

800

1000

1200

Fig. 1. Rapidly increasing number of citations of Benjamini and Hochberg (1995), suggesting that its influ-ence has not yet reached its peak (note that the figure for 2009 is only partially shown)

Table 1. 10 most cited papers that cite Benjamini andHochberg (1995)

Rank Article citing Benjamini Number ofand Hochberg (1995) citations

1 Tusher et al. (2001) 37232 Storey and Tibshirani (2003) 14123 Weisberg et al. (2003) 11874 Genovese et al. (2002) 10205 Storey (2002) 7266 Wilkinson (1999) 6527 Benjamini and Yekutieli (2001) 5848 Wacholder et al. (2004) 4869 Patti et al. (2003) 479

10 Dudoit et al. (2002) 459

[Benjamini (2010,JRSSB)]

now > 20,000 citations on google scholar !

E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 23 / 33

Page 45: False Discovery Rate Part I : introduction et enjeux

1 Introduction

2 False discovery rate control

3 FDR in other statistical issues

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 24 / 33

Page 46: False Discovery Rate Part I : introduction et enjeux

Why should FDR thresholding be adaptive to sparsity?

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 25 / 33

Page 47: False Discovery Rate Part I : introduction et enjeux

Why should FDR thresholding be adaptive to sparsity?

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 25 / 33

Page 48: False Discovery Rate Part I : introduction et enjeux

Why should FDR thresholding be adaptive to sparsity?

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0

00

0

0

00

00

0

0

0

000

1111111111111111

111111

11111

111

11111111

111

11

11

1

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 25 / 33

Page 49: False Discovery Rate Part I : introduction et enjeux

[Linnemann] - increasing signal strength

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33

Page 50: False Discovery Rate Part I : introduction et enjeux

[Linnemann] - increasing signal strength

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33

Page 51: False Discovery Rate Part I : introduction et enjeux

[Linnemann] - increasing signal strength

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33

Page 52: False Discovery Rate Part I : introduction et enjeux

[Linnemann] - increasing signal strength

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33

Page 53: False Discovery Rate Part I : introduction et enjeux

[Linnemann] - increasing signal strength

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33

Page 54: False Discovery Rate Part I : introduction et enjeux

Adaptation to unknown sparsity

t seems "adaptive" to the “quantity" of signal in the data

I Classification : where is the signal ?[Bogdan et al. (2011)], [Neuvial and R. (2012)]

I Detection: is there some signal ?[Ingster (2002)], [Donoho and Jin (2004)], etc

I Estimation: what is the value EX of the signal ?

EX = Xi 1{|Xi | ≥ t} (hard thresholding)

[Abramovich et al. (2006)], [Donoho and Jin (2006)]

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 27 / 33

Page 55: False Discovery Rate Part I : introduction et enjeux

Classification

Xi ∼ π0,m N (0,1) + (1− π0,m)N (µm,1),1 ≤ i ≤ m, i.i.d.

but π0,m → 1 (sparse) and µm →∞ (compensates sparsity).

I training set = null distribution known (one-class classification)

I Classification rule hm : R→ {0,1};I Risk

Rm(hm) = (1− π0)−1E(

m−1m∑

i=1

1{hm(Xi ) 6= Hi}).

I Classification boundary in (sparsity, signal) space such that

Above the boundary, ∃hm : Rm(hm)→ 0 (perfect classification)

Under the boundary, ∀hm,Rm(hm)→ 1 (unclassifiable)

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 28 / 33

Page 56: False Discovery Rate Part I : introduction et enjeux

Classification boundary

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

β

r

Perfect classification

Unclassifiable

µm =√

2r log m

π0,m = 1−m−β

BH hBHm (x) = 1{x ≥ Φ

−1(t)}

with αm ∝ (log m)−1/2

I Classification boundaryattained by BH.

On the boundary :risk BH ∼ Bayes risk.

[Bogdan et al. (2011)], [Neuvial and R. (2012)]

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 29 / 33

Page 57: False Discovery Rate Part I : introduction et enjeux

Detection : is there some signal ?

Same model

Xi ∼ π0,m N (0,1) + (1− π0,m)N (µm,1),1 ≤ i ≤ m, i.i.d.

but π0,m → 1 (sparse) and µm →∞ (compensates sparsity).

I Test H0 : “N (0, Im)" against H1 : “mixture".I Risk Rm(T ) = PH0 (T (X ) = 1) + PH1 (T (X ) = 0)

I Detection boundary in (sparsity, signal) space such that

Above the boundary, ∃T : Rm(T )→ 0 (perfect detection)Under the boundary, ∀T ,Rm(T )→ 1 (undetectable)

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 30 / 33

Page 58: False Discovery Rate Part I : introduction et enjeux

Detection boundary

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

β

r

Perfect classification

Perfect detection

Undetectable

µm =√

2r log m

π0,m = 1−m−β

T BH = 1{∃i : p(i) ≤ αmi/m}with αm ∝ (log m)−1/2

I Detection boundaryattained by BH whenβ ∈ (3/4,1)

I Better to use “highercriticism"

maxi

{√

mi/m − p(i)√p(i)(1− p(i))

}

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 31 / 33

Page 59: False Discovery Rate Part I : introduction et enjeux

LASSO and FDR

Regression with orthogonal design:

X ∼ N (β, Im)

[Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE)

β = arg minβ∈Rm

{12||X − β||2 +

m∑k=1

λk |β|(k)

}

where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m)

Selection with {i : βi 6= 0}:

I λk = λ = Φ−1

(α/(2m)) '√

2 log m Bonferroni

I λk = Φ−1

(αk/(2m)) '√

2 log(m/k) BH !

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 32 / 33

Page 60: False Discovery Rate Part I : introduction et enjeux

LASSO and FDR

Regression with orthogonal design:

X ∼ N (β, Im)

[Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE)

β = arg minβ∈Rm

{12||X − β||2 +

m∑k=1

λk |β|(k)

}

where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m)

Selection with {i : βi 6= 0}:

I λk = λ = Φ−1

(α/(2m)) '√

2 log m Bonferroni

I λk = Φ−1

(αk/(2m)) '√

2 log(m/k) BH !

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 32 / 33

Page 61: False Discovery Rate Part I : introduction et enjeux

Outlook

Some conclusions for FDR

⊕ Very simple⊕ Trade-off type I / power⊕ Adaptive to sparsity

Some issues

! Sensitive to null hypothesis! Choosing α! Calibrating test statistics

Main challenge

What about dependence ?

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 33 / 33

Page 62: False Discovery Rate Part I : introduction et enjeux

Outlook

Some conclusions for FDR

⊕ Very simple⊕ Trade-off type I / power⊕ Adaptive to sparsity

Some issues

! Sensitive to null hypothesis! Choosing α! Calibrating test statistics

Main challenge

What about dependence ?

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 33 / 33

Page 63: False Discovery Rate Part I : introduction et enjeux

Outlook

Some conclusions for FDR

⊕ Very simple⊕ Trade-off type I / power⊕ Adaptive to sparsity

Some issues

! Sensitive to null hypothesis! Choosing α! Calibrating test statistics

Main challenge

What about dependence ?

E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 33 / 33