false discovery rate part i : introduction et enjeux

False Discovery RatePart I : introduction et enjeux

E. Roquain1

1Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France

Point de Vue, 3rd February 2014

E. Roquain FDR : introduction, enjeux et perspectives. Part I. 1 / 33

1 Introduction

2 False discovery rate control

3 FDR in other statistical issues

E. Roquain FDR : introduction, enjeux et perspectives. Part I. 2 / 33

1 Introduction



E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 3 / 33

A “multiple testing joke" (http://xkcd.com)


A “multiple testing joke" (http://xkcd.com)

Multiplicity problem

P( make at least one false discovery )� P( the i-th is a false discovery )

A correction is needed to assess significancy!E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33

Some other examples

Paradoxes due to large scale experiments

Probable facts appear significant


Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

PLoS Medicine | www.plosmedicine.org 0696

Essay

Open access, freely available online

August 2005 | Volume 2 | Issue 8 | e124

Published research fi ndings are sometimes refuted by subsequent evidence, with ensuing confusion

and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false fi ndings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research fi ndings are false. Here I will examine the key

factors that infl uence this problem and some corollaries thereof.

Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confi rmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research fi ndings solely on the basis of a single study assessed by formal statistical signifi cance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles

should be interpreted based only on p-values. Research fi ndings are defi ned here as any relationship reaching formal statistical signifi cance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null fi ndings.

As has been shown previously, the probability that a research fi nding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical signifi cance [10,11]. Consider a 2 × 2 table in which research fi ndings are compared against the gold standard of true relationships in a scientifi c fi eld. In a research fi eld both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the fi eld. R

is characteristic of the fi eld and can vary a lot depending on whether the fi eld targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fi elds where either there is only one true relationship (among many that can be hypothesized) or the power is similar to fi nd any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study fi nding a true relationship refl ects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists refl ects the Type I error rate, α. Assuming that c relationships are being probed in the fi eld, the expected values of the 2 × 2 table are given in Table 1. After a research fi nding has been claimed based on achieving formal statistical signifi cance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R⁄(R − βR + α). A research fi nding is thus

The Essay section contains opinion pieces on topics of broad interest to a general medical audience.

Why Most Published Research Findings Are False John P. A. Ioannidis

Citation: Ioannidis JPA (2005) Why most published research fi ndings are false. PLoS Med 2(8): e124.

Copyright: © 2005 John P. A. Ioannidis. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abbreviation: PPV, positive predictive value

John P. A. Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America. E-mail: [email protected]

Competing Interests: The author has declared that no competing interests exist.

DOI: 10.1371/journal.pmed.0020124

SummaryThere is increasing concern that most

current published research fi ndings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientifi c fi eld. In this framework, a research fi nding is less likely to be true when the studies conducted in a fi eld are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater fl exibility in designs, defi nitions, outcomes, and analytical modes; when there is greater fi nancial and other interest and prejudice; and when more teams are involved in a scientifi c fi eld in chase of statistical signifi cance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientifi c fi elds, claimed research fi ndings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

It can be proven that most claimed research

fi ndings are false.


Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

[Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments

⇓

1000 pure noise 30 perfect signal

⇓

publish results with a p-value ≤ 0.05

⇓

' 50 false discoveries 30 true discoveries

I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the

top medical literature ' 14%

I Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and

application to the top medical literature" is false


Multiplicity in microarray [Hedenfalk et al. (2001)]

BRCA1 vs BRCA2

gene

s

I expression level (activity)I genes differentially activated?I 1 test for each geneI thousands of genes

I nb replications� dimensionI correlations


Other applications

I Neuroimaging (FMRI)activated regions?

I Econometricswinning strategies?

I Astronomydirections with stars?


Canonical setting

I Xi = avg group 2 - avg group 1 (rescaled) for genes i

I Gaussian model : X1X2...

Xm

= µ

H1H2...

Hm

+

ε1ε2...εm

,

with µ > 0, H ∈ {0,1}m (fixed) and ε ∼ N (0, Γ) (Γi,i = 1).

I Γ = dependence structure = Im for now

Question: for each i , Hi = 0 or Hi = 1 ?Multiple testing : favors the “0" decision


Individual decision and errors

I Test statistic: Xi

I p-value: pi = Φ(Xi ), with Φ(z) = P(Z ≥ z), Z ∼ N (0,1)

pi such that

if Hi = 0, pi ∼ U(0,1)

if Hi = 1, pi ∼ Φ(Φ−1

(·)− µ)

I Choose Hi = 1{pi ≤ t} for some threshold t

I Two errors:Hi = 0 Hi = 1

Hi = 0 true negative false positiveHi = 1 false negative true positive

I False positive more annoying


Picture 1. m = 100; m0 = 50; µ = 2; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

000000000000

00000

0000

00

00000

0

00000

0000000

0000

00000

11111111111111111111111111

1 1111111111111 111

111

1

1

1

1


Picture 2. m = 100; m0 = 95; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

000000

0000000

0000

0000000

000000000

000000

00000000

0000000

00000

00000000

00000

00000000

00000

000

0000000

11111


Picture 3. m = 100; m0 = 50; µ = 0.01; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0000000

00

0000

00

0000

00

00000

000

0 0000000

00

0000

0000

000

11 1

1

1111

11111

11111

111

11

1111

111

11 1111

11111

11 111

11

11


Picture 4. m = 100; m0 = 95; µ = 0.01; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

000000

00000000

000000

0000000

000000

000

0000

000000

0000000

00000000000

000000

000000

00000

000000

00000000

1

1

1

1

1


Doing like for 1 test? t ≡ α = 0.1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0000 0000

0000000

0000000000

0000000

00000

000000

0000000000

000000

0000

000

0000

00000000

00000

000000

000000

11

1

1

1



0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11


Union bound Bonferroni? t ≡ α/m = 0.1/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11


Union bound Bonferroni? t ≡ α/m = 0.1/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0

00

0

0

00

00

0

0

0

000

1111111111111111

111111

11111

111

11111111

111

11

11

1


Do something in between! t` = α`/m = 0.1`/100

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20



0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0

00

0

0

00

00

0

0

0

000

1111111111111111

111111

11111

111

11111111

111

11

11

1



0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11


Smart !. . . and rigorous ?


1 Introduction



E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 19 / 33

BH procedure

p-value view c.d.f. view

k = max{0 ≤ i ≤ m : p(i) ≤ αi/m} t = max{t ∈ [0,1] : Gm(t) ≥ t/α}

t = αk/m

Hi = 1{pi ≤ t} = 1{Xi ≥ Φ−1

(t)}


False discovery rate control

For a decision Hi = 1{pi ≤ t} (∀i),

FDP(t) =#{i : Hi = 0, Hi = 1}

#{i : Hi = 1}

(00

= 0)

FDR(t) = E[FDP(t)]

Theorem [Benjamini and Hochberg (1995)] [Benjamini and Yekutieli (2001)]

If Γ = Im and t threshold of BH procedure, ∀µ,H,

FDR(t) = (m0/m)α ≤ α


Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

0000

00

0000

00

1111111111111111111111

11

1

FDP(BH) = 0



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

00

000

000

0

000

0

0

1111111111111111111

11

111

FDP(BH) = 0.16



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0 0

000

0

000

00

0

1111111111111111111

111

1

1

1

FDP(BH) = 0.0833



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

00

00

000

0

1111111111111111111

1111

1

1

FDP(BH) = 0.08



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

000

00

00

0

0000

111111111111111

11111

11

1

1

1

FDP(BH) = 0.12



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0 000

0

0

000000

11111111111111111111

111

1

1FDP(BH) = 0.167



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

0

0

00

0

0

111111111111111111111

1

1

11

FDP(BH) = 0.0435



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

00

0

000

00

111111111111111

111111

111

1

FDP(BH) = 0



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0000

0000

0000

00

00

1111111111111111111

1

11

11

1

FDP(BH) = 0



0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0

0

0

00000

00000

111111111111111111111

111

1

FDP(BH) = 0.04


Benjamini and Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to

multiple testing

False Discovery Rate 411

procedure such that FDR !α. However, later work has blurred this distinction: there are methods thatgiven a rejection procedureR estimate the unknown quantity ERV=R.

ImpactIn many ways, Benjamini and Hochberg (1995) is a very successful paper. Its influence is clear from its 4967citations (according to the Web of Science at the time of this session), which are still on the rise each year ascan be seen in Fig. 1. Although 607 of these are in the area of statistics and probability, the majority of thesepublications are in the life sciences, from genetics to biochemistry, from oncology to plant sciences, reflect-ing in large part the use of FDR in microarray-related research. Importantly, citations in other high dimen-sional application areas, such as neural imaging, are on the rise also, showing its ability to be applied in manydiverse types of application. The list of the 10 highest cited papers that cite Benjamini and Hochberg (1995),which is shown in Table 1, is particularly interesting, because it includes six statistical papers, suggestingthat further theoretical and methodological developments of the method have had significant influence.

1996 1998 2000 2002 2004 2006 2008

Year

Num

ber

of c

itatio

ns0

200

400

600

800

1000

1200

Fig. 1. Rapidly increasing number of citations of Benjamini and Hochberg (1995), suggesting that its influ-ence has not yet reached its peak (note that the figure for 2009 is only partially shown)

Table 1. 10 most cited papers that cite Benjamini andHochberg (1995)

Rank Article citing Benjamini Number ofand Hochberg (1995) citations

1 Tusher et al. (2001) 37232 Storey and Tibshirani (2003) 14123 Weisberg et al. (2003) 11874 Genovese et al. (2002) 10205 Storey (2002) 7266 Wilkinson (1999) 6527 Benjamini and Yekutieli (2001) 5848 Wacholder et al. (2004) 4869 Patti et al. (2003) 479

10 Dudoit et al. (2002) 459

[Benjamini (2010,JRSSB)]

now > 20,000 citations on google scholar !


1 Introduction



E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 24 / 33

Why should FDR thresholding be adaptive to sparsity?

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20



0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0000

0

00

0

0000000

0000

0

00000

000

00

00

11



0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0

00

0

0

00

00

0

0

0

000

1111111111111111

111111

11111

111

11111111

111

11

11

1


[Linnemann] - increasing signal strength


Adaptation to unknown sparsity

t seems "adaptive" to the “quantity" of signal in the data

I Classification : where is the signal ?[Bogdan et al. (2011)], [Neuvial and R. (2012)]

I Detection: is there some signal ?[Ingster (2002)], [Donoho and Jin (2004)], etc

I Estimation: what is the value EX of the signal ?

EX = Xi 1{|Xi | ≥ t} (hard thresholding)

[Abramovich et al. (2006)], [Donoho and Jin (2006)]


Classification

Xi ∼ π0,m N (0,1) + (1− π0,m)N (µm,1),1 ≤ i ≤ m, i.i.d.

but π0,m → 1 (sparse) and µm →∞ (compensates sparsity).

I training set = null distribution known (one-class classification)

I Classification rule hm : R→ {0,1};I Risk

Rm(hm) = (1− π0)−1E(

m−1m∑

i=1

1{hm(Xi ) 6= Hi}).

I Classification boundary in (sparsity, signal) space such that

Above the boundary, ∃hm : Rm(hm)→ 0 (perfect classification)

Under the boundary, ∀hm,Rm(hm)→ 1 (unclassifiable)


Classification boundary

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

β

r

Perfect classification

Unclassifiable

µm =√

2r log m

π0,m = 1−m−β

BH hBHm (x) = 1{x ≥ Φ

−1(t)}

with αm ∝ (log m)−1/2

I Classification boundaryattained by BH.

On the boundary :risk BH ∼ Bayes risk.

[Bogdan et al. (2011)], [Neuvial and R. (2012)]


Detection : is there some signal ?

Same model

Xi ∼ π0,m N (0,1) + (1− π0,m)N (µm,1),1 ≤ i ≤ m, i.i.d.

but π0,m → 1 (sparse) and µm →∞ (compensates sparsity).

I Test H0 : “N (0, Im)" against H1 : “mixture".I Risk Rm(T ) = PH0 (T (X ) = 1) + PH1 (T (X ) = 0)

I Detection boundary in (sparsity, signal) space such that

Above the boundary, ∃T : Rm(T )→ 0 (perfect detection)Under the boundary, ∀T ,Rm(T )→ 1 (undetectable)


Detection boundary

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

β

r

Perfect classification

Perfect detection

Undetectable

µm =√

2r log m

π0,m = 1−m−β

T BH = 1{∃i : p(i) ≤ αmi/m}with αm ∝ (log m)−1/2

I Detection boundaryattained by BH whenβ ∈ (3/4,1)

I Better to use “highercriticism"

maxi

{√

mi/m − p(i)√p(i)(1− p(i))

}


LASSO and FDR

Regression with orthogonal design:

X ∼ N (β, Im)

[Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE)

β = arg minβ∈Rm

{12||X − β||2 +

m∑k=1

λk |β|(k)

}

where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m)

Selection with {i : βi 6= 0}:

I λk = λ = Φ−1

(α/(2m)) '√

2 log m Bonferroni

I λk = Φ−1

(αk/(2m)) '√

2 log(m/k) BH !


Outlook

Some conclusions for FDR

⊕ Very simple⊕ Trade-off type I / power⊕ Adaptive to sparsity

Some issues

! Sensitive to null hypothesis! Choosing α! Calibrating test statistics

Main challenge

What about dependence ?


false discovery rate part i : introduction et enjeux

Documents