false discovery control in spatial multiple testingreich/talks/fdr_slides_reich.pdf · false...

21
Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results False Discovery Control in Spatial Multiple Testing W Sun 1 , B Reich 2 , T Cai 3 , M Guindani 4 , and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University of Pennsylvania 4 MD Anderson

Upload: others

Post on 25-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

False Discovery Control in Spatial Multiple Testing

W Sun1, B Reich2, T Cai3, M Guindani4, and A. Schwartzman2

WNAR, June, 2012

1 University of Southern California2 North Carolina State University3 University of Pennsylvania4 MD Anderson

Page 2: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Spatial modeling of time trends in tropospheric ozone

The EPA uses a monitoring network to regulate ozone.

Our objective it to identify areas with changing ozone.

Other examples of spatial multiple testing: climate change,disease monitoring, neuroimaging, etc.

−90 −85 −80 −75 −70

2530

3540

45

Longitude

Latitu

de

−3

−2

−1

0

1

2

3

−90 −85 −80 −75 −70

2530

3540

45

Longitude

Latitu

de

−4

−2

0

2

(a) First stage estimates, β̂(s) (b) First stage z-scores, z(s) = β̂(s)/w(s)

Page 3: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

New issues in spatial multiple testing

One only observes data points at a discrete subset of thelocations but needs to make inference everywhere in thespatial domain.

A finite approximation strategy is needed for inference in acontinuous spatial domain – otherwise an uncountable numberof tests needs to be conducted, which is impossible in practice.

It is desirable to aggregate information from nearby locationsto make cluster-wise inference, and to incorporateimportant spatial variables in the decision-making process.

Page 4: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Gaussian random field model

Let XXX = {X (s) : s ∈ S} be a random field on a spatialdomain S :

X (s) = μ(s) + ε(s), (1)

where μ(s) is the unobserved random process and ε(s) is thenoise process.

An important special case is the Gaussian random field modelwhere the signals and errors are Gaussian processes withmeans μ̄ and 0, and covariance functions ρ1 and ρ2,respectively.

Let Θ denote the collection of all parameters in model (1).

Page 5: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Notation

Hypotheses

H0(s) : μ(s) ∈ A versus H1(s) : μ(s) ∈ Ac

A is the indifference region, e.g. A = {μ : μ ≤ μ0}True states

θ(s) = 0 if H0(s) and θ(s) = 1 if H1(s)

Null area: S0 = {s ∈ S : θ(s) = 0}Non-null area S1 = {s ∈ S : θ(s) = 1}

Decisions

δ(s) = 1 if reject and δ(s) = 0 otherwise

Rejection region, R = {s ∈ S : δ(s) = 1}Error regions

False positive area: SFP = {s ∈ S : θ(s) = 0, δ(s) = 1}False negative area: SFN = {s ∈ S : θ(s) = 1, δ(s) = 0}

Page 6: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

False discovery measures

The key quantity to control is the false discovery proportion,

FDP =ν(SFP)

ν(R)I{ν(R) > s0},

where s0 is a small positive value.

FDR: Typically, false discovery rate is controlled so thatFDR = E (FDP) < α.

FDX: We might also want to be reasonably confident theFDP is less than some value, say τ ∈ (0, 1). Therefore, wealso consider the false discovery exceedance,FDXτ = P(FDP > τ) < α.

MDR: The power of a multiple testing procedure issummarized by the missed discovery rate, MDR = E{ν(SFN)}.

Page 7: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Compound decision theory for spatial multiple testing

Under mild conditions, the multiple testing problem isequivalent to a weighted classification problem

L(θθθ,δδδ) = λν(SFP) + ν(SFN).

There is a one-to-one relationship between λ and α.

The optimal rule is to reject if the posterior probability of thenull is smaller than a threshold t, i.e., δ(s) = I [TOR(s) < t],where TOR(s) = PΘ{θ(s) = 0|XXX}.

The threshold t is taken to be as large as possible (to increasepower) while still maintaining FDR(t) ≤ α.

FDR(t) is unknown and must be approximated.

Page 8: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Discrete approximation

This problem boils down to estimating FDR(t), which isdifficult because it is an integral over potentiallyinfinitely-many tests (spatial locations).

Let ∪mi=1Si be a fine partition of S . Take a point si from each

Si .

Compute the probability of the null, TOR(si ).

Let {T (i)OR : i = 1, · · · ,m} be the ordered oracle statistics and

S(i) the region corresponding to T(i)OR .

Page 9: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

FDR control

Procedure

Define Rj = ∪ji=1S(i) and

r = max

{j : ν(Rj)

−1j∑

i=1

T(i)ORν(S(i)) ≤ α

}.

The rejection area is given by R = ∪ri=1S(i).

Therefore, we assume the decision is constant within pixels andapproximate the FDR as the sum of posterior probabilities of thenull over the pixels in the rejection region.

Page 10: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

FDX control

Similar to FDR control, we control FDX by:

Procedure

DefineFDXm

τ,j = PΘ

(I{ν(Rj )>0}

ν(Rj )

∑si∈Rm

j{1− θ(si)}ν(Si ) > τ − ε0|XXX

)and r = max{j : FDXm

τ,j ≤ α}.

Then the rejection region is given by R = ∪ri=1S(i).

Page 11: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

The decision process on a continuous spatial domain can bedescribed, within a small margin of error, by a finite number ofdecisions on a grid of pixels.

Theorem

Under conditions on random field and partition

(a) The FDR level of the FDR procedure satisfiesFDR ≤ α+ o(1) when m → ∞.

(b) The FDX level of the FDX procedure at tolerancelevel τ satisfies FDX ≤ α+ o(1) when m → ∞.

Page 12: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Computational Algorithms

The numerical methods for model fitting and parameterestimation in spatial models have been extensively studied in aBayesian computational framework.

Suppose the MCMC samples are {μμμb : b = 1, · · · ,B}, whereμμμb = [μb(s1), · · · , μb(sm)] is a sample b.

Let θb(si ) = I [μb(si) ∈ Ac ].

TOR(si ) can be estimated by

T̂OR(si ) =1

B

B∑b=1

[1− θb(si)].

Page 13: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Simulation setting

Generate data from the model x(s) = μ(s) + ε(s).

Both the signals and errors are generated as Gaussianprocesses.

The signal process μ has mean μ̄ and exponential covarianceCov[μ(s), μ(s ′)] = σ2

μ exp[−||s − s ′||/ρμ].

The error process ε has mean zero and covarianceCov[ε(s), ε(s ′)] = (1− r)I (s = s ′) + r exp[−||s − s ′||/ρε].

Choose n = 1000, r = 0.9, μ̄ = −1, and σμ = 2. Theexpected proportion of positive observations is 0.31.

Page 14: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Models

We compare seven methods:

Non-spatial approach of Benjamini and Hochberg FDR (BH)

Non-spatial approach of Genovese and Wasserman (GW)

Spatial approach of Pacifico et al (PGVW) FDR and FDX

Oracle (our approach with hyperparameters fixed at truevalues) FDR and FDX

MC (our approach with hyperparameters estimated usingMCMC) FDR and FDX

Uninformative priors:

μ̄ ∼ N(0, 1002)

σ−2μ ∼ Gamma(0.1, 0.1)

r , ρμ, ρε ∼ Uniform(0, 1).

Page 15: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

False discovery rate (target is α = 0.1) by ρμ

0.00 0.05 0.10 0.15 0.20

0.00

0.05

0.10

0.15

0.20

Spatial range

Mea

n FD

PBHGWOracle FDROracle FDX

MC FDRMC FDXPGVW FDRPGVW FDX

Page 16: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Distribution of FDP by ρμ

0.00 0.05 0.10 0.15 0.20

0.00

0.05

0.10

0.15

0.20

0.25

Spatial range

Dist

ribut

ion

of F

DPOracle FDROracle FDXMC FDRMC FDX

Page 17: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Missed discovery rate by ρμ

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Spatial range

MDR

Page 18: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Summary of simulation

The oracle FDR procedure controls the FDR nearly perfectly.

The MC FDR procedure with uninformative priors has goodFDR control.

FDX methods are more conservative than the FDR methods.

The BH, GW, and PGVW are very conservative.

MDR levels of the oracle and MC methods are much lower.

Page 19: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Ozone data analysis

−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.0

0.2

0.4

0.6

0.8

1.0

(a) Posterior mean of μ(s) (b) Posterior prob μ(s) < −0.1

Page 20: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Rejection region (black) using the FDX rule

0.0

0.2

0.4

0.6

0.8

1.0

Page 21: False Discovery Control in Spatial Multiple Testingreich/talks/FDR_slides_Reich.pdf · False Discovery Control in Spatial Multiple Testing WSun1,BReich2,TCai3, M Guindani4, and A

Introduction False discovery measures False discovery controlling procedures Computation & Numerical Results

Summary

Convention (e.g., Benjamini and Yekutieli (2001) and Sarkar(2002)) is that it is safe to apply standard methods as if thetests were independent.

While standard methods control FDR, incorporating theunderlying dependency structure can dramatically improve thepower.

A continuous decision process can be described, within a smallmargin of error, by a finite number of decisions on a grid ofpixels.

FDR and FDX controlling problems can be solved in a unifiedtheoretical and computational framework.

We have also extended this to deal with spatial clusters.