statistical methods in lecture 7.2: functional mr...

11
1 Statistical Methods in functional MRI Martin Lindquist Department of Biostatistics Johns Hopkins University Lecture 7.2: Multiple Comparisons 04/25/13 Issues with FWER Methods that control the FWER (Bonferroni, RFT, Permutation Tests) provide a strong control over the number of false positives. While this is appealing, the resulting thresholds often lead to tests that suffer from low power. Power is critical in fMRI applications because the most interesting effects are usually at the edge of detection. False Discovery Rate • The false discovery rate (FDR) is a recent development in multiple comparison problems due to Benjamini and Hochberg (1995). While the FWER controls the probability of any false positives, the FDR controls the proportion of false positives among all rejected tests. Suppose we perform tests on m voxels. U, V, T and S are unobservable random variables. R is an observable random variable. Declared Inactive Declared Active Truly inactive U V m 0 Truly active T S m-m 0 m-R R m Notation Definitions In this notation: False discovery rate: The FDR is defined to be 0 if R=0. = R V E FDR ( ) 1 = V P FWER

Upload: nguyenkhue

Post on 30-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

1

Statistical Methods in functional MRI

Martin Lindquist Department of Biostatistics Johns Hopkins University

Lecture 7.2: Multiple Comparisons

04/25/13

Issues with FWER

•  Methods that control the FWER (Bonferroni, RFT, Permutation Tests) provide a strong control over the number of false positives.

•  While this is appealing, the resulting thresholds often lead to tests that suffer from low power.

•  Power is critical in fMRI applications because the most interesting effects are usually at the edge of detection.

False Discovery Rate

•  The false discovery rate (FDR) is a recent development in multiple comparison problems due to Benjamini and Hochberg (1995).

•  While the FWER controls the probability of any

false positives, the FDR controls the proportion of false positives among all rejected tests.

Suppose we perform tests on m voxels.

U, V, T and S are unobservable random variables. R is an observable random variable.

Declared Inactive

Declared Active

Truly inactive U V m0

Truly active T S m-m0

m-R R m

Notation Definitions

•  In this notation:

•  False discovery rate:

•  The FDR is defined to be 0 if R=0.

⎟⎠

⎞⎜⎝

⎛=RVEFDR

( )1≥= VPFWER

2

Properties

•  A procedure controlling the FDR ensures that on average the FDR is no bigger than a pre-specified rate q which lies between 0 and 1.

•  However, for any given data set the FDR need not be below the bound.

•  An FDR-controlling technique guarantee controls of the FDR in the sense that FDR ≤ q.

BH Procedure

1.  Select desired limit q on FDR (e.g., 0.05)

2.  Rank p-values, p(1) ≤ p(2) ≤ ... ≤ p(m)

3.  Let r be largest i such that

4.  Reject all hypotheses corresponding to p(1), ... , p(r).

p(i) ≤ i/m × q p(i)

p-va

lue

0 1

0 1

i/m × q

The BH procedure is adaptive in the sense that the larger the signal, the lower the threshold.

qm

Low signal

q

High signal

Comments

•  If all null hypothesis are true, the FDR is equivalent to the FWER.

•  Any procedure that controls the FWER also controls the FDR. A procedure that controls the FDR only can be less stringent and lead to a gain in power.

•  Since FDR controlling procedures work only on the p-values and not on the actual test statistics, it can be applied to any valid statistical test.

Example Signal

Signal + Noise

Noise

+

=

α=0.10, No correction

Percentage of false positives 0.0974 0.1008 0.1029 0.0988 0.0968 0.0993 0.0976 0.0956 0.1022 0.0965

FWER control at 10%

Occurrence of false positive FWER

FDR control at 10%

Percentage of active voxels that are false positives 0.0871 0.0952 0.0790 0.0908 0.0761 0.1090 0.0851 0.0894 0.1020 0.0992

3

Uncorrected Thresholds

•  Most published PET and fMRI studies use arbitrary uncorrected thresholds (e.g., p<0.001). –  With available sample sizes, corrected thresholds are so

stringent that power is extremely low.

•  Using uncorrected thresholds is problematic when interpreting conclusions from individual studies, as many activated regions may be false positives.

•  Null findings are hard to disseminate, hence it is difficult to refute false positives established in the literature.

Extent Threshold

•  Sometimes an arbitrary extent threshold is used when reporting results.

•  Here a voxel is only deemed truly active if it belongs to a cluster of k contiguous active voxels (e.g., p<0.001, 10 contingent voxels).

•  Unfortunately, this does not necessarily correct the problem because imaging data are spatially smooth and therefore false positives may appear in clusters.

•  Activation maps with spatially correlated noise thresholded at three different significance levels. Due to the smoothness, the false-positive activation are contiguous regions of multiple voxels.

α=0.10 α=0.01 α=0.001

Note: All images smoothed with FWHM=12mm

Example

α=0.10 α=0.01 α=0.001

Note: All images smoothed with FWHM=12mm

•  Similar activation maps using null data.

Example

Lecture 8: Functional Connectivity

04/25/13

4

Data Processing Pipeline

Preprocessing

Data Analysis

Data Acquisition

Slice-time Correction

Motion Correction, Co-registration & Normalization

Spatial Smoothing

Localizing Brain Activity

Connectivity

Prediction

Reconstruction

Experimental Design

Brain Networks •  It has become common practice to talk about brain

networks, i.e. sets of interconnected brain regions with information transfer among regions.

•  To construct a network: –  Define a set of nodes (e.g., ROIs) –  Estimate the set of connections, or edges, between the

nodes.

A

B

C 0 1 0 0 0 1 0 1 0

A B C

A B C

Network Methods •  A number of methods have been suggested in

the neuroimaging literature to quantify the relationship between nodes/regions.

•  Their appropriateness depend upon:

–  what type of conclusions one is interested in making;

–  what type of assumptions one is willing to make;

–  and the level of the analysis and modality.

Brain Connectivity •  Functional Connectivity

–  Undirected association between two or more fMRI time series.

–  Makes statements about the structure of relationships among brain regions.

DLPFC

MTG

dACC VMPFC

Brain Connectivity •  Effective Connectivity

–  Directed influence of one brain region on the physiological activity recorded in other brain regions.

–  Makes statements about causal effects among tasks and regions.

V1

V5

PPC

Functional Connectivity •  Methods:

–  Seed analysis –  Inverse covariance methods –  Multivariate decomposition methods

§  Principle Components Analysis §  Independent Components Analysis §  Partial Least Squares

–  Mediation analysis

–  Psychophysiological interaction (PPI) analysis

5

Effective Connectivity •  Methods:

–  Structural Equation Modeling

–  Granger Causality

–  Dynamic Causal Modeling –  Bayes Net

–  Mediation analysis –  Psychophysiological interaction (PPI) analysis

Levels of Analysis •  Functional connectivity can be applied at

different levels of analysis, with different interpretations at each.

•  Connectivity across time can reveal networks that are dynamically activated across time.

•  Connectivity across trials can identify coherent networks of task related activations.

Levels of Analysis •  Connectivity across subjects can reveal

patterns of coherent individual differences.

•  Connectivity across studies can reveal tendencies for studies to co-activate within sets of regions.

Bivariate Connectivity

•  Simple functional connectivity –  Region A is correlated with Region B.

–  Provides information about relationships among regions.

–  Can be performed on time series data within a subject, or individual differences (contrast maps, one per subject).

A B

Time Series Connectivity •  Calculate the cross-correlation between time

series from two separate brain regions.

Region 1 Region 2

Subject 1

Subject 2

Subject n

Group Analysis

r Z

r Z

r Z

Seed Analysis •  In seed analysis the cross-correlation is

computed between the time course from a predetermined region (seed region) and all other voxels.

•  This allows researchers to find regions correlated with the activity in the seed region.

•  The seed time course can also be a performance or physiological variable

6

Correlations between brain activity and heart-rate

Time (TRs, 2 s)

Average within-subject correlation (r)

Threshold: p < .005

VMPFC

Issues •  One of the main problems with time series

connectivity is the fact that there may be different hemodynamic lags in different regions:

–  Time series from different regions may not match up, even if neural activity patterns match up.

–  If lags are estimated from data, temporal order may be caused by vascular (uninteresting) or neural (interesting) response.

Beta Series •  The beta series approach can be used to

minimize issues of inter-region neurovascular coupling.

•  Procedure:

–  Fit a GLM to obtain separate parameter estimates for each individual trial.

–  Compute the correlation between these estimates across voxels.

Beta Series Region 1 Region 2

Subject 1

Subject 2

Subject n

Group Analysis

r Z

r Z

r Z

Individual Differences

……

..

Subject Contrast Image

1

2

N

Seed Value

1x

2x

Nx

Group Results

Partial Correlation

•  Partial Correlation

–  Correlation between two regions, after the effect of all other regions have been removed.

–  Helps protect against ‘illusory’ correlations between regions (e.g., A and C uncorrelated after controlling for B).

A C

B

7

Inverse Covariance Methods •  For multivariate normal data there exists a duality

between the inverse covariance (precision) matrix and the graph representing relationships between regions.

–  Conditional independence between variables (regions) corresponds to zero entries in the precision matrix.

–  Graphical lasso (GLASSO) can be used to estimate sparse precision matrices and graphs.

A C

B 0

0 =Σ−1

Mediation

•  Mediation –  The relationship between regions A and B is mediated by M –  Can identify functional pathways spanning > 2 regions –  Can be performed on time series data within a subject, or

individual differences (contrast maps, one per subject)

–  Also: Test of whether task-related activations in B are mediated, or explained, by M.

A B M

Task B M

Demonstrating Mediation

x y

m a b

c’ x y

c

Full model, with mediator Reduced model, without mediator

m = im + ax + em y = iy + bm + c'x + e’y

y = iy’ + cx + ey

Decomposition of Effects •  The mediation framework allows us to

decompose the total effect of x on y as follows:

•  Does m explain some of the x-y relationship?

–  Test c – c’, which is equivalent to significance of the ab product.

–  Sobel test or bootstrap test.

c = c' + ab Total effect = Direct effect + Mediated effect

X

M

Y ( )nxxx …,, 21 ( )nyyy …,, 21

( ))(),(),( 21 tmtmtm n…

)(tα )(tβ

Total:

Direct: 'γ

γ

)()()( , txttm miii εα +=

)(')()( ,0

txdssmsy yii

T

ii εγα ++= ∫

)(, txy xiii εγ +=

∫+=T

dsss0

)()(' βαγγ

Functional Mediation Pain Data

α pathway function

Temp Rating

Brain Response

αβ pathway function

β pathway function

Activation in the right anterior insula mediates the relationship between temperature and pain rating.

The key time interval driving the mediation is between 14-24 seconds following activation.

8

Moderation

•  Moderation –  The relationship between regions A and B is moderated by M –  Connectivity between A and B depends on state (level) of M –  Can be performed on time series data within a subject, or

individual differences (contrast maps, one per subject) –  M can be task state or other variable

•  In SPM, on time series data: “Psychophysiological interaction” (PPI).

M

B

A •  In the psychophysiological interaction (PPI) approach, the standard GLM model is supplemented with additional regressors that model the interaction between the task and the time course in a seed region.

εββββ ++++= XRRXY *3210

Task

Time course from seed region

Interaction term

PPI

•  PPI can be used to determine whether the correlation between two brain areas is altered by different psychological contexts.

•  The interaction term reflects the modulation of the slope of the linear relationship with the seed voxel depending on the variable used to create the interaction.

PPI Decomposition Methods •  We often use multivariate decomposition

methods to study functional connectivity. –  Provides a decomposition of the data into separate

components. –  Can be used to find coherent brain networks. –  Provides information on how different brain regions

interact with one another.

•  The most common decomposition methods are principal components analysis and independent components analysis.

Voxels

Tim

e

X

•  Throughout we organize the fMRI data in a T×N matrix X. –  The row dimension is the number of time points and

the column dimension the number of voxels.

Data Organization Principal Components Analysis

•  Principal components analysis involves finding spatial modes, or eigenimages, in the data. –  These are the patterns that account for most of the

variance-covariance structure in the data. –  They are ranked in order of the amount of variation they

explain.

•  The eigenimages can be obtained using singular value decomposition (SVD), which decomposes the data into two sets of orthogonal vectors that correspond to patterns in space and time.

9

Using SVD, we can decompose the matrix X as:

TUSVX =

where U and V are unitary orthogonal matrices and S is a diagonal matrix consisting of ranked singular values.

Each column of V defines a distributed brain region that can be displayed as an image (eigenimages).

Each column of U correspond to the time-dependent profiles associated with each eigenimage.

TUSVX =

Voxels

Tim

e

=

Eigenimages Time courses

TNNN

TT sss vuvuvuX +++= …222111

APPROX. OF Y

s1 + APPROX. OF Y

s2 + ... = u1

v1T

u2

v2T

Worsley

Independent Components Analysis

•  Independent Components Analysis (ICA) is a family of techniques used to extract independent signals from some source signal.

•  ICA provides a method to blindly separate the data into spatially independent components.

•  The key assumption is that the data set consists of p spatially independent components, which are linearly mixed and spatially fixed.

Two people are talking simultaneously in a room with two microphones.

Cocktail Party Problem

Speakers: s1(t) and s2(t). Microphones: x1(t) and x2(t)

)()()()()()(

2221212

2121111

tsatsatxtsatsatx

+=

+= ASX =→

Mixing matrix Source matrix

10

Assumptions

•  If the mixing matrix is known, the problem is straight forward.

•  However, ICA solves this problem without knowing the mixing parameters.

•  Instead it exploits some key assumptions:

–  Linear mixing of sources.

–  The components si are statistically independent.

–  The components si are non-Gaussian.

ICA Estimation

•  We can find the independent components using a variety of different approaches. –  Maximizing non-Gaussianity –  Minimizing the mutual information –  Maximum likelihood estimation –  Projection pursuit

ICA for fMRI

•  It is assumed that the fMRI data can be modeled by identifying sets of voxels whose activity both vary together over time and are different from the activity in other sets.

•  Decompose the data set into a set of spatially independent component maps with a set of corresponding time-courses.

ICA for fMRI

+

A[ ]1 2Ts s=s

fMRI data

fMRI data is assumed to be a linear mixture of statistically independent sources, s.

×

×

Source 1

Source 2

Time course 1

Time course 2 Vince Calhoun

×

×

ICA for fMRI

•  We seek to decompose X as follows:

where the matrix S contains statistically independent maps in its rows each with an internally consistent time-course contained in the associated column of the mixing matrix A.

ASX =

Voxels

Tim

e

=

Mixing Matrix

Components Data

Spatially independent Components

Time Courses

ASX =

Use an ICA algorithm to find A and S.

Overview

11

Comments •  Unlike PCA which assumes an orthonormality

constraint, ICA assumes statistical independence among a collection of spatial patterns.

•  Independence is a stronger requirement than orthonormality.

•  However, in ICA the spatially independent components are not ranked in order of importance as they are when performing PCA.

Types of ICA •  An ICA that decomposes the original data into

spatially statistically independent components is called spatial ICA (sICA).

•  It is possible to switch the order and make the temporal dynamics independent. This is called temporal ICA (tICA).

•  Spatial ICA is more common in fMRI data analysis.

McKeown, et. al.

Multi-subject Analysis •  Using ICA to analyze fMRI data from multiple

subjects raises several questions. –  How should components be combined across subjects? –  How should the final results be thresholded and/or presented?

•  There are several approaches: –  Stack time courses (forces time courses to be the same) –  Stack images and back-reconstruct (allows time courses to

vary, allows some flexibility in images) –  Stack into a cube (forces images and time courses to be the

same)

Group ICA •  Group ICA is based on temporal concatenation.

•  It decomposes the group matrix, and estimates through back-reconstruction the spatial weights for each subject for a component of interest.

•  For each subject the spatial weights at each voxel are treated as random variables, and a t-test is used to test whether that voxel loaded significantly on that component in the group.

Group ICA

X

Subject 1

Subject N

Data

A S_agg

ICA

A1

AN

= ×

Subject i

Back-reconstruction

×1−

=Ai Si

= ×

× = -1