group analysis with four-way anova in afni gang chen, ziad s. saad, robert w. cox scientific and...

1
Group Analysis with Four-Way ANOVA in AFNI Gang Chen, Ziad S. Saad, Robert W. Cox Scientific and Statistical Computing Core, National Institute of Mental Health National Institutes of Health, Department of Health and Human Services, USA Introduction Experimental designs with FMRI are increasingly requiring more factors in group analysis, thus impelling the creation of a four-way ANOVA program in AFNI. With potential expansion to a program capable of running ANCOVA and unbalanced designs, the four-way ANOVA for AFNI datasets is currently implemented in Matlab by converting factors into dummy variables. QR decomposition is used to solve the normal equations of the general linear system. Five design types (fixed/random and crossed/nested) are embedded in the program, allowing for the user to analyze most typical experiments. We present a streamlined way of running ANOVA in which information requested for the four-way analysis is straightforward and saved for the user’s records. Runtime for a typical four-way ANOVA is usually about half an hour. Theory and Numerical Considerations Four-Way ANOVA Table (B F C F D R (A F )) 1. Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. (1996), Allied Linear Statistical Models, Fourth Edition, McGraw- Hill. 2. Keppel, G., and Wickens, T. (2004), Design and Analysis. A Research Handbook (4 th Ed.), Prentice Hall. Group analysis is a critical stage in FMRI analysis when the investigator makes some generalization about the conditions/stimuli or their comparisons from single subject to population level. Such a step usually involves the analysis of variance (ANOVA) with various categorizations of stimulus by treating subjects as a random factor. Previously one-, two-, and three-way ANOVAs were implemented in AFNI in C as three separate programs by calculating various sums of squares and t/F statistics. Until recently, these programs met the needs of the users. However, contrast tests among second-order and above terms were not available in three-way ANOVA due to its complications in computation. More importantly, as investigations get more complicated and refined, higher numbers of stimulus categorization are involved in the analysis at group level, and thus a four-way ANOVA in AFNI became highly desirable. Other than the numbers of stimulus categorization, concomitant variables (covariates) and unbalanced design or missing data are very typically encountered in FMRI group analysis. With these considerations in mind, a general linear model approach was adopted by coding factor levels into Source of Variati on F Statistic Distribution A MSA/MSD(A) F(a-1, a(d-1)) B MSB/MSBD(A) F(b-1, a(b-1)(d-1)) C MSC/MSCD(A) F(c-1, a(c-1)(d-1)) D(A) MSD(A)/MSE F(a(d-1), abcd(n-1)) AB MSAB/MSBD(A) F((a-1)(b-1), a(b-1)(d- 1)) AC MSAC/MSCD(A) F((a-1)(c-1), a(c-1)(d- 1)) BC MSBC/MSBCD(A) F((b-1)(c-1), a(b-1)(c-1) (d-1)) BD(A) MSBD(A)/MSE F(a(b-1)(d-1), abcd(n-1)) CD(A) MSCD(A)/MSE F(a(c-1)(d-1), abcd(n-1)) ABC MSABC/MSBCD(A) F((a-1)(b-1)(c-1), a(b-1) (c-1)(d-1)) BCD(A) MSBCD(A)/MSE F(a(b-1)(c-1)(d-1), abcd(n-1) We start with a prototype of four-way ANOVA with a basic design of AXBXCXD with all factors fixed. The corresponding cell means model leads to a general linear model y = X + where y is an n 1 vector of the observation values, X is the cell means design n m matrix, is the m 1 regression coefficient vector, and the random error n 1 vector. This leads to solving the normal equations for ordinary least squares estimation X'X = X'y Due to the coding with dummy variables, design matrix X is rank deficit as rank(X)<m. In the meantime, a constraints matrix C is defined based on all the factors and their various interactions. The numerical calculations are done through the following basic steps: (1) QR decomposition of constraints matrix C CE c = Q c R c E c is a permutation matrix so that diag(R c ) is decreasing. (2) Projection of the design matrix X into the null space Q c0 of the constraints matrix C, which is composed of those rows of Q c corresponding to the diagonal zeros in R c X p = XQ c0 (3) QR decomposition of X p XpEd = QdRd Again Ed is a permutation matrix so that diag(Rd) is decreasing. (4) The degrees of freedom (df) and sum of squares (SS) df = rank(Q d ) SS = ║ŷ2 = ║Qd' y2 The above steps apply to computing random error and all ANOVA terms (main effects and interactions) as well. Other design types are also based on this basic algorithm. As a demonstration, we assume a four-way ANOVA with a design of BC D(A), and Sample Dialog: Questions and Answers How many factors? 4 Choose design type (0, 1, 2, 3, 4, 5, ...): 2 How many slices along the Z axis? 40 Label for No. 1 factor: MD How many levels does factor A (MD) have? 2 Label for No. 1 level of factor A (MD) is: VI1 Label for No. 2 level of factor A (MD) is: AU …… Label for No. 4 factor: SJ How many levels does factor D (SJ) have? 12 Label for No. 1 level of factor D (SJ) is: S1 …… There should be totally 96 input files. Correct? (1 - Yes; 0 - No) 1 (1) factor combination: factor A (MD) at level 2 (VI1) factor B (FB) at level 2 (NW) factor C (CG) at level 1 (AN) factor D (SJ) at level 12 (S1) is: ss15.a_sound.irf.mean+tlrc.BRIK …… How many 2nd-roder contrasts? (0 if none) 7 Label for 2nd order contrast No. 1: is: vis_avt How many terms are involved? 2 Factor index for No. 1 term is (e.g., 0120): 1010 Corresponding coefficient (i.e., 1 or -1): 1 Factor index for No. 2 term is (e.g., 0120): 1020 Corresponding coefficient (i.e., 1 or -1): -1 …… References Five Design Types of Four-Way ANOVA A F B F C F D F All factors fixed; Fully crossed A,B,C,D=stimulus category, drug treatment, etc. All combinations of subjects and factors exist; Multiple subjects: treated as repeated measures; One subject: longitudinal analysis A F B F C F D R Last factor random; fully crossed A,B,C=stimulus category, etc. D=subjects, typically treated as random (more powerful than treating them as repeats) Good for an experiment where each fixed factor applies to all subjects; B F C F D R (A F ) Last factor random, and nested within the first (fixed) factor A=subject class: genotype, sex, or disease B,C=stimulus category, etc. D=subjects nested within A levels B F C R D F (A F ) Third factor random; fourth factor fixed and nested within the first (fixed) A=stimulus type (e.g., repetition number) B=another stimulus category (e.g., animal/tool) C=subjects D=stimulus subtype (e.g. Software Implementation Four-way ANOVA with an unbalanced design (unequal sample size) and with covariates (ANCOVA) are currently under development. The package for four-way ANOVA can be downloaded from the AFNI website: http://afni.nimh.nih.gov/sscc/gangc

Upload: nora-gibson

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Group Analysis with Four-Way ANOVA in AFNI Gang Chen, Ziad S. Saad, Robert W. Cox Scientific and Statistical Computing Core, National Institute of Mental

Group Analysis with Four-Way ANOVA in AFNI

Gang Chen, Ziad S. Saad, Robert W. CoxScientific and Statistical Computing Core, National Institute of Mental Health

National Institutes of Health, Department of Health and Human Services, USA

Introduction

Experimental designs with FMRI are increasingly requiring more factors in group analysis, thus impelling the creation of a four-way ANOVA program in AFNI.

With potential expansion to a program capable of running ANCOVA and unbalanced designs, the four-way ANOVA for AFNI datasets is currently implemented in Matlab by converting factors into dummy variables. QR decomposition is used to solve the normal equations of the general linear system. Five design types (fixed/random and crossed/nested) are embedded in the program, allowing for the user to analyze most typical experiments.

We present a streamlined way of running ANOVA in which information requested for the four-way analysis is straightforward and saved for the user’s records. Runtime for a typical four-way ANOVA is usually about half an hour.

Theory and Numerical Considerations

Four-Way ANOVA Table (BFCFDR(AF))

1. Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. (1996), Allied Linear Statistical Models, Fourth Edition, McGraw-Hill.

2. Keppel, G., and Wickens, T. (2004), Design and Analysis. A Research Handbook (4th Ed.), Prentice Hall.

Group analysis is a critical stage in FMRI analysis when the investigator makes some generalization about the conditions/stimuli or their comparisons from single subject to population level. Such a step usually involves the analysis of variance (ANOVA) with various categorizations of stimulus by treating subjects as a random factor. Previously one-, two-, and three-way ANOVAs were implemented in AFNI in C as three separate programs by calculating various sums of squares and t/F statistics. Until recently, these programs met the needs of the users. However, contrast tests among second-order and above terms were not available in three-way ANOVA due to its complications in computation. More importantly, as investigations get more complicated and refined, higher numbers of stimulus categorization are involved in the analysis at group level, and thus a four-way ANOVA in AFNI became highly desirable.

Other than the numbers of stimulus categorization, concomitant variables (covariates) and unbalanced design or missing data are very typically encountered in FMRI group analysis. With these considerations in mind, a general linear model approach was adopted by coding factor levels into values of dummy variables. Numerical computation is not done through indexing terms as in previous ANOVA programs; instead the QR decomposition of the design matrix is used to project each term onto its corresponding subspace, and to obtain various sums of squares for all possible terms.

Source of Variation

F Statistic Distribution

A MSA/MSD(A) F(a-1, a(d-1))

B MSB/MSBD(A) F(b-1, a(b-1)(d-1))

C MSC/MSCD(A) F(c-1, a(c-1)(d-1))

D(A) MSD(A)/MSE F(a(d-1), abcd(n-1))

AB MSAB/MSBD(A) F((a-1)(b-1), a(b-1)(d-1))

AC MSAC/MSCD(A) F((a-1)(c-1), a(c-1)(d-1))

BC MSBC/MSBCD(A) F((b-1)(c-1), a(b-1)(c-1)(d-1))

BD(A) MSBD(A)/MSE F(a(b-1)(d-1), abcd(n-1))

CD(A) MSCD(A)/MSE F(a(c-1)(d-1), abcd(n-1))

ABC MSABC/MSBCD(A) F((a-1)(b-1)(c-1), a(b-1)(c-1)(d-1))

BCD(A) MSBCD(A)/MSE F(a(b-1)(c-1)(d-1), abcd(n-1)

We start with a prototype of four-way ANOVA with a basic design of AXBXCXD with all factors fixed. The corresponding cell means model leads to a general linear model

y = X +

where y is an n 1 vector of the observation values, X is the cell means design n m matrix, is the m 1 regression coefficient vector, and the random error n 1 vector. This leads to solving the normal equations for ordinary least squares estimation

X'X = X'y

Due to the coding with dummy variables, design matrix X is rank deficit as rank(X)<m. In the meantime, a constraints matrix C is defined based on all the factors and their various interactions.

The numerical calculations are done through the following basic steps:

(1) QR decomposition of constraints matrix C

CEc = QcRc

Ec is a permutation matrix so that diag(Rc) is decreasing.

(2) Projection of the design matrix X into the null space Qc0 of the constraints matrix C, which is composed of those rows of Qc corresponding to the diagonal zeros in Rc

Xp = XQc0

(3) QR decomposition of Xp

XpEd = QdRd

Again Ed is a permutation matrix so that diag(Rd) is decreasing.

(4) The degrees of freedom (df) and sum of squares (SS)

df = rank(Qd)SS = ║ŷ║2 = ║Qd' y║2

The above steps apply to computing random error and all ANOVA terms (main effects and interactions) as well.

Other design types are also based on this basic algorithm. As a demonstration, we assume a four-way ANOVA with a design of BC D(A), and among the four factors, A, B, and C are fixed while D is random and nested within A. Following the rules of thumb for writing the ANOVA table (1, 2), we have an ANOVA table with all available variation sources and their corresponding F statistics. Various contrasts with their t statistics are constructed in the same fashion with relevant variance estimates.

Sample Dialog: Questions and Answers

How many factors? 4

Choose design type (0, 1, 2, 3, 4, 5, ...): 2

How many slices along the Z axis? 40

Label for No. 1 factor: MD

How many levels does factor A (MD) have? 2

Label for No. 1 level of factor A (MD) is: VI1

Label for No. 2 level of factor A (MD) is: AU

……

Label for No. 4 factor: SJ

How many levels does factor D (SJ) have? 12

Label for No. 1 level of factor D (SJ) is: S1

……

There should be totally 96 input files. Correct? (1 - Yes; 0 - No) 1

(1) factor combination:

factor A (MD) at level 2 (VI1)

factor B (FB) at level 2 (NW)

factor C (CG) at level 1 (AN)

factor D (SJ) at level 12 (S1)

is: ss15.a_sound.irf.mean+tlrc.BRIK

……

How many 2nd-roder contrasts? (0 if none) 7

Label for 2nd order contrast No. 1: is: vis_avt

How many terms are involved? 2

Factor index for No. 1 term is (e.g., 0120): 1010

Corresponding coefficient (i.e., 1 or -1): 1

Factor index for No. 2 term is (e.g., 0120): 1020

Corresponding coefficient (i.e., 1 or -1): -1

……

Running ANOVA on slice:

#1... done in 20.748358 seconds

……

References

Five Design Types of Four-Way ANOVA

AFBF CF DF

All factors fixed;

Fully crossed

A,B,C,D=stimulus category, drug treatment, etc.

All combinations of subjects and factors exist;

Multiple subjects: treated as repeated measures;

One subject: longitudinal analysis

AFBF CF DR

Last factor random;

fully crossed

A,B,C=stimulus category, etc.

D=subjects, typically treated as random (more powerful than treating them as repeats)

Good for an experiment where each fixed factor applies to all subjects;

BF CF DR(AF)

Last factor random, and nested within the first

(fixed) factor

A=subject class: genotype, sex, or disease

B,C=stimulus category, etc.

D=subjects nested within A levels

BF CR DF(AF)

Third factor random; fourth factor fixed and nested within the first

(fixed) factor

A=stimulus type (e.g., repetition number)

B=another stimulus category (e.g., animal/tool)

C=subjects

D=stimulus subtype (e.g. perceptual/conceptual)

CF DR(AF BF)

Doubly nested!

A, B=subject classes: genotype, sex, or disease

C=stimulus category, etc.

D=subjects, random with two distinct factors dividing the subjects into finer sub-groups

Software ImplementationFour-way ANOVA with an unbalanced design (unequal sample size) and with covariates (ANCOVA) are currently under development. The package for four-way ANOVA can be downloaded from the AFNI website:

http://afni.nimh.nih.gov/sscc/gangc