connectome classification: statistical graph theoretic methods for analysis of mr-connectome data

1
Connectome Classification: Statistical Graph Theoretic Methods for Analysis of MR-Connectome Data Joshua T. Vogelstein 1 , William R. Gray 1,2 , John A. Bogovic 1 , Susan M. Resnick 3 , Jerry L. Prince 1 , Carey E. Priebe 1 , R. Jacob Vogelstein 1,2 1 Johns Hopkins University, Baltimore, Maryland, 2 Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland 3 National Institutes of Health, Bethesda, Maryland Abstract • Methods for high-throughput MR connectome inference are available [1] • Previous analyses of connectome data relied on classical graph theoretic tools, such as clustering coefficient • We develop a statistical graph theoretic framework to apply to generic connectome classification problems • Applying the tools to 49 senior individuals from the BLSA data set resulted in connectome classification accuracy of up to 85% • Using standard graph theoretic measures, like clustering coefficient, ignores vertex labels, and achieves only 75% accuracy even upon using sophisticated multivariate machine learning methods [2] • Extensions and further applications aplenty. Methods Connectome Inference MR Connectome Automated Pipeline (MRCAP) [1] to infer connectomes Vertices are neuroanatomical gyral regions [3], edges are estimated tracts using FACT [4] 49 subjects from the Baltimore Longitudinal Study on Aging; 25 male, 24 female MRCAP is available at: http://www.nitrc.org/projects/mrcap/ Model • Joint graph/class model • Each edge is an independent binary random variable • A subset of edges comprise the signal subgraph Classifier • Bayes plug-in classifier is asymptotically optimal Robust estimators have better convergence properties than the MLE Signal Subgraph Estimator • The signal subgraph could be all edges, an incoherent subset, or a coherent subset • We devise a different estimator for the two special cases • For each edge, we compute the significance of the difference between the two clases, using a Fisherʼs exact test, which is optimal under the model • The incoherent signal subgraph estimator chooses the s most significant edges • The coherent signal subgraph estimator chooses the m most significant vertices, and then the s most significant edges incident to those vertices Figure Legend: (Top) Gyral labels and associated numeric indices (adapted from [3]). Connections between these regions, as revealed through the DTI tensor data, are quantified in terms of the mean fractional anisotropy (FA) of the estimated fibers. (Bottom) Adjacency matrices illustrating connections between gyral regions (vertices) in female and male brains. Each entry in these adjacency matrices represents the mean FA of fibers originating in the gyral region indicated by the row index and terminating in the gyral region indicated by the column index, averaged across all subjects from each sex. The significance of the difference (uncorrected, exact p-values) between female and male brains, computed with Fisher’s exact test, is also shown. In all plots, lighter coloration implies higher values. Only the lower triangle is shown because these graphs are undirected and therefore the adjacency matrices are symmetric. Labels 1–35 are assigned to the left hemisphere; 36–70 are assigned to the right hemisphere. F GY = F G|Y F Y = (u,v )S Bern(a uv ; p uv |y )π y (u,v )E\S Bern(a uv ; p uv ) ˆ y = (u,v )ˆ S Bern(a uv p uv |y π y Results Gender Classifier • Coherent and incoherent classifiers perform better than chance and the naive Bayes classifier (coherent classifier is significant with p-value < 0.0001). • Best classifier achieves 83% accuracy using 12 signal vertices and 360 signal edges • Classical graph theoretic tools, such as clustering coeffiecient, number of triangles, etc., do not use vertex labels, which contain useful classification signal. • SOA Machine learning techniques [2] using classical graph theory yield only 75% accuracy Figure Legend (above): The top two panels depict the relative performances of the incoherent (left) and coherent (right) classifiers as a function of their hyper-parameters. The middle two depict misclassification rate (left) for a few different choices of # of signal vertices and (right) a zoomed in depiction of the top right panel. The bottom left panel shows the estimated signal subgraph, and the bottom right shows the coherogram. Together, these bottom panels suggest that the signal subgraph for these data is neither particularly coherent or incoherent. (below): The figure below visualizes the twelve signal subgraph nodes. Each subplot shows the signal subgraph induced by one of the 12 signal vertices estimated using the coherent classifier. There are 360 edges in the signal subgraph. Synthetic Data Analysis • Simulations as true to real data as possible suggest model is not wholly unreasonable • Even under true model, we only expect about 50% of the identified edges are true signal edges with <50 samples • With only a few more samples, both misclassification rate and missed-edge rate drop precipitously Assumption Checking • Correlation matrix is significantly correlated, suggesting independent edge assumption is poor (data not shown) Discussion MRCAP is an effective tool for high-throughput connectome inference •Signal subgraph classifiers significantly improve performance over standard classification results in both real and synthetic data • Synthetic data suggests a few additional datapoints could yield vastly improved performance • Assumption suggests performance improvements are despite some model inaccuracies, and generalized models might yield further improvements • Standard graph theoretical tools are less effective and do not suggest a signal subgraph References [1] Gray et al, submitted and available at: http://www.nitrc.org/projects/mrcap/ . . [2] Drezde et al, 2008. [3] Desikan et al, 2006. [4] Mori,et al. 1999. 10 0 10 1 10 2 10 3 0 0.25 0.5 log size of signal subgraph misclassification rate incoherent estimator ˆ L nb =0.41 ˆ L inc =0.27 ˆ L ˆ π =0 . 5 size of signal subgraph # signalvertices coherent estimator ˆ L coh =0.16 200 400 600 800 1000 10 20 30 0.16 0.3 0.4 0.5 10 0 10 1 10 2 10 3 0 0.16 0.25 0.5 log size of signal subgraph misclassification rate some coherent estimators size of signal subgraph # starvertices zoomed in coherent estimator 400 500 600 15 18 21 0.16 0.3 0.4 0.5 coherent signal subgraph estimate vertex vertex 20 40 60 20 40 60 threshold coherogram 0.04 0.14 0.29 0.55 20 40 60 0 10 20 30 10 0 10 1 10 2 10 3 0 0.25 0.5 0.75 1 log size of signal subgraph misclassification rate incoherent estimator size of signal subgraph # starvertices coherent estimator 200 400 600 800 1000 10 20 30 0.18 0.3 0.5 0.7 0 20 40 60 80 100 0 0.5 1 # training samples missededge rate 0 20 40 60 80 100 0.1 0.2 0.3 0.4 0.5 # training samples misclassification rate coh inc nb

Upload: joshua-vogelstein

Post on 10-May-2015

285 views

Category:

Technology


1 download

DESCRIPTION

poster at HBM11

TRANSCRIPT

Page 1: Connectome Classification: Statistical Graph Theoretic Methods for Analysis of MR-Connectome Data

Connectome Classification: Statistical Graph Theoretic Methods for Analysis of MR-Connectome Data

Joshua T. Vogelstein1, William R. Gray1,2, John A. Bogovic1, Susan M. Resnick3, Jerry L. Prince1, Carey E. Priebe1, R. Jacob Vogelstein1,2 1Johns Hopkins University, Baltimore, Maryland, 2Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland

3National Institutes of Health, Bethesda, Maryland

Abstract• Methods for high-throughput MR connectome inference are available [1]• Previous analyses of connectome data relied on classical graph theoretic tools, such as clustering coefficient• We develop a statistical graph theoretic framework to apply to generic connectome classification problems• Applying the tools to 49 senior individuals from the BLSA data set resulted in connectome classification accuracy of up to 85%• Using standard graph theoretic measures, like clustering coefficient, ignores vertex labels, and achieves only 75% accuracy even upon using sophisticated multivariate machine learning methods [2] • Extensions and further applications aplenty.

MethodsConnectome Inference• MR Connectome Automated Pipeline (MRCAP) [1] to infer connectomes• Vertices are neuroanatomical gyral regions [3], edges are estimated tracts using FACT [4]• 49 subjects from the Baltimore Longitudinal Study on Aging; 25 male, 24 female

MRCAP is available at: http://www.nitrc.org/projects/mrcap/

Model• Joint graph/class model• Each edge is an independent binary random variable• A subset of edges comprise the signal subgraph

Classifier• Bayes plug-in classifier is asymptotically optimal• Robust estimators have better convergence properties than the MLE

Signal Subgraph Estimator• The signal subgraph could be all edges, an incoherent subset, or a coherent subset• We devise a different estimator for the two special cases• For each edge, we compute the significance of the difference between the two clases, using a Fisherʼs exact test, which is optimal under the model• The incoherent signal subgraph estimator chooses the s most significant edges• The coherent signal subgraph estimator chooses the m most significant vertices, and then the s most significant edges incident to those vertices

Figure Legend: (Top) Gyral labels and associated numeric indices (adapted from [3]). Connections between these regions, as revealed through the DTI tensor data, are quantified in terms of the mean fractional anisotropy (FA) of the estimated fibers. (Bottom) Adjacency matrices illustrating connections between gyral regions (vertices) in female and male brains. Each entry in these adjacency matrices represents the mean FA of fibers originating in the gyral region indicated by the row index and terminating in the gyral region indicated by the column index, averaged across all subjects from each sex. The significance of the difference (uncorrected, exact p-values) between female and male brains, computed with Fisher’s exact test, is also shown. In all plots, lighter coloration implies higher values. Only the lower triangle is shown because these graphs are undirected and therefore the adjacency matrices are symmetric. Labels 1–35 are assigned to the left hemisphere; 36–70 are assigned to the right hemisphere.

FGY = FG|Y FY

=�

(u,v)∈S

Bern(auv; puv|y)πy

(u,v)∈E\S

Bern(auv; puv)

y =�

(u,v)∈S

Bern(auv; puv|y)πy

Figure 2: (Top) Gyral labels and associated numeric indices (adapted from Ref. 5). Connections between these regions, as revealed through the DTI tensor data, are quantified in terms of the mean fractional anisotropy (FA) of the estimated fibers. (Bottom) Adjacency matrices illustrating connections between gyral regions (vertices) in female and male brains. Each entry in these adjacency matrices represents the mean FA of fibers originating in the gyral region indicated by the row index and terminating in the gyral region indicated by the column index, averaged across all subjects from each sex. The significance of the difference (uncorrected, exact p-values) between female and male brains, computed with Fisher’s exact test, is also shown. In all plots, lighter coloration implies higher values. Only the lower triangle is shown because these graphs are undirected and therefore the adjacency matrices are symmetric. Labels 1–35 are assigned to the left hemisphere; 36–70 are assigned to the right hemisphere.

ResultsGender Classifier• Coherent and incoherent classifiers perform better than chance and the naive Bayes classifier (coherent classifier is significant with p-value < 0.0001).• Best classifier achieves 83% accuracy using 12 signal vertices and 360 signal edges• Classical graph theoretic tools, such as clustering coeffiecient, number of triangles, etc., do not use vertex labels, which contain useful classification signal.• SOA Machine learning techniques [2] using classical graph theory yield only 75% accuracy

Figure Legend (above): The top two panels depict the relative performances of the incoherent (left) and coherent (right) classifiers as a function of their hyper-parameters. The middle two depict misclassification rate (left) for a few different choices of # of signal vertices and (right) a zoomed in depiction of the top right panel. The bottom left panel shows the estimated signal subgraph, and the bottom right shows the coherogram. Together, these bottom panels suggest that the signal subgraph for these data is neither particularly coherent or incoherent. (below): The figure below visualizes the twelve signal subgraph nodes. Each subplot shows the signal subgraph induced by one of the 12 signal vertices estimated using the coherent classifier. There are 360 edges in the signal subgraph.

Synthetic Data Analysis• Simulations as true to real data as possible suggest model is not wholly unreasonable• Even under true model, we only expect about 50% of the identified edges are true signal edges with <50 samples• With only a few more samples, both misclassification rate and missed-edge rate drop precipitously

Assumption Checking• Correlation matrix is significantly correlated, suggesting independent edge assumption is poor (data not shown)

Discussion• MRCAP is an effective tool for high-throughput connectome inference •Signal subgraph classifiers significantly improve performance over standard classification results in both real and synthetic data• Synthetic data suggests a few additional datapoints could yield vastly improved performance• Assumption suggests performance improvements are despite some model inaccuracies, and generalized models might yield further improvements• Standard graph theoretical tools are less effective and do not suggest a signal subgraph

References[1] Gray et al, submitted and available at: http://www.nitrc.org/projects/mrcap/. .[2] Drezde et al, 2008.[3] Desikan et al, 2006.[4] Mori,et al. 1999.

100 101 102 1030

0.25

0.5

log size of signal subgraph

mis

clas

sific

atio

n ra

te

incoherent estimator

Lnb=0.41

L i nc=0.27

L ! = 0 .5

size of signal subgraph

# si

gnal−v

ertic

es

coherent estimator

L c oh=0.16

200 400 600 800 1000

10

20

300.16

0.3

0.4

0.5

100 101 102 1030

0.160.25

0.5

log size of signal subgraph

mis

clas

sific

atio

n ra

tesome coherent estimators

size of signal subgraph

# st

ar−v

ertic

es

zoomed in coherent estimator

400 500 600

15

18

21

0.16

0.3

0.4

0.5

coherent signal subgraph estimate

verte

x

vertex20 40 60

20

40

60

threshold

coherogram

0.04 0.14 0.29 0.55

20

40

600

10

20

30

100 101 102 1030

0.25

0.5

0.75

1

log size of signal subgraph

mis

clas

sific

atio

n ra

te

incoherent estimator

size of signal subgraph

# st

ar−v

ertic

es

coherent estimator

200 400 600 800 1000

10

20

300.18

0.3

0.5

0.7

0 20 40 60 80 1000

0.5

1

# training samples

mis

sed−

edge

rate

0 20 40 60 80 100

0.1

0.2

0.3

0.4

0.5

# training samples

mis

clas

sific

atio

n ra

te

cohincnb