understanding alzheimer’s disease progression through

Master Thesis on Intelligent Interactive Systems

Universitat Pompeu Fabra

Understanding Alzheimer’s diseaseprogression through phenotypesdiscovery using manifold learning

techniques

Natalia Karina Pattarone

Supervisor: Gemma Piella

July 2021

Contents

1 Introduction 1

1.1 Clinical Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Data-Driven Computational Methods . . . . . . . . . . . . . . . . . . 21.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Structure of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Materials and Methods 5

2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 MRI acquisition/processing . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . 72.2.3 Unsupervised clustering . . . . . . . . . . . . . . . . . . . . . 11

2.3 Phenotype analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Results 14

3.1 Patient embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Patient phenotyping . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Discussion and Future Work 26

4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

List of Figures 28

List of Tables 30

A First Appendix 31

B Second Appendix 41

Abstract

Alzheimer’s disease (AD) is clinically highly heterogeneous, varying in terms ofrates of progression, test and cognitive symptoms among patients, as well as froma neuroimaging perspective. In the datasets provided by The Alzheimer’s DiseaseNeuroimaging Initiative (ADNI), researchers collect, validate and utilize data, in-cluding MRI and PET images, genetics, cognitive tests, CSF and blood biomarkersas predictors of the disease. Data coming from these datasets allow discoveringphenotypes that could help to better understand the disease and provide targetedtreatment.The objective of this thesis is to identify data-driven phenotypes using manifoldlearning and unsupervised clustering on multimodal longitudinal imaging and non-imaging data. First, we apply a novel approach for dimensionality reduction calledPHATE that captures both local and global nonlinear structure using an information-geometric distance between datapoints that would facilitate the discovery of possibleAD phenotypes. Over PHATE output space, we performed a multiple-kernel unsu-pervised clustering to obtain profiles and describe AD phenotypes where features areweighted to construct kernels. Our results show that our approach can reveal ADprogression trajectories in a lower dimensionality space, improving the results of theprofiling where we obtained 4 possible profile subgroups using MRI cross-sectionalbaseline data and 8 possible profile subgroups when using longitudinal data. Fur-thermore, longitudinal data established clearer separation among profiles and highersignificance for cognitive tests and general volumetric cerebral values than baselinedata. Identifying these profiles could be useful for more personalized treatment ofsuch a heterogeneous disease as AD.

Keywords: MRI; Imaging Techniques; Alzheimer; Manifolds; Longitudinal Data;Cross-sectional Data

Chapter 1

Introduction

1.1 Clinical MotivationAlzheimer’s Disease (AD) is the most common type of dementia. It is a neuro-degenerative disease with insidious onset and progressive impairment of behavioraland cognitive functions including memory, comprehension, language, attention, rea-soning, and judgment. There is no cure yet available, no effective pharmaceuticalagents exist to reduce the progression or halt it. Individuals with AD experiencemultiple symptoms that change over the years. These symptoms reflect the degreeof damage to neurons in different parts of the brain. The pace at which symptomsadvance from mild to moderate to severe varies from person to person. On thesevere stages of the disease, individuals become incapable of leaving bed becauseof the damage to areas of the brain involved in movement. Other brain areas getaffected, such as the ones controlling capability to eat and drink.

The accumulation of the protein fragment amyloid-beta (Aβ) (called amyloid-beta plaques) outside neurons and the accumulation of an abnormal form of theprotein tau (called tau tangles) inside neurons are two of several brain changesassociated with AD [1]. It has been shown that Aβ deposition occurs slowly andprotracted, possibly extended during 20 years [24], so an early detection of thedisease process is crucial for early therapeutic interventions to delay the onset of theclinical symptoms or slow the cognitive decline.

It is a matter of great importance to understand the value of multimodal andlongitudinal data. As established above, it is notorious that AD is a multifacto-rial disease, with distinct progression paths depending on the observed biomarkers[13]. Furthermore, studying this progression at different stages of the disease as wellas for different cognitive impairment levels (Dementia, Mild Cognitive Impairment,

1

etc), can provide insights by setting a frame to extract meaningful information thatgroup together factors (i.e. demographic) highlighting subtypes/phenotypes wherethe specific progressions take place, a fact necessary so that a more personalisedtreatment can be done. It could help to improve the detection of the disease andprovide a better understanding of the interaction between different biological mech-anisms and markers.

1.2 Data-Driven Computational MethodsRecently, several data-driven computational approaches applied to AD phenotypingdiscovery have been introduced. Some of them include analysis of longitudinal datato quantify the evolution of the disease, determine temporal trajectories and detectdifferent paths of degeneration to build disease progression. For example, techniquesof multi-task learning for cognitive prediction [4, 16, 14] focused on the combinationof cognitive scores analysis and imaging markers for a more robust model. Anotherstudy performed an extensive survey on recent statistical and machine learningapplications in AD using longitudinal neuroimaging [17]. Deep learning (DL) tech-niques are starting to take off, considered nowadays state of the art in relation toML, have been used in cognitive score predictions applying convolutional neuralnetworks (CNN) [7, 25, 12], and others use them for computer-aided diagnosis usingrecurrent neural networks (RNN) [11, 3, 6]. Nevertheless, DL-based algorithms aremostly non interpretable and the performance gained by these type of techniquesdo not compensate their lack of interpretation; event-based models and manifoldlearning among others, are some of the methods that have been used as well (seework in [17] for an extensive description).

Additionally, a recent study for multimodal phenotyping developed a compu-tational method based on coupled nonnegative matrix factorization (C-NMF)[15]that cluster’s associated entities from either brain regions or cognitive tasks simul-taneously (two data modalities) so that the phenotypes can reflect both types ofinformation, capturing interactions between them. NMF is a dimensionality re-duction approach used to represent the best approximation of each data modalityso that C-NMF calculation ends up with the advantage of respecting the differentdistributions of these two data modalities.

A related-side topic, but equally important, is the fact that when performingAD progression studies there is an imbalanced outcome distributions in patientdata, especially when working on classifications for Cognitively Normal (CN) vsMild Cognitive Impairment (MCI), as data tend to be available in lower amounts,

2

probably because such diagnostic groups are more sparse in available public andprivate cohorts, since a long follow-up is needed to correctly assess whether thepatient will progress to AD. Facing similar problems as the one just mentioned, astudy performed to predict key complication phenotypes among patients with acutediseases, developed a novel method using deep learning in combination with RNNsequence embedding to represent disease progression while considering temporal het-erogeneities in patient data including cost-sensitive learning to address imbalancedoutcome distributions in patient data [22].

So far, there has been some research on phenotyping using a dimensionalityreduction approach like t-SNE [18] that allows to preserve a range of patterns in(highly dimensional) data, however lacks the ability to construct progression tracksthat forms trajectories, which is highly valuable when studying AD progression.Furthermore, when working with features from imaging, normally the focus is on a(cross-sectional) single modality (e.g. MRI or PET). This motivated us to developa novel phenotyping discovery approach using manifold learning techniques wherethe results are extended by considering an underlying geometry consisting of mul-tiple one-dimensional manifolds in combination (i.e. trajectory curves, expectingour data to fit such geometry given the progressive nature of AD) with a multi-kernel clustering approach to extract and analyze phenotypes. We also focused onbaseline and longitudinal progression of neuroimaging because AD is a progressivedisease and understanding the neurodegeneration is the main outcome of interest inAD research. Therefore, our objective is to identify data-driven phenotypes usingmanifold learning on multimodal longitudinal imaging and non-imaging data.

1.3 ObjectivesThe research and the resulting AD phenotypes cover the following goals:

1. Applying a novel algorithm for dimensionality reduction that assist in theanalysis, selection and definition of features that may reveal AD phenotypes.

2. Finding and visualizing any correlations among brain volumetric measures thatassist in the differentiation for AD phenotypes, given that structural MRI isconsidered a clinical predictor of AD [2].

3. Gaining insight into AD progression from a phenotype perspective to assist anearly treatment.

3

4. Visualizing neuroimaging data in 2D and 3D space to potentially classify thedata points into clusters based on similarities using supervised and unsuper-vised dimensionality reduction techniques.

5. Supporting data stratification.

1.4 Structure of the ReportChapter 2 introduces all materials and methods used in the thesis e.g. the datasetdescriptions, an overview of the process used to obtain the curated data, descriptionof the dataset and the algorithm utilised. Next, the methods for finding resultswill be described by beginning from data-processing then followed by the softwarearchitecture, software functions, and its usage.Chapter 3 introduces the results of the findings from Chapter 2 while Chapter 4discusses possible areas of future work.

4

Chapter 2

Materials and Methods

2.1 DatasetWe work with datasets generated by provided scripts from Tadpole (The Alzheimer’sDisease Prediction Of Longitudinal Evolution) Challenge 2017 [9]. The scripts con-tain detailed processes to parse and unify several Alzheimer’s Disease NeuroimagingInitiative (ADNI) datasets. Concretely, we use 2 of the training sets: D1. and

D2. TADPOLE Standard training set. The first one (D1) created from ADNIdata to which was added additional MRI, PET (FDG, AV45 and AV1451), DTI andCSF biomarkers. The MRI biomarkers added consist of FreeSurfer longitudinallyprocessed ROIs from UCSFFSL tables. Duplicate rows were removed by retainingthose with the most recent RUNDATE and IMAGEUID. Additionally was includedthe following types of PET ROI-based biomarkers: FDG, AV45 and AV1451 alongwith three CSF biomarkers: Amyloid-beta, Tau and P-Tau. Union of several datasources was unified matching the subjects using the subject ID and visit code. Dupli-cate rows were removed as well. On the other hand, the second one (D2) includes allcurrently available longitudinal data for prospective ADNI3 participants (rolloversfrom ADNI2). That is, active participants with ADNI2 visits.

Informed consent was obtained for all subjects, and the study was approvedby the relevant institutional review board at each data acquisition site (for up-to-date information, see http://adni.loni.usc.edu/wp-content/themes/freshnews-dev-v2/documents/policy/ADNI_Acknowledgement_List%205-29-18.pdf). All meth-ods were performed in accordance with the relevant guidelines and regulations.

5

2.2 MRI acquisition/processingWe defined an initial benchmark with a set of features plus diagnostic column(DX=Diagnostic, for the target) coming only from MRI scans for the baseline mea-sures. We split our work into 2 different set of features: cross-sectional data usingbaseline and longitudinal data, obtained by using FreeSurfer Software Suite Version4.4 [10]. Table A.2 lists for every feature used for MRI cohort. Data pre-processingincluded:

1. Removal of rows containing NaN values.

2. Normalization of cerebral volumetric measures accordingly [26].

3. Application of undersampling [8] to handle imbalance data.

4. Standardization (to [0,1]) of the structural volumes.

5. Use of min-max scaling, substracting the minimum value of each feature anddividing by the difference between the maximum and the minimum. This way,we preserve zero entries and introduce robustness to small standard deviationsin the biomarkers.

For the MRI cross-sectional baseline data, we ended up with a set of 1350 sub-jects, including 438 cognitive normal (CN), 680 with mild cognitive impairment(MCI) and 231 with AD. We handled unbalanced classes using undersampling tech-niques, obtaining a final set of 231 subjects for each class CN, MCI and AD and atotal of 140 features. Table 2.1 shows the demographic information of the studiedcohort.

CN MCI AD

Nº of subjects 213 213 213

Age (years) 74.04± 5.57 72.43± 7.45 74.11± 7.86

Sex (female) 51.48% 41.61% 45.45%

Education (years) 16.32± 2.71 15.86± 2.90 15.23± 2.89

APOE4 25.74% 39.41% 48.48%

Table 2.1: Demographic information of the studied cohort for MRI baseline cross-sectional data.

6

And for the MRI longitudinal data, we ended up with a set of 5021 subjects,including 1694 cognitive normal (CN), 2434 with mild cognitive impairment (MCI)and 893 with AD. We handled unbalanced classes using undersampling techniquesobtaining a final set of 893 subjects for each class CN, MCI and AD and a total of152 features. Table 2.2 shows the demographic information of the studied cohort.

CN MCI AD

Nº of subjects 893 893 893

Age (years) 73.98± 5.39 72.65± 7.24 73.79± 7.43

Sex (female) 49.94% 37.62% 46.13%

Education (years) 16.17± 2.76 15.94± 2.93 15.16± 2.90

APOE4 24.30% 37.29% 48.82%

Table 2.2: Demographic information of the studied cohort for MRI longitudinaldata.

2.2.1 Feature selection

To improve the model, we performed feature selection using regularization meth-ods. The (original) number of features selected after cleaning the data was of 140for baseline and longitudinal data. We applied Lasso regularization with a penaltyparameter of C = 0.1, and we observed an improvement on the representation of thedata (see Section 3 for further details). Concretely, the final cohort using baselinedata contains 38 features remaining after performing feature selection using Lin-ear Support Vector Classification (Linear SVC) and penalty l1 with regularizationparameter equal to 0.1. And the final cohort using longitudinal data contains 87features remaining after feature selection with same parametrization as for baselinedata.

2.2.2 Dimensionality reduction

Dimensionality reduction (DR) is widely used on the analysis of high-dimensionaldata. Given the heterogeneous nature of biological, medical or healthcare data wherehundreds of features are collected, there is a necessity to find a better suited methodthan statistical ones as they are not able to extract the latent underlying represen-tation that remains sparsely submerged in a voluminous high-dimensional space,

7

turning into an impossible task to explore them exhaustively. To overcome thisphenomenon, we use PHATE (Potential of Heat-diffusion for Affinity-based Transi-tion Embedding) [19], a dimensionality reduction method that captures both localand global nonlinear structure using an information-geometric distance between dat-apoints. PHATE generates a low-dimensional embedding specific for visualizationthat provides an accurate, denoised representation of both local and global structureof a dataset in the required number of dimensions without forcing any prior assump-tions on the structure of the data. It combines techniques from manifold learning,information geometry, and data-driven diffusion geometry. The result is that high-dimensional and nonlinear structures, such as clusters, nonlinear progressions, andbranches, become apparent in two or three dimensions and can be extracted forfurther analysis. Hence, one can expect PHATE to provide an "embedding" overwhich it can be performed clusters/phenotypes analysis and thus, computing theprobabilities to belong to each particular one.

PHATE algorithm can be summarised within the follow steps (see Figure 2.1):

Figure 2.1: PHATE algorithm (see Figure 2 in [19]).

1. Compute the pairwise distances from the data matrix.When performing dimensionality reduction, one common approach is to em-bed raw data matrix in a linear manner to preserve global structure of the

8

data (i.e. PCA). Nevertheless, in most cases these are noisy and global tran-sitions are nonlinear, concluding that they are insufficient to capture latentpatterns in the data, and typically result in a noisy visualization resultingin the main motivation of PHATE to preserve distances between data pointsthat consider gradual changes between them along these nonlinear transitions.A typical choice for distance metrics is Euclidean distances. Nevertheless, asstated before, global Euclidean distances would not be able to reflect properlytransitions in data as they are usually nonlinear.

2. Transform the distances to affinities to encode local information.A common approach to transforming global (e.g. Euclidean) distances tolocal similarities is to apply a kernel function that quantifies pair-wise sim-ilarity between points based on their Euclidean distance. A popular choiceis the Gaussian, which measures similarity between data points xi and xj asKε (xi, xj) = exp

(−‖xi−xj‖

22

2ε(xi,xj)2

). However, embedding local affinities directly

can result in a loss of global structure. On the contrary, a faithful structure-preserving embedding (and visualization) needs to go beyond local affinities(or distances), and also consider global relations between parts of the data.To accomplish this, PHATE is based on constructing a diffusion geometry tolearn and represent the shape of the data. This construction is based on com-puting local similarities between data points, and then diffusing through thedata using a Markovian random-walk diffusion process to infer more global re-lations. The initial probabilities for the random walk are calculated with thekernel matrix row-sums being normalised. For the Gaussian kernel described:

νε(xi) = ‖Kε(xi, ·)‖1 =∑

xi∈XKε(xi, xj)

resulting in a N × N row-stochastic matrix, referred to as the diffusion oper-ator, which is the probability of moving from x to y in a single time step

[Pε](xi,xj) =Kε(xi,xj)

νε(xi), xi, xj ∈ X

3. Learn global relationships via the diffusion process.Choosing the bandwidth ε of the Kε(xi, xj) corresponds to a trade-off betweenencoding global and local information in the probability matrix Pε. If thebandwidth is small, then single-step transitions in the random walk using Pεare confined to the nearest neighbors of each data point. This may result in

9

sparsely sampled regions being entirely excluded and the trajectory structurein the probability matrix Pε will not be encoded. Conversely, if the bandwidthis too large, then the resulting probability matrix Pε loses local informationas [Pε](x, ·) becomes more uniform for all x ∈ X, which may result in aninability to resolve different trajectories. A suitable choice is setting ε to thesmallest distance that allow for the diffusion process. Nevertheless, this onlycovers for some specific cases and is hugely affected by outliers and sparsedata regions. Furthermore, it would rely on a single manifold with constantdimension as its underlying geometry, negatively influencing on cases where thedata is sampled from trajectories. To overcome this, PHATE uses an adaptivebandwidth that changes with each point to be equal to its kth-nearest neighbordistance to preserve local information in cases where, for instance, there areregions densely sampled, along with an α-decaying kernel that controls therate of decay of the kernel. The α-decay kernel is defined as follow:

Kk,α(xi, xj) =12exp

(−(‖xi−xj‖2εk(xi)

)α)+ 1

2exp

(−(‖xi−xj‖2εk(xj)

)α)4. Encode the learned relationships using the potential distance.

The algorithm uses a novel diffusion-based informational distance called poten-tial distance, which increases sensitivity to global relationships and providesmore stability at boundaries of manifolds due to the use of probability dis-tributions. To go from the probability space to the information (or energy)space, the log transformation is applied. This log function is used to renderdistances sensitive to small differences that provides PHATE the ability topreserve both local and manifold-intrinsic global distances optimised for vi-sualization. Mathematically, if U t

x = − log (ptx) for x ∈ X, then the t-steppotential distance is defined as:

Vt(xi, xj) =∥∥∥U t

xi− U t

xj

∥∥∥2, xi, xj ∈ X

PHATE is sensitive to small differences in probability distribution correspond-ing to differences in long-range global structure, which allows PHATE to pre-serve global manifold relationships using this potential distance.

5. Embed the potential distance information into low dimensions for visualiza-tion.The popular approach of using diffusion maps can preserve global structure

10

and denoise the data, but it has a high intrinsic dimensionality not suitablefor visualisation. Hence, PHATE compresses the variability into low dimen-sions using metric multidimensional scaling (MDS [5]), a distance embeddingmethod. It uses a combination of Classical MDS (CMDS) to obtain an initialconfiguration of the data in low dimension m. However, CMDS assumes thatinput distances are restricted to low-dimensional Euclidean distances, whichis highly restrictive. In order to relax this assumption, PHATE uses MetricMDS that embeds the data into lower dimensions by minimising a "stress"function over embedded m-dimensional coordinates.

2.2.3 Unsupervised clustering

To find the different profiles, we cluster the patients on the dimensionality reducedspace provided by PHATE, without using diagnosis. To find heterogeneous brainpresentations in each cluster, we analyze the relationships between brain phenotypesand cognitive test information in each cluster.

We used CIMLR to cluster the patients. CIMLR [21] is a method based onmultiple kernel learning that learns a similarity between each pair of patients bycombining different kernels per feature (in our case, PHATE output space coor-dinates). A cluster is then defined as a block structure of the learned similarity,grouping together similar patients, where each subject is positioned with respect tothe whole population according to their degree of similarity. The number of clus-ters C must be specified beforehand. By combining multiple kernels, it integratesthe heterogeneous information, and provides the contribution of each feature in thecomputed low-dimensional representation.

Using PHATE, each patient is embedded in a 3D latent space where each patientis represented as yi ∈ Rm with i = 1, ..., total_number_patients and m features onwhich we apply CIMLR. Each dimension m (aka feature) is assigned to P kernelsKmp. The method then constructs a set of Gaussian kernels for the dataset byfitting multiple hyperparameters. In total, there are P kernels for each feature m,each with different parameters. We define Kmp as:

Kmp (yi, yj) =1

εijp√2π

exp(−‖yim−yjm‖

22

2ε2ijm

),

µim =∑j∈KNN(yim)‖yim−xjm‖2

k, εijm =

σ(µim+µjm)

2

11

where yim denotes the m-th coordinate of the embedding of patient i, KNN(yim)

is representing the k nearest neighbours of subject i with respect to PHATE spacefeature m, and ‖.‖2 is the Euclidean distance. The above procedure is performedfor each coordinate independently, using a total of P = 3 kernels. Each kernel isdecided by a pair of parameters (σ, k), where σ is a hyper-parameter that estimatesthe variance using the local scales of the distances. Different combinations of choicesof k and σ would provide us kernels.

Then, optimization problem is solved considering the Gaussian kernels to buildpairwise patient similarity matrix. This optimization problem is defined as follows:

minS,A,w

−∑i,j,p,m

wmpKmp (yim, yjm)Sij + γ tr(AT (IN − S)A

)+ µ

∑m,p

wmp logwmp + β‖S‖2F

subject to ATA = IC ,∑m,p

wmp = 1, wmp ≥ 0,∑j

Sij = 1 and Sij ≥ 0

Here, γ, µ, and β are tuning parameters for the various terms of the optimizationfunction, IC represents the identity matrix of size C, ‖.‖F stands for the Frobeniusnorm, and tr denotes the trace of the matrix. The first term of the above equationlinks the learned similarity matrix S with the combination of kernels from all fea-tures: similarity between two samples should be small if their kernel-based distanceis large. The second term enforces S to have C connected components, through theauxiliary matrix A and its associated constraint ATA = IC . The third term imposesa constraint on w so that more than one kernel is selected, and the last term appliesa regularization penalizing the scale of the learned similarities.

We estimate the best number of clusters with the heuristic proposed in [21], andfurther validate the choice with the elbow method. In addition, we use NormalizedMutual Information (NMI) [14] to evaluate the consistency between the obtainedclustering and the true labels of the N patients. Given two clustering results U andV on a set of data points, NMI is defined as: I(U, V )/max{H(U), H(V )}, whereI(U, V ) is the mutual information between U and V , and H(U) represents the en-tropy of the clustering U . Specifically, assuming that U has P clusters, and V hasQ clusters, the mutual information is computed as follows:

12

I(U, V ) =P∑p=1

Q∑q=1

|Up ∩ Vq|N

logN |Up ∩ Vq||Up| × |Vq|

where |Up| and |Vq| denote the cardinality of the p-th cluster in U and the q-thcluster in V respectively. NMI takes on values between 0 and 1, measuring theconcordance of two clustering results. Therefore, a higher NMI refers to higher con-cordance with ground truth, i.e. a more accurate label assignment of each patient.

In addition, CIMLR was evaluated in a supervised setting with a 10-fold crossvalidation and simple classifiers. On each trail of cross-validation, they use 9 folds astraining data, and the remaining one fold as validation data. The average of the tenvalidation errors is recorded to show how the obtained latent features can facilitatethe classification task. The simplest classifier is nearest neighbor classifier, and thecorresponding error is called Nearest Neighbor Error (NNE) where the final value isthe average classification error on the ten fold.

2.3 Phenotype analysisOne can assume that subjects belonging to a particular cluster, share certain proper-ties or profile. PHATE provides a reduced space where it is easier to find clusters assimilar patients are close to each other, we contrast a CIMLR analysis over originaldata space versus the dimensionality reduction space provided by PHATE and lookat the values of NMI and NNE value reported by CIMLR.

Furthermore, we use one-way analysis of variance (ANOVA) tests across all clus-ters to test whether the population of each cluster has a different mean for thecognitive features and the volumetric data (see Table B.1). In addition, for furthervalidate cognitive tests significance we did a pairwise comparison for each clusterpair using Tukey HSD analysis.

We tested for differences in the cognitive tests, brain volumes and cortical thick-ness of the individuals in each subgroup. We compared each cluster with the restof the population, not taking into account diagnostic groups, to detect differentcharacteristics in each subgroup. We want to know whether cluster membership(independent variable) has significant effects on cognitive tests plus cerebral volu-metric values (dependent variable).

13

Chapter 3

Results

We applied the proposed method to the described dataset (see Section 2.1). Fromthe TADPOLE (The Alzheimer’s Disease Prediction Of Longitudinal Evolution)dataset, we selected a subset of the Challenge 2017, including Dementia (AD,Alzheimer’s Disease), MCI and cognitively normal (NL) subjects, using selected fea-tures for cross-sectional baseline and longitudinal data performed with the FreeSurferimage analysis suite [10] in addition to cerebral volumetric data coming from datasheetsmerge from ADNI (see Table A.1 for more details). The final cohort using baselinecross-sectional data contains 693 patients with 38 features remaining after perform-ing feature selection using Linear Support Vector Classification (Linear SVC) andpenalty l1 with regularization parameter equal to 0.1. And the final cohort usinglongitudinal data contains 2679 patients with 87 features remaining after featureselection with same parametrization as for baseline data.

3.1 Patient embeddingWe applied PHATE as described in Section 2.2.2 parametrized using 3 components(PHATE1, PHATE2 and PHATE3) for the output space. Figure 3.1 shows theoutput embedding for cross-sectional baseline data. For longitudinal patients asingle patient and a single time point is a different embedding, and for baseline dataa single patient and the first visit time point is an embedding. We can observe howPHATE was able to separate the different stages of the disease, as it is expected tobe observed on baseline data. We can state then that PHATE was able to captureunderlying structure of the data correctly. Another example showing in Figure 3.2the effect of adjusting the parameter decay=5, which makes the clusters to be evencloser. As explained in [20], data coming from sparsely sampled regions might beexcluded entirely, losing the trajectory encoding. To overcome this, it is suggested to

14

decrease the default value of parameter decay=15 in order to increase connectivityin the graph.

Figure 3.1: PHATE result (in 2D) for baseline MRI data for NL, MCI and De-mentia subjects with parameter n_components=3.

Figure 3.2: PHATE result (in 2D) for baseline (left) and longitudinal (right) MRIdata for NL, MCI and Dementia subjects with parameter n_components=3 anddecay=5.

For comparison, Figure 3.3 shows the embedding obtained using other dimen-sionality reduction methods, namely, PCA and t-SNE [23] to show the efficacy ofPHATE over the most common approaches for dimensionality reduction (see Figure3.3). PHATE is able to show a better and more concise differentiation among the

15

classes than the other approaches. Nevertheless, one can still observe some definitionon PCA and t-SNE (although more sparse), PHATE still remains as an improvedvisualization of extracted features for the selected classes.

Figure 3.3: PHATE vs PCA vs t-SNE, using baseline cross-sectional data.

Additionally, in Figure 3.4 one can observe AD progression trajectory for 3 pa-tients, following the trajectory drawn by PHATE, i.e., from left to right. We canconfirm then that PHATE is correctly capturing the trajectories of AD progressionfor patients under different starting points conditions.

16

Figure 3.4: Different time points for 3 patients along PHATE trajectory. Progres-sion occurs left to right, from NL to Dementia (as observed in previous Figures).Patient A: progresses from NL to MCI, 4 time points. Patient B: progresses fromMCI to Dementia, 4 time points. Patient C: progresses from NL to Dementia, 3time points.

17

3.2 Patient phenotypingCIMLR was applied on the PHATE embedding of the patients. Using the heuristicproposed in [18], we found that the number of clusters was C = 4 for cross-sectionalbaseline and longitudinal data. This was further validated using the elbow method.However, we wanted to contrast a CIMLR analysis over original data space versusthe dimensionality reduction space provided by PHATE. From the results in Table3.1 we observed that the NNE decreased and the correlation among clusters (NMI)improved when using PHATE space output for both data cohorts (cross-sectionalbaseline and longitudinal). Furthermore, we performed an ANOVA test to validatethe results using the selected covariate features (see Table B.1 for details) on thesubjects in each cluster and we found that in the original data space the F-scorevalues are substantially lower than the ones obtained from PHATE space and, inconsequence as expected, also higher significance given the lower p-value values. Inconclusion, we demonstrated that when using PHATE we can construct a robustclustering, obtaining a further solid base for AD phenotype discovery proving thatCIMLR works better over PHATE space than over the original space.

Cross-sectional Nº of Clusters NMI ↑ NNE ↓

BaselinePHATE space 4 0.105098 0.17269

Original space 10 0.029522 0.21529

LongitudinalPHATE space 8 0.115086 0.32789

Original space 7 0.036932 0.53894

Table 3.1: CIMLR analysis over original data space vs. PHATE output space.The values are with respect to the heuristic to estimate the number of clustersfrom 1 to 10 possible clusters in total. NMI: Normalized Mutual Information.NNE: Nearest Neighbor Error reported at the maximum number of iterations.

18

Original space PHATE spaceFEATURES

F-Score p-value F-Score p-value

ADAS11 10.636454 2.568153e-11 63.279449 3.809737e-36

ADAS13 11.443751 3.209684e-12 75.728769 2.419462e-42

CDRSB 7.964122 2.547480e-08 65.789696 2.045614e-37

MMSE 9.533475 4.425290e-10 75.992485 1.799451e-42

APOE4 1.453479 0.191709 9.180338 0.000006

RAVLT Immediate 9.318811 7.704470e-10 55.39417 4.371492e-32

PTEDUCAT 1.073059 0.377096 8.72747 0.000011

Ventricles 10.759982 1.867709e-11 20.577826 8.862608e-13

Hippocampus 7.521101 7.984990e-08 93.793485 6.595045e-51

Entorhinal 8.206961 1.361176e-08 92.3152 3.177793e-50

Fusiform 8.306421 1.052920e-08 112.794745 2.023828e-59

MidTemp 7.279257 1.488747e-07 139.63699 1.111622e-70

WholeBrain 4.819641 0.000079 148.491416 3.230522e-74

Table 3.2: ANOVA test over all clusters to compare CIMLR method in the origi-nal data space vs. PHATE space for cross-sectional baseline data.

19

FEATURESOriginal space PHATE space

F-Score p-value F-Score p-value

ADAS 11 67.167816 4.004632e-80 264.127359 0.0

ADAS 13 75.341269 1.026070e-89 307.491943 0.0

CDRSB 62.399786 1.733944e-74 242.756973 1.519370e-302

MMSE 49.928835 1.326469e-59 238.938341 1.865802e-298

APOE4 9.148347 5.670591e-10 50.349837 4.153311e-69

RAVLT Immediate 71.695749 1.896901e-85 208.281372 6.314996e-265

PTEDUCAT 3.34374 0.002741 16.782484 5.513887e-22

Ventricles 137.882663 3.537115e-160 83.699675 4.572521e-114

Hippocampus 82.898148 1.671825e-9 338.011056 0.0

Entorhinal 72.575726 5.351880e-89 314.434008 0.0

Fusiform 74.727329 5.351880e-89 320.921331 0.0

Mid Temp 69.771024 3.451619e-83 447.889219 0.0

Whole Brain 62.434583 1.576851e-74 379.896484 0.0

Table 3.3: ANOVA test over all clusters to compare CIMLR method in the origi-nal data space vs. PHATE space for longitudinal data.

The learnt similarity matrix S can be observed in Figure 3.5 for cross-sectionalbaseline data and in Figure 3.6 for the longitudinal data (right images). In compar-ison to a similarity matrix constructed using Euclidean distance (left images) in theoriginal data space, we can clearly observe that the clusters are not distinguishable,whilst CIMLR clearly shows the block structure for the different clusters. Neverthe-less, for the longitudinal data, clusters are not as much separated as in the similaritymatrix for cross-sectional data.

20

Figure 3.5: Comparison between E uclidean distance between and the learn simi-larity matrix S for cross-sectional baseline data.

Figure 3.6: Comparison between Euclidean distance between and the learn simi-larity matrix S for cross-sectional longitudinal data.

CIMLR revealed patterns of single modality (MRI) imaging features (cross-sectional and longitudinal) that relate to natural subgroups. The method usesall these features to find the clusters but, unlike other methods, it automaticallyweights each one of them.

Figure 3.7 shows the values of the selected covariate features to analyze thephenotypes performing a univariate test, comparing one cluster to the rest for cross-sectional data:

• All cerebral volumetric features show values below p < 0.01 for C4.

• Some cerebral volumetric features show values below p < 0.05 and p < 0.01

for C1, C2 and C3.

• All clusters show p < 0.01 for the Education feature (measured in years).

• C2, C3 and C4 show p < 0.05 for genetic marker APOE4.

21

• With the exception of C4 (p < 0.05), none of the other clusters show statisti-cally significant values for the cognitive test RAVLT Immediate.

All the ANOVA tests (for all clusters) done for each of the described features(see Table 3.2 for the baseline data, column p-value for PHATE space) reject thenull hypothesis with p < 0.001, meaning that the differences found between clusterson those covariate features are statistically significant.

Figure 3.7: Distribution of the selected features for each cluster for cross-sectionalbaseline data.

22

On the other hand, Figure 3.8 shows the values for longitudinal data:

• C2, C4 and C7 show values below p < 0.05 for APOE4 and C1 below p < 0.01.

• All clusters but C7 and C8 show values below p < 0.05 for Education, partic-ularly C2 and C4 below p < 0.01.

• C1, C2, C3 and C5 present values below p < 0.01 for Ventricles, C4 and C6below p < 0.05.

• Only C7 present significant p < 0.05 for Hippocampus.

• Only C4 and C7 present significant p < 0.05 for Entorhinal.

• C1, C2, C4 and C4 show values below p < 0.05 for Fusiform.

• C4 and C5 show values below p < 0.05 and p < 0.001 respectively for MidTemp.

• The majority of the clusters, except C1 and C8, show values below p < 0.05

for Whole Brain.

• None of the clusters showed significance values for the cognitive tests selected.

The ANOVA tests (for all clusters) done for each of the described features (seeTable 3.3 over longitudinal data, column p-value for PHATE space) reject the nullhypothesis with p < 0.001 for the cognitive test features while the univariate testshowed no significance on them. Furthermore, the cerebral volumetric feature Hip-pocampus was the single one presenting a significant p-value for a single cluster C7,contrary to what it is observed on the univariate test where the value was p < 0.001

for all the clusters. We wanted to further analyse the significance of cognitive testsfor every pairwise cluster comparison by doing a Tukey HSD analysis. For all thecognitive tests null hypothesis can be rejected except for pairwise comparison inC1-C4, C3-C6 and C7-C8 and additionally for C1-C5 just for CDRSB. In AppendixB one can observe the box plots for every cognitive test. For most of the groups wecan reject the null hypothesis confirming that for the cognitive tests the differencesalong clusters are significant.

23

Figure 3.8: Distribution of the selected features for each cluster for longitudinaldata.

As a side analysis, we tried as well including the cerebral volumetric features fromthe covariate data to the analysis done with PHATE. First, after the feature selec-tion only covariate features ’Normalised_Hippocampus’, ’Normalised_Entorhinal’,’Normalised_MidTemp’ where retained (from the entire set of cerebral volumetriccovariate features). Furthermore, the heuristic for clustering found C=5 (insteadof 4 as before) to be the optimal number of clusters, and improved the NMI andNNE values as well: NNE = 0.16606 and NMI = 0.116750. It is interesting tonotice that when adding more general volumetric measures (no cross-sectional) thecovariate features related to cognitive tests (ADAS11, ADAS13, CDRSB, MMSEand RAVLT Immediate) show statistically significance but only on C2, as observedin Figure 3.9.

24

Figure 3.9: Distribution of the selected features for each cluster including cerebralvolumetric features from the covariate data for cross-sectional data. In green, aresurrounded the values for those cognitive tests that changed.

25

Chapter 4

Discussion and Future Work

This chapter summarizes the main developments and results achieved in this thesis.In addition, possible future work is proposed.

4.1 DiscussionWe have presented a data-driven approach to find AD phenotypes. First, we usedPHATE, a manifold learning based approach for dimensionality reduction, for mod-eling AD trajectories capturing global and local nonlinear structures between dat-apoints (patients). The method was developed to help visualize problems such assingle cell RNA sequencing differentiation over time, where the number of dimen-sions is substantially high. It is not uncommon to find this type of situations in avariety of biological and medical data, often referred to as the curse of dimensional-ity, and modeling AD progression is not an exception. The feature selection usingLasso with a SVC model addressed this problem for the initial dataset as well asPHATE by reducing the dimensionality even more. Furthermore, we apply CIMLR,a weighted multi-kernel unsupervised clustering over the reduced space to obtainclusters defining different AD phenotyping profiles using baseline and longitudinalneuroimaging data from ADNI.

We demonstrated that PHATE is a powerful dimensionality reduction approachto portrait AD progression trajectories in baseline and longitudinal neuroimagingdata. This capability was an advantage when using CIMLR as it allowed us toconstruct profiles with higher accuracy for AD phenotyping.

Further findings are the statistical analysis done (univariate and ANOVA) overthe obtained clusters for cognitive tests and cerebral volumes features as both groupsof data are correlated on how thinking abilities ( i.e. memory, language, reasoningand perception) deteriorate when developing AD, where we found them to be (except

26

for a minority of cases) significant. All the results were assessed with Tukey-HSDpost-hoc analysis presenting, for almost all cases, a highly statistical significance.

4.2 Future WorkThis work has some limitations worth mentioning. We believe that other imagingmodalities could help improve the outcome, for instance including PET scans. Fur-thermore, our database was somehow limited to datasets provided for the TADPOLEChallenge, which is a reduced dataset, and not a raw selection of data from ADNI.On the data pre-processing, dealing with missing values was reduced to a simplesolution such as removing the rows containing Nan values and also, for imbalancedclasses we apply an undersampling technique. For the former, a more sophisticatedapproach could have been applied, such as including mean values over missing data;and for the latter other techniques could have been applied such as oversamplingcombined with k-fold cross-validation.

When performing feature selection, we tried 2 different methods: SVC and LinearRegression applying Lasso (l1) penalty. We saw better results when using SVCparametrized with C = 0.1 as the impact on features was not as radical than theone obtained using Linear Regression with the same parametrization. Nevertheless,a full analysis as the one performed in this thesis could be insightful to betterunderstand the impact of each of the methods used.

PHATE helped us visualize AD progression trajectories, however we had to tryseveral parametrization to capture a proper trajectory output. For most of the caseswith the adjustment of decay parameter to increase connectivity in the graph wassufficient, however we observed that posterior clustering over the low-dimensionaloutput was highly sensitive to these modifications resulting in a higher number ofclusters and as consequence, a negative impact on the metrics.

Unsupervised clustering by CIMLR was done over the output space of PHATE,which generates a low-dimensional space with 3 coordinates. Although the analysisover PHATE space improved when compared to original space, the weights obtainedfrom the multi-kernel process were not fully exploited, because we only use 3 fea-tures (i.e. the 3D coordinates) for clustering and the importance and contribution(weights) for each feature becomes trivial. In that case, a different kind of clus-tering could be explore, such as k-nearest neighbors algorithm (KNN) where thelow-dimensional PHATE’s space could help as KNN relies on neighbours (patients)distances.

27

List of Figures

2.1 PHATE algorithm (see Figure 2 in [19]). . . . . . . . . . . . . . . . . 8

3.1 PHATE result (in 2D) for baseline MRI data for NL, MCI andDementia subjects with parameter n_components=3. . . . . . . . . . 15

3.2 PHATE result (in 2D) for baseline (left) and longitudinal (right)MRI data for NL, MCI and Dementia subjects with parametern_components=3 and decay=5. . . . . . . . . . . . . . . . . . . . . . 15

3.3 PHATE vs PCA vs t-SNE, using baseline cross-sectional data. . . . . 163.4 Different time points for 3 patients along PHATE trajectory. Pro-

gression occurs left to right, from NL to Dementia (as observed inprevious Figures). Patient A: progresses from NL to MCI, 4 timepoints. Patient B: progresses from MCI to Dementia, 4 time points.Patient C: progresses from NL to Dementia, 3 time points. . . . . . . 17

3.5 Comparison between E uclidean distance between and the learnsimilarity matrix S for cross-sectional baseline data. . . . . . . . . . . 21

3.6 Comparison between Euclidean distance between and the learnsimilarity matrix S for cross-sectional longitudinal data. . . . . . . . . 21

3.7 Distribution of the selected features for each cluster for cross-sectionalbaseline data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.8 Distribution of the selected features for each cluster for longitudinaldata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.9 Distribution of the selected features for each cluster including cerebralvolumetric features from the covariate data for cross-sectional data.In green, are surrounded the values for those cognitive tests thatchanged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

B.1 Tukey HSD for cognitive tests ADAS11. . . . . . . . . . . . . . . . . 42B.2 Tukey HSD for cognitive tests ADAS13. . . . . . . . . . . . . . . . . 42B.3 Tukey HSD for cognitive tests CDRSB. . . . . . . . . . . . . . . . . . 42

28

B.4 Tukey HSD for cognitive tests MMSE. . . . . . . . . . . . . . . . . . 43B.5 Tukey HSD for cognitive tests RAVLT Immediate. . . . . . . . . . . . 43

29

List of Tables

2.1 Demographic information of the studied cohort for MRI baselinecross-sectional data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Demographic information of the studied cohort for MRI longitudinaldata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 CIMLR analysis over original data space vs. PHATE output space.The values are with respect to the heuristic to estimate the numberof clusters from 1 to 10 possible clusters in total. NMI: NormalizedMutual Information. NNE: Nearest Neighbor Error reported at themaximum number of iterations. . . . . . . . . . . . . . . . . . . . . . 18

3.2 ANOVA test over all clusters to compare CIMLR method in theoriginal data space vs. PHATE space for cross-sectional baseline data. 19

3.3 ANOVA test over all clusters to compare CIMLR method in theoriginal data space vs. PHATE space for longitudinal data. . . . . . . 20

A.1 Columns used from MRI Alzheimer’s Disease Neuroimaging Initiativedata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A.2 Columns used from FreeSurfer MRI cross-sectional data. . . . . . . . 31

B.1 Cognitive tests and cerebral volumetric columns used from ADNI data. 41

30

Appendix A

First Appendix

List of cerebral volumes columns used fromMRI data

COLUMN NAME DESCRIPTION

Ventricles UCSF Ventricles

Hippocampus UCSF Hippocampus

WholeBrain UCSF WholeBrain

Entorhinal UCSF Entorthinal

Fusiform UCSF Fusiform

MidTemp UCSF Med Temp

ICV UCSF ICV

Table A.1: Columns used from MRI Alzheimer’s Disease Neuroimaging Initiativedata.

List of cross-sectional columns used from MRI data

Table A.2: Columns used from FreeSurfer MRI cross-sectional data.


Normalised_Ventricles Ventricles volume normalized with

ICV

31

Table A.2 continued from previous page


Normalised_Hippocampus Hippocampus volume normalized

with ICV

Normalised_Entorhinal Entorhinal volume normalized with

ICV

Normalised_Fusiform Fusiform volume normalized with

ICV

Normalised_MidTemp

Normalised_WholeBrain Whole Brain volume normalized

with ICV

ST101SV_UCSFFSX_11_02_15

UCSFFSX51_08_01_16

Volume (WM Parcellation) of

RightPallidum

ST102CV_UCSFFSX_11_02_15

UCSFFSX51_08_01_16

Volume (Cortical Parcellation) of

RightParacentral

ST102SA_UCSFFSX_11_02_15

UCSFFSX51_08_01_16

Surface Area of RightParacentral

ST102TA_UCSFFSX_11_02_15

UCSFFSX51_08_01_16

Cortical Thickness Average of

RightParacentral

ST102TS_UCSFFSX_11_02_15

UCSFFSX51_08_01_16

Cortical Thickness Standard Devia-

tion of RightParacentral


UCSFFSX51_08_01_16


RightParahippocampal


UCSFFSX51_08_01_16

Surface Area of RightParahip-

pocampal


UCSFFSX51_08_01_16


RightParahippocampal


UCSFFSX51_08_01_16


tion of RightParahippocampal


UCSFFSX51_08_01_16


RightParsOpercularis


UCSFFSX51_08_01_16

Surface Area of RightParsOpercu-

laris


UCSFFSX51_08_01_16


RightParsOpercularis


UCSFFSX51_08_01_16


tion of RightParsOpercularis

32




UCSFFSX51_08_01_16


RightParsOrbitalis


UCSFFSX51_08_01_16

Surface Area of RightParsOrbitalis


UCSFFSX51_08_01_16


RightParsOrbitalis


UCSFFSX51_08_01_16


tion of RightParsOrbitalis


UCSFFSX51_08_01_16


RightParsTriangularis


UCSFFSX51_08_01_16

Surface Area of RightParsTriangu-

laris


UCSFFSX51_08_01_16


RightParsTriangularis


UCSFFSX51_08_01_16


tion of RightParsTriangularis


UCSFFSX51_08_01_16


RightPericalcarine


UCSFFSX51_08_01_16

Surface Area of RightPericalcarine


UCSFFSX51_08_01_16


RightPericalcarine


UCSFFSX51_08_01_16


tion of RightPericalcarine


UCSFFSX51_08_01_16


RightPostcentral


UCSFFSX51_08_01_16

Surface Area of RightPostcentral


UCSFFSX51_08_01_16


RightPostcentral


UCSFFSX51_08_01_16


tion of RightPostcentral


UCSFFSX51_08_01_16


RightPosteriorCingulate

33




UCSFFSX51_08_01_16

Surface Area of RightPosteriorCin-

gulate


UCSFFSX51_08_01_16


RightPosteriorCingulate


UCSFFSX51_08_01_16


tion of RightPosteriorCingulate


UCSFFSX51_08_01_16


Icv


UCSFFSX51_08_01_16


RightPrecentral


UCSFFSX51_08_01_16

Surface Area of RightPrecentral


UCSFFSX51_08_01_16


RightPrecentral


UCSFFSX51_08_01_16


tion of RightPrecentral


UCSFFSX51_08_01_16


RightPrecuneus


UCSFFSX51_08_01_16

Surface Area of RightPrecuneus


UCSFFSX51_08_01_16


RightPrecuneus


UCSFFSX51_08_01_16


tion of RightPrecuneus


UCSFFSX51_08_01_16


RightPutamen


UCSFFSX51_08_01_16


RightRostralAnteriorCingulate


UCSFFSX51_08_01_16

Surface Area of RightRostralAnte-

riorCingulate


UCSFFSX51_08_01_16


RightRostralAnteriorCingulate


UCSFFSX51_08_01_16


tion of RightRostralAnteriorCingu-

late

34




UCSFFSX51_08_01_16


RightRostralMiddleFrontal


UCSFFSX51_08_01_16

Surface Area of RightRostralMid-

dleFrontal


UCSFFSX51_08_01_16


RightRostralMiddleFrontal


UCSFFSX51_08_01_16


tion of RightRostralMiddleFrontal


UCSFFSX51_08_01_16


RightSuperiorFrontal


UCSFFSX51_08_01_16

Surface Area of RightSuperior-

Frontal


UCSFFSX51_08_01_16


RightSuperiorFrontal


UCSFFSX51_08_01_16


tion of RightSuperiorFrontal


UCSFFSX51_08_01_16


RightSuperiorParietal


UCSFFSX51_08_01_16

Surface Area of RightSuperiorPari-

etal


UCSFFSX51_08_01_16


RightSuperiorParietal


UCSFFSX51_08_01_16


tion of RightSuperiorParietal


UCSFFSX51_08_01_16


RightSuperiorTemporal


UCSFFSX51_08_01_16

Surface Area of RightSuperiorTem-

poral


UCSFFSX51_08_01_16


RightSuperiorTemporal


UCSFFSX51_08_01_16


tion of RightSuperiorTemporal


UCSFFSX51_08_01_16


RightSupramarginal

35




UCSFFSX51_08_01_16

Surface Area of RightSupra-

marginal


UCSFFSX51_08_01_16


RightSupramarginal


UCSFFSX51_08_01_16


tion of RightSupramarginal


UCSFFSX51_08_01_16


RightTemporalPole


UCSFFSX51_08_01_16

Surface Area of RightTemporalPole


UCSFFSX51_08_01_16


RightTemporalPole


UCSFFSX51_08_01_16


tion of RightTemporalPole


UCSFFSX51_08_01_16

Volume (WM Parcellation) of Left-

AccumbensArea


UCSFFSX51_08_01_16


RightThalamus


UCSFFSX51_08_01_16


RightTransverseTemporal


UCSFFSX51_08_01_16

Surface Area of RightTransver-

seTemporal


UCSFFSX51_08_01_16


RightTransverseTemporal


UCSFFSX51_08_01_16


tion of RightTransverseTemporal


UCSFFSX51_08_01_16


RightVentralDC


UCSFFSX51_08_01_16


RightVessel


UCSFFSX51_08_01_16


ThirdVentricle


UCSFFSX51_08_01_16


WMHypoIntensities

36




UCSFFSX51_08_01_16


LeftInsula


UCSFFSX51_08_01_16

Surface Area of LeftInsula


UCSFFSX51_08_01_16

Cortical Thickness Average of Left-

Insula


UCSFFSX51_08_01_16


tion of LeftInsula


UCSFFSX51_08_01_16


Amygdala


UCSFFSX51_08_01_16


RightInsula


UCSFFSX51_08_01_16

Surface Area of RightInsula


UCSFFSX51_08_01_16


RightInsula


UCSFFSX51_08_01_16


tion of RightInsula


UCSFFSX51_08_01_16


LeftBankssts


UCSFFSX51_08_01_16

Surface Area of LeftBankssts


UCSFFSX51_08_01_16


Bankssts


UCSFFSX51_08_01_16


tion of LeftBankssts


UCSFFSX51_08_01_16


LeftCaudalAnteriorCingulate


UCSFFSX51_08_01_16

Surface Area of LeftCaudalAnteri-

orCingulate


UCSFFSX51_08_01_16


CaudalAnteriorCingulate


UCSFFSX51_08_01_16

Cortical Thickness Standard Devi-

ation of LeftCaudalAnteriorCingu-

late

37




UCSFFSX51_08_01_16


LeftCaudalMiddleFrontal


UCSFFSX51_08_01_16

Surface Area of LeftCaudalMiddle-

Frontal


UCSFFSX51_08_01_16


CaudalMiddleFrontal


UCSFFSX51_08_01_16


tion of LeftCaudalMiddleFrontal


UCSFFSX51_08_01_16


Caudate


UCSFFSX51_08_01_16


CerebellumCortex


UCSFFSX51_08_01_16


CerebellumWM


UCSFFSX51_08_01_16


Brainstem


UCSFFSX51_08_01_16


ChoroidPlexus


UCSFFSX51_08_01_16


LeftCuneus


UCSFFSX51_08_01_16

Surface Area of LeftCuneus


UCSFFSX51_08_01_16


Cuneus


UCSFFSX51_08_01_16


tion of LeftCuneus


UCSFFSX51_08_01_16


LeftEntorhinal


UCSFFSX51_08_01_16

Surface Area of LeftEntorhinal


UCSFFSX51_08_01_16


Entorhinal


UCSFFSX51_08_01_16


tion of LeftEntorhinal

38




UCSFFSX51_08_01_16


LeftFrontalPole


UCSFFSX51_08_01_16

Surface Area of LeftFrontalPole


UCSFFSX51_08_01_16


FrontalPole


UCSFFSX51_08_01_16


tion of LeftFrontalPole


UCSFFSX51_08_01_16


LeftFusiform


UCSFFSX51_08_01_16

Surface Area of LeftFusiform


UCSFFSX51_08_01_16


Fusiform


UCSFFSX51_08_01_16


tion of LeftFusiform


UCSFFSX51_08_01_16


Hippocampus


UCSFFSX51_08_01_16

Volume (WM Parcellation) of Cor-

pusCallosumAnterior


UCSFFSX51_08_01_16


InferiorLateralVentricle


UCSFFSX51_08_01_16


LeftInferiorParietal


UCSFFSX51_08_01_16

Surface Area of LeftInferiorParietal


UCSFFSX51_08_01_16


InferiorParietal


UCSFFSX51_08_01_16


tion of LeftInferiorParietal


UCSFFSX51_08_01_16


LeftInferiorTemporal


UCSFFSX51_08_01_16

Surface Area of LeftInferiorTempo-

ral

39




UCSFFSX51_08_01_16


InferiorTemporal


UCSFFSX51_08_01_16


tion of LeftInferiorTemporal

40

Appendix B

Second Appendix

List of cognitive test features used from ANOVA test


ADAS11 Alzheimer’s Disease Assessment Scale, 11 questions version

ADAS13 Alzheimer’s Disease Assessment Scale, ADAS11 plus test

of delayed word recall and a number cancellation or maze task

CDRSB Clinical Dementia Rating Scale–Sum of Boxes

MMSE Mini-Mental State Exam

RAVLT_immediate Rey Auditory Verbal Learning Test Immediate

Ventricles UCSF Ventricles

Hippocampus UCSF Hippoampus

Entorhinal UCSF Entorhinal

MidTemp UCSF Med Temp

WholeBrain UCSF Whole Brain

Table B.1: Cognitive tests and cerebral volumetric columns used from ADNI data.

Tukey HSD confidence interval plots for cognitive

tests

41

Figure B.1: Tukey HSD for cognitive tests ADAS11.

Figure B.2: Tukey HSD for cognitive tests ADAS13.

Figure B.3: Tukey HSD for cognitive tests CDRSB.

42

Figure B.4: Tukey HSD for cognitive tests MMSE.

Figure B.5: Tukey HSD for cognitive tests RAVLT Immediate.

43

Bibliography

[1] “2018 Alzheimer’s disease facts and figures”. In: Alzheimer’s Dementia 14.3 (2018),

pp. 367–429. issn: 1552-5260. doi: https://doi.org/10.1016/j.jalz.2018.02.

001. url: http://www.sciencedirect.com/science/article/pii/S1552526018300414.

[2] Stanisław Adaszewski et al. “How early can we predict Alzheimer’s disease us-

ing computational anatomy?” In: Neurobiology of Aging 34.12 (2013), pp. 2815–

2826. issn: 0197-4580. doi: https://doi.org/10.1016/j.neurobiolaging.

2013.06.015. url: https://www.sciencedirect.com/science/article/pii/

S0197458013002704.

[3] Maryamossadat Aghili et al. “Predictive modeling of longitudinal data for Alzheimer’s

Disease Diagnosis Using RNNs”. In: International Workshop on PRedictive Intelli-

gence In MEdicine. Springer. 2018, pp. 112–119.

[4] Leon M. Aksman et al. “Modeling longitudinal imaging biomarkers with paramet-

ric Bayesian multi-task learning”. In: Human Brain Mapping 40.13 (2019), pp. 3982–

4000. doi: https://doi.org/10.1002/hbm.24682. eprint: https://onlinelibrary.

wiley.com/doi/pdf/10.1002/hbm.24682. url: https://onlinelibrary.wiley.

com/doi/abs/10.1002/hbm.24682.

[5] Michael AA Cox and Trevor F Cox. “Multidimensional scaling”. In: Handbook of

data visualization. Springer, 2008, pp. 315–347.

[6] Ruoxuan Cui, Manhua Liu, and Gang Li. “Longitudinal analysis for Alzheimer’s dis-

ease diagnosis using RNN”. In: 2018 IEEE 15th International Symposium on Biomed-

ical Imaging (ISBI 2018). IEEE. 2018, pp. 1398–1401.

[7] Qunxi Dong et al. “Multi-task Dictionary Learning Based on Convolutional Neural

Networks for Longitudinal Clinical Score Predictions in Alzheimer’s Disease”. In:

International Workshop on Human Brain and Artificial Intelligence. Springer. 2019,

pp. 21–35.

[8] Rashmi Dubey et al. “Analysis of sampling techniques for imbalanced data: An n =

648 ADNI study”. eng. In: NeuroImage 87 (Feb. 2014). S1053-8119(13)01016-1[PII],

44

https://doi.org/https://doi.org/10.1016/j.jalz.2018.02.001

https://doi.org/https://doi.org/10.1016/j.jalz.2018.02.001

http://www.sciencedirect.com/science/article/pii/S1552526018300414

https://doi.org/https://doi.org/10.1016/j.neurobiolaging.2013.06.015

https://doi.org/https://doi.org/10.1016/j.neurobiolaging.2013.06.015

https://www.sciencedirect.com/science/article/pii/S0197458013002704


https://doi.org/https://doi.org/10.1002/hbm.24682

https://onlinelibrary.wiley.com/doi/pdf/10.1002/hbm.24682

https://onlinelibrary.wiley.com/doi/pdf/10.1002/hbm.24682

https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.24682

https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.24682

pp. 220–241. issn: 1095-9572. doi: 10.1016/j.neuroimage.2013.10.005. url:

https://doi.org/10.1016/j.neuroimage.2013.10.005.

[9] EuroPOND-Consortium. The Alzheimer’s Disease Prediction Of Longitudinal Evo-

lution (TADPOLE) Challenge. https://github.com/noxtoby/TADPOLE. 2017.

[10] Bruce Fischl. “FreeSurfer”. In: Neuroimage 62.2 (2012), pp. 774–781.

[11] Mostafa Mehdipour Ghazi et al. “Training recurrent neural networks robust to in-

complete data: application to Alzheimer’s disease progression modeling”. In: Medical

image analysis 53 (2019), pp. 39–46.

[12] Lev E Givon et al. “Cognitive Subscore Trajectory Prediction in Alzheimer’s Dis-

ease”. In: arXiv preprint arXiv:1706.08491 (2017).

[13] Clifford R Jack et al. “Hypothetical model of dynamic biomarkers of the Alzheimer’s

pathological cascade”. In: The Lancet Neurology 9.1 (2010), pp. 119–128. issn: 1474-

4422. doi: https://doi.org/10.1016/S1474-4422(09)70299-6. url: https:

//www.sciencedirect.com/science/article/pii/S1474442209702996.

[14] Biao Jie et al. “Temporally constrained group sparse learning for longitudinal data

analysis in Alzheimer’s disease”. In: IEEE Transactions on Biomedical Engineering

64.1 (2016), pp. 238–249.

[15] Yejin Kim et al. “Multimodal Phenotyping of Alzheimer’s Disease with Longitudinal

Magnetic Resonance Imaging and Cognitive Function Data”. In: Scientific Reports

10.1 (Mar. 2020), p. 5527. issn: 2045-2322. doi: 10.1038/s41598-020-62263-w.

url: https://doi.org/10.1038/s41598-020-62263-w.

[16] Baiying Lei et al. “Longitudinal analysis for disease progression via simultaneous

multi-relational temporal-fused learning”. In: Frontiers in aging neuroscience 9 (2017),

p. 6.

[17] Gerard Martí-Juan, Gerard Sanroma-Guell, and Gemma Piella. “A survey on ma-

chine and statistical learning for longitudinal analysis of neuroimaging data in Alzheimer’s

disease”. In: Computer Methods and Programs in Biomedicine 189 (2020), p. 105348.

issn: 0169-2607. doi: https://doi.org/10.1016/j.cmpb.2020.105348. url:

https://www.sciencedirect.com/science/article/pii/S0169260719316165.

[18] Gerard Martı-Juan et al. “Revealing heterogeneity of brain imaging phenotypes in

Alzheimer’s disease based on unsupervised clustering of blood marker profiles”. In:

PloS one 14.3 (2019), e0211121.

[19] K. R. Moon et al. Visualizing structure and transitions in high-dimensional biological

data. Dec. 2019.

45

https://doi.org/10.1016/j.neuroimage.2013.10.005

https://doi.org/10.1016/j.neuroimage.2013.10.005

https://github.com/noxtoby/TADPOLE

https://doi.org/https://doi.org/10.1016/S1474-4422(09)70299-6



https://doi.org/10.1038/s41598-020-62263-w

https://doi.org/10.1038/s41598-020-62263-w

https://doi.org/https://doi.org/10.1016/j.cmpb.2020.105348


[20] Kevin R. Moon et al. “Visualizing structure and transitions in high-dimensional bio-

logical data”. eng. In: Nature biotechnology 37.12 (Dec. 2019). PMC7073148[pmcid],

pp. 1482–1492. issn: 1546-1696. doi: 10.1038/s41587-019-0336-3. url: https:

//doi.org/10.1038/s41587-019-0336-3.

[21] Daniele Ramazzotti et al. “Multi-omic tumor data reveal diversity of molecular mech-

anisms that correlate with survival”. In: Nature communications 9.1 (2018), pp. 1–

14.

[22] Jessica Qiuhua Sheng et al. “Predictive Analytics for Care and Management of

Patients With Acute Diseases: Deep Learning–Based Method to Predict Crucial

Complication Phenotypes”. In: J Med Internet Res 23.2 (Feb. 2021), e18372. issn:

1438-8871. doi: 10.2196/18372. url: http://www.ncbi.nlm.nih.gov/pubmed/

33576744.

[23] Laurens Van der Maaten and Geoffrey Hinton. “Visualizing data using t-SNE.” In:

Journal of machine learning research 9.11 (2008).

[24] Victor L Villemagne et al. “Amyloid β deposition, neurodegeneration, and cognitive

decline in sporadic Alzheimer’s disease: a prospective cohort study”. In: The Lancet

Neurology 12.4 (2013), pp. 357–367. issn: 1474-4422. doi: https://doi.org/10.

1016/S1474-4422(13)70044-9. url: http://www.sciencedirect.com/science/

article/pii/S1474442213700449.

[25] Tingyan Wang, Robin G Qiu, and Ming Yu. “Predictive modeling of the progression

of Alzheimer’s disease with recurrent neural networks”. In: Scientific reports 8.1

(2018), pp. 1–12.

[26] Jennifer L. Whitwell et al. “Normalization of Cerebral Volumes by Use of Intracra-

nial Volume: Implications for Longitudinal Quantitative MR Imaging”. In: Ameri-

can Journal of Neuroradiology 22.8 (2001), pp. 1483–1489. issn: 0195-6108. eprint:

http://www.ajnr.org/content/22/8/1483.full.pdf. url: http://www.ajnr.

org/content/22/8/1483.

46

https://doi.org/10.1038/s41587-019-0336-3

https://doi.org/10.1038/s41587-019-0336-3

https://doi.org/10.1038/s41587-019-0336-3

https://doi.org/10.2196/18372

http://www.ncbi.nlm.nih.gov/pubmed/33576744

http://www.ncbi.nlm.nih.gov/pubmed/33576744





http://www.ajnr.org/content/22/8/1483.full.pdf

http://www.ajnr.org/content/22/8/1483

http://www.ajnr.org/content/22/8/1483

understanding alzheimer’s disease progression through

Documents