understanding alzheimer’s disease progression through
TRANSCRIPT
Master Thesis on Intelligent Interactive Systems
Universitat Pompeu Fabra
Understanding Alzheimer’s diseaseprogression through phenotypesdiscovery using manifold learning
techniques
Natalia Karina Pattarone
Supervisor: Gemma Piella
July 2021
Contents
1 Introduction 1
1.1 Clinical Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Data-Driven Computational Methods . . . . . . . . . . . . . . . . . . 21.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Structure of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Materials and Methods 5
2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 MRI acquisition/processing . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . 72.2.3 Unsupervised clustering . . . . . . . . . . . . . . . . . . . . . 11
2.3 Phenotype analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Results 14
3.1 Patient embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Patient phenotyping . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Discussion and Future Work 26
4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
List of Figures 28
List of Tables 30
A First Appendix 31
B Second Appendix 41
Abstract
Alzheimer’s disease (AD) is clinically highly heterogeneous, varying in terms ofrates of progression, test and cognitive symptoms among patients, as well as froma neuroimaging perspective. In the datasets provided by The Alzheimer’s DiseaseNeuroimaging Initiative (ADNI), researchers collect, validate and utilize data, in-cluding MRI and PET images, genetics, cognitive tests, CSF and blood biomarkersas predictors of the disease. Data coming from these datasets allow discoveringphenotypes that could help to better understand the disease and provide targetedtreatment.The objective of this thesis is to identify data-driven phenotypes using manifoldlearning and unsupervised clustering on multimodal longitudinal imaging and non-imaging data. First, we apply a novel approach for dimensionality reduction calledPHATE that captures both local and global nonlinear structure using an information-geometric distance between datapoints that would facilitate the discovery of possibleAD phenotypes. Over PHATE output space, we performed a multiple-kernel unsu-pervised clustering to obtain profiles and describe AD phenotypes where features areweighted to construct kernels. Our results show that our approach can reveal ADprogression trajectories in a lower dimensionality space, improving the results of theprofiling where we obtained 4 possible profile subgroups using MRI cross-sectionalbaseline data and 8 possible profile subgroups when using longitudinal data. Fur-thermore, longitudinal data established clearer separation among profiles and highersignificance for cognitive tests and general volumetric cerebral values than baselinedata. Identifying these profiles could be useful for more personalized treatment ofsuch a heterogeneous disease as AD.
Keywords: MRI; Imaging Techniques; Alzheimer; Manifolds; Longitudinal Data;Cross-sectional Data
Chapter 1
Introduction
1.1 Clinical MotivationAlzheimer’s Disease (AD) is the most common type of dementia. It is a neuro-degenerative disease with insidious onset and progressive impairment of behavioraland cognitive functions including memory, comprehension, language, attention, rea-soning, and judgment. There is no cure yet available, no effective pharmaceuticalagents exist to reduce the progression or halt it. Individuals with AD experiencemultiple symptoms that change over the years. These symptoms reflect the degreeof damage to neurons in different parts of the brain. The pace at which symptomsadvance from mild to moderate to severe varies from person to person. On thesevere stages of the disease, individuals become incapable of leaving bed becauseof the damage to areas of the brain involved in movement. Other brain areas getaffected, such as the ones controlling capability to eat and drink.
The accumulation of the protein fragment amyloid-beta (Aβ) (called amyloid-beta plaques) outside neurons and the accumulation of an abnormal form of theprotein tau (called tau tangles) inside neurons are two of several brain changesassociated with AD [1]. It has been shown that Aβ deposition occurs slowly andprotracted, possibly extended during 20 years [24], so an early detection of thedisease process is crucial for early therapeutic interventions to delay the onset of theclinical symptoms or slow the cognitive decline.
It is a matter of great importance to understand the value of multimodal andlongitudinal data. As established above, it is notorious that AD is a multifacto-rial disease, with distinct progression paths depending on the observed biomarkers[13]. Furthermore, studying this progression at different stages of the disease as wellas for different cognitive impairment levels (Dementia, Mild Cognitive Impairment,
1
etc), can provide insights by setting a frame to extract meaningful information thatgroup together factors (i.e. demographic) highlighting subtypes/phenotypes wherethe specific progressions take place, a fact necessary so that a more personalisedtreatment can be done. It could help to improve the detection of the disease andprovide a better understanding of the interaction between different biological mech-anisms and markers.
1.2 Data-Driven Computational MethodsRecently, several data-driven computational approaches applied to AD phenotypingdiscovery have been introduced. Some of them include analysis of longitudinal datato quantify the evolution of the disease, determine temporal trajectories and detectdifferent paths of degeneration to build disease progression. For example, techniquesof multi-task learning for cognitive prediction [4, 16, 14] focused on the combinationof cognitive scores analysis and imaging markers for a more robust model. Anotherstudy performed an extensive survey on recent statistical and machine learningapplications in AD using longitudinal neuroimaging [17]. Deep learning (DL) tech-niques are starting to take off, considered nowadays state of the art in relation toML, have been used in cognitive score predictions applying convolutional neuralnetworks (CNN) [7, 25, 12], and others use them for computer-aided diagnosis usingrecurrent neural networks (RNN) [11, 3, 6]. Nevertheless, DL-based algorithms aremostly non interpretable and the performance gained by these type of techniquesdo not compensate their lack of interpretation; event-based models and manifoldlearning among others, are some of the methods that have been used as well (seework in [17] for an extensive description).
Additionally, a recent study for multimodal phenotyping developed a compu-tational method based on coupled nonnegative matrix factorization (C-NMF)[15]that cluster’s associated entities from either brain regions or cognitive tasks simul-taneously (two data modalities) so that the phenotypes can reflect both types ofinformation, capturing interactions between them. NMF is a dimensionality re-duction approach used to represent the best approximation of each data modalityso that C-NMF calculation ends up with the advantage of respecting the differentdistributions of these two data modalities.
A related-side topic, but equally important, is the fact that when performingAD progression studies there is an imbalanced outcome distributions in patientdata, especially when working on classifications for Cognitively Normal (CN) vsMild Cognitive Impairment (MCI), as data tend to be available in lower amounts,
2
probably because such diagnostic groups are more sparse in available public andprivate cohorts, since a long follow-up is needed to correctly assess whether thepatient will progress to AD. Facing similar problems as the one just mentioned, astudy performed to predict key complication phenotypes among patients with acutediseases, developed a novel method using deep learning in combination with RNNsequence embedding to represent disease progression while considering temporal het-erogeneities in patient data including cost-sensitive learning to address imbalancedoutcome distributions in patient data [22].
So far, there has been some research on phenotyping using a dimensionalityreduction approach like t-SNE [18] that allows to preserve a range of patterns in(highly dimensional) data, however lacks the ability to construct progression tracksthat forms trajectories, which is highly valuable when studying AD progression.Furthermore, when working with features from imaging, normally the focus is on a(cross-sectional) single modality (e.g. MRI or PET). This motivated us to developa novel phenotyping discovery approach using manifold learning techniques wherethe results are extended by considering an underlying geometry consisting of mul-tiple one-dimensional manifolds in combination (i.e. trajectory curves, expectingour data to fit such geometry given the progressive nature of AD) with a multi-kernel clustering approach to extract and analyze phenotypes. We also focused onbaseline and longitudinal progression of neuroimaging because AD is a progressivedisease and understanding the neurodegeneration is the main outcome of interest inAD research. Therefore, our objective is to identify data-driven phenotypes usingmanifold learning on multimodal longitudinal imaging and non-imaging data.
1.3 ObjectivesThe research and the resulting AD phenotypes cover the following goals:
1. Applying a novel algorithm for dimensionality reduction that assist in theanalysis, selection and definition of features that may reveal AD phenotypes.
2. Finding and visualizing any correlations among brain volumetric measures thatassist in the differentiation for AD phenotypes, given that structural MRI isconsidered a clinical predictor of AD [2].
3. Gaining insight into AD progression from a phenotype perspective to assist anearly treatment.
3
4. Visualizing neuroimaging data in 2D and 3D space to potentially classify thedata points into clusters based on similarities using supervised and unsuper-vised dimensionality reduction techniques.
5. Supporting data stratification.
1.4 Structure of the ReportChapter 2 introduces all materials and methods used in the thesis e.g. the datasetdescriptions, an overview of the process used to obtain the curated data, descriptionof the dataset and the algorithm utilised. Next, the methods for finding resultswill be described by beginning from data-processing then followed by the softwarearchitecture, software functions, and its usage.Chapter 3 introduces the results of the findings from Chapter 2 while Chapter 4discusses possible areas of future work.
4
Chapter 2
Materials and Methods
2.1 DatasetWe work with datasets generated by provided scripts from Tadpole (The Alzheimer’sDisease Prediction Of Longitudinal Evolution) Challenge 2017 [9]. The scripts con-tain detailed processes to parse and unify several Alzheimer’s Disease NeuroimagingInitiative (ADNI) datasets. Concretely, we use 2 of the training sets: D1. and
D2. TADPOLE Standard training set. The first one (D1) created from ADNIdata to which was added additional MRI, PET (FDG, AV45 and AV1451), DTI andCSF biomarkers. The MRI biomarkers added consist of FreeSurfer longitudinallyprocessed ROIs from UCSFFSL tables. Duplicate rows were removed by retainingthose with the most recent RUNDATE and IMAGEUID. Additionally was includedthe following types of PET ROI-based biomarkers: FDG, AV45 and AV1451 alongwith three CSF biomarkers: Amyloid-beta, Tau and P-Tau. Union of several datasources was unified matching the subjects using the subject ID and visit code. Dupli-cate rows were removed as well. On the other hand, the second one (D2) includes allcurrently available longitudinal data for prospective ADNI3 participants (rolloversfrom ADNI2). That is, active participants with ADNI2 visits.
Informed consent was obtained for all subjects, and the study was approvedby the relevant institutional review board at each data acquisition site (for up-to-date information, see http://adni.loni.usc.edu/wp-content/themes/freshnews-dev-v2/documents/policy/ADNI_Acknowledgement_List%205-29-18.pdf). All meth-ods were performed in accordance with the relevant guidelines and regulations.
5
2.2 MRI acquisition/processingWe defined an initial benchmark with a set of features plus diagnostic column(DX=Diagnostic, for the target) coming only from MRI scans for the baseline mea-sures. We split our work into 2 different set of features: cross-sectional data usingbaseline and longitudinal data, obtained by using FreeSurfer Software Suite Version4.4 [10]. Table A.2 lists for every feature used for MRI cohort. Data pre-processingincluded:
1. Removal of rows containing NaN values.
2. Normalization of cerebral volumetric measures accordingly [26].
3. Application of undersampling [8] to handle imbalance data.
4. Standardization (to [0,1]) of the structural volumes.
5. Use of min-max scaling, substracting the minimum value of each feature anddividing by the difference between the maximum and the minimum. This way,we preserve zero entries and introduce robustness to small standard deviationsin the biomarkers.
For the MRI cross-sectional baseline data, we ended up with a set of 1350 sub-jects, including 438 cognitive normal (CN), 680 with mild cognitive impairment(MCI) and 231 with AD. We handled unbalanced classes using undersampling tech-niques, obtaining a final set of 231 subjects for each class CN, MCI and AD and atotal of 140 features. Table 2.1 shows the demographic information of the studiedcohort.
CN MCI AD
Nº of subjects 213 213 213
Age (years) 74.04± 5.57 72.43± 7.45 74.11± 7.86
Sex (female) 51.48% 41.61% 45.45%
Education (years) 16.32± 2.71 15.86± 2.90 15.23± 2.89
APOE4 25.74% 39.41% 48.48%
Table 2.1: Demographic information of the studied cohort for MRI baseline cross-sectional data.
6
And for the MRI longitudinal data, we ended up with a set of 5021 subjects,including 1694 cognitive normal (CN), 2434 with mild cognitive impairment (MCI)and 893 with AD. We handled unbalanced classes using undersampling techniquesobtaining a final set of 893 subjects for each class CN, MCI and AD and a total of152 features. Table 2.2 shows the demographic information of the studied cohort.
CN MCI AD
Nº of subjects 893 893 893
Age (years) 73.98± 5.39 72.65± 7.24 73.79± 7.43
Sex (female) 49.94% 37.62% 46.13%
Education (years) 16.17± 2.76 15.94± 2.93 15.16± 2.90
APOE4 24.30% 37.29% 48.82%
Table 2.2: Demographic information of the studied cohort for MRI longitudinaldata.
2.2.1 Feature selection
To improve the model, we performed feature selection using regularization meth-ods. The (original) number of features selected after cleaning the data was of 140for baseline and longitudinal data. We applied Lasso regularization with a penaltyparameter of C = 0.1, and we observed an improvement on the representation of thedata (see Section 3 for further details). Concretely, the final cohort using baselinedata contains 38 features remaining after performing feature selection using Lin-ear Support Vector Classification (Linear SVC) and penalty l1 with regularizationparameter equal to 0.1. And the final cohort using longitudinal data contains 87features remaining after feature selection with same parametrization as for baselinedata.
2.2.2 Dimensionality reduction
Dimensionality reduction (DR) is widely used on the analysis of high-dimensionaldata. Given the heterogeneous nature of biological, medical or healthcare data wherehundreds of features are collected, there is a necessity to find a better suited methodthan statistical ones as they are not able to extract the latent underlying represen-tation that remains sparsely submerged in a voluminous high-dimensional space,
7
turning into an impossible task to explore them exhaustively. To overcome thisphenomenon, we use PHATE (Potential of Heat-diffusion for Affinity-based Transi-tion Embedding) [19], a dimensionality reduction method that captures both localand global nonlinear structure using an information-geometric distance between dat-apoints. PHATE generates a low-dimensional embedding specific for visualizationthat provides an accurate, denoised representation of both local and global structureof a dataset in the required number of dimensions without forcing any prior assump-tions on the structure of the data. It combines techniques from manifold learning,information geometry, and data-driven diffusion geometry. The result is that high-dimensional and nonlinear structures, such as clusters, nonlinear progressions, andbranches, become apparent in two or three dimensions and can be extracted forfurther analysis. Hence, one can expect PHATE to provide an "embedding" overwhich it can be performed clusters/phenotypes analysis and thus, computing theprobabilities to belong to each particular one.
PHATE algorithm can be summarised within the follow steps (see Figure 2.1):
Figure 2.1: PHATE algorithm (see Figure 2 in [19]).
1. Compute the pairwise distances from the data matrix.When performing dimensionality reduction, one common approach is to em-bed raw data matrix in a linear manner to preserve global structure of the
8
data (i.e. PCA). Nevertheless, in most cases these are noisy and global tran-sitions are nonlinear, concluding that they are insufficient to capture latentpatterns in the data, and typically result in a noisy visualization resultingin the main motivation of PHATE to preserve distances between data pointsthat consider gradual changes between them along these nonlinear transitions.A typical choice for distance metrics is Euclidean distances. Nevertheless, asstated before, global Euclidean distances would not be able to reflect properlytransitions in data as they are usually nonlinear.
2. Transform the distances to affinities to encode local information.A common approach to transforming global (e.g. Euclidean) distances tolocal similarities is to apply a kernel function that quantifies pair-wise sim-ilarity between points based on their Euclidean distance. A popular choiceis the Gaussian, which measures similarity between data points xi and xj asKε (xi, xj) = exp
(−‖xi−xj‖
22
2ε(xi,xj)2
). However, embedding local affinities directly
can result in a loss of global structure. On the contrary, a faithful structure-preserving embedding (and visualization) needs to go beyond local affinities(or distances), and also consider global relations between parts of the data.To accomplish this, PHATE is based on constructing a diffusion geometry tolearn and represent the shape of the data. This construction is based on com-puting local similarities between data points, and then diffusing through thedata using a Markovian random-walk diffusion process to infer more global re-lations. The initial probabilities for the random walk are calculated with thekernel matrix row-sums being normalised. For the Gaussian kernel described:
νε(xi) = ‖Kε(xi, ·)‖1 =∑
xi∈XKε(xi, xj)
resulting in a N × N row-stochastic matrix, referred to as the diffusion oper-ator, which is the probability of moving from x to y in a single time step
[Pε](xi,xj) =Kε(xi,xj)
νε(xi), xi, xj ∈ X
3. Learn global relationships via the diffusion process.Choosing the bandwidth ε of the Kε(xi, xj) corresponds to a trade-off betweenencoding global and local information in the probability matrix Pε. If thebandwidth is small, then single-step transitions in the random walk using Pεare confined to the nearest neighbors of each data point. This may result in
9
sparsely sampled regions being entirely excluded and the trajectory structurein the probability matrix Pε will not be encoded. Conversely, if the bandwidthis too large, then the resulting probability matrix Pε loses local informationas [Pε](x, ·) becomes more uniform for all x ∈ X, which may result in aninability to resolve different trajectories. A suitable choice is setting ε to thesmallest distance that allow for the diffusion process. Nevertheless, this onlycovers for some specific cases and is hugely affected by outliers and sparsedata regions. Furthermore, it would rely on a single manifold with constantdimension as its underlying geometry, negatively influencing on cases where thedata is sampled from trajectories. To overcome this, PHATE uses an adaptivebandwidth that changes with each point to be equal to its kth-nearest neighbordistance to preserve local information in cases where, for instance, there areregions densely sampled, along with an α-decaying kernel that controls therate of decay of the kernel. The α-decay kernel is defined as follow:
Kk,α(xi, xj) =12exp
(−(‖xi−xj‖2εk(xi)
)α)+ 1
2exp
(−(‖xi−xj‖2εk(xj)
)α)4. Encode the learned relationships using the potential distance.
The algorithm uses a novel diffusion-based informational distance called poten-tial distance, which increases sensitivity to global relationships and providesmore stability at boundaries of manifolds due to the use of probability dis-tributions. To go from the probability space to the information (or energy)space, the log transformation is applied. This log function is used to renderdistances sensitive to small differences that provides PHATE the ability topreserve both local and manifold-intrinsic global distances optimised for vi-sualization. Mathematically, if U t
x = − log (ptx) for x ∈ X, then the t-steppotential distance is defined as:
Vt(xi, xj) =∥∥∥U t
xi− U t
xj
∥∥∥2, xi, xj ∈ X
PHATE is sensitive to small differences in probability distribution correspond-ing to differences in long-range global structure, which allows PHATE to pre-serve global manifold relationships using this potential distance.
5. Embed the potential distance information into low dimensions for visualiza-tion.The popular approach of using diffusion maps can preserve global structure
10
and denoise the data, but it has a high intrinsic dimensionality not suitablefor visualisation. Hence, PHATE compresses the variability into low dimen-sions using metric multidimensional scaling (MDS [5]), a distance embeddingmethod. It uses a combination of Classical MDS (CMDS) to obtain an initialconfiguration of the data in low dimension m. However, CMDS assumes thatinput distances are restricted to low-dimensional Euclidean distances, whichis highly restrictive. In order to relax this assumption, PHATE uses MetricMDS that embeds the data into lower dimensions by minimising a "stress"function over embedded m-dimensional coordinates.
2.2.3 Unsupervised clustering
To find the different profiles, we cluster the patients on the dimensionality reducedspace provided by PHATE, without using diagnosis. To find heterogeneous brainpresentations in each cluster, we analyze the relationships between brain phenotypesand cognitive test information in each cluster.
We used CIMLR to cluster the patients. CIMLR [21] is a method based onmultiple kernel learning that learns a similarity between each pair of patients bycombining different kernels per feature (in our case, PHATE output space coor-dinates). A cluster is then defined as a block structure of the learned similarity,grouping together similar patients, where each subject is positioned with respect tothe whole population according to their degree of similarity. The number of clus-ters C must be specified beforehand. By combining multiple kernels, it integratesthe heterogeneous information, and provides the contribution of each feature in thecomputed low-dimensional representation.
Using PHATE, each patient is embedded in a 3D latent space where each patientis represented as yi ∈ Rm with i = 1, ..., total_number_patients and m features onwhich we apply CIMLR. Each dimension m (aka feature) is assigned to P kernelsKmp. The method then constructs a set of Gaussian kernels for the dataset byfitting multiple hyperparameters. In total, there are P kernels for each feature m,each with different parameters. We define Kmp as:
Kmp (yi, yj) =1
εijp√2π
exp(−‖yim−yjm‖
22
2ε2ijm
),
µim =∑j∈KNN(yim)‖yim−xjm‖2
k, εijm =
σ(µim+µjm)
2
11
where yim denotes the m-th coordinate of the embedding of patient i, KNN(yim)
is representing the k nearest neighbours of subject i with respect to PHATE spacefeature m, and ‖.‖2 is the Euclidean distance. The above procedure is performedfor each coordinate independently, using a total of P = 3 kernels. Each kernel isdecided by a pair of parameters (σ, k), where σ is a hyper-parameter that estimatesthe variance using the local scales of the distances. Different combinations of choicesof k and σ would provide us kernels.
Then, optimization problem is solved considering the Gaussian kernels to buildpairwise patient similarity matrix. This optimization problem is defined as follows:
minS,A,w
−∑i,j,p,m
wmpKmp (yim, yjm)Sij + γ tr(AT (IN − S)A
)+ µ
∑m,p
wmp logwmp + β‖S‖2F
subject to ATA = IC ,∑m,p
wmp = 1, wmp ≥ 0,∑j
Sij = 1 and Sij ≥ 0
Here, γ, µ, and β are tuning parameters for the various terms of the optimizationfunction, IC represents the identity matrix of size C, ‖.‖F stands for the Frobeniusnorm, and tr denotes the trace of the matrix. The first term of the above equationlinks the learned similarity matrix S with the combination of kernels from all fea-tures: similarity between two samples should be small if their kernel-based distanceis large. The second term enforces S to have C connected components, through theauxiliary matrix A and its associated constraint ATA = IC . The third term imposesa constraint on w so that more than one kernel is selected, and the last term appliesa regularization penalizing the scale of the learned similarities.
We estimate the best number of clusters with the heuristic proposed in [21], andfurther validate the choice with the elbow method. In addition, we use NormalizedMutual Information (NMI) [14] to evaluate the consistency between the obtainedclustering and the true labels of the N patients. Given two clustering results U andV on a set of data points, NMI is defined as: I(U, V )/max{H(U), H(V )}, whereI(U, V ) is the mutual information between U and V , and H(U) represents the en-tropy of the clustering U . Specifically, assuming that U has P clusters, and V hasQ clusters, the mutual information is computed as follows:
12
I(U, V ) =P∑p=1
Q∑q=1
|Up ∩ Vq|N
logN |Up ∩ Vq||Up| × |Vq|
where |Up| and |Vq| denote the cardinality of the p-th cluster in U and the q-thcluster in V respectively. NMI takes on values between 0 and 1, measuring theconcordance of two clustering results. Therefore, a higher NMI refers to higher con-cordance with ground truth, i.e. a more accurate label assignment of each patient.
In addition, CIMLR was evaluated in a supervised setting with a 10-fold crossvalidation and simple classifiers. On each trail of cross-validation, they use 9 folds astraining data, and the remaining one fold as validation data. The average of the tenvalidation errors is recorded to show how the obtained latent features can facilitatethe classification task. The simplest classifier is nearest neighbor classifier, and thecorresponding error is called Nearest Neighbor Error (NNE) where the final value isthe average classification error on the ten fold.
2.3 Phenotype analysisOne can assume that subjects belonging to a particular cluster, share certain proper-ties or profile. PHATE provides a reduced space where it is easier to find clusters assimilar patients are close to each other, we contrast a CIMLR analysis over originaldata space versus the dimensionality reduction space provided by PHATE and lookat the values of NMI and NNE value reported by CIMLR.
Furthermore, we use one-way analysis of variance (ANOVA) tests across all clus-ters to test whether the population of each cluster has a different mean for thecognitive features and the volumetric data (see Table B.1). In addition, for furthervalidate cognitive tests significance we did a pairwise comparison for each clusterpair using Tukey HSD analysis.
We tested for differences in the cognitive tests, brain volumes and cortical thick-ness of the individuals in each subgroup. We compared each cluster with the restof the population, not taking into account diagnostic groups, to detect differentcharacteristics in each subgroup. We want to know whether cluster membership(independent variable) has significant effects on cognitive tests plus cerebral volu-metric values (dependent variable).
13
Chapter 3
Results
We applied the proposed method to the described dataset (see Section 2.1). Fromthe TADPOLE (The Alzheimer’s Disease Prediction Of Longitudinal Evolution)dataset, we selected a subset of the Challenge 2017, including Dementia (AD,Alzheimer’s Disease), MCI and cognitively normal (NL) subjects, using selected fea-tures for cross-sectional baseline and longitudinal data performed with the FreeSurferimage analysis suite [10] in addition to cerebral volumetric data coming from datasheetsmerge from ADNI (see Table A.1 for more details). The final cohort using baselinecross-sectional data contains 693 patients with 38 features remaining after perform-ing feature selection using Linear Support Vector Classification (Linear SVC) andpenalty l1 with regularization parameter equal to 0.1. And the final cohort usinglongitudinal data contains 2679 patients with 87 features remaining after featureselection with same parametrization as for baseline data.
3.1 Patient embeddingWe applied PHATE as described in Section 2.2.2 parametrized using 3 components(PHATE1, PHATE2 and PHATE3) for the output space. Figure 3.1 shows theoutput embedding for cross-sectional baseline data. For longitudinal patients asingle patient and a single time point is a different embedding, and for baseline dataa single patient and the first visit time point is an embedding. We can observe howPHATE was able to separate the different stages of the disease, as it is expected tobe observed on baseline data. We can state then that PHATE was able to captureunderlying structure of the data correctly. Another example showing in Figure 3.2the effect of adjusting the parameter decay=5, which makes the clusters to be evencloser. As explained in [20], data coming from sparsely sampled regions might beexcluded entirely, losing the trajectory encoding. To overcome this, it is suggested to
14
decrease the default value of parameter decay=15 in order to increase connectivityin the graph.
Figure 3.1: PHATE result (in 2D) for baseline MRI data for NL, MCI and De-mentia subjects with parameter n_components=3.
Figure 3.2: PHATE result (in 2D) for baseline (left) and longitudinal (right) MRIdata for NL, MCI and Dementia subjects with parameter n_components=3 anddecay=5.
For comparison, Figure 3.3 shows the embedding obtained using other dimen-sionality reduction methods, namely, PCA and t-SNE [23] to show the efficacy ofPHATE over the most common approaches for dimensionality reduction (see Figure3.3). PHATE is able to show a better and more concise differentiation among the
15
classes than the other approaches. Nevertheless, one can still observe some definitionon PCA and t-SNE (although more sparse), PHATE still remains as an improvedvisualization of extracted features for the selected classes.
Figure 3.3: PHATE vs PCA vs t-SNE, using baseline cross-sectional data.
Additionally, in Figure 3.4 one can observe AD progression trajectory for 3 pa-tients, following the trajectory drawn by PHATE, i.e., from left to right. We canconfirm then that PHATE is correctly capturing the trajectories of AD progressionfor patients under different starting points conditions.
16
Figure 3.4: Different time points for 3 patients along PHATE trajectory. Progres-sion occurs left to right, from NL to Dementia (as observed in previous Figures).Patient A: progresses from NL to MCI, 4 time points. Patient B: progresses fromMCI to Dementia, 4 time points. Patient C: progresses from NL to Dementia, 3time points.
17
3.2 Patient phenotypingCIMLR was applied on the PHATE embedding of the patients. Using the heuristicproposed in [18], we found that the number of clusters was C = 4 for cross-sectionalbaseline and longitudinal data. This was further validated using the elbow method.However, we wanted to contrast a CIMLR analysis over original data space versusthe dimensionality reduction space provided by PHATE. From the results in Table3.1 we observed that the NNE decreased and the correlation among clusters (NMI)improved when using PHATE space output for both data cohorts (cross-sectionalbaseline and longitudinal). Furthermore, we performed an ANOVA test to validatethe results using the selected covariate features (see Table B.1 for details) on thesubjects in each cluster and we found that in the original data space the F-scorevalues are substantially lower than the ones obtained from PHATE space and, inconsequence as expected, also higher significance given the lower p-value values. Inconclusion, we demonstrated that when using PHATE we can construct a robustclustering, obtaining a further solid base for AD phenotype discovery proving thatCIMLR works better over PHATE space than over the original space.
Cross-sectional Nº of Clusters NMI ↑ NNE ↓
BaselinePHATE space 4 0.105098 0.17269
Original space 10 0.029522 0.21529
LongitudinalPHATE space 8 0.115086 0.32789
Original space 7 0.036932 0.53894
Table 3.1: CIMLR analysis over original data space vs. PHATE output space.The values are with respect to the heuristic to estimate the number of clustersfrom 1 to 10 possible clusters in total. NMI: Normalized Mutual Information.NNE: Nearest Neighbor Error reported at the maximum number of iterations.
18
Original space PHATE spaceFEATURES
F-Score p-value F-Score p-value
ADAS11 10.636454 2.568153e-11 63.279449 3.809737e-36
ADAS13 11.443751 3.209684e-12 75.728769 2.419462e-42
CDRSB 7.964122 2.547480e-08 65.789696 2.045614e-37
MMSE 9.533475 4.425290e-10 75.992485 1.799451e-42
APOE4 1.453479 0.191709 9.180338 0.000006
RAVLT Immediate 9.318811 7.704470e-10 55.39417 4.371492e-32
PTEDUCAT 1.073059 0.377096 8.72747 0.000011
Ventricles 10.759982 1.867709e-11 20.577826 8.862608e-13
Hippocampus 7.521101 7.984990e-08 93.793485 6.595045e-51
Entorhinal 8.206961 1.361176e-08 92.3152 3.177793e-50
Fusiform 8.306421 1.052920e-08 112.794745 2.023828e-59
MidTemp 7.279257 1.488747e-07 139.63699 1.111622e-70
WholeBrain 4.819641 0.000079 148.491416 3.230522e-74
Table 3.2: ANOVA test over all clusters to compare CIMLR method in the origi-nal data space vs. PHATE space for cross-sectional baseline data.
19
FEATURESOriginal space PHATE space
F-Score p-value F-Score p-value
ADAS 11 67.167816 4.004632e-80 264.127359 0.0
ADAS 13 75.341269 1.026070e-89 307.491943 0.0
CDRSB 62.399786 1.733944e-74 242.756973 1.519370e-302
MMSE 49.928835 1.326469e-59 238.938341 1.865802e-298
APOE4 9.148347 5.670591e-10 50.349837 4.153311e-69
RAVLT Immediate 71.695749 1.896901e-85 208.281372 6.314996e-265
PTEDUCAT 3.34374 0.002741 16.782484 5.513887e-22
Ventricles 137.882663 3.537115e-160 83.699675 4.572521e-114
Hippocampus 82.898148 1.671825e-9 338.011056 0.0
Entorhinal 72.575726 5.351880e-89 314.434008 0.0
Fusiform 74.727329 5.351880e-89 320.921331 0.0
Mid Temp 69.771024 3.451619e-83 447.889219 0.0
Whole Brain 62.434583 1.576851e-74 379.896484 0.0
Table 3.3: ANOVA test over all clusters to compare CIMLR method in the origi-nal data space vs. PHATE space for longitudinal data.
The learnt similarity matrix S can be observed in Figure 3.5 for cross-sectionalbaseline data and in Figure 3.6 for the longitudinal data (right images). In compar-ison to a similarity matrix constructed using Euclidean distance (left images) in theoriginal data space, we can clearly observe that the clusters are not distinguishable,whilst CIMLR clearly shows the block structure for the different clusters. Neverthe-less, for the longitudinal data, clusters are not as much separated as in the similaritymatrix for cross-sectional data.
20
Figure 3.5: Comparison between E uclidean distance between and the learn simi-larity matrix S for cross-sectional baseline data.
Figure 3.6: Comparison between Euclidean distance between and the learn simi-larity matrix S for cross-sectional longitudinal data.
CIMLR revealed patterns of single modality (MRI) imaging features (cross-sectional and longitudinal) that relate to natural subgroups. The method usesall these features to find the clusters but, unlike other methods, it automaticallyweights each one of them.
Figure 3.7 shows the values of the selected covariate features to analyze thephenotypes performing a univariate test, comparing one cluster to the rest for cross-sectional data:
• All cerebral volumetric features show values below p < 0.01 for C4.
• Some cerebral volumetric features show values below p < 0.05 and p < 0.01
for C1, C2 and C3.
• All clusters show p < 0.01 for the Education feature (measured in years).
• C2, C3 and C4 show p < 0.05 for genetic marker APOE4.
21
• With the exception of C4 (p < 0.05), none of the other clusters show statisti-cally significant values for the cognitive test RAVLT Immediate.
All the ANOVA tests (for all clusters) done for each of the described features(see Table 3.2 for the baseline data, column p-value for PHATE space) reject thenull hypothesis with p < 0.001, meaning that the differences found between clusterson those covariate features are statistically significant.
Figure 3.7: Distribution of the selected features for each cluster for cross-sectionalbaseline data.
22
On the other hand, Figure 3.8 shows the values for longitudinal data:
• C2, C4 and C7 show values below p < 0.05 for APOE4 and C1 below p < 0.01.
• All clusters but C7 and C8 show values below p < 0.05 for Education, partic-ularly C2 and C4 below p < 0.01.
• C1, C2, C3 and C5 present values below p < 0.01 for Ventricles, C4 and C6below p < 0.05.
• Only C7 present significant p < 0.05 for Hippocampus.
• Only C4 and C7 present significant p < 0.05 for Entorhinal.
• C1, C2, C4 and C4 show values below p < 0.05 for Fusiform.
• C4 and C5 show values below p < 0.05 and p < 0.001 respectively for MidTemp.
• The majority of the clusters, except C1 and C8, show values below p < 0.05
for Whole Brain.
• None of the clusters showed significance values for the cognitive tests selected.
The ANOVA tests (for all clusters) done for each of the described features (seeTable 3.3 over longitudinal data, column p-value for PHATE space) reject the nullhypothesis with p < 0.001 for the cognitive test features while the univariate testshowed no significance on them. Furthermore, the cerebral volumetric feature Hip-pocampus was the single one presenting a significant p-value for a single cluster C7,contrary to what it is observed on the univariate test where the value was p < 0.001
for all the clusters. We wanted to further analyse the significance of cognitive testsfor every pairwise cluster comparison by doing a Tukey HSD analysis. For all thecognitive tests null hypothesis can be rejected except for pairwise comparison inC1-C4, C3-C6 and C7-C8 and additionally for C1-C5 just for CDRSB. In AppendixB one can observe the box plots for every cognitive test. For most of the groups wecan reject the null hypothesis confirming that for the cognitive tests the differencesalong clusters are significant.
23
Figure 3.8: Distribution of the selected features for each cluster for longitudinaldata.
As a side analysis, we tried as well including the cerebral volumetric features fromthe covariate data to the analysis done with PHATE. First, after the feature selec-tion only covariate features ’Normalised_Hippocampus’, ’Normalised_Entorhinal’,’Normalised_MidTemp’ where retained (from the entire set of cerebral volumetriccovariate features). Furthermore, the heuristic for clustering found C=5 (insteadof 4 as before) to be the optimal number of clusters, and improved the NMI andNNE values as well: NNE = 0.16606 and NMI = 0.116750. It is interesting tonotice that when adding more general volumetric measures (no cross-sectional) thecovariate features related to cognitive tests (ADAS11, ADAS13, CDRSB, MMSEand RAVLT Immediate) show statistically significance but only on C2, as observedin Figure 3.9.
24
Figure 3.9: Distribution of the selected features for each cluster including cerebralvolumetric features from the covariate data for cross-sectional data. In green, aresurrounded the values for those cognitive tests that changed.
25
Chapter 4
Discussion and Future Work
This chapter summarizes the main developments and results achieved in this thesis.In addition, possible future work is proposed.
4.1 DiscussionWe have presented a data-driven approach to find AD phenotypes. First, we usedPHATE, a manifold learning based approach for dimensionality reduction, for mod-eling AD trajectories capturing global and local nonlinear structures between dat-apoints (patients). The method was developed to help visualize problems such assingle cell RNA sequencing differentiation over time, where the number of dimen-sions is substantially high. It is not uncommon to find this type of situations in avariety of biological and medical data, often referred to as the curse of dimensional-ity, and modeling AD progression is not an exception. The feature selection usingLasso with a SVC model addressed this problem for the initial dataset as well asPHATE by reducing the dimensionality even more. Furthermore, we apply CIMLR,a weighted multi-kernel unsupervised clustering over the reduced space to obtainclusters defining different AD phenotyping profiles using baseline and longitudinalneuroimaging data from ADNI.
We demonstrated that PHATE is a powerful dimensionality reduction approachto portrait AD progression trajectories in baseline and longitudinal neuroimagingdata. This capability was an advantage when using CIMLR as it allowed us toconstruct profiles with higher accuracy for AD phenotyping.
Further findings are the statistical analysis done (univariate and ANOVA) overthe obtained clusters for cognitive tests and cerebral volumes features as both groupsof data are correlated on how thinking abilities ( i.e. memory, language, reasoningand perception) deteriorate when developing AD, where we found them to be (except
26
for a minority of cases) significant. All the results were assessed with Tukey-HSDpost-hoc analysis presenting, for almost all cases, a highly statistical significance.
4.2 Future WorkThis work has some limitations worth mentioning. We believe that other imagingmodalities could help improve the outcome, for instance including PET scans. Fur-thermore, our database was somehow limited to datasets provided for the TADPOLEChallenge, which is a reduced dataset, and not a raw selection of data from ADNI.On the data pre-processing, dealing with missing values was reduced to a simplesolution such as removing the rows containing Nan values and also, for imbalancedclasses we apply an undersampling technique. For the former, a more sophisticatedapproach could have been applied, such as including mean values over missing data;and for the latter other techniques could have been applied such as oversamplingcombined with k-fold cross-validation.
When performing feature selection, we tried 2 different methods: SVC and LinearRegression applying Lasso (l1) penalty. We saw better results when using SVCparametrized with C = 0.1 as the impact on features was not as radical than theone obtained using Linear Regression with the same parametrization. Nevertheless,a full analysis as the one performed in this thesis could be insightful to betterunderstand the impact of each of the methods used.
PHATE helped us visualize AD progression trajectories, however we had to tryseveral parametrization to capture a proper trajectory output. For most of the caseswith the adjustment of decay parameter to increase connectivity in the graph wassufficient, however we observed that posterior clustering over the low-dimensionaloutput was highly sensitive to these modifications resulting in a higher number ofclusters and as consequence, a negative impact on the metrics.
Unsupervised clustering by CIMLR was done over the output space of PHATE,which generates a low-dimensional space with 3 coordinates. Although the analysisover PHATE space improved when compared to original space, the weights obtainedfrom the multi-kernel process were not fully exploited, because we only use 3 fea-tures (i.e. the 3D coordinates) for clustering and the importance and contribution(weights) for each feature becomes trivial. In that case, a different kind of clus-tering could be explore, such as k-nearest neighbors algorithm (KNN) where thelow-dimensional PHATE’s space could help as KNN relies on neighbours (patients)distances.
27
List of Figures
2.1 PHATE algorithm (see Figure 2 in [19]). . . . . . . . . . . . . . . . . 8
3.1 PHATE result (in 2D) for baseline MRI data for NL, MCI andDementia subjects with parameter n_components=3. . . . . . . . . . 15
3.2 PHATE result (in 2D) for baseline (left) and longitudinal (right)MRI data for NL, MCI and Dementia subjects with parametern_components=3 and decay=5. . . . . . . . . . . . . . . . . . . . . . 15
3.3 PHATE vs PCA vs t-SNE, using baseline cross-sectional data. . . . . 163.4 Different time points for 3 patients along PHATE trajectory. Pro-
gression occurs left to right, from NL to Dementia (as observed inprevious Figures). Patient A: progresses from NL to MCI, 4 timepoints. Patient B: progresses from MCI to Dementia, 4 time points.Patient C: progresses from NL to Dementia, 3 time points. . . . . . . 17
3.5 Comparison between E uclidean distance between and the learnsimilarity matrix S for cross-sectional baseline data. . . . . . . . . . . 21
3.6 Comparison between Euclidean distance between and the learnsimilarity matrix S for cross-sectional longitudinal data. . . . . . . . . 21
3.7 Distribution of the selected features for each cluster for cross-sectionalbaseline data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.8 Distribution of the selected features for each cluster for longitudinaldata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9 Distribution of the selected features for each cluster including cerebralvolumetric features from the covariate data for cross-sectional data.In green, are surrounded the values for those cognitive tests thatchanged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
B.1 Tukey HSD for cognitive tests ADAS11. . . . . . . . . . . . . . . . . 42B.2 Tukey HSD for cognitive tests ADAS13. . . . . . . . . . . . . . . . . 42B.3 Tukey HSD for cognitive tests CDRSB. . . . . . . . . . . . . . . . . . 42
28
B.4 Tukey HSD for cognitive tests MMSE. . . . . . . . . . . . . . . . . . 43B.5 Tukey HSD for cognitive tests RAVLT Immediate. . . . . . . . . . . . 43
29
List of Tables
2.1 Demographic information of the studied cohort for MRI baselinecross-sectional data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Demographic information of the studied cohort for MRI longitudinaldata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 CIMLR analysis over original data space vs. PHATE output space.The values are with respect to the heuristic to estimate the numberof clusters from 1 to 10 possible clusters in total. NMI: NormalizedMutual Information. NNE: Nearest Neighbor Error reported at themaximum number of iterations. . . . . . . . . . . . . . . . . . . . . . 18
3.2 ANOVA test over all clusters to compare CIMLR method in theoriginal data space vs. PHATE space for cross-sectional baseline data. 19
3.3 ANOVA test over all clusters to compare CIMLR method in theoriginal data space vs. PHATE space for longitudinal data. . . . . . . 20
A.1 Columns used from MRI Alzheimer’s Disease Neuroimaging Initiativedata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.2 Columns used from FreeSurfer MRI cross-sectional data. . . . . . . . 31
B.1 Cognitive tests and cerebral volumetric columns used from ADNI data. 41
30
Appendix A
First Appendix
List of cerebral volumes columns used fromMRI data
COLUMN NAME DESCRIPTION
Ventricles UCSF Ventricles
Hippocampus UCSF Hippocampus
WholeBrain UCSF WholeBrain
Entorhinal UCSF Entorthinal
Fusiform UCSF Fusiform
MidTemp UCSF Med Temp
ICV UCSF ICV
Table A.1: Columns used from MRI Alzheimer’s Disease Neuroimaging Initiativedata.
List of cross-sectional columns used from MRI data
Table A.2: Columns used from FreeSurfer MRI cross-sectional data.
COLUMN NAME DESCRIPTION
Normalised_Ventricles Ventricles volume normalized with
ICV
31
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
Normalised_Hippocampus Hippocampus volume normalized
with ICV
Normalised_Entorhinal Entorhinal volume normalized with
ICV
Normalised_Fusiform Fusiform volume normalized with
ICV
Normalised_MidTemp
Normalised_WholeBrain Whole Brain volume normalized
with ICV
ST101SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
RightPallidum
ST102CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightParacentral
ST102SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightParacentral
ST102TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightParacentral
ST102TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightParacentral
ST103CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightParahippocampal
ST103SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightParahip-
pocampal
ST103TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightParahippocampal
ST103TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightParahippocampal
ST104CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightParsOpercularis
ST104SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightParsOpercu-
laris
ST104TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightParsOpercularis
ST104TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightParsOpercularis
32
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST105CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightParsOrbitalis
ST105SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightParsOrbitalis
ST105TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightParsOrbitalis
ST105TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightParsOrbitalis
ST106CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightParsTriangularis
ST106SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightParsTriangu-
laris
ST106TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightParsTriangularis
ST106TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightParsTriangularis
ST107CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightPericalcarine
ST107SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightPericalcarine
ST107TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightPericalcarine
ST107TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightPericalcarine
ST108CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightPostcentral
ST108SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightPostcentral
ST108TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightPostcentral
ST108TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightPostcentral
ST109CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightPosteriorCingulate
33
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST109SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightPosteriorCin-
gulate
ST109TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightPosteriorCingulate
ST109TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightPosteriorCingulate
ST10CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
Icv
ST110CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightPrecentral
ST110SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightPrecentral
ST110TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightPrecentral
ST110TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightPrecentral
ST111CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightPrecuneus
ST111SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightPrecuneus
ST111TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightPrecuneus
ST111TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightPrecuneus
ST112SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
RightPutamen
ST113CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightRostralAnteriorCingulate
ST113SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightRostralAnte-
riorCingulate
ST113TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightRostralAnteriorCingulate
ST113TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightRostralAnteriorCingu-
late
34
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST114CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightRostralMiddleFrontal
ST114SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightRostralMid-
dleFrontal
ST114TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightRostralMiddleFrontal
ST114TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightRostralMiddleFrontal
ST115CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightSuperiorFrontal
ST115SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightSuperior-
Frontal
ST115TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightSuperiorFrontal
ST115TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightSuperiorFrontal
ST116CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightSuperiorParietal
ST116SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightSuperiorPari-
etal
ST116TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightSuperiorParietal
ST116TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightSuperiorParietal
ST117CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightSuperiorTemporal
ST117SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightSuperiorTem-
poral
ST117TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightSuperiorTemporal
ST117TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightSuperiorTemporal
ST118CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightSupramarginal
35
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST118SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightSupra-
marginal
ST118TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightSupramarginal
ST118TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightSupramarginal
ST119CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightTemporalPole
ST119SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightTemporalPole
ST119TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightTemporalPole
ST119TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightTemporalPole
ST11SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
AccumbensArea
ST120SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
RightThalamus
ST121CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightTransverseTemporal
ST121SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightTransver-
seTemporal
ST121TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightTransverseTemporal
ST121TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightTransverseTemporal
ST124SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
RightVentralDC
ST125SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
RightVessel
ST127SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
ThirdVentricle
ST128SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
WMHypoIntensities
36
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST129CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftInsula
ST129SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftInsula
ST129TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
Insula
ST129TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftInsula
ST12SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
Amygdala
ST130CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
RightInsula
ST130SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of RightInsula
ST130TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of
RightInsula
ST130TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of RightInsula
ST13CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftBankssts
ST13SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftBankssts
ST13TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
Bankssts
ST13TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftBankssts
ST14CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftCaudalAnteriorCingulate
ST14SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftCaudalAnteri-
orCingulate
ST14TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
CaudalAnteriorCingulate
ST14TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devi-
ation of LeftCaudalAnteriorCingu-
late
37
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST15CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftCaudalMiddleFrontal
ST15SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftCaudalMiddle-
Frontal
ST15TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
CaudalMiddleFrontal
ST15TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftCaudalMiddleFrontal
ST16SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
Caudate
ST17SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
CerebellumCortex
ST18SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
CerebellumWM
ST1SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of
Brainstem
ST21SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
ChoroidPlexus
ST23CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftCuneus
ST23SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftCuneus
ST23TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
Cuneus
ST23TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftCuneus
ST24CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftEntorhinal
ST24SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftEntorhinal
ST24TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
Entorhinal
ST24TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftEntorhinal
38
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST25CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftFrontalPole
ST25SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftFrontalPole
ST25TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
FrontalPole
ST25TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftFrontalPole
ST26CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftFusiform
ST26SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftFusiform
ST26TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
Fusiform
ST26TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftFusiform
ST29SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
Hippocampus
ST2SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Cor-
pusCallosumAnterior
ST30SV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (WM Parcellation) of Left-
InferiorLateralVentricle
ST31CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftInferiorParietal
ST31SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftInferiorParietal
ST31TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
InferiorParietal
ST31TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftInferiorParietal
ST32CV_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Volume (Cortical Parcellation) of
LeftInferiorTemporal
ST32SA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Surface Area of LeftInferiorTempo-
ral
39
Table A.2 continued from previous page
COLUMN NAME DESCRIPTION
ST32TA_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Average of Left-
InferiorTemporal
ST32TS_UCSFFSX_11_02_15
UCSFFSX51_08_01_16
Cortical Thickness Standard Devia-
tion of LeftInferiorTemporal
40
Appendix B
Second Appendix
List of cognitive test features used from ANOVA test
COLUMN NAME DESCRIPTION
ADAS11 Alzheimer’s Disease Assessment Scale, 11 questions version
ADAS13 Alzheimer’s Disease Assessment Scale, ADAS11 plus test
of delayed word recall and a number cancellation or maze task
CDRSB Clinical Dementia Rating Scale–Sum of Boxes
MMSE Mini-Mental State Exam
RAVLT_immediate Rey Auditory Verbal Learning Test Immediate
Ventricles UCSF Ventricles
Hippocampus UCSF Hippoampus
Entorhinal UCSF Entorhinal
MidTemp UCSF Med Temp
WholeBrain UCSF Whole Brain
Table B.1: Cognitive tests and cerebral volumetric columns used from ADNI data.
Tukey HSD confidence interval plots for cognitive
tests
41
Figure B.1: Tukey HSD for cognitive tests ADAS11.
Figure B.2: Tukey HSD for cognitive tests ADAS13.
Figure B.3: Tukey HSD for cognitive tests CDRSB.
42
Figure B.4: Tukey HSD for cognitive tests MMSE.
Figure B.5: Tukey HSD for cognitive tests RAVLT Immediate.
43
Bibliography
[1] “2018 Alzheimer’s disease facts and figures”. In: Alzheimer’s Dementia 14.3 (2018),
pp. 367–429. issn: 1552-5260. doi: https://doi.org/10.1016/j.jalz.2018.02.
001. url: http://www.sciencedirect.com/science/article/pii/S1552526018300414.
[2] Stanisław Adaszewski et al. “How early can we predict Alzheimer’s disease us-
ing computational anatomy?” In: Neurobiology of Aging 34.12 (2013), pp. 2815–
2826. issn: 0197-4580. doi: https://doi.org/10.1016/j.neurobiolaging.
2013.06.015. url: https://www.sciencedirect.com/science/article/pii/
S0197458013002704.
[3] Maryamossadat Aghili et al. “Predictive modeling of longitudinal data for Alzheimer’s
Disease Diagnosis Using RNNs”. In: International Workshop on PRedictive Intelli-
gence In MEdicine. Springer. 2018, pp. 112–119.
[4] Leon M. Aksman et al. “Modeling longitudinal imaging biomarkers with paramet-
ric Bayesian multi-task learning”. In: Human Brain Mapping 40.13 (2019), pp. 3982–
4000. doi: https://doi.org/10.1002/hbm.24682. eprint: https://onlinelibrary.
wiley.com/doi/pdf/10.1002/hbm.24682. url: https://onlinelibrary.wiley.
com/doi/abs/10.1002/hbm.24682.
[5] Michael AA Cox and Trevor F Cox. “Multidimensional scaling”. In: Handbook of
data visualization. Springer, 2008, pp. 315–347.
[6] Ruoxuan Cui, Manhua Liu, and Gang Li. “Longitudinal analysis for Alzheimer’s dis-
ease diagnosis using RNN”. In: 2018 IEEE 15th International Symposium on Biomed-
ical Imaging (ISBI 2018). IEEE. 2018, pp. 1398–1401.
[7] Qunxi Dong et al. “Multi-task Dictionary Learning Based on Convolutional Neural
Networks for Longitudinal Clinical Score Predictions in Alzheimer’s Disease”. In:
International Workshop on Human Brain and Artificial Intelligence. Springer. 2019,
pp. 21–35.
[8] Rashmi Dubey et al. “Analysis of sampling techniques for imbalanced data: An n =
648 ADNI study”. eng. In: NeuroImage 87 (Feb. 2014). S1053-8119(13)01016-1[PII],
44
pp. 220–241. issn: 1095-9572. doi: 10.1016/j.neuroimage.2013.10.005. url:
https://doi.org/10.1016/j.neuroimage.2013.10.005.
[9] EuroPOND-Consortium. The Alzheimer’s Disease Prediction Of Longitudinal Evo-
lution (TADPOLE) Challenge. https://github.com/noxtoby/TADPOLE. 2017.
[10] Bruce Fischl. “FreeSurfer”. In: Neuroimage 62.2 (2012), pp. 774–781.
[11] Mostafa Mehdipour Ghazi et al. “Training recurrent neural networks robust to in-
complete data: application to Alzheimer’s disease progression modeling”. In: Medical
image analysis 53 (2019), pp. 39–46.
[12] Lev E Givon et al. “Cognitive Subscore Trajectory Prediction in Alzheimer’s Dis-
ease”. In: arXiv preprint arXiv:1706.08491 (2017).
[13] Clifford R Jack et al. “Hypothetical model of dynamic biomarkers of the Alzheimer’s
pathological cascade”. In: The Lancet Neurology 9.1 (2010), pp. 119–128. issn: 1474-
4422. doi: https://doi.org/10.1016/S1474-4422(09)70299-6. url: https:
//www.sciencedirect.com/science/article/pii/S1474442209702996.
[14] Biao Jie et al. “Temporally constrained group sparse learning for longitudinal data
analysis in Alzheimer’s disease”. In: IEEE Transactions on Biomedical Engineering
64.1 (2016), pp. 238–249.
[15] Yejin Kim et al. “Multimodal Phenotyping of Alzheimer’s Disease with Longitudinal
Magnetic Resonance Imaging and Cognitive Function Data”. In: Scientific Reports
10.1 (Mar. 2020), p. 5527. issn: 2045-2322. doi: 10.1038/s41598-020-62263-w.
url: https://doi.org/10.1038/s41598-020-62263-w.
[16] Baiying Lei et al. “Longitudinal analysis for disease progression via simultaneous
multi-relational temporal-fused learning”. In: Frontiers in aging neuroscience 9 (2017),
p. 6.
[17] Gerard Martí-Juan, Gerard Sanroma-Guell, and Gemma Piella. “A survey on ma-
chine and statistical learning for longitudinal analysis of neuroimaging data in Alzheimer’s
disease”. In: Computer Methods and Programs in Biomedicine 189 (2020), p. 105348.
issn: 0169-2607. doi: https://doi.org/10.1016/j.cmpb.2020.105348. url:
https://www.sciencedirect.com/science/article/pii/S0169260719316165.
[18] Gerard Martı-Juan et al. “Revealing heterogeneity of brain imaging phenotypes in
Alzheimer’s disease based on unsupervised clustering of blood marker profiles”. In:
PloS one 14.3 (2019), e0211121.
[19] K. R. Moon et al. Visualizing structure and transitions in high-dimensional biological
data. Dec. 2019.
45
[20] Kevin R. Moon et al. “Visualizing structure and transitions in high-dimensional bio-
logical data”. eng. In: Nature biotechnology 37.12 (Dec. 2019). PMC7073148[pmcid],
pp. 1482–1492. issn: 1546-1696. doi: 10.1038/s41587-019-0336-3. url: https:
//doi.org/10.1038/s41587-019-0336-3.
[21] Daniele Ramazzotti et al. “Multi-omic tumor data reveal diversity of molecular mech-
anisms that correlate with survival”. In: Nature communications 9.1 (2018), pp. 1–
14.
[22] Jessica Qiuhua Sheng et al. “Predictive Analytics for Care and Management of
Patients With Acute Diseases: Deep Learning–Based Method to Predict Crucial
Complication Phenotypes”. In: J Med Internet Res 23.2 (Feb. 2021), e18372. issn:
1438-8871. doi: 10.2196/18372. url: http://www.ncbi.nlm.nih.gov/pubmed/
33576744.
[23] Laurens Van der Maaten and Geoffrey Hinton. “Visualizing data using t-SNE.” In:
Journal of machine learning research 9.11 (2008).
[24] Victor L Villemagne et al. “Amyloid β deposition, neurodegeneration, and cognitive
decline in sporadic Alzheimer’s disease: a prospective cohort study”. In: The Lancet
Neurology 12.4 (2013), pp. 357–367. issn: 1474-4422. doi: https://doi.org/10.
1016/S1474-4422(13)70044-9. url: http://www.sciencedirect.com/science/
article/pii/S1474442213700449.
[25] Tingyan Wang, Robin G Qiu, and Ming Yu. “Predictive modeling of the progression
of Alzheimer’s disease with recurrent neural networks”. In: Scientific reports 8.1
(2018), pp. 1–12.
[26] Jennifer L. Whitwell et al. “Normalization of Cerebral Volumes by Use of Intracra-
nial Volume: Implications for Longitudinal Quantitative MR Imaging”. In: Ameri-
can Journal of Neuroradiology 22.8 (2001), pp. 1483–1489. issn: 0195-6108. eprint:
http://www.ajnr.org/content/22/8/1483.full.pdf. url: http://www.ajnr.
org/content/22/8/1483.
46