Machine learning for automatic Alzheimer’s disease detection: addressing domain shift issues for building robust models

Alzheimer’s disease (AD) is a type of brain disease that affects a person’s ability to perform daily tasks. Modern neuroimaging techniques have made it possible to detect structural and functional changes in the brain that are linked to AD, and machine learning (ML)-based methods have been extensively developed to help physicians achieve fast and accurate imaging-based AD detection. One critical issue when deploying ML methods in clinical applications is the domain shift that exists between the training and test data, which may significantly attenuate a model’s performance. To resolve this issue, domain adaptation (DA) is needed to narrow the performance gap between data from domains with different distributions. The purpose of this review is to offer insight into the state of ML and DA research in the field of neuroimaging-based AD detection. The limitations of existing studies, as well as opportunities for future studies, are discussed with the hope that more investigations will be conducted in the future to optimize the clinical workflow for AD diagnosis and treatment.


INTRODUCTION
Alzheimer's disease (AD) is the most frequent cause of dementia among the elderly.AD is a neurologic disorder that affects patient memory and cognitive skills [1].Based on the 2022 World Alzheimer Report, AD is among the 7 major causes of death globally and 12.7 million people ≥ 65 years of age will be diagnosed with AD by 2050 [2].A recent national cross-sectional survey showed that 15.07 million Chinese people ≥ 60 years of age have dementia, making AD the 5 th leading cause of death in China (including 9.83 million with AD, 3.92 million with vascular dementia, and 1.32 million with other forms of dementia).Additionally, according to national research, the annual cost of treating AD patients in China in 2015 was 167.74 billion USD, and the growing treatment expenditures are projected to reach 1.8 trillion USD by 2050 [2,3].Although there are currently no cures for AD, various therapies have been used to delay the development of some symptoms and decrease the cognitive impact on patients, such as memory loss [4,5].Therefore, detecting AD in the early stages is a critical challenge [4,6].
Numerous studies have been conducted to detect AD using neuroimaging methods, including magnetic resonance imaging (MRI) [7][8][9] and positron emission tomography (PET) [10,11].Nevertheless, it is both resource-intensive and time-consuming to analyze the acquired brain images due to the amount and complexity of the data.To this end, computer-aided diagnosis systems have been built to help identify biomarkers automatically [6].The detection of AD has greatly benefited from machine learning (ML), which is particularly helpful when working with complicated and abundant data [12,13].The performance of classical ML algorithms may be limited by the hand-crafted feature extraction process, which makes it difficult to comprehensively mine the available data [5].In contrast, deep learning (DL) techniques, such as convolutional neural networks (CNNs), have generated impressive results in AD detection due to the ability to automatically extract massive features from the input data [5].
Despite the fact that DL performance in neuroimaging-based AD detection has been reported to be promising, obtaining sufficient labeled imaging data to train DL models can be expensive and challenging.Most learning-based methods presume that the training (source domain) and test (target domain) sets have the same distribution such that an algorithm that has been optimized on the training set can be successfully applied to the test set.Due to the domain shift, which is caused by various factors, including different scanning machines, different imaging protocols, different sites (hospitals), or even different imaging modalities, this data distribution assumption does not always hold true in practice.Domain shift may result in poor generalization performance when a model that has been successfully trained on the source domain is deployed to an unseen target domain [14].To resolve these issues, transfer learning (TL) methods can be employed; the aim of TL is to leverage large amounts of data (e.g., ImageNet) to address various challenges with a smaller dataset [6,14,15].Domain adaptation (DA) is one type of TL.DA refers specifically to a learning technique that aims to improve model performance on a target domain by using the knowledge learned from a source domain.DA has drawn increasing attention as a potential solution for the domain shift issue for medical image datasets because DA attempts to narrow the performance gap between different, but related domains [1,6].
DA has an important role in developing robust and clinically applicable ML-based AD detection models.Therefore, the main objectives of this paper are to provide a comprehensive review of the existing literature on DA-related ML method development for detecting AD and to discuss potential future research directions in this field.

DIFFERENT STAGES OF AD
AD is a neurologic disease that influences brain activity and gradually damages brain cells, leading to memory loss, instability in daily life, and ultimately death [16].AD is thought to be a complex illness with multiple risk factors, including increased age, genetic factors, head trauma, vascular illnesses, infections, and environmental variables (e.g., heavy and trace metals) [17,18].According to the National Institute on Aging-Association Alzheimer's (NIA-AA), the clinical phases of AD can be divided into three major stages: preclinical AD; mild cognitive impairment (MCI) due to AD; and dementia due to AD, often known as Alzheimer's dementia.Alzheimer's dementia is further classified into three sub-stages: AD with mild dementia (mild AD); AD with moderate dementia (moderate AD); and AD with severe dementia (advanced AD) [19].Different symptoms and time durations can be observed during the different AD stages (Figure 1).

CLINICAL TECHNIQUES TO DETECT AD
Prior to the discovery of distinctive plaques and tangles in the brain under a microscope, AD could not be diagnosed until the patient had passed away.Currently, neurologists, neuroradiologists, and researchers can more accurately diagnose AD in a patient while they are still alive [20].There are different techniques for diagnosing AD [21], as follows: (1) 2) [22].
The clinical symptoms of AD, such as a deterioration in cognitive and functional levels with a prominent disturbance of memory and other mental skills, can be identified through clinical assessment and blood test results.These are general symptoms of different types of dementia, however, which makes it challenging to differentiate between AD and other diagnoses.Therefore, histopathologic verification is still require for a definite identification.Neuroimaging, on the other hand, serves a critical role because the brain can be observed in a non-invasive manner.Neuroimaging can also assist in monitoring the development of disease and the impact of treatment [23,24].
Nevertheless, neuroimaging data are commonly high-dimensional and complex, which poses difficulties during manual inspection.Computer-aided diagnosis systems are highly desired to help physicians achieve fast and accurate neuroimaging data analysis.ML techniques have had great success in different fields in recent decades, including medical and neuroimaging fields [25][26][27], and the ability and accuracy of largescale complicated data analyses have been significantly improved due to recent developments in DL techniques [21,[28][29][30][31]. Essential obstacles, however, still prevent the direct and efficient application of DL algorithms in the clinical setting because there are few labeled medical datasets because annotating medical datasets is a labor-intensive, costly, and time-consuming procedure that requires neurologists, neuroradiologists, and other experts [5].In addition, due to privacy issues, sharing data between different hospitals or medical centers is often not feasible [32].It is thus not surprising that a DL model built on data collected from one center fails to achieve the expected performance when applied to data collected from another center.This phenomenon is known as the domain shift issue, which needs to be addressed to build large-scale clinically applicable DL models.DA is a type of TL that can successfully address the problems of domain distribution disparity and insufficient annotated data [33].

Machine learning for AD detection
Because building ML models requires large quantities of labeled training data, most existing studies have adopted TL methods to address the data limitation problem [33,34] (Table 1).Hon et al. [35] used the OASIS MRI brain    [41] pretrained a 2D CNN (ResNet-18) using ImageNet before moving the learnable parameters to a 3D ResNet-18 by repeatedly copying the 2D filters into the third dimension.This practice served as an alternative to slicing 3D MRI data into 2D slices for AD classification.Then, these parameters were optimized as the entire model was trained using the MRI dataset.Furthermore, Zaabi et al. [42] developed a two-stage process in which images were partitioned into distinct blocks to capture the area of the brain containing the hippocampus using regions of interest (ROIs).Then, during the second stage, Zaabi et al. [42] classified the MRI images using two classifiers (randomly initialized CNN and TL models).The TL model leveraged the features obtained from the ImageNet-pretrained AlexNet, and Zaabi et al. [42] reported that the TL model provided more accurate results.Abed et al. [43] used three DL models (VGG-19, Inception v3, and ResNet-50) to initially train on ImageNet and subsequently fine-tuned on ADNI data to identify AD, MCI, and CN from sMRI images.Because the hippocampus is the initial region of the brain to be impacted by AD, atrophy of the hippocampus can be recognized using sMRI and DTI by preserving the shape, but reversing the appearance.As a result, Aderghal et al. [44] used sMRI data to train a 2-D+model [45], which was then fine-tuned using DTI images.Aderghal et al. [44] also used the LeNet model, which was previously trained on Modified National Institute of Standards and Technology (MNIST) data before being applied to sMRI and DTI data.A Deep Transfer Ensemble (DTE) network was established by Tanveer et al. [5] to classify AD, DL, and TL, and ensemble learning was used by the DTE.
The described network can be viewed as a collection of model snapshots that are created by training a model with a random set of hyper-parameters.Ashraf et al. [46] used a number of data augmentation approaches to expand and improve the dataset for feature extraction, and a number of DL models that have already been trained, such as spiking neural networks, DenseNet, MobileNet, SqueezeNet, ResNet, VGG, and GoogLeNet, were retrained to provide diagnoses for AD patients.
All of these existing studies reported promising AD detection performance when utilizing different neural networks.Nevertheless, the researchers ignored the issue of domain shift during model testing, which may lead to a deteriorated inference performance.

Domain adaptation for AD detection
DA methods have been developed to solve the domain shift issue (Table 1).In general, DA can be classified into traditional and deep DA.Traditional DA approaches typically rely on conventional ML models and human-engineered imaging characteristics.Moradi et al. [47] developed a hierarchical technique to distinguish AD and MCI participants from healthy controls.Moradi et al. [47] performed sparse logistic regression to extract features from MRI scans, then used these features in a binary classifier trained with a semi-supervised learning strategy.Additionally, Moradi et al. [47] addressed the domain shift between the ADNI source and the CADDementia target domains using an unsupervised DA (UDA) technique.In addition, Hofer et al. [48] described two scenarios related to the domain shift issue: (1) cross-dataset learning; and (2) the use of pretrained classifiers.Hofer et al. [48] employed a classifier that combined an SVM with a Gaussian kernel radial basis function (RBF).
Instance weighting is a common strategy used in traditional DA to solve the domain shift issue by applying different weights to the source domain data depending on how close the source domain data are to the target data, in which higher weights are typically assigned to source examples that are more closely related to the target samples.Following instance weighting, the domain gap between the source and target domains is reduced by training a learning model on the reweighted source samples [33,34,49].Wachinger et al. [50] explored the instance weighting technique to overcome the domain gap issue by reweighting the source samples.Wachinger et al. [50] then used multinomial regression with a mixed l1/l2 norm to detect AD, MCI, and CN in MRI data.Furthermore, Zhou et al. [51] selected features from the MR images that would help in the diagnosis of AD using the information gain technique, and used instance weighting to close the domain difference between the domains.After comparing numerous classifiers, Zhou et al. [51] concluded that the TrAdaboost procedure achieved the highest accuracy.
Another widely utilized method in traditional DA is feature transformation (alignment), which minimizes the distribution difference between the source and target domains, while preserving the essential structure of the original data by converting the source and target instances from the corresponding feature representation spaces to a shared feature space.A model can then be applied to the new feature space, which is less influenced by the domain gap [33,49].To distinguish between MCI-converters and -non-converters, Cheng et al. [52] presented a multimodal manifold-regularized TL (M2TL) technique to minimize the differences between the source and target domains.Examples from both domains were first chosen using M2TL-based sample selection, then the kernel-based maximum mean discrepancy (MMD) distance was used.Finally, a M2TL was created by combining MMD and a manifold regularization function with the sparse least squares classification algorithm; however, this strategy has the disadvantage of not leveraging feature selection.Cheng et al. [52] further suggested a technique (the domain-transfer learning approach [domain transfer SVM]), which involves domain transfer feature selection (DTFS) and domain transfer sample selection (DTSS).DTFS attempts to choose features (brain areas) from the source and target domains that are related to AD, and DTSS transfers the data samples from the original feature space to the kernel space using the kernel learning method.Then, multi-task least absolute shrinkage and the selection operator (LASSO) are used for sample selection.Finally, they constructed a domain transfer support vector machine (DTSVM) to distinguish MC-converters from MCI-non-converters [53].Guerrero et al. [54] attempted to integrate various feature spaces composed of multilevel relevant intensity characteristics acquired from images at 1.5T and 3T by presuming that images located in different feature subspaces adhere to the overall manifold framework, and these various spaces were joined via manifold alignment.A joint manifold was optimized on ROIs created by sparse regression and multilevel variable selection.Guerrero et al. [54] claimed that their method was the first to categorize the entire dataset by combining the intensity features of 1.5T and 3T MRI images from ADNI-1, ADNI-GO, and ADNI-2 into a unified manifold; however, the disadvantage of this method is the assumption that each feature space has a unique and ideal low-dimensional embedding.Problems may arise if there are significant differences in these embeddings.To reduce the gap between various datasets and increase the classification performance with very little training data, Li et al. [55] suggested a successful strategy.In the feature space, different functional MRI (fMRI) data sources have different sample distributions.According to the Li et al. [55] method, features from two distinct domains that were dispersed throughout two feature spaces were first extracted and selected, then using the modified subspace alignment method described by Fernando et al. [56], the example values from the two different feature spaces were then matched into a single subspace.Samples from the single subspace were utilized to create the classifier for AD prediction.In addition, van Opbroek et al. [57] introduced a feature-space transformation (FST) mechanism for hippocampus segmentation to address variations in feature representations across source and target datasets.To facilitate consistency between both the source and target feature spaces, the approach utilized unlabeled source and target examples, which are sometimes known as source-target pairs.Then, median transformation was used to map the training data from the source feature space to the target feature space.If the data do not contain images of source-target pairs, however, this technique will not be effective.Wang et al. [58] used joint distribution adaption (JDA) to close the gap between the source and target domains.Data from the two domains were first mapped into a more consistent feature space before assigning further weights to the target domain samples.Then, a classifier was applied to detect AD in fMRI scans using data from the new consistent feature space.
Deep DA methods integrate deep network features, such as the generative adversarial network (GAN) with adaptation techniques [33].Sinha et al. [59] adopted an attention-guided GAN [60] to harmonize images from three publicly available brain MRI datasets to generate fake images.Then, a 2D AlexNet CNN model was used to predict AD.Moreover, a novel DA paradigm was created by Wang et al. [61] to address the domain shift issue.In one instance a pretrained model was fine-tuned using a subgroup of data, such as age range, race, or scanner type, with a weight-constraint penalty term and TL.Wang et al. [61] modified a model trained in the source to the target in a multi-study using an auxiliary task, such as age regression or gender classification, which helps find the imaging features of the domain shift.The feature-extractor was retrained on the source task, and the main task-related variables were regularized to transfer the model to the target data.
The studies listed above only used one source and one target domain; however, several studies have focused on sharing data among different medical institutions (e.g., using more than one source domain), while preserving patient privacy [62].To fully utilize the collected heterogeneous data for AD detection, Cheng et al. [63] established a multi-domain TL (MDTL) technique.Two components were crucial for the constructed MDTL structure: 1) a model known as the multi-domain transfer feature selection (MDTFS) mechanism, which selects the most beneficial feature subset from multi-domain data; and 2) a multi-domain transfer classification (MDTC) technique that identifies the presence of the disease to identify early-onset AD.Guan et al. [14] proposed a multi-source optimal transport (MSOT) approach for using data from multiple MRI facilities.Specifically, they first projected the data from several source domains to the target domain using optimal transport in an unsupervised manner; the Wasserstein distance was then used to match each source domain to its target domain, and the similarity was used to calculate the source domain weight.An SVM classifier was subsequently trained using the projected data from each source domain, and to forecast the target sample labels, a weighted voting ensemble learning strategy was used.
The current DA approaches have demonstrated considerable improvements in the diagnosis of AD disease using a variety of modalities.Nevertheless, there are still some general shortcomings to these methods, which will be discussed in Section 5.

DISCUSSION AND OUTLOOK
Neuroimaging is a useful method for AD detection; however, regardless of the advantages of modern imaging modalities, interpreting the high-dimensional, complex, and probably noisy data puts much pressure on the physicians who inspect the images.To this end, automated ML-based methods have been extensively investigated to help achieve fast and accurate image inspection, and the detection of AD from neuroimaging data has been successfully accomplished using classical ML and feed-forward neural networks.Because existing DL algorithms build deep structures from scratch, DL algorithms have various disadvantages, including the requirement for a large training dataset, therefore developing TL strategies is one major solution [37,64].The idea behind TL is to utilize knowledge from a source domain to aid representation learning in a target domain; however, the following restrictions affect the feasibility of most current TL approaches: first, most TL approaches necessitate a significant amount of labeled target domain data to fine-tune the model, but there are generally limited labeled target domain data available; and second, most TL techniques concentrate on pretraining networks using massive natural image databases (e.g., ImageNet), which have characteristics that are largely different from the databased found in medical images.Additionally, fine-tuning models on the target data may lead to overfitting if there is a significant domain difference between the source and the target domains [49,60].
DA is a type of TL that is particularly targeted at decreasing ML model performance gaps between domains with different data distribution patterns.Nevertheless, existing DA methods for AD detection has drawbacks.For example, the instance weighting method may produce very low weights for some samples.Most source samples are assigned lower weights when there is a significant cross-domain difference, which results in a smaller number of useful training instances [49,58].A variety of methods have attempted to align the source and target distributions [56], but require extremely precise data assumptions or the computational complexity is unmanageable for huge datasets.Some techniques rely on subspace-based representations and are considerably less expensive, but the reliance on a basic linear assumption is too restrictive for situations in which there are severe domain differences [49,65].As a result, further investigation is needed to develop more effective DA methods for AD detection.Some recommendations are discussed in the remainder of this section.

Multi-modality images
Various neuroimaging tools are available for AD detection (Figure 2); however, recent research has concentrated on a single modality (Figure 3a), such as MRI, which is widely used in medical facilities for the diagnosis and prognosis of AD (Figure 3b).Multi-modal neuroimaging enables the discovery of complex neurodegenerative conditions that develop during the progression of AD [49].Additional research that adopts multi-modal neuroimaging data is required to comprehensively understand the disease development pattern.A significant issue with using multi-modal data is that not every participant participates in all of the different imaging processes.For example, amyloid-PET imaging is expensive and not affordable by all participants.As a result, combining sparse or missing datasets is a critical challenge in multi-modal data analysis that needs to be investigated further [29,66].

Unsupervised DA
DL has demonstrated an outstanding ability to detect AD by utilizing large quantities of labeled training images, but manually labeling medical images is difficult and demands professional skills and knowledge [67].Utilization of TL, which transmits knowledge from the annotated source domain to the unlabeled target domain, reduces the requirement for numerous labeled images.Nevertheless, if the distribution of the annotated samples varies from the distribution of the unlabeled samples, the model will perform poorly [65,68].This problem can be solved via UDA, the primary purpose of which is to reduce the gap in distribution among labeled source domains and unlabeled target domains [67,69].Although UDA performs well on a variety of medical image tasks [67][68][69], few studies have utilized UDA to identify AD [47,60].Therefore, additional research regarding the use of UDA to detect AD is anticipated.

Multi-source multi-target DA
Current DA techniques typically concentrate on single-source domain adaptation, but in application scenarios there may be numerous source domains.For example, classification studies combine information from various imaging modalities [34,65] or merge imaging information with clinical or genomics data [30].It is also beneficial to transfer a model to various target domains.Therefore, future research is necessary because multi-source/multi-target DA in AD detection has not been extensively researched [34,70,71].Multi-site medical data transmission is also challenging to manage from a legal perspective due to the many variations in the definition of patient confidentiality and the legal framework around its implementation across different areas [62,72].Federated learning (FL), an ML strategy, sends model updates rather than actual data to maintain data privacy.FL trains models by utilizing decentralized clients under server management.Therefore, FL may be more suitable for this task.Due to the domain shift issue, models trained using FL may still have trouble generalizing to a new domain.Therefore, it is crucial to create DA algorithms for FL [73,74].

CONCLUSION
AD is among the leading causes of death, thus the accurate detection of early-stage AD is a primary concern.Neuroimaging provides detailed information on changes in brain structures, but computer-based strategies are needed to help neuroradiologists achieve fast and accurate neuroimaging-based AD detection.ML/ DL algorithms trained on one data source may perform badly when evaluated on data from a different source due to a domain shift.Model adaptation across data with different distributions is therefore essential.This paper reviewed the existing studies that have addressed ML-based, and specifically DA-based methods developed to detect AD using neuroimaging data.The purpose of this review was to provide new researchers in the relevant fields with a conceptual description of the problem as well as information about recent advances.The limitations of the existing studies were also discussed and possible research directions were provided.

Figure 2 |
Figure 2 | Representative MRI scans of patients without (a) and with AD (b), as well as representative PET scans of patients without (c) and with AD (d).These images are from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.

Figure 3 |
Figure 3 | Percentages of reviewed studies utilizing single, cross, and multi-modal imaging data (a) and utilizing the respective imaging modalities (b).

Table 1 |
Representative works in the field of ML-and DA-based AD detection.

Table | (
[40]inued)imaging dataset to retrain the VGG-16 and Inception V4 classifiers to identify AD after pretraining on ImageNet, and Acharya et al.[36]selected TL methods to classify MRI images of AD patients into various categories.These models include retrained versions of VGG-16, ResNet-50, and AlexNet.Khan et al.[37]utilized ImageNet to pretrain a VGG model, and then employed layer-wise TL to retrain the top layers after freezing the lower-level layers.Aderghal et al.[38]used weak Gaussian blurring and shift translation as an augmentation technique to increase the sample size, then fine-tuned the model to distinguish AD from MCI and cognitive normal (CN) individuals using diffusion tensor imaging (DTI) after training the model using structural MRI (sMRI) data.Wee et al.[39]proposed a spectral graph CNN (graph-CNN) to detect MCI and AD using the ADNI-2 dataset, and the best-performing models were retrained to distinguish between AD and MCI on the ADNI-1 and Asian datasets.Ebrahimi-Ghahnavieh et al.[40]detected AD in 3D MRI images by first training a 2D CNN using ImageNet, then fine-tuning the CNN with ADNI data to extract discriminatory features from 2D MRI scans.A long shortterm memory (LSTM) module was utilized to integrate the spatial relationships among the MRI images during the classification process after receiving the features obtained from the 2D CNN.Ebrahimi et al.