+1 Recommend
1 collections

      King Salman Center for Disability Research "KSCDR" is pleased to invite you to submit your scientific research to the "Journal of Disability Research - JDR", where the "JDR" comes within the Center's strategy aimed at maximizing the impact of research on the scientific field, by supporting and publishing scientific research on disability and its issues, which reflect positively on the level of services, rehabilitation, and care for individuals with disabilities.
      "JDR" is a scientific journal that has the lead in covering all areas of human, health and scientific disability at the regional and international levels.

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Enhancing Fetal Medical Image Analysis through Attention-guided Convolution: A Comparative Study with Established Models



            The ability to detect and track fetal growth is greatly aided by medical image analysis, which plays a crucial role in parental care. This study introduces an attention-guided convolutional neural network (AG-CNN) for maternal–fetal ultrasound image analysis, comparing its performance with that of established models (DenseNet 169, ResNet50, and VGG16). AG-CNN, featuring attention mechanisms, demonstrates superior results with a training accuracy of 0.95 and a testing accuracy of 0.94. Comparative analysis reveals AG-CNN’s outperformance against alternative models, with testing accuracies for DenseNet 169 at 0.90, ResNet50 at 0.88, and VGG16 at 0.86. These findings underscore the effectiveness of AG-CNN in fetal image analysis, emphasising the role of attention mechanisms in enhancing model performance. The study’s results contribute to advancing the field of obstetric ultrasound imaging by introducing a novel model with improved accuracy, demonstrating its potential for enhancing diagnostic capabilities in maternal–fetal healthcare.

            Main article text


            The field of fetal medical image analysis has gained significant importance due to its vital role in maternal–fetal healthcare. Accurate and efficient analysis of ultrasound images is crucial for the early detection of anomalies and ensuring the well-being of both the mother and the fetus. In this context, the present study aims to contribute to the advancement of this field by introducing a novel attention-guided convolutional neural network (AG-CNN) for enhanced feature extraction in maternal–fetal ultrasound images. The field of medical image analysis has made incredible strides in recent years, completely altering how specialists diagnose and treat patients in a wide range of fields (Horgan et al., 2023). Medical imaging of the fetus is an important part of this field as it can reveal important information about the fetus’s growth and health (Mehrdad et al., 2021).

            In recent years, medical imaging has played a pivotal role in diagnosing anomalies and evaluating congenital and acquired disabilities. One area of significant focus is fetal medical image analysis, where advanced techniques contribute to the early detection of abnormalities, thereby facilitating timely interventions and improved outcomes. This study delves into the realm of attention-guided convolution, presenting a novel technique for adaptive feature extraction in fetal medical image analysis. By harnessing the power of attention mechanisms, our approach aims to enhance the interpretation of ultrasound images, particularly in the context of anomalies and conditions associated with congenital and acquired disabilities. The utilisation of a large dataset from real clinical settings allows us to explore the efficacy of the proposed technique across diverse cases, ranging from morphological anomalies to neurodevelopmental conditions. This research holds promise in advancing the field, contributing valuable insights for improved diagnostic capabilities and patient care in the context of anomalies and various fetal conditions. Medical imaging of the fetus, obtained by methods such as ultrasound and magnetic resonance imaging, provides insight into the complex stages of prenatal development, allowing for the early diagnosis of anomalies and developmental diseases (Tenajas et al., 2023). While considerable progress has been made in the application of artificial intelligence (AI) to ultrasound analysis, there exists a knowledge gap in the optimal utilisation of attention mechanisms for improved feature extraction. Existing models may not fully exploit the intricate details within ultrasound images, potentially hindering diagnostic accuracy. Addressing this gap, our study seeks to explore and leverage attention-guided convolution, providing a nuanced understanding of its impact on feature extraction in fetal medical image analysis. By addressing this specific void in current knowledge, our research aims to contribute valuable insights that can refine existing methodologies and pave the way for more accurate diagnostic tools.

            However, due to the specific constraints provided by fetal medical imaging, information extraction from these images remains a complicated endeavour (Xiao et al., 2023). Due to factors such as background noise, anatomical heterogeneity, and fetal position changes, the collected pictures may lack clarity and consistency (Fiorentino et al., 2022). These complexities are typically beyond the capabilities of conventional image analysis methods, calling for more cutting-edge computational ways to tackle the problems (Iskandar et al., 2023). As a result of their remarkable performance in tasks including image segmentation, classification, and feature extraction, convolutional neural networks (CNNs) have become a useful tool in medical image analysis. In contrast to manually crafted features, which may not be able to capture the intricacies found in fetal medical imaging, these deep learning models may automatically learn important features from data (Cai et al., 2018). The complexity and subtlety of fetal anatomy and development make it difficult for CNNs to be applied to fetal medical imaging, notwithstanding their effectiveness (Fergus et al., 2021).

            The research problem at the core of this study revolves around the need for a more nuanced and effective approach to feature extraction in maternal–fetal ultrasound images. The proposed AG-CNN model incorporates attention mechanisms to selectively focus on relevant image regions, potentially improving diagnostic accuracy. The main challenge is coming up with an adaptive feature extraction method that can reliably detect subtle details in fetal medical photos in the presence of background noise and natural variation. When extracting features from an image, conventional CNN architectures average over the entire image, which can lead to an exaggeration of background noise and a blurring of critical anatomical details. Therefore, it is crucial to create a method that can dynamically zero down on important parts of a picture while discarding the rest. Adaptive feature extraction in fetal medical picture analysis faces a number of obstacles. There is substantial variation in the appearance of fetal structures because, first, there is no standardised fetal anatomy across different imaging sessions and gestational ages. Creating a standard method for feature extraction is made more difficult by this diversity. Second, because of the complexity and vulnerability of fetal organs, an approach that is sensitive to changes in a little amount of data while being robust to noise is desirable. Third, in medical applications, interpretability of the feature extraction process is critical so that physicians can comprehend the reasoning behind a model’s predictions. To aid in tasks like organ segmentation, anomaly identification, and age estimate, fetal medical image analysis seeks to extract relevant and meaningful features from fetal images. Traditional feature extraction approaches face difficulties due to noise, anatomical heterogeneity, and fetal position changes. In order to overcome these obstacles, we present a new method dubbed “attention-guided convolution,” which incorporates attention processes into the CNN architecture.

            This attention-guided convolution process selectively emphasises key features while downplaying those that are less crucial to the overall meaning of the input image. The resulting characteristics should be more discriminative and immune to noise, leading to better results when analysing fetal medical images. The issue can be stated in terms of improving the feature extraction process in fetal medical picture analysis by including attention-guided convolution into the CNN architecture. The goal of this method is to enhance the precision and reliability of the analysis by compensating for noise, anatomical variability, and fetal position changes. Innovative solutions that go beyond standard CNN designs are needed to tackle these problems. In light of these challenges, this study presents “attention-guided convolution,” a unique technique that combines attention processes into CNNs to improve the adaptive feature extraction process in fetal medical picture analysis. The goal of this method is to help physicians make better judgements based on the model’s predictions by supplying them with interpretable attention maps.

            The study’s main objectives are:

            • To bring the idea of attention-guided convolution to the field of fetal medical picture analysis and put it into practice. To do this, a novel method must be developed that dynamically adjusts the feature extraction process to zero in on important parts of the photos while filtering out noise and extraneous features. Our goal is to improve the CNN’s ability to recognise fetal anatomy by embedding attention processes into the network’s architecture.

            • To enhance the process of extracting fetal anatomy from medical imaging. Our goal is to show that, despite noise, variability, and changes in the fetal position, the attention-guided convolution technique can still identify and highlight the delicate and complicated fetal organs. Our goal in doing so is to improve the precision of organ segmentation, a crucial process in the processing of fetal medical images.

            • To demonstrate that the attention-guided convolution method is capable of effectively mitigating the effects of background noise and other distracting features in fetal medical photos. Our goal is to demonstrate the method’s reliability across a range of conditions by conducting experimental validation across a spectrum of imaging modalities and gestational ages.

            • To shed light on the working and effects of the attention mechanism on feature extraction. We hope that by producing interpretable attention maps, we may help doctors and academics better understand the model’s decision-making process, which will increase confidence in the model and speed up its adoption in clinical practice.

            Several new insights into fetal medical image analysis are provided by this study. In particular, we provide a novel approach termed “attention-guided convolution,” which embeds attention mechanisms directly into the framework of CNNs. To improve the accuracy and robustness of organ segmentation, anomaly detection, and age estimate, this method dynamically emphasises essential regions in fetal medical images while suppressing noise and unnecessary features. Additionally, our work generates interpretable attention maps, which provide physicians and researchers with insights into the model’s decision-making process, thus overcoming the interpretability barrier. We illustrate the versatility of the attention-guided convolution method across a variety of datasets and clinical contexts, and show that it outperforms more conventional CNN methods through extensive experimental validation. Collectively, our work paves the way for better application of fetal medical pictures in clinical practice and research, leading to improved diagnostics, better understanding of fetal development, and ultimately better outcomes.

            RELATED WORK

            The study by Horgan et al. (2023) provides a high-level overview of AI’s potential uses in obstetric ultrasound. This review takes a broad look at the present-day application of AI in the area, including its effects on diagnostic precision, workflow optimisation, and the difficulties of deploying AI technologies. Medical treatments, such as surgery and image-guided interventions, are the primary emphasis of the review of sophisticated medical telerobots provided by Mehrdad et al. (2021). The paper examines numerous facets of telerobotic systems, illuminating their successes, failures, and promising future prospects. Recent developments in ultrasound scanning with the help of AI are presented by Tenajas et al. (2023). This study explores the use of AI approaches in ultrasound systems to boost image quality, give operators immediate feedback, and streamline the scanning process.

            Xiao et al. (2023) assess the state of AI in fetal ultrasonography and its potential future developments.

            The authors highlight the contributions and problems of using AI approaches for diverse fetal ultrasound analysis tasks like image segmentation, anomaly identification, and gestational age calculation. The deep learning techniques for fetal ultrasound image processing are reviewed in detail by Fiorentino et al. (2022). The research addresses a wide variety of uses, such as image segmentation, classification, and detection, demonstrating the efficacy of deep learning algorithms in dealing with the intricacies of fetal ultrasound images. A study on synthesising realistic ultrasound images of the fetal brain is presented by Iskandar et al. (2023). In order to provide deep learning models with more realistic data for training, the authors propose a way to synthetically generate ultrasound images of fetal brains. In this study, we explore the feasibility of using this method to broaden the applicability of algorithms used in fetal brain analysis.

            Standardised fetal ultrasound plane detection using eye tracking is presented by Cai et al. (2018). In order to improve the precision and reliability of plane localisation in clinical practice, the authors employ eye-tracking data to direct the detection of fetal ultrasound planes.

            The work of Fergus et al. (2021) centres on the application of one-dimensional CNNs to the modelling of cardiotocography time-series signals that have been segmented. This study highlights the use of deep learning models trained on cardiotocography data for the early detection of aberrant delivery outcomes, demonstrating the potential of CNNs in enhancing prenatal care. A study on how to learn the architectures of deep neural networks using differential evolution is presented by Belciug (2022). The author uses this strategy to medical image processing to demonstrate the utility of evolutionary algorithms in enhancing the efficiency of neural network topologies. Automatic fetal abdominal segmentation from ultrasound pictures is proposed by Ravishankar et al. (2016) using a hybrid method. The authors show the promise of hybrid solutions for difficult segmentation tasks by accurately segmenting fetal abdominal tissues using a combination of deep learning and contour-based methods.

            For fetal ultrasound picture segmentation, Zeng et al. (2021) present a deeply supervised attention-gated V-Net. The accuracy of head circumference biometry from ultrasound pictures is increased by the network design shown in this paper, which blends attention processes with segmentation models. An approach for detecting fetal movement and recognising anatomical planes is presented by Dandıl et al. (2021), which makes use of the YOLOv5 network. In order to aid in thorough evaluations of fetal health, the authors employ this network to recognise anatomical features and track fetal movement in ultrasound images. In order to automatically classify common maternal–fetal ultrasound planes, Burgos-Artizzu et al. (2020). assess deep CNNs. The authors investigate the usefulness of CNNs in the classification of maternal–fetal ultrasound images, expanding our knowledge of automatic plane recognition.

            Automatic classification of frequent maternal–fetal ultrasound planes using deep CNNs is discussed and evaluated by Burgos-Artizzu et al. (2020). The research makes a contribution to the area by evaluating CNNs’ ability to identify maternal–fetal ultrasound planes. For the purpose of fetal head analysis, Alzubaidi et al. (2022) offer a transfer learning ensemble method. Using transfer learning and ensemble approaches, the authors propose a comprehensive solution for multi-task analysis, in this case predicting the gestational age and weight of a fetus from ultrasound scans. Sengan et al. (2022) use deep learning to segment echocardiographic images for prenatal diagnosis of fetal cardiac rhabdomyoma. The authors’ goal is to aid in the early detection of cardiac problems by using deep learning algorithms to segment photos of the fetal heart. Categorisation of Down syndrome markers using dense neural networks in fetal ultrasound pictures is presented by Pregitha et al. (2022). The use of deep neural networks to detect Down syndrome in fetal ultrasound images is investigated. For recognising standard scan planes of the fetal brain in 2D ultrasound pictures, Qu et al. (2020) present a deep learning-based solution. The authors present a system that uses deep CNNs to automatically recognise common ultrasound planes used for fetal brain scans.

            Automatic classification of frequent maternal–fetal ultrasound planes using deep CNNs was evaluated by Cerrolaza et al. (2018). The authors hope that their work will help advance automated plane recognition by expanding our knowledge of CNNs’ capability in recognising planes in maternal–fetal ultrasound. Deep learning methods for ultrasound during pregnancy are discussed by Diniz et al. (2021). This paper provides a survey of recent work that has used deep learning techniques to assess ultrasound images for signs of pregnancy. The study by Wang et al. (2021) provides an extensive literature review on the application of deep learning to the processing of medical ultrasound images. The authors address the influence of deep learning approaches on bettering diagnostic accuracy and clinical decision-making across a variety of settings. In their paper, Lipa and Trzciński (Płotka et al., 2022) discuss the results of a study in which deep learning fetal ultrasound video models were compared to human observers. In this work, the authors investigate deep learning models for fetal ultrasound video interpretation with the end goal of reaching the same biometric measurement accuracy as human observers. Automatic fetal biometry prediction using a unique deep convolutional network architecture is proposed by Ghelich Oghli et al. (2023). Using convolutional networks as an example, the authors present a deep learning strategy for predicting fetal biometric data. Deep learning and the Industrial Internet of Things (IIoT) are the foundation for automatic fetal ultrasound standard plane detection, which is the focus of Pu et al. (2021). In order to accurately recognise common fetal ultrasound planes, the authors offer a solution that blends deep learning approaches with IIoT concepts. Table 1 show the summarization of the related work.

            Table 1:

            Comparative table.

            Iskandar et al. (2023) Fetus datasetImage synthesisProposal of method for realistic ultrasound fetal brain imaging synthesisNo real dataset, focus on image synthesis
            Cai et al. (2018) Fetal ultrasound data, eye-tracking dataAttention mechanismsSonoEyeNet for standardised fetal ultrasound plane detectionLimited dataset and potential hardware dependency
            Fergus et al. (2021) Cardiotocography data1D CNNsModelling segmented cardiotocography time-series signalsFocus on time-series data, no ultrasound
            Belciug (2022) Fetus datasetDifferential evolutionLearning deep neural network architectures for medical imagingNo specific dataset mentioned
            Ravishankar et al. (2016) Fetal ultrasound dataHybrid approachAutomatic segmentation of fetal abdomenNo comprehensive dataset mentioned
            Zeng et al. (2021) Fetal ultrasound dataAttention-gated V-NetHead circumference biometry using deep learningLimited detail on other techniques
            Dandıl et al. (2021) Ultrasound scansYOLOv5 networkFetal movement detection, anatomical plane recognitionLimited scope, YOLOv5 specific
            Burgos-Artizzu et al. (2020) Maternal–fetal ultrasound imagesDeep CNNsAutomatic classification of maternal–fetal ultrasound planesFocus on plane classification, no fetal outcome
            Burgos-Artizzu et al. (2020) Maternal–fetal ultrasound imagesDeep CNNsAutomatic classification of maternal–fetal ultrasound planesSimilar to Burgos-Artizzu et al. (2020)
            Alzubaidi et al. (2022) Fetal head ultrasound dataEnsemble transfer learningFetal head analysis, gestational age, and weight predictionSpecific focus on fetal head analysis
            Sengan et al. (2022) Fetal cardiac ultrasound imagesDeep learningEchocardiographic image segmentation for diagnosing fetal cardiac rhabdomyomaSpecific focus on cardiac analysis
            Pregitha et al. (2022) Ultrasound fetal imagesDense neural networkDown syndrome marker classificationSpecific focus on Down syndrome markers
            Qu et al. (2020) 2D ultrasound imagesDeep learningRecognition of fetal brain standard scan planesSpecific focus on brain scan plane recognition
            Cerrolaza et al. (2018) Maternal–fetal ultrasound imagesDeep CNNsAutomatic classification of maternal–fetal ultrasound planesSimilar to Burgos-Artizzu et al. (2020)
            Diniz et al. (2021) Ultrasound scansDeep learningDeep learning strategies for ultrasound in pregnancyBroad review without specific dataset/technique

            Abbreviation: CNN, convolutional neural network.

            An integrated method that leverages the best features of various segmentation architectures, attention mechanisms, and fusion approaches is a key area of unexplored study in the field of medical image segmentation. While many separate studies have made important contributions, there has been a dearth of meta-analyses that examine how these advances interact with one another. In addition, there is a lack of comprehensive and generalisable solutions because most studies have only looked at one or two imaging modalities or health issues. To fill this void, we need a standardised framework for medical picture segmentation that makes use of attention-guided convolution, multi-modal fusion, and adaptive architectures. In order to overcome the obstacles presented by fetal images, previous research in medical image processing has mostly focused on modifying pre-existing CNN structures. Several methods have been investigated to address the problem of insufficient training data, including transfer learning from more general medical imaging domains, domain adaptation to account for anatomical heterogeneity, and data augmentation techniques. There has been a rise in interest in the application of interpretable AI methods in the field of medical imaging. To better comprehend which aspects of an image contribute to a model’s conclusion, the latter can use attention mechanisms to provide different amounts of importance to distinct regions. However, the development of attention mechanisms that are optimised for the nuances of fetal medical imagery is still in its infancy.


            The approach used in this research makes use of an attention-guided CNN model to examine a large dataset of maternal–fetal ultrasound pictures from an actual clinical scenario at BCNatal, which includes Hospital Clinic and Hospital Sant Joan de Deu in Barcelona, Spain. The dataset was painstakingly curated, and it included over 12,000 photos from 1792 individuals who were receiving standard tests in their second or third trimesters of pregnancy. The gestational age range was 18-40 weeks due to the exclusion of multiple pregnancies, congenital abnormalities, and aneuploidies. A senior maternal–fetal doctor painstakingly annotated each image in the dataset with anatomical plane labels. There are six distinct categories in the dataset, including five primary maternal–fetal anatomical planes. Multiple operators used a variety of ultrasound equipment, including the Voluson E6, Voluson S10, and Aloka systems, to gather the ultrasound images. In Figure 1, the suggested process flow is provided.

            Figure 1:

            The proposed working flow.

            Mathematical formulation

            Let’s denote a fetal medical image as I, which is a two-dimensional matrix representing the pixel intensities. Our aim is to learn a set of features, F, that capture relevant anatomical structures while minimising the impact of noise and variability. Conventionally, a CNN extracts features using convolutional layers, which can be represented as:



            Fij is the value of the feature map at position (i, j).

            I (i+m)(j+n) represents the pixel intensity at position (i+m, j+n) in the input image.

            Kmn is the convolution kernel applied at position (m,n).

            However, conventional convolution averages over the entire image, which may enhance noise and dilute crucial details. We develop an attention mechanism that makes real-time adjustments to the convolution process to solve this problem. The attention mechanism prioritises certain parts of the image above others based on their importance to the mission at hand. The following is how we calculate the attention map, A:



            Aij is the attention weight at position (i, j).

            W a is the learnable attention parameter.

            σ is the activation function.

            || denotes concatenation.

            The attention-guided convolution is then formulated as follows:



            Fijatt is the value of the attention-guided feature map at position (i, j).

            A (i+m)(j+n) is the attention weight at position (i+m, j+n).

            Dataset description

            This research made use of the BCNatal dataset, which was painstakingly assembled from ultrasound photos of mothers and their unborn children taken at Barcelona’s Hospital Clinic and Hospital Sant Joan de Deu. The dataset’s composition, labelling, and distribution are all described in this section.

            Dataset composition

            More than 12,000 ultrasound scans from 1792 patients are included in the dataset’s complete collection. During their second and third trimesters of pregnancy, these women went in for routine checkups. The dataset was created to be broadly representative of maternal–fetal anatomical planes, allowing for a wide range of research and potential clinical applications.

            Labelling and categories

            An experienced maternal–fetal doctor painstakingly assigned anatomical plane labels to each photograph in the dataset. The naming system includes not only the five most common maternal–fetal anatomical planes but also a sixth grouping for all other variations. Labels were placed on the following anatomical planes:

            Fetal abdomen

            This section focuses on fetal weight and fetal abdominal shape.

            Fetal brain

            The study of neurodevelopment benefits from labelled photos from this category.

            Trans-thalamic pictures are essential for studying neurodevelopment.

            Fetal weight analysis can benefit from images in the Trans-cerebellum category.

            Images that help analyse the growth of the heart and lungs fall under the category of Trans-ventricular.

            Fetal femur

            These photos are helpful in determining the approximate birth weight of the fetus.

            Images in the Fetal Thorax category help researchers learn more about how the fetal heart and lungs form.

            Maternal cervix

            Premature birth is studied using images from this category.


            This section includes a wide range of medical pictures used for a variety of applications.

            The variety of ultrasound pictures obtained across multiple anatomical planes is illustrated in Figure 2 (sample photos from the collection). The intricacy and variety of the data are illustrated by the subfigures (a) through (h), which show instances of images belonging to different anatomical categories.

            Figure 2:

            Dataset samples.

            Dataset distribution

            Table 2 displays the dataset distribution across several anatomical categories and planes. For each anatomical plane category, this table reveals the clinical relevance, patient count, and image count. There is a wide variety of clinical settings and uses represented in the collection.

            Table 2:

            Dataset distribution across anatomical planes.

            Anatomical planeClinical useNumber of patientsNumber of images
            Fetal abdomenMorphology, fetal weight595711
            Fetal brainNeurodevelopment10823092
            Trans-cerebellumFetal weight575714
            Trans-ventricularHeart and lung development446597
            Fetal femurFetal weight7541040
            Fetal thoraxHeart and lung development7551718
            Maternal cervixPrematurity9171626

            The distribution of these pictures across different anatomical planes is shown graphically in Figure 3. Different anatomical categories are represented in the sub-figures, showing how each group adds variety to the dataset as a whole. With this representation, users may quickly grasp the dataset’s structure and clinical relevance.

            Figure 3:

            The distribution of images across anatomical planes.

            The Voluson E6, the Voluson S10, and the Aloka systems account for the bulk of the ultrasound machines represented in the dataset. Table 3 details the image distribution across these machines and their respective operators. This table shows how various machines and operators have contributed to the dataset.

            Table 3:

            Distribution of images across ultrasound machines.

            Ultrasound machineNumber of patientsNumber of imagesOperator numberNumber of patientsNumber of images
            Voluson E68075862Operator 14072792
            Voluson S10911082Operator 23442435
            Aloka2703560Operator 32703560

            Figure 4 is a visual representation of how commonly used ultrasound equipment produces particular types of images. The contribution of various machine types to the dataset is graphically represented by the bar graph. This visualisation helps to shed light on the ways in which a variety of machines and operators add to the uniqueness of the dataset.

            Figure 4:

            Pie charts for distribution of images across ultrasound machines.

            Data pre-processing

            The ultrasound pictures cannot be used for analysis or model training until they have undergone data preparation. The steps mentioned in Algorithm 1 used to improve the fetal photos’ quality and applicability for the attention-guided CNN model are detailed in this section.

            Image resizing

            Variations in the picture size in the raw ultrasound data can reduce the model’s accuracy. Images are scaled down to a uniform resolution while preserving their aspect ratio to assure consistency and lessen computing burden. This process of reduction can be written as follows:



            Original Image refers to the raw ultrasound image.

            Target Resolution is the desired resolution for the resized image.

            Image enhancement

            Ultrasound images are enhanced using image processing techniques to increase contrast and reveal hidden details. The term “histogram equalisation,” which describes a typical method, can be defined as follows:



            Resized Image is the image after resizing.

            histeq denotes the histogram equalisation operation.


            Pixel values must be normalised to a consistent range to facilitate reliable model training. Typically, min-max normalisation is used to convert pixel values to the [0, 1] range:



            Enhanced Image represents the image after enhancement.

            Data augmentation

            To avoid overfitting and boost model generalisation, data augmentation methods are used to artificially expand the diversity of the training dataset. The following is a definition of augmentation operations, which include rotation, flipping, and zooming:



            Normalised Image is the image after normalisation.

            augment denotes the data augmentation operation.

            Label encoding

            Each image has a unique number that represents the label for the anatomical plane linked with it. Model training is simplified by this encoding because numerical inputs are required by most machine learning techniques. The following is one such expression for the label encoding procedure:



            Anatomical Plane Label is the categorical label associated with the image.

            encode represents the label encoding operation.

            Algorithm 1:

            Data pre-processing of fetal ultrasound images

            Input: Raw Ultrasound Image, Anatomical Plane Label
            Output: Preprocessed Image, Encoded Label
            ResizedImage ← resize(RawUltrasoundImage, TargetResolution);
            // Resize the raw ultrasound image to the desired resolution for uniformity
            EnhancedImage ← histeq(ResizedImage);
            // Apply histogram equalization to improve image contrast and visibility
            NormalizedImage ← normalize(EnhancedImage);
            // Normalize pixel values to the [0, 1] range for stable model training
            AugmentedImage ← augment(NormalizedImage);
            // Apply data augmentation techniques to increase dataset diversity
            EncodedLabel ← encode (AnatomicalPlaneLabel);
            // Encode the anatomical plane label into a numerical value for model training
            Proposed novel model AG-CNN

            Here, we introduce our unique model, see Algorithm 2, the AG-CNN, which was developed for the purpose of adaptive feature extraction in the interpretation of fetal medical images. To improve its capacity to zero in on important regions and characteristics within ultrasound pictures, the AG-CNN incorporates attention mechanisms into the regular CNN architecture.

            Architecture overview

            Convolutional layers, pooling layers, attention modules, and fully linked layers all make up the AG-CNN. Targeting fetal ultrasound pictures, it seeks to automatically learn and extract relevant features crucial for precise classification and segmentation.

            Attention mechanism

            The AG-CNN relies heavily on its attention mechanism to selectively zero in on important parts of the ultrasound pictures. Our model makes use of the spatial attention process, which entails creating attention maps to zero down on the important details of an input image. The feature maps F from the previous convolutional layer are used to generate the attention map A, which is a weighted sum of those maps.



            A is the attention map.

            W represents the learnable weight matrix.

            F denotes the feature maps.

            The attention map A is then element-wise multiplied with the feature maps F to obtain the attended feature maps, Fattended .



            ⊙ represents element-wise multiplication.

            CNN architecture with attention

            The convolutional layers of an AG-CNN are where the attention mechanism is embedded. The method that can be utilised to explain the process is shown in Figure 5.

            Figure 5:

            AG-CNN architecture. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Loss function

            For classification tasks, we use the categorical cross-entropy loss function Lclassification to optimise the model’s weights. For segmentation tasks, we adopt the dice loss Lsegmentation to ensure accurate boundary localisation.



            ytrue represents the ground truth segmentation map.

            ypred denotes the predicted segmentation map.

            ϵ is a small constant to avoid division by 0.

            Algorithm 2:

            Detailed architecture

            Input: Input Image (Dimensions: W × H)
            Output: Class Prediction
            FeatureMaps1 ← ApplyConvolution(InputImage, 3x3kernel);
            FeatureMaps2 ← ApplyConvolution(InputImage, 3x3kernel);
            FeatureMaps3 ← ApplyConvolution(InputImage, 3x3kernel);
             ApplyAttention(FeatureMaps1, FeatureMaps2, FeatureMaps3);
            FeatureMaps4 ←
             ApplyConvolution(AttendedFeatureMaps, 3x3kernel);
             ApplyConvolution(AttendedFeatureMaps, 3x3kernel);
            FeatureMaps6 ←
             ApplyConvolution(AttendedFeatureMaps, 3x3kernel);
             ApplyPooling(FeatureMaps4, FeatureMaps5, FeatureMaps6, 2x2);
            FC Layer1Output ← ApplyFullyConnected(FlattenedFeatures);
            FC Layer2Output ← ApplyFullyConnected(FC Layer1Output);
            ClassPredictionSoftmax(FC Layer2Output);
            Training strategy

            Backpropagation and gradient descent are used to train AG-CNN. We then adjust the model’s parameters to minimise the loss function. To avoid overfitting and maintain training stability, we additionally use methods such as dropout and batch normalisation.

            In conclusion, AG-CNN is intended to improve feature extraction when analysing medical images of a fetus. The model’s accuracy in classification and segmentation tasks is enhanced by the incorporation of attention mechanisms that teach it to zero in on important regions within ultrasound pictures. The adaptive feature extraction capabilities of the AG-CNN are the result of its structure, attention mechanism, loss functions, and training approach.


            Here, we provide the outcomes of our suggested AG-CNN model and evaluate its efficiency in comparison to three industry-standard architectures: DenseNet 169, ResNet50, and VGG16. Key measures like loss, accuracy, and the confusion matrix are used in the analysis.

            Performance metrics

            We evaluated the models using the following criteria:

            The difference between the expected and actual values is measured by the loss function. If the value is lower, the model fits the data better. The task at hand and the data’s inherent characteristics heavily influence the loss function selected. Categorical cross-entropy loss is a popular option for the loss function in fetal ultrasound image categorisation tasks. For issues requiring classification into many classes, where each input image can only be classified into one of those classes, this loss function is utilised.

            The following is the formula for the loss of categorical cross-entropy:



            n is the number of samples (images) in the dataset.

            C is the number of classes.

            yij is the ground truth label of sample i for class j, which is 1 if the image belongs to class j and 0 otherwise.

            pij is the predicted probability of sample i belonging to class j outputted by the model. Figure 6 shows the training and testing loss of AG-CNN.

            Figure 6:

            Training and testing loss of AG-CNN. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Accuracy is measured as the percentage of instances for which a correct class was predicted. It provides an overview of the reliability of the model. Accuracy of AG-CNN training and testing is depicted in Figure 7.

            Figure 7:

            Training and testing accuracy of AG-CNN. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Confusion matrix: true positives, true negatives, false positives, and false negatives are all listed in the confusion matrix. Metrics like accuracy, recall, and F1-score can be derived from this. In Figure 8, AG-CNN’s confusion matrix is provided. Figure 9 displays the AG-CNN-classified correct classes.

            Figure 8:

            Confusion matrix for AG-CNN. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Figure 9:

            Correctly classified classes by AG-CNN. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Comparative analysis

            On our fetal ultrasound dataset, we compared AG-CNN’s results with those of the chosen architectures. To maintain a consistent standard of comparison, all models were trained using the identical sets of training and validation data.

            Figure 10a displays the accuracy curve during training, while Figure 10b displays the loss curve during testing. Models’ relative efficacy is compared in Table 4.

            Figure 10:

            Comparative analysis (a) training and testing accuracy curves and (b) training and testing loss curves. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Table 4:

            Comparative performance of models.

            ModelTraining lossTesting lossTraining accuracyTesting accuracy
            DenseNet 1690.400.530.920.90

            Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Figure 11 presents the results of a comparison between our proposed AG-CNN and the state-of-the-art models DenseNet 169, ResNet50, and VGG16 in terms of loss and accuracy. The model’s attention mechanism helps to capture relevant features, which in turn boosts the accuracy of its classifications. In addition, the confusion matrix analysis shows where the various models fall short for various categories.

            Figure 11:

            Comparative confusion matrices. Abbreviation: AG-CNN, attention-guided convolutional neural network.

            Finally, when comparing the AG-CNN architecture’s performance with that of other more conventional models, the former emerges victorious in the classification of fetal ultrasound images. It is a potential solution for medical image analysis tasks thanks to its attention-guided strategy, which improves feature extraction and accuracy.


            The study leverages AG-CNN for adaptive feature extraction in fetal medical image analysis. The utilisation of AG-CNN demonstrates its effectiveness in enhancing feature extraction, providing a more nuanced understanding of maternal–fetal anatomical structures. The innovative approach contributes to the field of fetal medical image analysis, offering promising outcomes for accurate and adaptive feature extraction. Furthermore, the comprehensive dataset from BCNatal, comprising over 12,000 images from routine pregnancy screenings, enhances the study’s robustness. The inclusion of diverse anatomical planes and careful labelling by a senior clinician enriches the dataset’s quality and ensures the model’s applicability to various clinical scenarios. Despite the strengths, certain limitations merit consideration. The study excludes cases of multiple pregnancies, congenital malformations, and aneuploidies, narrowing the scope of applicability. Additionally, the reliance on a specific dataset from BCNatal may introduce biases inherent to the population served by the two centres in Barcelona. The diversity of ultrasound machines and operators, while reflecting real-world variability, introduces potential variability in image quality. The study acknowledges these variations and provides a detailed breakdown, yet it is essential to be mindful of their impact on model generalisation. The discussion concludes by outlining potential directions for future research. Addressing the current study’s limitations may involve expanding the dataset to include a broader demographic and incorporating additional clinical scenarios. Further investigations could explore ensemble approaches, combining attention-guided techniques with other deep learning architectures. Moreover, the application of the proposed AG-CNN model to real-time scenarios and its integration into clinical workflows warrant exploration. Collaborative efforts with medical practitioners can enhance the model’s clinical relevance and foster translational applications in fetal medicine. In summary, the discussion reflects on the study’s achievements, acknowledges its limitations, and provides a roadmap for future research endeavours in the dynamic field of fetal medical image analysis.


            The study set out to enhance maternal–fetal medical image analysis through the application of AG-CNN. The results, as discussed in the preceding sections, underscore the efficacy of AG-CNN in adaptive feature extraction, providing valuable insights into various maternal–fetal anatomical structures. The central question guiding this study was whether AG-CNN could significantly contribute to the field of fetal medical image analysis. The affirmative answer is evident in the improved feature extraction capabilities demonstrated by AG-CNN, leading to enhanced accuracy in anatomical plane detection. By leveraging a meticulously curated dataset from routine pregnancy screenings, the study establishes a foundation for robust and adaptive model performance. As a unique approach to adaptive feature extraction in fetal medical picture analysis, AG-CNN was introduced in this study. When compared to well-established models like DenseNet 169, ResNet50, and VGG16, the suggested AG-CNN showed superior performance in terms of smaller training and testing losses and higher training and testing accuracies. Recognition of fetal anatomical planes could benefit from the AG-CNN because of its ability to efficiently capture and emphasise key aspects through attention mechanisms. The potential for AG-CNN to aid in prenatal screening and obstetric diagnostics is demonstrated by these findings; hence, this technique shows promise as a valuable tool in the field of fetal medical image analysis. The contributions of this study extend beyond the realm of academic inquiry. AG-CNN’s proficiency in maternal–fetal image analysis holds the potential to redefine clinical practices in fetal medicine. The model’s adaptability to diverse clinical scenarios, as evidenced by the comprehensive dataset, positions it as a valuable tool for clinicians in real-world applications. Acknowledging the limitations inherent in any scientific endeavour, the study paves the way for future research directions. Expanding the dataset’s diversity, addressing real-time applicability, and exploring collaborative ventures with medical practitioners represent promising avenues for further exploration. In conclusion, the current study successfully tackles the research problem by demonstrating the effectiveness of AG-CNN in maternal–fetal medical image analysis. The findings not only contribute to the academic discourse but also hold significant implications for advancing clinical practices in fetal medicine.


            The authors extend their appreciation to the King Salman Center for Disability Research for funding this work through Research Group no. KSRG-2023-476. (funder id: http://dx.doi.org/10.13039/501100019345).


            1. Alzubaidi M, Agus M, Shah U, Makhlouf M, Alyafei K, Househ M. 2022. Ensemble transfer learning for fetal head analysis: from segmentation to gestational age and weight prediction. Diagnostics (Basel). Vol. 12(9):2229

            2. Belciug S. 2022. Learning deep neural networks’ architectures using differential evolution. Case study: medical imaging processing. Comput. Biol. Med. Vol. 146:105623. [Cross Ref]

            3. Burgos-Artizzu XP, Coronado-Gutiérrez D, Valenzuela-Alcaraz B, Bonet-Carne E, Eixarch E, Crispi F, et al.. 2020. Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes. Sci. Rep. Vol. 10(1):10200. [Cross Ref]

            4. Cai Y, Sharma H, Chatelain P, Noble JA. 2018. SonoEyeNet : standardized fetal ultrasound plane detection informed by eye tracking. Proc. IEEE Int. Symp. Biomed Imaging. Vol. 2018:1475–1478

            5. Cerrolaza JJ, Sinclair M, Li Y, Gomez A, Ferrante E, Matthew J, et al.. 2018. Deep learning with ultrasound physics for fetal skull segmentation2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); p. 564–567

            6. Dandıl E, Turkan M, Urfalı FE, Bıyık İ, Korkmaz M. 2021. Fetal movement detection and anatomical plane recognition using YOLOv5 network in ultrasound scans [Ultrason Taramalarında YOLOv5 Ağı Kullanarak Anatomik Yapıların Tanınması ve Fetüs Hareketlerinin Tespiti]. Eur. J. Sci. Technol. [Avrupa Bilim ve Teknoloji Dergisi]. Vol. 26:208–216. [Cross Ref]

            7. Diniz PHB, Yin Y, Collins S. 2021. Deep learning strategies for ultrasound in pregnancy. Eur. Med. J. Reprod. Health. Vol. 6(1):73–80

            8. Fergus P, Chalmers C, Montanez CAC, Reilly D, Lisboa P, Pineles B. 2021. Modelling segmented cardiotocography time-series signals using one-dimensional convolutional neural networks for the early detection of abnormal birth outcomes. IEEE Trans. Emerg. Top. Comput. Intell. Vol. 5(6):882–892

            9. Fiorentino MC, Villani FP, Di Cosmo M, Frontoni E, Moccia S. 2022. A review on deep-learning algorithms for fetal ultrasound-image analysis. 1–31. https://arxiv.org/pdf/2201.12260.pdf

            10. Ghelich Oghli M, Shabanzadeh A, Moradi S, Sirjani N, Gerami R, Ghaderi P, et al.. 2023. Automatic fetal biometry prediction using a novel deep convolutional network architecture. Phys. Med. Vol. 88:127–137. [Cross Ref]

            11. Horgan R, Nehme L, Abuhamad A. 2023. Artificial intelligence in obstetric ultrasound: a scoping review. Prenat. Diagn. Vol. 43(9):1176–1219. [Cross Ref]

            12. Iskandar M, Mannering H, Sun Z, Matthew J, Kerdegari H, Peralta L, et al.. 2023. Towards realistic ultrasound fetal brain imaging synthesis. arXiv preprint. arXiv:2304.03941

            13. Mehrdad S, Liu F, Pham MT, Lelevé A, Atashzar SF. 2021. Review of advanced medical telerobots. Appl. Sci. Vol. 11(1):209

            14. Płotka S, Klasa A, Lisowska A, Seliga-Siwecka J, Lipa M, Trzciński T, et al.. 2022. Deep learning fetal ultrasound video model match human observers in biometric measurements. Phys. Med. Biol. Vol. 67(4):

            15. Pregitha RE, Vinod Kumar RS, Kumar C. 2023. Down syndrome markers classification via dense neural network in ultrasound foetal image. Soft Computing. 1–13

            16. Pu B, Li K, Li S, Zhu N. 2021. Automatic fetal ultrasound standard plane recognition based on deep learning and IIoT. IEEE Trans. Ind. Inform. Vol. 17(1):7771–7780. [Cross Ref]

            17. Qu R, Xu G, Ding C, Jia W, Sun M. 2020. Deep learning-based methodology for recognition of fetal brain standard scan planes in 2D ultrasound images. IEEE Access. Vol. 8:44443–44451

            18. Ravishankar H, Prabhu SM, Vaidya V, Singhal N. 2016. Hybrid approach for automatic segmentation of fetal abdomen from ultrasound images using deep learning2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI 2016); Prague, Czech Republic. 13-16 April 2016; p. 779–782

            19. Sengan S, Mehbodniya A, Bhatia S, Saranya SS, Alharbi M, Basheer S, et al.. 2022. Echocardiographic image segmentation for diagnosing fetal cardiac rhabdomyoma during pregnancy using deep learning. IEEE Access. Vol. 10:114077–114091. [Cross Ref]

            20. Tenajas R, Miraut D, Illana CI, Alonso-gonzalez R, Arias-Valcayo F, Herraiz JL. 2023. Recent advances in artificial intelligence-assisted ultrasound scanning. Appl. Sci. Vol. 13(6):3693

            21. Wang Y, Ge X, Ma H, Qi S, Zhang G, Yao Y. 2021. Deep learning in medical ultrasound image analysis: a review. IEEE Access. 9[Cross Ref]

            22. Xiao S, Zhang J, Zhu Y, Zhang Z, Cao H, Xie M, et al.. 2023. Application and progress of artificial intelligence in fetal ultrasound. J. Clin. Med. Vol. 12(9):3298

            23. Zeng Y, Tsui PH, Wu W, Zhou Z, Wu S. 2021. Fetal ultrasound image segmentation for automatic head circumference biometry using deeply supervised attention – Gated V – Net. J. Digit. Imaging. Vol. 34(1):134–148. [Cross Ref]

            Author and article information

            Journal of Disability Research
            King Salman Centre for Disability Research (Riyadh, Saudi Arabia )
            22 February 2024
            : 3
            : 2
            : e20240005
            [1 ] Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia ( https://ror.org/02f81g417)
            [2 ] King Salman Center for Disability Research, Riyadh, Saudi Arabia ( https://ror.org/01ht2b307)
            [3 ] Department of Information Systems, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia ( https://ror.org/02f81g417)
            [4 ] Electrical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, Saudi Arabia ( https://ror.org/02f81g417)
            Author notes
            Correspondence to: Emad Mahrous Awwad*, e-mail: 442106835@ 123456student.ksu.edu.sa , Tel: +966562939242
            Author information
            Copyright © 2024 The Authors.

            This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY) 4.0, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

            : 06 September 2023
            : 08 January 2024
            : 10 January 2024
            Page count
            Figures: 11, Tables: 4, References: 23, Pages: 16
            Funded by: King Salman Center for Disability Research
            Award ID: KSRG-2023-476
            The authors extend their appreciation to the King Salman Center for Disability Research for funding this work through Research Group no. KSRG-2023-476. (funder id: http://dx.doi.org/10.13039/501100019345).

            Social policy & Welfare,Political science,Education & Public policy,Special education,Civil law,Social & Behavioral Sciences
            adaptive feature extraction,medical image analysis,convolutional neural network,diagnostic precision,fetal medical imaging,attention-guided convolution


            Comment on this article