387
views
0
recommends
+1 Recommend
1 collections
    3
    shares

      King Salman Center for Disability Research is pleased to invite you to submit your scientific research to the Journal of Disability Research. JDR contributes to the Center's strategy to maximize the impact of the field, by supporting and publishing scientific research on disability and related issues, which positively affect the level of services, rehabilitation, and care for individuals with disabilities.
      JDR is an Open Access scientific journal that takes the lead in covering disability research in all areas of health and society at the regional and international level.

      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automated Gesture Recognition Using African Vulture Optimization with Deep Learning for Visually Impaired People on Sensory Modality Data

      Published
      research-article
      Bookmark

            Abstract

            Gesture recognition for visually impaired persons (VIPs) is a useful technology for enhancing their communications and increasing accessibility. It is vital to understand the specific needs and challenges faced by VIPs when planning a gesture recognition model. But, typical gesture recognition methods frequently depend on the visual input (for instance, cameras); it can be vital to discover other sensory modalities for input. The deep learning (DL)-based gesture recognition method is effective for the interaction of VIPs with their devices. It offers a further intuitive and natural way of relating with technology, creating it more available for everybody. Therefore, this study presents an African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data (AVODL-GRSMD) technique. The AVODL-GRSMD technique mainly focuses on the utilization of the DL model with hyperparameter tuning strategy for a productive and accurate gesture detection and classification process. The AVODL-GRSMD technique utilizes the primary data preprocessing stage to normalize the input sensor data. The AVODL-GRSMD technique uses a multi-head attention-based bidirectional gated recurrent unit (MHA-BGRU) method for accurate gesture recognition. Finally, the hyperparameter optimization of the MHA-BGRU method can be performed by the use of African Vulture Optimization with Deep Learning (AVO) approach. A series of simulation analyses were performed to demonstrate the superior performance of the AVODL-GRSMD technique. The experimental values demonstrate the better recognition rate of the AVODL-GRSMD technique compared to that of the state-of-the-art models.

            Main article text

            INTRODUCTION

            Human–computer interaction (HCI) refers to a regulated loop process where data sent by devices and humans should be considered whole ( Padmavathi, 2021). This interaction loop is an integration of feedforwards and feedback. Meeting the challenges of blind persons in accessing contextual and graphical data is challenging ( Agarwal and Das, 2023). Gesture detection has a considerable amount of realistic applications that include remote teaching guidance, virtual reality/augmented reality (VR/AR) games, and interaction with remote automation equipment operations, remote health care, and intelligent vehicles ( Gorobets et al., 2022). Moreover, gesture detection grabs special interest to understand the ways to enhance the standard of life of people with hearing disorders, and gesture detection is utilized in translation tasks and sign language recognition for deaf persons ( Gangrade and Bharti, 2023). Several research studies on gesture detection were carried out, and remarkable progression has been made in this field.

            Gesture detection is classified into dynamic gesture detection and static gesture detection ( Li et al., 2022). Static gesture detection needs organizing and learning the gestures’ spatial features without concerning the temporal features ( Ryumin et al., 2023). Conversely, dynamic gesture detection needs to consider both the temporal and spatial features of the gesture since it varies over period. Hence, dynamic gesture detection is highly sophisticated compared to static gesture detection; however, the utilization of dynamic gestures is broader ( Sahana et al., 2022). This study offers a lightweight gesture action detection network for real-time HCI and control. However, the potentiality of potential gesture detection remains an unsolved issue because of the variances in the syntactic and semantic framework of gestures ( Pandey, 2023). At present, fully automatic methods for detecting various dynamic gestures do not exist ( Gangrade and Bharti, 2023). Devising these methods needs deep semantic analyses that can be conducted at a superficial level owing to the limitation of text analysis approaches and knowledge bases and can attain word level detection. Thus the presented approach has few reference important for the sign language detection of persons with hearing disorders ( Varsha and Nair, 2021).

            This study presents an African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data (AVODL-GRSMD) technique. The AVODL-GRSMD technique utilizes the primary data preprocessing stage to normalize the input sensor data. The AVODL-GRSMD technique uses a multi-head attention-based bidirectional gated recurrent unit (MHA-BGRU) method for accurate gesture recognition. Finally, the hyperparameter optimization of the MHA-BGRU technique is performed by the use of the AVO method. A series of simulation analyses were conducted to validate the improved performance of the AVODL-GRSMD technique.

            RELATED STUDIES

            The authors in Adeel et al. (2022) define a gesture-based confidence assessment (GCA) method for hand gesture recognition (HGR) for identifying the state of mind based on the hand actions in an interview as a context. This method is also valuable for visually impaired persons (VIPs) when conducting or performing an interview. Previously, there was without work completed to identify a person’s state of mind utilizing HGR. This method was dependent upon a conventional neural network (CNN) with long short-term memory networks (LSTM) for capturing the temporal data. In Zhang and Zeng (2022), touch gestures were predictable by a trained radial basis function (RBF) network, but integrated gestures can be demonstrated by Petrinet, which establishes a logic, timing, and spatial relationship model. As a result, the Braille input regarding multi-touch gesture recognition was executed.

            Deepa et al. (2023) presented a structure that utilizes CNN for HGR. The gesture sign has been verified and exposed to binarization, but the image was divided into background and foreground. Contours can be identified in the binarized image. Feature extraction has been completed utilizing the SIFT technique. The feature extraction can be used by the CNN technique for recognizing the HGR. The HGR was provided as an outcome from the text format. Can et al. (2021) proposed a DL-CNN approach which categorizes hand gestures efficiently in the investigation of near-infrared and color natural images. This paper presents a novel deep learning (DL) technique dependent upon CNN for recognizing hand gestures, enhancing the rate of recognition, testing, and training time.

            Al-Hammadi et al. (2020) presented an effective diffusion-convolutional neural networks (DCNN) algorithm for HGR. The presented technique utilized transfer learning (TL) to beat the lack of huge labeled hand gesture database. Kraljević et al. (2020) examined a smart home automatization method specially planned for providing real-time sign language recognition. A new hierarchical system can be projected comprising resource-and-time-aware elements—a wake-up element and better performance sign recognition element dependent upon the Conv3D network. Lahiani and Neji (2018) proposed a static HGR method for mobile devices by integrating the histogram of orientated gradients (HOG) and local binary pattern (LBP) features that correctly identify hand poses.

            THE PROPOSED MODEL

            This study concentrates on the development of an automated gesture recognition tool named the AVODL-GRSMD technique for visually impaired people based on sensory modality data. The AVODL-GRSMD technique exploits the DL model with hyperparameter tuning strategy for effectual and accurate gesture detection and classification process. The AVODL-GRSMD technique follows three major processes, namely data preprocessing, MHA-BGRU recognition, and AVO-based hyperparameter tuning. Figure 1 demonstrates the workflow of the AVODL-GRSMD system.

            Figure 1:

            Workflow of the AVODL-GRSMD system. Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            Data preprocessing

            To preprocess the input data, data normalization is adopted. The data recorded by wearable sensors were normalized and cleaned for achieving suitable and consistent data for training a recognition element. Primarily, the imputation process can set the missing values of sensor databases with the linear interpolation method. Next, the noises can be removed with median filtering and a third-order low-pass Butterworth filter with a 20 Hz cutoff frequency. A normalized model converts all the sensor data with mean and standard derivation.

            Gesture recognition using the MHA-BGRU technique

            At this stage, the preprocessed input can be passed into the MHA-BGRU method for gesture recognition. Recurrent neural network (RNN) is specialized to process sequence data, namely audio, time series, and text, different from CNN, focusing on the spatial features of the input ( Bao et al., 2022). In general, RNN conducts a similar computation process cyclically on all the segments of the sequence, and the subsequent output depends on prior calculation. From the network architecture, it involves memory for storing the hidden internal state h t that is evaluated by the input x t and prior hidden layer (HL) h t −1, such that

            (1) ht=fW(Wxhxt+Whhht1+bh).

            In Eq. (1), W hh denotes the weight of the hidden-to-HL, and W yh indicates the weight of hidden-to-output; f w denotes the HL function, namely tanh activation function with W parameter shared across time (viz., W xh specifies the weight of the input-HL); b represents the corresponding bias vector, and the predicted output is as follows:

            (2) yt=Wyhht+by.

            But this typical architecture faces problems of vanishing gradients and exploding weights on long-term sequences.

            The bidirectional RNN structure allows the output layer to receive future and past data for all the points in the input series. More accurately, a forward RNN learns from prior information, whereas reverse RNN learns from future information in such a way that all the time steps make optimum usage of lower- and upper-relevant data. In addition, both outputs are spliced together as the concluding output of bidirectional recurrent neural network (BiRNN).

            Given that, BiGRU is a BiRNN that exploits the GRU for all the hidden nodes. BiGRU splits GRU neurons into backward and forward layers that match with negative and positive time directions, correspondingly.

            The existing statement of the HL of BiGRU can be defined by the existing input x t , the HL statement output of backward layer ht1, and the forward layer ht1. Meanwhile, BiGRU is considered as two single GRUs, the HL state of BiGRU at t time is attained by the weighted amount of ht1 and ht1 that is expressed below:

            (3) ht=GRU(xt,ht1)

            (4) ht=GRU(xt,ht1,)b

            (5) h=wht+vtht+bt.

            Briefly, BiGRU allows modeling the possible relationship between future and historical ship trajectory status with the existing state, thus raising prediction accuracy.

            The attention-based mechanism was devised in the field of image detection, and now it is utilized instead of RNN in the field of machine translation. The attention-based module highlighted the curial influencing factors. By allocating a weight to all the elements in the input sequence, thus increasing accuracy of the model:

            (6) f(xiy)=(W1*xiW2*y)

            (7) Attention=i=1nsoftmax(f(xi,y))*xi,

            where x i signifies the input series. It can be mapped in the [0,1] interval by the normalized exponential function that can be “weight.” Moreover, Dot product attention refers to the weighted combination of x i .

            The multi-head attention (MHA) module emerges as the situation requires, with the attention-related module being extensively utilized in image and natural language processing (NLP) tasks. Furthermore, all the iteration’s linear conversion parameters W for, K, and V seems to be unique; they are not shared. MHA can be utilized for processing the information from the BiGRU output layer instead of applying average or maximum pooling, as follows:

            (8) Attention(Q,K,V)=softmax(QKTdk)V

            (9) MultiHead(Q,K,V)=(head1head2head3headh)W0.

            Therefore, the MHA module, a fusion of multi-attention-based mechanisms, is considered a weighting system for information that could allocate weight to the HL of BiGRU so that they effectively apply data sources while making predictions.

            Hyperparameter tuning using the AVO algorithm

            Finally, the AVO algorithm can be applied for the optimal hyperparameter adjustment of the MHA-BGRU model. The process involved in the AVO algorithm is elaborated as follows ( Liu et al., 2023). The fitness value for every answer was evaluated after forming the initial population, the better answer of the initial Vulture group and the better answer of the second Vulture group is defined, and the following equation can choose other answers.

            (10) R(i)={BestVulture1ifpi=L1BestVulture2ifpj=L2.

            In Eq. (10), the parameters L 1 and L 2 should be initialized with values within [0,1] before the search process, and the sum of both variables was equivalent to one as follows:

            (11) pi=FiΣi=1nFi

            Equation (13) is used for mathematical modeling of these behaviors. It is used for transition from ERP to ETp that is stimulated by the satiety speed or hunger of Vultures.

            (12) T=h×(sinw(π2×iterimaxiter)+cos(π2×iterimaxiter)1)

            (13) F=(2×rand1+1)×z×(1iterimaxiter)+t

            In Eqs. (12) and (13), F denotes that the agent is satisfied, iter i shows the existing amount of iteration, max_iter represents the overall amount of iterations, and z represents the randomly generated value within [−1,1] which changes all the iterations, h denotes the random integer within [−2, 2]. rand 1 represents the random integer within [0,1], and w denotes the parameter with constant number. If | F| > 1 the agent looks for food in dissimilar regions, and AVO algorithm entered the ERP. If | F| < 1 AVO algorithm enters the ETp, and agent forage in the neighborhood of solution.

            Here, the variable P 1, which ranges from 0 to 1, is used for choosing two dissimilar approaches. Before the search operation, these parameters should be valued. A random integer within [0,1] is generated for selecting every strategy in the ERP rand. If ≥ P 1, Eq. (15) is exploited. If randp1<P1, Eq. (16) is applied:

            (14) P1(i+1){Eq. (15)ifp1randp1Eq. (16)ifp1randp1

            (15) P1(i+1)=R(i)D(i)×F

            (16) D(i)=|X×R(i)P1(i)|.

            In Eq. (14), P1( i+1) denotes the agent’s position vector in the following iteration, and F indicates the agent’s satiation rate. In Eq. (16), R( i) denotes the better agent. Moreover, X denotes the agent randomly moving to secure food from other agents and can be provided by X = 2× rand, where rand denotes the random integer within [0, 1]. P1( i) denotes the existing position vector of the Vultures.

            (17) P1(i+1)=R(i)F+rand2×(UBLB)×rand3+LB),

            where rand 2 has a random integer within [0,1], and rand 3 has taken a number closer to 1.

            The fourth stage: exploitation

            If | F| < 1, AVO algorithm entered the ETp that also has stages where dual dissimilar strategies are applied. The selection degree of all the strategies in every internal stage is defined by two parameters, such as P 2 and P 3. In the first stage, the P 2 variable is used for selecting the strategy; in the second stage, the P 3 parameter is used for selecting the strategy. Both parameters should be set to 0 and 1 before carrying out the searching process.

            The AVO algorithm enters the initial phase in the ETp if | F| is between 1 and 0.5. Initially, two dissimilar approaches of turn flight and siege combat are implemented. P 2 represents the selection of all the strategies that should be estimated beforehand by implementing the search process, and the value should be between 0 and 1. Initially, rand, a random integer within [0,1], is produced. If the number is ≥ P 2, the Siegefight approach is gradually implemented. But if this random number is < P 2, then rotary flight strategy is implemented.

            (18) P1(i+1)={Eq.(20)ifp2randp2Eq.(23)ifp2randp2.

            If | F| ≥ 0.5, then the agent was comparatively full and had sufficient energy. If the agent congregates on a food source, then it causes intense conflict over food. The weak agent tries to take and exhaust food from healthy agents by collecting around the healthy agent.

            (19) P1(i+1)=D(i)×(F+rand4)d(t)

            (20) d(t)=R(i)P1(i),

            where D( i) is evaluated using Eq. (16), and F denotes the satiety of the agent; rand 4 denotes the random integer within [0,1], which is used for increasing the randomness factor. R( i) is the better agent of the second category chosen by means of Eq. (10). In the existing iteration, P1( i) indicates the agent’s existing position vector, where the distance between agent and better agent in the second category is attained.

            Agent frequently implements a circling flight. A spiral is generated between each agent and top two agents, and it can be discussed as follows:

            (21) S1=R(i)×(rand5×P1(i)2π)×cos(P1(i)),S2=R(i)×(rand6×P1(i)2π)× sin (P1(i))

            (22) P1(i+1)=R(i)(S1+S2).

            R( i) indicates the position vector of the two better agents in the existing iteration. Cos and sin represent the sin and cos functions correspondingly. rand 5 and rand 6 show randomly generated numbers between 0 and 1. S 1 and S 2 are attained using Eq. (22).

            Exploitation (second stage)

            Here, ETp, the movement of two agents, collects different kinds of agents over the food sources, and aggressive and encirclement fighting is performed for finding food. If | F| < 0.5, this step of the method was implemented. Initially, randp3 is produced that is randomly generated within [0,1]. If randp3P3, the strategy was to collect different kinds of agents on the food source. Or else, if the value is < P 3, then siege fight offensive strategy was implemented.

            (23) P1(i+1)={Eq.(26)ifp3randp3Eq. (27)ifp3randp3

            (24) A1=BestVulture(i)BestVulture1(i)×p1(i)BestVulture1(i)p1(i)2×F,A2=BestVulture(i)BestVulture2(i)×p1(i)BestVulture2(i)p1(i)2×F,

            where BestVulture 1 ( i) and BestVulture 2 ( i) denote the better agent from initial and second category in the existing iteration, whereas F denotes the agent’s satiety, and P1( i) shows the existing vector position of the agent.

            (25) P1(i+1)=A1+A22.

            Lastly, the aggregation of each agent is performed using Eq. (26), where A 1 and A 2 are attained using Eq. (24), and P1( i+1) indicates agent position vector in the following iteration. Figure 2 represents the flowchart of the AVO algorithm.

            Figure 2:

            Flowchart of the AVO algorithm.

            If | F| < 5, head agent becomes hungry and weaker and doesn’t have sufficient energy to fight other agents. At the same time, other agents turn out to be aggressive while finding food. They move in dissimilar directions toward the head of agent.

            (26) P1(i+1)=R(i)|d(t)|×F×Levy(d),

            where d( t) characterizes the distance of the agent to the better agent of the second category that is evaluated using Eq. (20). Levy flight (LF) designs are employed for higher performance of african-vulture optimization algorithm (AVOA) in Eq. (26), and LF was recognized and utilized in several MH approaches. LF is computed utilizing Eq. (27).

            (27) LF(x)=0.01×u×σ|v|1p,σ=(Γ(1+β)×sin(πβ2)Γ(1+β2)×β×2(β12))1p.

            Fitness choice is a critical aspect of the AV system. An encoded result can be utilized to evaluate the aptitude of candidate results. Recently, the accuracy value is key condition employed for scheming a fitness function.

            (28) Fitness= max (P)

            (29) P=TPTP+FP,

            in which TP demonstrates the true positive and FP signifies the false positive values.

            RESULTS AND DISCUSSION

            In this section, the gesture recognition results of the AVODL-GRSM technique are validated on two datasets: UCI HAR dataset (UCI HAR) and USC HAD dataset (USC HAD), as illustrated in Table 1.

            Table 1:

            Details on two databases.

            ClassLabelsDataset
            UCI HAR
            No. of samples

            USC HAD
            WalkingC-117228476
            Walking upstairsC-215444709
            Walking downstairsC-314064382
            SittingC-417775810
            StandingC-519065240
            Laying/sleepingC-619448331
            Total number of samples1029936948

            Figure 3 demonstrates the classifier results of the AVODL-GRSM technique on the UCI HAR dataset. Figure 3a and b portray the confusion matrix rendered by the AVODL-GRSM method on 70:30 of TRP/TSP. The figure specifies that the AVODL-GRSM technique has identified and classified all six class labels accurately. Likewise, Figure 3c shows the PR analysis of the AVODL-GRSM approach. The figures pointed out that the AVODL-GRSM approach has acquired maximum PR performance under six classes. Eventually, Figure 3d illustrates the receiver operating characteristic (ROC) investigation of the AVODL-GRSM model. The results portrayed that the AVODL-GRSM approach has proficient outcomes with maximum ROC values under six class labels.

            Figure 3:

            Classifier result on the UCI HAR dataset. (a and b) Confusion matrices, (c) PR curve, and (d) ROC curve.

            In Table 2, a brief recognition result of the AVODL-GRSM technique is clearly portrayed on the UCI HAR dataset. The results identified that the AVODL-GRSM technique accurately recognizes six activities. For instance, on 70% of TRP, the AVODL-GRSM technique obtains average accu y of 99.43%, prec n of 98.32%, reca l of 98.24%, F score of 98.27%, and MCC of 97.93%. In addition, on 30% of TSP, the AVODL-GRSM technique gains average accu y of 99.64%, prec n of 98.94%, reca l of 98.93%, F score of 98.93%, and MCC of 98.72%.

            Table 2:

            Gesture recognition outcome of the AVODL-GRSM system on the UCI HAR dataset.

            UCI HAR dataset
            ClassAccuracyPrecisionRecallRecallF-scoreMCC
            Training phase (70%)
             Walking (C-1)99.3597.8198.3798.0997.69
             Walking upstairs (C-2)99.4698.0798.3498.2097.88
             Walking downstairs (C-3)99.5098.9597.3098.1297.83
             Sitting (C-4)99.5198.9598.2498.5998.30
             Standing (C-5)99.5399.6297.8598.7398.45
             Laying/sleeping (C-6)99.2196.5199.3397.9097.43
             Average99.4398.3298.2498.2797.93
            Testing phase (30%)
             Walking (C-1)99.6898.8099.2099.0098.81
             Walking upstairs (C-2)99.6898.4999.3598.9298.73
             Walking downstairs (C-3)99.6899.3298.4298.8698.68
             Sitting (C-4)99.6498.6899.2498.9698.75
             Standing (C-5)99.6899.8298.3899.0998.90
             Laying/sleeping (C-6)99.5198.5299.0198.7798.47
             Average99.6498.9498.9398.9398.72

            Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            Figure 4 inspects the accuracy of the AVODL-GRSM method in the training and validation on the UCI HAR database. The result notifies that the AVODL-GRSM technique has greater accuracy values over higher epochs. Also, the greater validation accuracy over training accuracy depicted that the AVODL-GRSM method learns productively on the UCI HAR database.

            Figure 4:

            Accuracy curve of the AVODL-GRSM system on the UCI HAR dataset. Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            The loss analysis of the AVODL-GRSM method in the training and validation is pointed out on UCI HAR database in Figure 5. The results indicate that the AVODL-GRSM method has adjacent values of training and validation loss. The AVODL-GRSM technique learns productively on the UCI HAR database.

            Figure 5:

            Loss curve of the AVODL-GRSM system on the UCI HAR dataset. Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            Figure 6 shows the classifier results of the AVODL-GRSM method on the USC HAD dataset. Figure 6a and b illustrates the confusion matrix rendered by the AVODL-GRSM approach on 70:30 of TRP/TSP. The result specifies that the AVODL-GRSM approach has identified and classified all six class labels accurately. Likewise, Figure 6c shows the PR analysis of the AVODL-GRSM method. The result stated that the AVODL-GRSM algorithm has acquired higher PR performance under six classes. Eventually, Figure 6d presents the ROC investigation of the AVODL-GRSM model. The figure portrays that the AVODL-GRSM approach has productive outcomes with maximum ROC values under six class labels.

            Figure 6:

            Classifier result on the USC HAD dataset. (a and b) Confusion matrices, (c) PR curve, and (d) ROC curve.

            In Table 3, a brief recognition result of the AVODL-GRSM technique is clearly portrayed on the USC HAD dataset. The results identified that the AVODL-GRSM technique accurately recognizes six activities. For instance, on 70% of TRP, the AVODL-GRSM technique obtains average accu y of 99.43%, prec n of 98.32%, reca l of 98.24%, F score of 98.27%, and MCC of 97.93%. In addition, on 30% of TSP, the AVODL-GRSM technique obtains average accu y of 99.64%, prec n of 98.94%, reca l of 98.93%, F scor e of 98.93%, and MCC of 98.72%.

            Table 3:

            Gesture recognition outcome of AVODL-GRSM system on USC HAD dataset.

            USC HAD dataset
            ClassAccuracyPrecisionRecallF-scoreMCC
            Training phase (70%)
             Walking (C-1)99.1097.5098.6298.0697.48
             Walking upstairs (C-2)99.6698.5998.7498.6798.47
             Walking downstairs (C-3)99.5297.8898.1398.0097.73
             Sitting (C-4)99.3098.3997.1797.7897.36
             Standing (C-5)99.4197.2098.6297.9097.56
             Laying/sleeping (C-6)99.3199.1797.7698.4698.02
             Average99.3898.1298.1798.1497.77
            Testing phase (30%)
             Walking (C-1)99.2898.1298.7498.4397.96
             Walking upstairs (C-2)99.6098.6998.2898.4898.25
             Walking downstairs (C-3)99.5197.6798.1397.9097.63
             Sitting (C-4)99.3398.8196.8597.8297.43
             Standing (C-5)99.3596.9098.7097.7997.42
             Laying/sleeping (C-6)99.3398.7498.2798.5098.07
             Average99.4098.1698.1698.1597.79

            Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            Figure 7 examines the accuracy of the AVODL-GRSM approach in training and validation on the USC HAD database. The figure notifies that the AVODL-GRSM technique has greater accuracy values over higher epochs. Moreover, the greater validation accuracy over training accuracy portrayed that the AVODL-GRSM approach learns efficiently on the USC HAD database.

            Figure 7:

            Accuracy curve of the AVODL-GRSM system on the USC HAD dataset. Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            The loss analysis of the AVODL-GRSM method in training and validation is shown on the USC HAD database in Figure 8. The results indicate that the AVODL-GRSM approach has adjacent values of training and validation loss. The AVODL-GRSM technique learns efficiently on the USC HAD database.

            Figure 8:

            Loss curve of the AVODL-GRSM system on the USC HAD dataset. Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            In Table 4 and Figure 9, the comparative outcomes of the AVODL-GRSM technique on two datasets are provided ( Tahir et al., 2023). The results exhibited that the AVODL-GRSM technique reaches effective recognition results on both datasets. For instance, on the UCI HAR dataset, the AVODL-GRSM technique provides increasing accu y of 99.64% while the existing MWHODL-SHAR, convolutional neural network-random forest (CNN-RF), residual network, Deep CNN, CAE, human activity recognition on signal images (HARSI), and LSTM models provide decreasing accu y of 99.09, 96.27, 95.45, 94.20, 97.94, 95.86, and 97.38%, respectively. Also, on the USC HAD dataset, the AVODL-GRSM technique provides increasing accu y of 99.40% while the existing MWHODL-SHAR, CNN-RF, Residual network, Deep CNN, CAE, HARSI, and LSTM methods provide decreasing accu y of 99.03, 97.84, 95.86, 94.06, 94.73, 95.76, and 96.74%, correspondingly.

            Table 4:

            Comparative outcome of AVODL-GRSM approach with other systems on two datasets.

            Accuracy (%)
            MethodsUCI HAR datasetUSC HAD dataset
            AVODL-GRSM99.6499.40
            MWHODL-SHAR99.0999.03
            CNN-RF96.2797.84
            Residual network95.4595.86
            Deep CNN94.2094.06
            CAE97.9494.73
            HARSI95.8695.76
            LSTM97.3896.74

            Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            Figure 9:

            Comparative outcome of the AVODL-GRSM approach with other systems on two datasets. Abbreviation: AVODL-GRSM, African Vulture Optimization with Deep Learning-based Gesture Recognition for Visually Impaired People on Sensory Modality Data.

            These outcomes ensured the better performance of the AVODL-GRSM technique over other current methods.

            CONCLUSION

            This study has focused on the development of an automated gesture recognition tool named the AVODL-GRSMD technique for visually impaired people on sensory modality data technique. The AVODL-GRSMD technique exploits the DL model with hyperparameter tuning strategy for effectual and accurate gesture detection and classification process. The AVODL-GRSMD technique follows three major processes, namely data preprocessing, MHA-BGRU recognition, and AVO-based hyperparameter tuning. In this work, the hyperparameter optimization of the MHA-BGRU method is performed using the AVO algorithm. A series of simulation analyses were effectuated to demonstrate the improved performance of the AVODL-GRSMD technique. The experimental values demonstrate the better recognition rate of the AVODL-GRSMD technique compared to that of the state-of-the-art models.

            CONFLICTS OF INTEREST

            The authors declare no conflicts of interest in association with the present study.

            DATA AVAILABILITY STATEMENT

            Data sharing not applicable to this article as no datasets were generated during the current study.

            REFERENCES

            1. Adeel MI, Asad MA, Zeeshan MR, Amna M, Aslam M, Martinez-Enriquez AM. 2022. Gesture based confidence assessment system for visually impaired people using deep learningAdvances in Information and Communication: Proceedings of the 2022 Future of Information and Communication Conference (FICC); Springer International Publishing. Cham: Vol. Vol. 2. p. 135–147

            2. Agarwal A, Das A. 2023. Facial gesture recognition based real time gaming for physically impairmentArtificial Intelligence: First International Symposium, ISAI 2022; Haldia. India: February 17-22, 2022; Springer Nature Switzerland. Cham: p. 256–264

            3. Al-Hammadi M, Muhammad G, Abdul W, Alsulaiman M, Bencherif MA, Mekhtiche MA. 2020. Hand gesture recognition for sign language using 3DCNN. IEEE Access. Vol. 8:79491–79509

            4. Bao K, Bi J, Gao M, Sun Y, Zhang X, Zhang W. 2022. An improved ship trajectory prediction based on AIS data using MHA-BiGRU. J. Mar. Sci. Eng. Vol. 10(6):804

            5. Can C, Kaya Y, Kılıç F. 2021. A deep convolutional neural network model for hand gesture recognition in 2D near-infrared images. Biomed. Phys. Eng. Express. Vol. 7(5):055005

            6. Deepa S, Umamageswari A, Menaka S. 2023. A novel hand gesture recognition for aphonic people using convolutional neural networkComputer Vision and Machine Intelligence Paradigms for SDGs: Select Proceedings of ICRTAC-CVMIP 2021; Springer Nature Singapore. Singapore: p. 235–243

            7. Gangrade J, Bharti J. 2023. Vision-based hand gesture recognition for Indian sign language using convolution neural network. IETE J. Res. Vol. 69(2):723–732

            8. Gorobets V, Merkle C, Kunz A. 2022. Pointing, pairing and grouping gesture recognition in virtual realityComputers Helping People with Special Needs: 18th International Conference, ICCHP-AAATE 2022; Lecco, Italy: July 11–15, 2022; Springer International Publishing, Cham. p. 313–320

            9. Kraljević L, Russo M, Pauković M, Šarić M. 2020. A dynamic gesture recognition interface for smart home control based on Croatian sign language. Appl. Sci. Vol. 10(7):2300

            10. Lahiani H, Neji M. 2018. Hand gesture recognition method based on HOG-LBP features for mobile devices. Procedia Comput. Sci. Vol. 126:254–263

            11. Li J, Li C, Han J, Shi Y, Bian G, Zhou S. 2022. Robust hand gesture recognition using HOG-9ULBP features and SVM model. Electronics. Vol. 11(7):988

            12. Liu Q, Kosarirad H, Meisami S, Alnowibet KA, Hoshyar AN. 2023. An optimal scheduling method in IoT-fog-cloud network using combination of Aquila optimizer and African vultures optimization. Processes. Vol. 11(4):1162

            13. Padmavathi R. 2021. Expressive and Deployable Hand Gesture Recognition for Sign Way of Communication for Visually Impaired People.

            14. Pandey S. 2023. Automated gesture recognition and speech conversion tool for speech impairedProceedings of Third International Conference on Advances in Computer Engineering and Communication Systems: ICACECS 2022; Springer Nature Singapore. Singapore: p. 467–476

            15. Ryumin D, Ivanko D, Ryumina E. 2023. Audio-visual speech and gesture recognition by sensors of mobile devices. Sensors. Vol. 23(4):2284

            16. Sahana T, Basu S, Nasipuri M, Mollah AF. 2022. MRCS: multi-radii circular signature based feature descriptor for hand gesture recognition. Multimed. Tools Appl. Vol. 81(6):8539–8560

            17. Tahir BS, Ageed ZS, Hasan SS, Zeebaree SRM. 2023. Modified wild horse optimization with deep learning enabled symmetric human activity recognition model. Comput. Mater. Contin. Vol. 75(2):4009–4024

            18. UCI HAR Dataset. https://www.kaggle.com/competitions/uci-har/data?select=UCI+HAR+Dataset+for+Kaggle

            19. USC HAD Dataset. https://sipi.usc.edu/had/

            20. Varsha M, Nair CS. 2021. Indian sign language gesture recognition using deep convolutional neural network2021 8th International Conference on Smart Computing and Communications (ICSCC); IEEE. p. 193–197

            21. Zhang J, Zeng X. 2022. Multi-touch gesture recognition of Braille input based on Petri Net and RBF Net. Multimed. Tools Appl. Vol. 81(14):19395–19413

            Author and article information

            Journal
            jdr
            Journal of Disability Research
            King Salman Centre for Disability Research (Riyadh, Saudi Arabia )
            21 July 2023
            : 2
            : 2
            : 37-48
            Affiliations
            [1 ] Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia ( https://ror.org/02f81g417)
            [2 ] King Salman Center for Disability Research, Riyadh, Saudi Arabia ( https://ror.org/01ht2b307)
            [3 ] Department of Computer Science, College of Computer, Qassim University, Saudi Arabia ( https://ror.org/01wsfe280)
            [4 ] Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia ( https://ror.org/04jt46d36)
            Author notes
            Author information
            https://orcid.org/0000-0002-6951-8823
            Article
            10.57197/JDR-2023-0019
            b23d52a2-1f5e-469a-94da-bc7984981539
            Copyright © 2023 The Authors.

            This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY) 4.0, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

            History
            : 16 May 2023
            : 22 June 2023
            : 22 June 2023
            Page count
            Figures: 9, Tables: 4, References: 21, Pages: 12
            Funding
            Funded by: King Salman Center for Disability Research
            Award ID: KSRG-2023-175
            The authors extend their appreciation to the King Salman Center for Disability Research for funding this work through Research Group no KSRG-2023-175.
            Categories

            Computer science
            African vulture optimization algorithm,deep learning,gesture recognition,visually impaired person,hyperparameter tuning

            Comments

            Comment on this article