Sign Language Recognition Using Artificial Rabbits Optimizer with Siamese Neural Network for Persons with Disabilities

Sign language recognition is an effective solution for individuals with disabilities to communicate with others. It helps to convey information using sign language. Recent advances in computer vision (CV) and image processing algorithms can be employed for effective sign detection and classification. As hyperparameters involved in Deep Learning (DL) algorithms considerably affect the classification results, metaheuristic optimization algorithms can be designed. In this aspect, this manuscript offers the design of Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network (SLR-AROSNN) technique for persons with disabilities. The proposed SLR-AROSNN technique mainly focused on the recognition of multiple kinds of sign languages posed by disabled persons. The goal of the SLR-AROSNN technique lies in the effectual exploitation of CV, DL, and parameter tuning strategies. It employs the MobileNet model to derive feature vectors. For the identification and classification of sign languages, Siamese neural network is used. At the final stage, the SLR-AROSNN technique makes use of the ARO algorithm to get improved sign recognition results. To illustrate the improvement of the SLR-AROSNN technique, a series of experimental validations are involved. The attained outcomes reported the supremacy of the SLR-AROSNN


INTRODUCTION
Sign languages are categorized as natural languages and display every feature of other natural languages.The spatial nature and visual of sign languages and their changeability provide a challenge for study in numerous domains, like computer vision (CV), linguistics, machine learning, computer graphics, natural language processing (NLP), and medicine (Bora et al., 2023).Interpretation and linguistics of sign languages were considered with the meaning taken with the use of the sign language.In the early 1980s and late 1970s with the detection of sign languages as a natural language, linguist research got deep into this domain (Katoch et al., 2022).Neural aspects were regarded for completely grabbing the connection between phonetic and sign languages.NLP is concerned with analysis, a task similar to comprehension and interpretation issues (Novopoltsev et al., 2023).Visualization and sign language synthesis is an area that manages the creation of signed speech and visualization problems of sign languages.Sign language recognition is the scientific area accountable for capturing and translating sign speech utilizing artificial intelligence and CV methods (Mannan et al., 2022).In this concern, the effect of digital divide on the urban knowledge of people with disabilities becomes a critical problem that needs to be solved.Considering the elderly in the global population, the percentage of susceptible, underrepresented people in the monarchy of smart city practices can be high (Das et al., 2023).
Deep learning (DL) remains a class of learning methods developed to define difficult structures by merging several nonlinear adjustments (Rwelli et al., 2022).The NNs related to building deep neural networks are the vital building blocks Journal of Disability Research 2023 of DL (Duy Khuat et al., 2021).Such approaches have enabled progression in picture and sound processing, encompassing automated language processing, face identification, CV, voice recognition, spam identification, and various other fields like genomics and drug diagnosis.There are numerous potential uses (Aarthi et al., 2023).First, DL enabled computational processes with numerous processing layers to obtain a representation of various abstracted dimensions (Herath and Ishanka, 2022).DL finds unpredictability in large datasets with the use of the backpropagation approach to express how a mechanism must alter its inner variables that can be used to achieve a presentation in all levels from the symbolization in the preceding layer (Grover et al., 2021).
This manuscript offers the design of Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network (SLR-AROSNN) technique for persons with disabilities.The proposed SLR-AROSNN technique mainly focused on the recognition of multiple kinds of sign languages posed by disabled persons.The goal of the SLR-AROSNN technique lies in the effectual exploitation of CV, DL, and parameter tuning strategies.It employs the MobileNet model to derive feature vectors.For the identification and classification of sign languages, Siamese neural network (SNN) is used.At the final stage, the SLR-AROSNN technique makes use of the ARO algorithm to get improved sign recognition results.To illustrate the improvement of the SLR-AROSNN technique, a series of experimental validations are involved.Obi et al. (2023) present American Sign Language (ASL) data and the CNN system.During the classifying process, the hand image can be sent over a filter and after filter was adopted, the hand was sent through a classifier that forecasts the class of the hand gesture.AlKhuraym et al. (2022) intend to address the ASL recognition issue and make sure tradeoffs are made between enhancing the classifier performance and reducing the design of deep networks to diminish the computing cost.To be specific, we modified EfficientNet methods and produced lightweight DL methods for categorizing Arabic Sign Language gestures.Alsaadi et al. (2022) present a real-time ArSLA detection method with the use of the DL structure.As a method, the proceeding steps were followed.Initially, trusted scientific ArSLA data are located.Then, better DL structures are selected by inspecting related studies.Next, an experiment is directed to test the earlier DL selected structure.Then, the DL structure is constructed on extracted outcomes.Eventually, a real-time detection system is designed.Athira et al. (2022) modeled a signer-independent vision-relevant gesture detection mechanism that can detect single-handed dynamic and static gestures, double-handed static gestures, and finger spelling words of the Indian Sign Language.In the preprocessing stage, utilizing skin color segmentation, the signs are extracted from real videos.A suitable feature vector was mined from the gesture series after the co-articulation elimination stage.The acquired features are utilized for classification with SVM.Galván-Ruiz et al. (2023) described one method devised for transcribing Spanish Sign Language.In this study, a leap motion volumetric sensor was leveraged because of its capability to find hand movement in three dimensions.Li et al. (2020) explored the temporal semantic structure of sign videos for learning discriminatory features.To this end, the author developed new sign video segment representations that consider many temporal granularities, therefore easing the necessity for precise video segmentation.Using the presented segment representation, the author designed a new hierarchical sign video feature learning approach through a temporal semantic pyramid network named TSPNet.Jiang et al. (2020) devised a new fingerspelling detection approach that tested four different configurations of TL.Moreover, Adam algorithm was compared with root mean square propagation methods, and SGD with momentum, and comparison of utilizing data augmentation against not utilizing DA was implemented to get higher performance.

THE PROPOSED MODEL
This manuscript is aimed to develop an automated sign language detection model, named SLR-AROSNN technique, for persons with disabilities.The proposed SLR-AROSNN technique mainly focused on the recognition of multiple kinds of sign languages posed by disabled persons.The goal of the SLR-AROSNN technique lies in the effectual exploitation of CV, DL, and parameter tuning strategies.Figure 1 illustrates the overall process of the SLR-AROSNN method.

Phase I: MobileNet model
Primarily, the presented model employs the MobileNet model to derive feature vectors.The MobileNet architecture is designed for possibly running in mobile devices or embedded devices and is designed for efficiency (Taufiqurrahman et al., 2020).The DW convolution is the initial layer of this model for reducing the amount of features.The DW separable convolution consists of two layers: pointwise and DW convolutions.With DW and separable convolutions, MobileNet significantly decreased the parameter of the convolutional layer while retaining better classification performance.Also, a residual bottleneck block was introduced for reusing the feature map.Thus, MobileNetV2 could accomplish good classification performance with a lesser training time.In the current study, we applied both pretrained networks.MobileNetV2 can be the same as original MobileNet, except that it exploits an inverted residual block with bottle necking feature.Compared to original MobileNet, it has a considerably lesser parameter number.MobileNet supports any input size bigger than 32 × 32, with bigger image sizes providing best performance.The major difference between both versions lies in the convolution layer.

Phase II: the SNN model
For the identification and classification of sign languages, the SNN classifier is used.The SNN comprises a set of interconnected NN with similar parameters that are trained simultaneously, where learning and unified architecture can be directed by the contrastive loss function (Vasconcellos et al., 2023). where The contrastive loss function's aim is to enhance classification accuracy by evaluating the difference between pairs of samples, represented by the binary label y, where y = 1 indicates dissimilarity and y = 0 represents similarity between samples.The function takes in two input samples, x 1 and x 2 , and evaluates the Euclidean distance between their corresponding embedding's (g(x 1 ) and g(x 2 ) mapping functions).Next, the similarity between embeddings is normalized and calculated to form the d(x 1 , x 2 ) distance measure.In the training, the parameter of g(x n ) is adjusted to minimalize the error rate (loss) by decreasing the distance measure between identical samples when maximizing it between different examples.This can be accomplished by setting a threshold value (α) so that the distance between identical samples is lesser than α, whereas the distance between different samples is greater than α.This architecture is same as binary cross-entropy loss function that is widely applied in binary classification problems whose primary benefit is that it enables the multiclass problem to be transformed into binary classification problems.This allows to implementation of flexible comparisons namely comparing one class to several classes.In case of arthropod species, this flexibility can be highly effective in problems where there is countless classes and comparisons between classes are significant.

Phase III: ARO model
At the final stage, the SLR-AROSNN technique makes use of the ARO algorithm to get improved sign recognition results.ARO algorithm is based on the survival technique used by rabbits, namely random sheltering and detour searching (Pop et al., 2022).The exploration (detour searching) compels a bunny to eat the grass nearby, toward other bunnies' nests.A rabbit selects a shelter from the network of tunnels to hide in using the exploitation (random hiding) method.Additionally, energy levels of the rabbits are diminishing which makes them abandon the detour foraging strategy for transition (randomized hiding).
The detour foraging represents that every search agent is likely for updating the location toward another randomly selected individual and contribute perturbation.The Journal of Disability Research 2023 mathematical expression of detour foraging can be given as follows: represents the updated ith candidate location, n indicates the population size, L indicates the running length that defines the velocity of motion while conducting detour foraging, t represents the existing iteration, T characterizes the count of iterations, d indicates the problem dimension, randperm produces a random permutation from 1 to problem dimension, round indicates the rounding to the nearest integer, r 1 , r 2 , and r 3 show the random value in the interval of [0,1], and n 1 indicates the standard distribution number between.The perturbation in Equation (3) assists ARO to conduct a global search and escape local minima or maxima.This unique methodology of foraging includes going toward other bunny's nests rather than their own, which considerably assists in exploration and ensures a global search.
At all the iterations of ARO, a rabbit continuously generates d holes along all the dimensions, randomly choosing a tunnel for hiding to diminish the probability of being intruded upon.The jth burrows of the ith rabbits are shown below.
} , ( ) ( ) ( ) , 1, , where d is the burrows generated in the region of a rabbit, H indicates the hiding parameter and the value is declined

Journal of Disability Research 2023
linearly from 1 to 1/T with the random perturbation through iteration.Mainly, hole was generated in a larger area of the bunny.This neighborhood decreases as the amount of iterations increases.Equation ( 10) is recommended to model the random hiding method.i where r 4 and r 5 denote the random numbers within [0, 1].
, ( ) i j b t denotes a randomly chosen burrow for hiding, Afterward either random hiding or detour foraging, the location of the ith rabbit is upgraded using the following equation: If the candidate fitness of the ith rabbit is higher than the existing location fitness, the bunny departs the existing position and remains at the candidate location generated by using Equation (3) or Equation (10).A factor that aims to imitate the transition from exploration to exploitation is given below: § In Equation ( 14), r indicates the random integer within [0, 1].Once the factor A(t) > 1, the process finds the exploration (solution globally); once the factor A(t) ≤ 1, the process finds the exploitation (solution locally).Figure 2 depicts the flowchart of the ARO algorithm.
The ARO method not only derives a fitness function to accomplish better classification performance but also defines a positive integer to represent the best performance of the candidate solution.The decline of the classification error rate is regarded as the fitness function, as given in Equation ( 15

EXPERIMENTAL VALIDATION
Comprising 1000 samples with 10 classes as demonstrated in Table 1. Figure 3 represents the sample images.

Journal of Disability Research 2023
In Table 2 and Figure 4, the recognition results of the SLR-AROSNN method are investigated under several epochs.The results highlighted that the SLR-AROSNN technique gains improved results under each epoch.On 500 epochs, the SLR-AROSNN technique obtains average accu y , prec n , reca 1 , F score , and G measure of 99.14, 95.72, 95.70, 95.70, and 95.71%, respectively.Simultaneously, on 1500 epochs, the SLR-AROSNN technique obtains average accu y , prec n , reca 1 , F score , and G measure of 94.68, 73.40, 73.40, 73.33, and 73.37%, respectively. Concurrently, on 2500 epochs, the SLR-AROSNN method obtains average accu y , prec n , reca 1 , F score , and G measure of 97.20, 86.01, 86, 85.96, and 85.98%, respectively.Lastly, on 3000 epochs, the SLR-AROSNN technique obtains average accu y , prec n , reca 1 , F score , and G measure of 96.66, 83.34, 83.30, 83.24, and 83.28%, respectively.Figure 5 inspects the accuracy of the SLR-AROSNN method in the training and validation process on 500 epochs.The figure indicates that the SLR-AROSNN technique reaches increasing accuracy values over increasing epochs.

Journal of Disability Research 2023
Furthermore, the increasing validation accuracy over training accuracy shows that the SLR-AROSNN method learns effectively on 500 epochs.
The loss analysis of the SLR-AROSNN method in training and validation is depicted on 500 epochs in Figure 6.The results indicate that the SLR-AROSNN technique obtains closer values of training and validation loss.The SLR-AROSNN method learns efficiently on 500 epochs.
A brief precision-recall (PR) curve of the SLR-AROSNN technique is demonstrated on 500 epochs in Figure 7.The outcomes stated that the SLR-AROSNN method results in increasing values of PR.In addition, it is noticeable that the SLR-AROSNN technique can reach higher PR values in all classes.
In Figure 8, an ROC study of the SLR-AROSNN technique is revealed on 500 epochs.The figure described that the SLR-AROSNN system resulted in improved ROC values.Besides, the SLR-AROSNN technique can extend enhanced ROC values on all classes.
In Table 3 and Figure 9, the overall results of the SLR-AROSNN method are compared with recent approaches (Dhulipala et al., 2022).Based on accu y , the SLR-AROSNN    of 98.77, 98.62, 95.42, 97.32, and 98.63%, correspondingly.Thus, the SLR-AROSNN technique can be employed for accurate recognition purposes.

Figure 1 :
Figure 1: Overall process of the SLR-AROSNN method.Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.

Figure 4 :
Figure 4: Average outcome of the SLR-AROSNN approach under varying epochs.Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.

Figure 5 :
Figure 5: Accuracy curve of the SLR-AROSNN approach under 500 epochs.Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.

Figure 6 :
Figure 6: Loss curve of the SLR-AROSNN approach under 500 epochs.Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.

Figure 8 :
Figure 8: ROC curve of the SLR-AROSNN approach under 500 epochs.Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.

Figure 9 :
Figure 9: Comparative outcome of the SLR-AROSNN approach with other algorithms.Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.
This manuscript aimed to develop an automated sign language detection model, named the SLR-AROSNN technique for persons with disabilities.The proposed SLR-AROSNN technique mainly focused on the recognition of multiple kinds of sign languages posed by disabled persons.The goal of the SLR-AROSNN technique lies in the effectual exploitation of CV, DL, and parameter tuning strategies.It employs the MobileNet model to derive feature vectors.For the identification and classification of sign languages, the SNN classifier is used.At the final stage, the SLR-AROSNN technique makes use of the ARO algorithm to get improved sign recognition results.To illustrate the improvement of the SLR-AROSNN technique, a series of experimental validations are involved.The attained outcomes exhibited the supremacy of the SLR-AROSNN technique in the sign recognition process.

Table 1 :
Details of the database.
Journal of Disability Research 2023

Table 2 :
Recognition outcome of the SLR-AROSNN approach under varying epochs.

Table 3 :
Comparative outcome of the SLR-AROSNN method with other algorithms.
Abbreviation: SLR-AROSNN, Sign Language Recognition using Artificial Rabbits Optimizer with Siamese Neural Network.