INTRODUCTION
Today, the prominence of personalized and adaptive human–computer interfaces, contrary to the framework devised for an “average” user, is broadly detected in enormous amount of applications ( Alashhab et al., 2022). Machine learning (ML) methods for automated investigation of body movements and facial expressions are used currently in numerous human–computer interaction (HCI) structures. In the community, people with disabilities are facing numerous problems. Technology is evolving daily but no progress is seen in improving the standard of life of blind people ( Muneeb et al., 2023). Most people all over the world are deaf and dumb. Interactions between a visually challenged and deaf-mute person remain a difficult task ( Pandey, 2023). Sign language aids to interact with blind and dumb persons. Gesture detection can be a broadly used technology to assist dumb and particularly blind persons ( Ryumin et al., 2023). This study is relevant to two significant domains, ML and computer vision (CV). CV can be described as a domain that integrates techniques to understand, acquire, and process images ( Fronteddu et al., 2022). Likewise, it can be employed in different domains namely image reconstruction, HCI, physics, healthcare, etc. ML is a subfield of computer science that progressed from studying pattern detection and computational learning in Artificial Intelligence (AI).
Hand gestures are a facet of body language, which is taken via the center of the palm, the shape constructed by the hand and the finger position ( Faria Oliveira et al., 2022). Hand gestures are of two types: dynamic and static. Dynamic gesture has a sequence of hand movements as waving whereas static gesture is the unchanging shape of the hand ( Gorobets et al., 2022). There are different movements of hand in gestures, e.g. a handshake differs from one individual to another and varies as per place and time. The major difference between gesture and posture is that posture emphasizes hand shape whereas gestures concentrate on hand movements ( de Oliveira et al., 2022). The key methods to hand gesture study can be categorized into the camera vision-related sensor method and the wearable glove-related sensor method. Hand gestures present an inspirational domain of study as they can enable transmission and offer a natural means of communication that is utilized across various applications ( Moysiadis et al., 2022). Earlier, hand gesture detection has been attained with wearable sensors linked directly to hands with gloves. Such sensors identified physical responses as per hand movement or finger twisting. The data gathered are processed using computers linked to the glove with wires ( Parra-Dominguez et al., 2022). This scheme of glove-related sensors can be made movable through sensors linked to microcontrollers.
Mukhiddinov et al. (2023) developed a facial emotion detection technique for masked facial imageries by means of feature study of the upper features of face and low-light image enhancement with a CNN. Primarily, the lower parts of facial input imagery are covered with a synthetic mask. Then, the author efficiently implements a facial landmark recognition technique-related feature extraction method. Eventually, the structures and the coordinates of the landmarks can be recognized. Alashhab et al. (2022) devise a scheme for mobile devices managed by hand gestures to let the user control the devices and leverage numerous assistance devices by building easy dynamic and static hand gestures. The scheme depends on a multihead NN that categorizes and identifies the gestures, and reliant on the gesture identified, executes a secondary step that executes the respective actions. Zhou et al. (2023) presented an improved D2NN design to the domain. The wavelet-like pattern diminishes the variables from the network layer by modulating the incident light phase.
Abdulhussein and Raheem (2020) presented a gesture detection of static ASL by means of DL. The method has two solutions. With Bicubic static ASL binary imageries, the first one can be resized. Apart from that, good detection outcomes are obtained in detecting the boundary hand by means of the Robert edge detection scheme. Moysiadis et al. (2022) developed a twofold system to allow (i) a real-time human–robot interaction structure and test it in diverse situations, and (ii) a real-time skeleton-related detection scheme for five hand gestures via ML and depth camera. Therefore, six ML classifiers are tested, while the ROS software has been applied to “translate” the gestures as five commands that are performed by robot.
Lu et al. (2023) proposed a gesture-language-recognition (GLR) feedback scheme combining ML technology and strain-sensor arrays. These strain-sensor arrays joined with 3D-printed gloves abstract either temporal or spatial data regarding the movement of fingers. Incorporating multidimensional manipulation, AI-related GLR, and visual feedback, the smart model can precisely identify complicated gestures and offer real-time feedback to users. Mujahid et al. (2021) devised a lightweight method related to DarkNet-53 and YOLOv3-CNN for gesture detection without additional enhancement preprocessing and image filtering. The presented technique has been assessed on labeled data of hand gestures in YOLO format and Pascal VOC.
This research presents an interactive gesture technique using sand piper optimization with deep belief network (IGSPO-DBN) technique. The purpose of the IGSPO-DBN technique enables people to handle the devices and exploit different assistance models by the use of different gestures. The IGSPO-DBN technique detects the gestures and classifies them into several kinds using the DBN model. To boost the overall gesture-recognition rate, the IGSPO-DBN technique exploits the SPO algorithm as a hyperparameter optimizer. The simulation outcome of the IGSPO-DBN approach was tested on gesture-recognition dataset.
THE PROPOSED MODEL
In this research work, we have concentrated on the progress of automated gesture recognition using the IGSPO-DBN technique. Figure 1 exemplifies the overall flow of the IGSPO-DBN algorithm. The purpose of the IGSPO-DBN technique enables people to handle the devices and exploit different assistance models by the use of different gestures. The IGSPO-DBN technique detects the gestures and classifies them into several kinds in three phases such as data preprocessing, DBN-based gesture recognition, and SPO-based hyperparameter optimization.
Data preprocessing
For preprocessing the input data, three stages are followed.
Missing values of sensor databases can be set by the imputation process with the linear interpolation method.
Noise can be removed with median filtering and third-order low-pass Butterworth filter with a 20 Hz cutoff frequency.
A normalization method transforms all the sensor data with mean and standard derivation. An input for training model and feature extraction are cleaned and normalized.
Gesture recognition using the DBN model
For effectual identification of gestures, the DBN model is utilized. The DBN is a probability generalized process presented by stacking RBMs ( Justin et al., 2023). The RBM is one of the effectual manners for removing and demonstrating the data implemented in ML approaches. The RBM is a form of typical Boltzmann Machine that removed all the links from similar layers, and the connected among visible and hidden layers can be retained. The RBM is an energy-driven method and is utilized as a generalized method for different kinds of data containing speech, images, and text.
where W ij signifies the module of W which interrelates the ith visible parameter v i to the jth hidden parameter h j ; b and c define the parameter models. Afterward, the basic Boltzmann distribution was measured as follows:
In the meantime, v is only detected, so the hidden variable h was marginalized.
where P( v) implies the probability allocated by nodes to v visible vector. Due to lack of connection among nodes (intra-connection was absent), the respective conditional probability is as follows:
For a binary database, Eq. (4) is modified by:
At present, σ(·) demonstrates the logistic function and σ( x) = (1+ exp (− x)) −1. It is established that effectual at uncovering the layer-by-layer difficult non-linearity. A fast-learning process for DBN was projected, thus joint distribution among detected vectors χ and ℓ hidden states h k are achieved as follows:
where x = h 0 P( h k | h k +1) represents the visible hidden conditional distribution from RBM compared to level k of DBN, and P( h ℓ −1, h ℓ ) signifies the joint distribution from the topmost-level RBM. The efficiency of energy appearance was improved by combining several layers as DBNs. In this projected technique, 2-stacked RBMs can be used for creating the DBN technique without labeling data. Figure 2 demonstrates the architecture of DBN.
SPO-based hyperparameter tuning
To boost the overall gesture-recognition rate, the IGSPO-DBN technique exploits the SPO algorithm as a hyperparameter optimizer. Sandpipers are seabirds that live in groups named colonies ( Sankar et al., 2023). They use their intelligence to trace and attack the prey. It comprises two stages: the migration and attacking phases.
Migration phase (exploration)
It can be the seasonal drive of sandpipers from one place to another in search of food for gaining energy.
During the migration phase, the sandpiper travels in a group. First, the whole sandpiper begins with dissimilar locations to prevent collision.
In the group, the whole sandpiper moves toward the optimal fitness value.
Due to the minimization property, Journal of Sensor fitness value is the smallest.
The sandpiper updates the location based on the fittest sandpiper.
During the migration phase, the sandpiper needs to fulfill three conditions.
Collision avoidance
The search agent or sandpiper creates a new location without collision S p , and it is mathematically modeled as follows:
where S cp specifies the existing location of the sandpiper, t shows the existing iteration, and S m symbolizes the movement of the sandpiper.
The sandpiper movement S m is evaluated, and it is shown below:
where S cf specifies sandpiper control frequency which is minimized from 2 to 0 and t shows the iteration which differs from 0 to maximal iteration.
Converge the best position of the sandpiper
The sandpiper moves toward the existing location S cp to the fittest sandpiper S best in order to converge, and its computation formulated can be given as follows:
where S BC shows the random integer based on exploration. S BC is evaluated, and it can be expressed as follows:
where rand denotes the random integer within [0,1].
Updating the position to the best sandpiper
Lastly, the sandpiper upgrades its existing location to the fittest location sandpiper, and it can be shown below:
where G s denotes the gap among the location and the fittest position of the sandpiper.
Attacking phase (exploration)
In the attacking stage, the sandpiper creates the spiral behaviors in the 3D plane, where r shows the radius of the spiral, e shows the base of the natural logarithm, j denotes the parameter and its value within [0, 2], and l and m refer to the constant of the spiral value. Consider l and m values as 1.
The upgraded location of sandpiper S p − new ( t) is evaluated as follows:
The SPO system grows a fitness function (FF) for making greater classifier solution. It solves a positive integer to portray the good result of candidate performances. In this case, the minimized classifier rate of errors can be regarded to be FF, as defined in Eq. (14), and explained in Table 1.
RESULTS AND DISCUSSION
The proposed model is simulated using the Python 3.6.5 tool on PC i5-8600k, GeForce 1050Ti 4GB, 16GB RAM, 250GB SSD, and 1TB HDD. The parameter settings are as follows: learning rate: 0.01, dropout: 0.5, batch size: 5, epoch count: 50, and activation: ReLU. In this section, the gesture-recognition outcome of the IGSPO-DBN approach was examined on the USC HAD dataset. It has 36,948 instances with six classes.
Figure 3 demonstrates the classifier outcomes of the IGSPO-DBN approach under test dataset. Figure 3a and b depicts the confusion matrix offered by the IGSPO-DBN approach on 60:40 of TRP/TSP. The result revealed that the IGSPO-DBN model has identified and classified all six class labels accurately. Also, Figure 3c and d represents the gesture-detection outcome of the IGSPO-DBN approach on 60:40 of TRP/TSP. The outcomes identified that the IGSPO-DBN system reaches effectual recognition rate under all classes.
In Table 2 and Figure 4, an extensive gesture-recognition outcome of the IGSPO-DBN system is obviously portrayed. The outcome implied that the better the IGSPO-DBN system is under all classes. For instance, on 60% of TRP, the IGSPO-DBN system acquires average accu y , prec n , reca l , F score , and AUC score of 99.43, 98.26, 98.17, 98.21, and 98.91%, correspondingly. Finally, on 20% of TSP, the IGSPO-DBN method gains average accu y , prec n , reca l , F score , and AUC score of 99.37, 98.04, 98.08, 98.06, and 98.85%, correspondingly.
Class | Accu y | Prec n | Reca l | F score | AUC score |
---|---|---|---|---|---|
Training phase (60%) | |||||
Walking (C-1) | 99.21 | 97.85 | 98.72 | 98.28 | 99.04 |
Walking upstairs (C-2) | 99.38 | 97.61 | 97.47 | 97.54 | 98.56 |
Walking downstairs (C-3) | 99.51 | 98.05 | 97.83 | 97.94 | 98.78 |
Sitting (C-4) | 99.59 | 98.69 | 98.74 | 98.72 | 99.25 |
Standing (C-5) | 99.48 | 98.65 | 97.68 | 98.17 | 98.73 |
Laying/sleeping (C-6) | 99.38 | 98.68 | 98.56 | 98.62 | 99.09 |
Average | 99.43 | 98.26 | 98.17 | 98.21 | 98.91 |
Testing phase (40%) | |||||
Walking (C-1) | 99.15 | 98.20 | 98.09 | 98.14 | 98.78 |
Walking upstairs (C-2) | 99.30 | 97.32 | 97.27 | 97.30 | 98.44 |
Walking downstairs (C-3) | 99.49 | 97.52 | 98.24 | 97.87 | 98.95 |
Sitting (C-4) | 99.49 | 98.31 | 98.40 | 98.35 | 99.04 |
Standing (C-5) | 99.49 | 98.32 | 98.09 | 98.20 | 98.91 |
Laying/sleeping (C-6) | 99.32 | 98.55 | 98.40 | 98.48 | 98.99 |
Average | 99.37 | 98.04 | 98.08 | 98.06 | 98.85 |
Abbreviation: IGSPO-DBN, interactive gesture technique using sand piper optimization with deep belief network.
Figure 5 scrutinizes the accuracy of the IGSPO-DBN system in the training and validation procedure on the test database. The result stated that the IGSPO-DBN system attains maximal accuracy values over enhanced epochs. Moreover, the maximum validation accuracy over training accuracy outperforms that the IGSPO-DBN system learns capably on the test database.
The loss investigation of the IGSPO-DBN algorithm at the time of training and validation is displayed on the test database in Figure 6. The outcome denoted that the IGSPO-DBN system attains adjacent values of training and validation loss. It could be obvious that the IGSPO-DBN algorithm learns effectively on the test database.
The experimental gesture-detection outcome of the IGSPO-DBN method is compared with other approaches in Table 3 and Figure 7 ( Tahir et al., 2023). Based on accu y , the IGSPO-DBN system highlights a higher value of 99.43% while the MWHODL-SHAR, CNN-RF, Residual network, Deep CNN, CAE, HARSI, and LSTM approaches indicated reducing values of 99.03, 97.84, 95.86, 94.06, 94.73, 95.76, and 96.74% correspondingly. Moreover, with respect to prec n , the IGSPO-DBN system demonstrated a superior value of 98.26% while the MWHODL-SHAR, CNN-RF, Residual network, Deep CNN, CAE, HARSI, and LSTM systems pointed out minimal values of 97.56, 96.91, 95.03, 96.52, 98, 94.11, and 94.98% correspondingly.
Methods | Accu y | Prec n | Reca l | F score |
---|---|---|---|---|
IGSPO-DBN | 99.43 | 98.26 | 98.17 | 98.21 |
MWHODL-SHAR | 99.03 | 97.56 | 97.52 | 97.54 |
CNN-RF | 97.84 | 96.91 | 95.87 | 97.85 |
Residual network | 95.86 | 95.03 | 96.61 | 94.86 |
Deep CNN | 94.06 | 96.52 | 97.06 | 96.63 |
CAE model | 94.73 | 98.00 | 96.33 | 96.36 |
HARSI model | 95.76 | 94.11 | 95.08 | 96.45 |
LSTM model | 96.74 | 94.98 | 96.58 | 94.46 |
Abbreviation: IGSPO-DBN, interactive gesture technique using sand piper optimization with deep belief network.
Followed by, interms of reca l , the IGSPO-DBN approach highlights enhancing value of 98.17% while the MWHODL-SHAR, CNN-RF, Residual network, Deep CNN, CAE, HARSI, and LSTM methods implied lesser values of 97.52, 95.87, 96.61, 97.06, 96.33, 95.08, and 96.58%, respectively. Eventually, with respect to F score , the IGSPO-DBN methodology depicts a higher value of 8.21% while the MWHODL-SHAR, CNN-RF, Residual network, Deep CNN, CAE, HARSI, and LSTM approaches pointed out reduced values of 97.54, 97.85, 94.86, 96.63, 96.36, 96.45, and 94.46%, correspondingly.
CONCLUSION
In this research work, we have focused on the development of automated gesture recognition using the IGSPO-DBN technique. The purpose of the IGSPO-DBN technique enables people to handle the devices and exploit different assistance models by using different gestures. The IGSPO-DBN technique detects the gestures and classifies them into several kinds using the DBN model. To boost the overall gesture-recognition rate, the IGSPO-DBN technique exploits the SPO algorithm as a hyperparameter optimizer. The simulation outcome of the IGSPO-DBN system was tested on a gesture-recognition dataset and the outcome showed the improvement of the IGSPO-DBN algorithm over other systems. In future, the proposed model can be implemented in real-time dataset.