Prediction of Air Flight Cancellation during COVID-19 using Deep Learning Methods

Air traffic is vulnerable to external factors, such as oil crises, natural disasters, economic recessions and disease outbreaks due to COVID-19. This reason seems to have a more severe and more rapid impact on air traffic numbers as sudden increases in flight cancellations, aircraft groundings and travel bans. Various Airways loose revenues and it is difficult for them to sustain for a long period. This problem as been facing the entire world. The reductions in passenger numbers are significant. It is due to flights being cancelled or planes flying empty between airports. It is in turn massively reducing revenues for airlines and forced many airlines to lay off employees or declare bankruptcy. Airways also have to attempt refunding cancelled trips in order to diminish their losses. The airliner manufacturers and airport operators have also laid off employees. According to some commentators, this crisis is the worst ever encountered in the history of the aviation industry. Aircraft cancellation prediction is accomplished by utilising deep learning framework. In this framework, two dissimilar recurrent neural networks are assembled as a single entity while inferring the prediction results. Long-short term memory (LSTM) and Gated Recurrent Unit (GRU) are employed to design the proposed predictive model. This predictive model is compared against traditional neural network based Multi-layer perceptron model. Experimental results indicated an accuracy of 98.7% by the proposed model.


1.! Introduction
Flight services are highly affected due to adverse weather condition that may lead to flight cancellation. Security issues, mechanical issues, air traffic restrictions, bird strikes, unavailability of aircrafts, deficiency of crew members are also responsible for flight cancellation. A study revealed that the reporting carriers cancelled 1.6% of their scheduled The Governments around the world decided to temporarily introduce travel restrictions due to COVID-19 pandemic. As a result, that the airlines were forced to suspend their flights. The passengers received notifications about cancellation or information that the cancellation of their flights may be delayed. Airlines are now introduced around the world have introduced new regulations for ticket changes and returns in the current situation. These regulations are applied only to flights whose cancellation has been confirmed.
Passengers can have less stress if the airline has informed it earlier that the flight has been cancelled. It is better if an e-mail was sent to the passenger earlier so that he/she will not move to the airport from house, if possible, and confirms the cancellation of ticket.
Passengers are entitled to a refund from the airline for the unused ticket. The possible refund option can be known from airlines. Airline authority sent a message with the refund options allowed by the airline. The authority also informed the passenger that the cancellation of flight is due to COVID-19.
This paper attempts to predict flight cancellation by involving data mining techniques. Data mining techniques are often useful in finding interesting patterns from the existing databases by applying sophisticated algorithms. The acquired patterns make knowledge base and thus help in taking informed decisions [2]. This study practices predictive modelling tasks of data mining techniques for achieving flight cancellation detection. In this context, a deep learning (DL) [3] based mechanism is exemplified for fulfilling the aforementioned target of this research. Recurrent Neural Network (RNN) is a popular DL technique that accomplishes predictive modelling tasks. LSTM and GRU [4] are two popular RNN that are integrated under a single platform along with certain hyper-parameter tuning. As contrast to major researches such as [5][6][7][8][9][10], instead of predicting flight delay scenarios, flight cancellation detection is favoured in this paper. Flight departure delay may appear be to one of the severe reasons for flight cancellations. Hence, it is necessary to consider the impact of delays on flight cancellation event. A computer aided classification can help to predict aircraft cancellation which in turn can save the resources and optimise the passenger's anxiety.

2.! Related Works
Many researches had been carried out while retrieving flight delay predictions. For analysing arrival delay of flights a comparative study had been carried out in [5] for developing classifier model. The classifier models include random forest, Support Vector Machine (SVM), Gradient Boosting Classifier (GBC) and k-nearest neighbour algorithm. Comparative results show that the gradient boosting classifier model attains the best predictive arrival delay performance of 79.7% of total scheduled American Airlines' flights. Another study in [6] applied random forest network-based air traffic delay prediction models considering both temporal and network delay states. Application of deep learning techniques was presented in [7] for analysing the patterns in air traffic delays. Long Short-Term Memory RNN architecture was utilised for building predictive model. Tu et. al. [8] presented a model that considers both seasonal trend and daily propagation patterns. Applying statistical methods for analysing long-term and short-term patterns in air traffic delays, departure delays had been estimated. While investigating propagation delays among airports, Bayesian network had been exemplified in [9]. For achieving the same objective, i.e., forecasting possible delays, multilevel input layer artificial neural network (ANN) was utilised in [10].
Another event flight cancellation has been carried out by [11] using support vector machine (SVM), decision tree (DT), Naïve Bayes, Logistics Regression (LR) methods. Their prediction performances were compared and an accuracy of 90% was reached by SVM and LR. Best of our knowledge, flight cancellation prediction has not been much studied yet. This field can still be explored and hence in this paper focuses on carrying out the prediction.

3.1!Neural Network and Deep Learning
Deep Learning (DL), is a subfield of Machine Learning (ML), automates and develops the machines which is the goal eventually exhibited by Artificial Intelligence (AI). By assembling more than two layers, DL provides a multi-layered hierarchical data representation typically in the form of a neural network. It is beneficial since it does not include the manual feature engineering task to be performed due to its self-adaptive nature. It establishes the involvement of neural network in order to accompany complex problem solving approach. A large number of processing elements (nodes) are present in neural network like neurons in human brain for acquiring best problem solving tactic [2].

3.2!Hyper-parameters used in neural network training
Some Pre-stage fine-tuning of hyper-parameters is necessary to perform before training this neural network. It contains number of layers, number of nodes, learning rate, epoch size, batch size and drop-out rate. These values should be adjusted to help the network to learn successfully. Activation function is one of the necessary tasks which can maximize the training procedure. These functions allow neural networks to learn non-linear relationship among data and to produce meaningful output signal. Sigmoid activation function may be used for activating output nodes for predicting binary class probabilities. The activation function accepts the input data and transforms it in the range of 0 to 1 and it is shown in equation (1) [12]. Tangent hyperbolic (tanh) is also non-linear activation function and it is a smoother and zero-centered function [12]. The function range in between -1 to 1 and the output of the Tanh function is given in equation (2). (1) For eliminating over-fitting problem in neural networks the dropout technique is employed.
In the training process it randomly detaches units along with incoming and outgoing connections from the neural network. Neural network acquires benchmark results in supervised classification tasks by using dropout [13].
Hyper-parameters such as epoch and batch size are also used in neural network training. be ensured that batch size should not be too small or too large. If the size of the batch is too small then it will present high variance. It means that it does not represent the entire dataset.
On the other hand, large batch size may not fit in memory to compute samples used for training and may lead to over-fitting problem [14].
The use of optimizer is mandatory in order to stack multiple Recurrent Neural Network (RNN) layers under a single framework,. Adam is is computationally efficient with optimised memory requirement and also easy to implement. The proposed method can be applied to optimize for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. It is quite well accepted due to its applicability on non-stationary objectives and problems with very noisy and/or sparse gradients [15].

3.3!Recurrent Neural Network, LSTM and GRU
RNN is constructed using multiple neural networks which are specialized for analyzing sequential data. In this network, the output of previous step is fed into current step input. For example, the output obtained during step Si affects the parameters of step Si+1. Hence, it is quite vibrant that RNN accepts two types of input-one is present input and other one is previous output for acquiring the final output. Loops present in RNN allow the signals to travel both forward and backward. However, RNN suffers from the problem of vanishing gradient problem [16].
To resolve this mentioned problem, variants of RNN Long short-term memory (LSTM) and y t = Ø(W yr × rt + b y ) GRU is a gating mechanism in RNN similar to a long short-term memory (LSTM) unit. It is used without an output gate. GRU is considered a variation of the LSTM because both have a similar design and produce equal results in some cases. In GRU the update gate controls information that flows into memory and the reset gate controls the information that flows out of memory. The two vectors decide which information will get passed on to the output. GRU can be trained to keep information from the past or remove information that is irrelevant to the prediction. Given x t = (x 1 , . . . , x T ) be an input sequence, W is the weight matrices σ states the sigmoid function for a GRU. At time t, the activation function of GRU is h j t which is dependent on previous activation h t-1 j candidate activation function h`t j . This is formulated in equation (9).
The update gate (u t j ), and reset gate (r t j ) can be formulated as equation (10) and equation (11) respectively.

3.4!Model Evaluation
Evaluation metrics are taken into consideration while discriminating the performance of any model from other models. Accuracy and loss are required to calculate for any deep model.
For each epoch, accuracy and loss are calculated during training efficiency assessment. A loss function (or cost function) [19] measures how much the model makes mistakes for each instance in the training set. In other words, the loss function acquires the probabilities of how much predicted values get varied from original value. Cross-entropy function can be used as loss function basically for binary classification problems. This function measures the performance of a classification model whose output is a probability value between 0 and 1 [20].
Using true positive (TP), true negative (TN), false positive (FP), false negative (FN), accuracy and f1-score metrics can be evaluated as equation (12) and (15) respectively. It is to be noted that, f1-score is a metric that relies on calculation of recall and precision which are formulated as equation (13) and (14) respectively [21].
Accuracy= TP+TN/(TP+FP+TN+TP) (12) Recall= TP/(TP+FN) (13) Precision= TP/(TP+FP) Cohen-Kappa Score [22] is also considered to be as an evaluating metric in this paper. This metric is a statistical measure that finds out inter-rate agreement for qualitative items for classification problem. It is formulated as equation (17).
Cohen-Kappa Score= (p 0 -p e )/(1-p e ) (17) where p o denotes relative observed agreement among raters and p e is the probability of agreement by chance.
Receiver Operating Characteristic (ROC) curve is used to visualize the relationship between true positive rate (alternatively known as recall) and false positive rate. The equations are shown in (18) and (19).
True positive rate (TPR)= TP/ (TP+FN) (18) False positive rate (FPR)=FP/(FP+TN) (19) Area under ROC Curve, often abbreviated as AUC, provides an aggregate measure of performance across all possible classification thresholds. AUC has values ranging from 0 to 1. A model that produces AUC as 1 can be regarded as best performing model whereas a model showing all wrong predictions will signify 0 as AUC [23].

Problem definition
This paper attempts to explore the predictive results for flight cancellation on a particular link for specific airport. Delay at time t may have impact on flight cancellation on time t+1. If the delay is quite larger, the tendency of flight cancellation will be successful. However, the aircraft cancellation for a particular flight also depends on distance between origin and destination, flight diversion tendency, flight timings. In this context, classifier model can be built where the output is binary prediction of whether a particular flight will be cancelled or not.

Data Source
To carry out the flight cancellation prediction, this research collects the data containing all the flights of USA in the month of January 2019 and January 2020. This open-source data under U.S. Govt. Works has been retrieved from kaggle data repository [24]. The dataset is consisting of 1191331 records with some missing values. The figure 1 shows the summary of missing values present in the dataset. The training and testing dataset is distinguished by the presence of attribute 'CANCELLED'.
Since, prediction needs to be retrieved from testing dataset; the target attribute is eliminated from it. The classification process is carried out in such a way that it receives training for January 2019 and later prediction is retrieved for January 2020. In other words, flight detail of January 2019 is the training dataset whereas flight detail of January 2020 is the testing dataset for which prediction result is retrieved. Any classifier model learns from the training dataset by extracting hidden patterns and utilizes that knowledge while predicting unknown sample from testing dataset.

Methodology
The research methodology considered by this study focuses on utilising neural network based framework. For this reason, two models are presented. One model is stacked GRU-LSTM-RNN model which follows DL architecture and it is presented as proposed model by this paper. The other model is traditional neural network based MLP classifier model. The former model is compared with the baseline neural network classifier such as MLP.

Stacked GRU-LSTM Model
The

5.! Experimental Results
Once the model is designed, the training process is executed through 2 epochs. The training procedure is evaluated against two metrics such as loss and accuracy exhibited by the model.

6.! Conclusions
For minimizing the substantial loss due to flight cancellation event, our research has been carried out to predict flight cancellation in advance. This study exemplifies the use of DL