Deep Learning Techniques for Cyber Security Intrusion Detection : A Detailed Analysis

In this study, we present a detailed analysis of deep learning techniques for intrusion detection. Speciﬁcally, we analyze seven deep learning models, including, deep neural networks, recurrent neural networks, convolutional neural networks, restricted Boltzmann machine, deep belief networks, deep Boltzmann machines, and deep autoencoders. For each deep learning model, we study the performance of the model in binary classiﬁcation and multiclass classiﬁcation. We use the CSE-CIC-IDS 2018 dataset and TensorFlow system as the benchmark dataset and software library in intrusion detection experiments. In addition, we use the most important performance indicators, namely, accuracy, detection rate, and false alarm rate for evaluating the efﬁciency of several methods.


INTRODUCTION
The major target of cyber attacks is a country's Critical National Infrastructure (CNI) such as ports, hospitals, water, gas or electricity producers, which use and rely upon Supervisory Control and Data Acquisitions(SCADA) and Industrial Control Systems (ICS) to manage their production.Protection of CNIs becomes an essential issue to be considered.Generally, available protective measures are classified according to legal, technical, organizational, capacity building, and cooperation aspects.Except from regulations and policies that may be used to tackle cyber attacks to CNIs specific practical measures need to be taken in order for these regulations to be effective Maglaras et al. (2018).
Along with other preventive security mechanisms, such as access control and authentication, intrusion detection systems (IDS) are deployed as a second line of defense Ahmim et al. (2018).IDS based on some specific rules or patterns of normal behavior of the system can distinguish between normal and malicious actions Ahmim et al. (2018).The necessity of cyber physical security is rising and traditional methods may not be effective anymore Stewart et al. (2017).According to Dewa and Maglaras (2016), data mining and its core feature which is knowledge discovery can significantly help in creating Data mining based IDSs that can achieve higher accuracy to novel types of intrusion and demonstrate more robust behaviour compared to traditional IDSs.
Moreover, many researchers struggle to find comprehensive and valid datasets to test and evaluate their proposed techniques and having a suitable dataset is a significant challenge itself.In order to test the efficiceny of such mechanisms, reliable datasets that contain both bening and several attacks, meets real world criteria and that is publicly avaialble is needed Sharafaldin et al. (2018).
Our contributions in this work are: • We review the deep learning techniques papers applied to cyber security intrusion detection.
• We present all datasets used by the deep learning techniques papers applied to cyber security intrusion detection.
• We analyze seven deep learning techniques according to two models, namely, deep discriminative models and generative/unsupervised models.
• We study the performance of each deep learning model in binary classification and multiclass classification using CSE-CIC-IDS 2018 dataset and TensorFlow system.
The rest of this paper is organized as follows.
Section 2 gives the intrusion detection systems based on deep learning techniques.In Section 3, we present the different datasets used by deep learning techniques papers applied to intrusion detection.In Section 4, we present seven deep learning approaches.In Section 5, we study the performance of each deep learning technique in binary classification and multiclass classification.Lastly, Section 6 presents conclusions.

A REVIEW OF INTRUSION DETECTION SYSTEMS BASED ON DEEP LEARNING TECHNIQUES
This section describes the intrusion detection systems based on deep learning techniques.Integrating a recurrent neural network in an IDS system was attempted by Yin et al. (2017) for supervised classification learning.The study used NSL-KDD dataset as benchmark dataset under three performance indicators, including, accuracy, true positive rate, and false positive rate.The anomaly detection performance is reported as higher accuracy when there are 80 hidden nodes and the learning rate is 0.1.The paper also states the benefits of a recurrent neural network for intrusion detection.
In another study, Tang et al. (2018) suggested a gated recurrent unit recurrent neural network for intrusion detection in software-defined networking.
The paper states a detection rate of 89% using a minimum number of features.The NSL-KDD dataset is used in the network performance with four evaluation metrics, including, precision, recall, F-measure, and accuracy.
A multi-channel intelligent attack detection system that uses long short term memory recurrent neural networks is described by Jiang et al. (2018).
The NSL-KDD dataset is used to evaluate the performance of the proposed intelligent attack detection system.The performance of the long short term memory recurrent neural network is reported as 99.23% detection rate with a false alarm rate of 9.86% and an accuracy of 98.94%.
The convolutional neural networks were used by Basumallik et al. (2019)  The study states that this combination shows a higher percentage of classification than support vector machine.

Deep neural networks (DNNs)
Deep Neural Network is multilayer perceptrons (MLP) with a number of layers superior to three.MLP is a class of feed forward artificial neural network, which is defined by the n layers that compose it and succeed each other, as presented in Figure 1.

Recurrent neural networks (RNNs)
A recurrent neural network is a neuron network, which the connection graph contains at least one cycle.There are many types of RNNs such as Elman networks proposed by Elman (1990)  networks proposed by Jordan (1997) and Echo State networks proposed by Jaeger and Haas (2004).Currently, RNN based on Long Short-Term Memory (LSTM) is the most used.The RNN is defined by adding an interconnection matrix V W M ∈ R a M ×a M to the layer M ∈ [1, N] in order to obtain a layer M of the recurrent network.Refer to Figure 2 and Gelly and Gauvain (2017), recurrent neural network algorithm is described as Algorithm 2.

Convolutional neural networks (CNNs)
A convolutional neural network is defined as a neural network that extracts features at a higher resolution, and then convert them into more complex features at a coarser resolution, as presented in Figure 3.There are many types of CNNs such as ZFNet proposed by Zeiler and Fergus (2014), GoogleNet proposed by Szegedy et al. (2015), and ResNet proposed by He et al. (2016).Therefore, CNN is based on three types of layers, including, convolutional, pooling, and fully-connected layers.Refer to Gu et al. (2018), the feature value at location (x, y) in the k-th feature map Algorithm 2 Recurrent neural network 1: Choose a learning pair (x(t), c(t));  of M -th layer can be calculated as follow: where X M x,y is the input patch centered at location (x, y), W M k is the weight vector of the k-th filter, and b M k is bias term of the M -th layer.
The activation value activ M x,y,k and pooling value pool M x,y,k of convolution feature f eature M x,y,k can be calculated as follow where R x,y is a local neighbourhood around location at location (x, y).The nonlinear activation function activation(•) are be ReLU, sigmoid, and tanh.The pooling operation pooling(•) are average pooling and max pooling.

Restricted Boltzmann machine (RBMs)
An RBM is an undirected graphic model G = {W ij , b i , c j }, as presented in Figure 4.There are two layers, including, the hidden layer and the visible layer.The two layers are fully connected through a set of weights W ij and {b i , c j }.Note that there is no connection between the units of the same layer.
Refer to Fischer and Igel (2012), the configuration of the connections between the visible units and the hidden units has an energy function, which can be defined as follow: Based on this energy function, the probability of each joint configuration can be calculated according to the Gibbs distribution as follow: where Z is the partition function, which can be calculated as follow: where curved letters V and V are used to denote the space of the visible and hidden units, respectively.

Deep belief networks (DBNs)
A DBN is multi-layer belief network, where each layer is Restricted Boltzmann Machine, as presented in Figure 5.The DBN contains a layer of visible units and a layer of hidden units.The layer of visible units represent the data.The layer of hidden units learns to represent features.Refer to Hinton (2009), the

Deep Learning Techniques for Cyber Security Intrusion Detection
All connec ons between layers are undirected but with no within-layer connec ons probability of generating a visible vector, V , can be calculated as: where P rob (H | W ) is the prior distribution over hidden vectors.

Deep Boltzmann machines (DBMs)
A DBM is a network of symmetrically coupled stochastic binary units, which contains a set of visible units and a sequence of layers of hidden units, as presented in Figure 6.Refer to Salakhutdinov and Larochelle (2010), a DBM with three hidden layers can be defined by the energy of the state {V, H} as: where H = {H 1 , H2 , H3 } are the set of hidden units, and G = {W 1 , W 2 , W 3 } are the model parameters.
The probability that the model assigns to a visible vector V can be defined as: (V,H,G)  (9)

Deep auto encoders (DA)
An autoencoder consists of two parts, the encoder and the decoder, as presented in Figure 7. Refer to Vincent et al. (2010), these two parts can be defined as follow: Input Output Hidden Encoder Decoder  where x is an input vector; y is the hidden representation; b is an offset vector of dimensionality d .

EXPERIMENTATION
We use the CSE-CIC-IDS2018 dataset 2 for the experiments.Table 2 summarizes the statistics of attacks in Training and Test datasets.The experiment is performed on Google Colaboratory 3 under python 3 using TensorFlow and Graphics Processing Unit (GPU).

Performance metrics
We use the most important performance indicators, including, detection rate (DR), false alarm rate (FAR) and accuracy (ACC).Table 3 shows the four possible cases of correct and wrong classification.where T P , T N, F P , and F N denote true positive, true negative, false positive, and false negative, respectively.

Results
Table 4      and recurrent neural network, the convolutional neural network gets a higher accuracy 97.376%, when there are 100 hidden nodes and the learning rate is 0.5.Table 7 demonstrates the accuracy and training time of generative/unsupervised models with different learning rate and hidden nodes.The deep auto encoders gets a higher accuracy 97.372%, when there are 100 hidden nodes and the learning rate is 0.5 compared to three techniques, including, restricted Boltzmann machine, deep belief network, and deep boltzmann machine.
The performance of deep learning techniques in term of false alarm rate is depicted in Figure 8.In the generative/unsupervised models, mean false alarm rate of the convolutional neural network is better than both deep neural network and recurrent neural network.In the deep discriminative models, mean false alarm rate of the deep autoencoders is better than three techniques, including, restricted Boltzmann machine, deep belief network, and deep Boltzmann machine.

CONCLUSION
In this paper, we conducted a comparative study of deep learning techniques for intrusion detection, namely, deep discriminative models and generative/unsupervised models.Specifically, we analyzed seven deep learning approaches, including, deep neural networks, recurrent neural networks, convolutional neural networks, restricted Boltzmann machine, deep belief networks, deep Boltzmann the affine transformation defined by the matrix W M and the vector b M .n M : R a M → R a M is the transfer function of the layer M .The matrix W M is called the weight matrix between the layer M − 1 and the layer M .The vector b M is called the bias vector of the layer M .Refer to Figure1andLiu et al. (2017), deep neural network algorithm based on MLP is described as Algorithm 1.

Figure 8 :
Figure 8: Performance of deep learning techniques in term of false alarm rate

Table 1 :
Deep learning techniques for intrusion detection and dataset they use

Table 1
Deep Learning Techniques for Cyber Security IntrusionDetection : A Detailed Analysis Ferrag • Maglaras • Janicke tive/unsupervised models include restricted Boltzmann machine (RBMs), deep belief networks (DBNs), deep Boltzmann machines (DBMs), and Deep autoencoders (DA).Depending on how these Deep learning techniques are intended for use, these techniques can be categorized into three major classes, including, 1) Deep networks for unsupervised or generative learning; 2) Deep networks for supervised learning; and 3) Hybrid deep networks. ));

Table 3 :
Deep Learning Techniques for Cyber Security Intrusion Detection : A Detailed AnalysisFerrag • Maglaras • Janicke Confusion matrix

Table 4 :
Performance of deep discriminative models relative to the different attack type and benign Attack +T N BENIGN T P Attack +F N Attack +T N BENIGN +F P BENIGN shows the performance of deep discriminative models relative to the different attack type and benign.It shows that deep neural network gives the highest true negative rate with 96.915%.The recurrent neural network gives the higest detection rate for seven attacks type, namely,

Table 5 :
Performance of generative/unsupervised models relative to the different attack type and benign

Table 6 :
The accuracy and training time of deep discriminative models with different learning rate and hidden nodes

Table 5 .
It shows that deep belief network gives the highest true negative rate with 98.212% and the higest detection rate for four attacks type, namely, Brute Force -XSS 92.281%, Brute Force -Web 91.427%, DoS attacks-Hulk

Table 7 :
The accuracy and training time of generative/unsupervised models with different learning rate and hidden nodes Hidden Nodes; LR: Learning Rate 91.712%, and DDOS attack-LOIC-HTTP 97.612%.The deep auto encoders gives the higest detection rate for three attacks type, namely, Brute Force -Web 95.311%, DoS attacks-Slowloris 97.120%, and Infilteration 97.818%.The deep Boltzmann machine gives the higest detection rate for five attacks type, namely, DoS attacks-Hulk 93.072%, DoS attacks-SlowHTTPTest 95.993%, DoS attacks-GoldenEye 97.421%, DDOS attack-LOIC-UDP 96.654%, and Botnet 97.812%.Table 6 presents the accuracy and training time of deep discriminative models with different learning rate and hidden nodes.Compared to both deep neural network