dl-ids: extracting features using cnn-lstm hybrid network ...apr 06, 2020  · table 2: distribution...

11
Research Article DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network for Intrusion Detection System Pengfei Sun, 1 Pengju Liu, 2 Qi Li, 3 Chenxi Liu, 2 Xiangling Lu, 2 Ruochen Hao, 2 and Jinpeng Chen 1 1 School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China 2 School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China 3 Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing 100084, China Correspondence should be addressed to Jinpeng Chen; [email protected] Received 6 April 2020; Revised 17 July 2020; Accepted 25 July 2020; Published 28 August 2020 Academic Editor: Huaming Wu Copyright © 2020 Pengfei Sun et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Many studies utilized machine learning schemes to improve network intrusion detection systems recently. Most of the research is based on manually extracted features, but this approach not only requires a lot of labor costs but also loses a lot of information in the original data, resulting in low judgment accuracy and cannot be deployed in actual situations. is paper develops a DL-IDS (deep learning-based intrusion detection system), which uses the hybrid network of Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) to extract the spatial and temporal features of network traffic data and to provide a better intrusion detection system. To reduce the influence of an unbalanced number of samples of different attack types in model training samples on model performance, DL-IDS used a category weight optimization method to improve the robustness. Finally, DL-IDS is tested on CICIDS2017, a reliable intrusion detection dataset that covers all the common, updated intrusions and cyberattacks. In the multiclassification test, DL-IDS reached 98.67% in overall accuracy, and the accuracy of each attack type was above 99.50%. 1. Introduction 1.1. Background. In recent years, with the rapid develop- ment of emerging communications and information tech- nologies such as 5G communications, mobile Internet, Internet of ings, “cloud computing,” and big data, net- work security has become increasingly important. As an important research content of network security, intrusion detection has been paid attention by experts and scholars. Problems that are common under traditional anomaly-based detection methods include the inaccurate feature extraction of network traffic and difficulty in building attack detection models, which leads to high false alarm rate when judging attack traffic. It is difficult for network security personnel to find unknown threats, which makes the defense inherently passive. In other words, traditional methods are no longer applicable to today’s Internet as per its massive data scale. In recent years, many scholars have explored how to use artificial intelligence (AI) to detect and analyze network traffic for intrusion detection and defense systems. Hassan et al. [1] proposed an ensemble-learning model based on the combination of a random subspace (RS) learning method with random tree (RT), which detected cyberattacks of SCADA by using the network traffic from the SCADA-based IoT platform. Khan and Gumaei [2] compared the most popular machine learning methods for intrusion detection in terms of accuracy, precision, recall, and training time cost. Alqahtani et al. [3] proposed GXGBoost model to detect intrusion attacks based on a genetic algorithm and an ex- treme gradient boosting (XGBoost) classifier. Derhab et al. [4] proposed a security architecture that integrates the Blockchain and the software-defined network (SDN) tech- nologies, which focuses on the security of commands in industrial IoT against forged commands and misrouting of Hindawi Security and Communication Networks Volume 2020, Article ID 8890306, 11 pages https://doi.org/10.1155/2020/8890306

Upload: others

Post on 16-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

Research ArticleDL-IDS Extracting Features Using CNN-LSTM HybridNetwork for Intrusion Detection System

Pengfei Sun1 Pengju Liu2 Qi Li3 Chenxi Liu2 Xiangling Lu2 Ruochen Hao2

and Jinpeng Chen 1

1School of Computer Science (National Pilot Software Engineering School) Beijing University of Posts and TelecommunicationsBeijing 100876 China2School of Cyberspace Security Beijing University of Posts and Telecommunications Beijing 100876 China3Institute for Network Sciences and Cyberspace Tsinghua University Beijing 100084 China

Correspondence should be addressed to Jinpeng Chen chenjinpengnlsdebuaaeducn

Received 6 April 2020 Revised 17 July 2020 Accepted 25 July 2020 Published 28 August 2020

Academic Editor Huaming Wu

Copyright copy 2020 Pengfei Sun et al is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Many studies utilized machine learning schemes to improve network intrusion detection systems recently Most of the research isbased on manually extracted features but this approach not only requires a lot of labor costs but also loses a lot of information inthe original data resulting in low judgment accuracy and cannot be deployed in actual situations is paper develops a DL-IDS(deep learning-based intrusion detection system) which uses the hybrid network of Convolutional Neural Network (CNN) andLong Short-Term Memory Network (LSTM) to extract the spatial and temporal features of network traffic data and to provide abetter intrusion detection system To reduce the influence of an unbalanced number of samples of different attack types in modeltraining samples on model performance DL-IDS used a category weight optimization method to improve the robustness FinallyDL-IDS is tested on CICIDS2017 a reliable intrusion detection dataset that covers all the common updated intrusions andcyberattacks In the multiclassification test DL-IDS reached 9867 in overall accuracy and the accuracy of each attack type wasabove 9950

1 Introduction

11 Background In recent years with the rapid develop-ment of emerging communications and information tech-nologies such as 5G communications mobile InternetInternet of ings ldquocloud computingrdquo and big data net-work security has become increasingly important As animportant research content of network security intrusiondetection has been paid attention by experts and scholarsProblems that are common under traditional anomaly-baseddetection methods include the inaccurate feature extractionof network traffic and difficulty in building attack detectionmodels which leads to high false alarm rate when judgingattack traffic It is difficult for network security personnel tofind unknown threats which makes the defense inherentlypassive In other words traditional methods are no longerapplicable to todayrsquos Internet as per its massive data scale

In recent years many scholars have explored how to useartificial intelligence (AI) to detect and analyze networktraffic for intrusion detection and defense systems Hassanet al [1] proposed an ensemble-learning model based on thecombination of a random subspace (RS) learning methodwith random tree (RT) which detected cyberattacks ofSCADA by using the network traffic from the SCADA-basedIoT platform Khan and Gumaei [2] compared the mostpopular machine learning methods for intrusion detectionin terms of accuracy precision recall and training time costAlqahtani et al [3] proposed GXGBoost model to detectintrusion attacks based on a genetic algorithm and an ex-treme gradient boosting (XGBoost) classifier Derhab et al[4] proposed a security architecture that integrates theBlockchain and the software-defined network (SDN) tech-nologies which focuses on the security of commands inindustrial IoT against forged commands and misrouting of

HindawiSecurity and Communication NetworksVolume 2020 Article ID 8890306 11 pageshttpsdoiorg10115520208890306

commands e current mainstream methods are intrusiondetection systems based on machine learning (ML) or deeplearning (DL) Among them the ML-based system mainlyclassifies and detects network traffic by analyzing themanually extracted features of network traffic while the DL-based system can not only analyze the manually extractedfeatures but also automatically extract the features from theoriginal traffic erefore DL-based systems can circumventthe manual feature extraction problem and enhance thedetection accuracy compared to general ML-based systems

To achieve higher accuracy DL-based intrusion detec-tion methods require a large amount of data for trainingespecially different types of attack traffic data In the actualenvironment and the existing datasets [KDD99 NSL-KDDand CICIDS2017] the attack traffic is always less comparedwith normal traffic Moreover because some types of attacktraffic are difficult to capture and simulate the amount ofdata available for model training is particularly small eseproblems greatly restrict the accuracy of the DL-basedmethod making it difficult to judge certain types of attacks

12 Key Contributions is paper proposes a DL-basedintrusion detection system DL-IDS which uses the hybridnetwork of Convolutional Neural Network (CNN) and LongShort-Term Memory Network (LSTM) to extract the tem-poral and spatial features of network traffic data to improvethe accuracy of intrusion detection In the model trainingphase DL-IDS uses category weights to optimize the modelis method reduces the effect of the number of unbalancedsamples of several attack types in model training samples onmodel performance and improves the robustness of trainingand prediction Finally we test DL-IDS to classify multipletypes of network traffic on the CICIDS2017 dataset andcompare it with the CNN-only model LSTM-only modeland othermachine learningmodels because CICIDS2017 is arecent original network traffic dataset simulating real situ-ations e results show that DL-IDS reached 9867 inoverall accuracy and the accuracy of each attack type wasabove 9950 which achieved the best results in all models

13 Paper Organization e remainder of this paper isorganized as follows Section 2 discusses the classification ofabnormal traffic as per previously published studies Section3 describes in detail the datasets and data preprocessingmethods we used in this study e classifier structure andclassificationmethods used for traffic classification under theproposed model are described in Section 4 Section 5presents the results of our model evaluation with varioushyperparameters Section 6 summarizes the paper anddiscusses potential future development trends

2 Related Work

With the continual expansion of the Internet network se-curity has become a problem that cannot be ignoredMalicious network behaviors such as DDos and brute forceattacks tend to be ldquomixed intordquo malicious traffic Securityresearchers seek to effectively analyze the malicious traffic in

a given network so as to identify potential attacks andquickly stop them [5ndash8]

21 Traditional Intrusion Detection System Traditionalmethods of intrusion detection mainly include statisticalanalysis methods [9] threshold analysis methods [10] andsignature analysis methods [11] ese methods do revealmalicious traffic behavior however they require securityresearchers to input data related to their personal experi-ence to this effect their various rules and set parameters arevery inefficient Said ldquoexperiencerdquo is also only a summary ofthe malicious traffic behavior found in the past and istypically difficult to quantify so these methods cannot bereadily adapted to the huge amount of network data andvolatile network attacks of todayrsquos Internet

22 IntrusionDetection SystemBased onML Advancements inmachine learning have produced models that effectivelyclassify and cluster traffic for the purposes of network se-curity Early researchers attempted simple machine learningalgorithms for classification-clustering problems in otherfields such as the k-Nearest Neighbor (KNN) [12] supportvector machine (SVM) [13] and self-organizing maps(SOM) [14] with good results on KDD99 NSL-KDDDARPA and other datasets ese datasets are out of dateunfortunately and contain not only normal data but alsoattack data that are overly simple It is difficult to use thesedatasets to simulate todayrsquos highly complex network envi-ronment It is also difficult to achieve the expected effectusing these algorithms to analyze malicious traffic in arelatively new dataset as evidenced by our work in thisstudy

23 Intrusion Detection System Based on Deep NeuralNetwork e success of machine learning algorithmsgenerally depends on data representation [15] Represen-tation learning also called feature learning is a technique indeep neural network which can be used to learn the ex-planatory factors of variation behind the data Ma et alcombine spectral clustering and deep neural network al-gorithms to detect intrusion behaviors [16] Niyaz et al useddeep belief networks for developing an efficient and flexibleintrusion detection system [17] But these research methodsconstruct their models to learn representations frommanually designed traffic features not taking full advantageof the ability of deep neural networks Eesa et al showed thathigher detection rate and accuracy rate with lower falsealarm rate can be obtained by using improved traffic featureset [18] Learning features directly from traffic raw datashould be feasible such as in the fields of computer visionand natural language processing [19]

Two most widely used deep neural network models areCNN and RNN e CNN uses original data as the directinput to the network does not necessitate feature extractionor image reconstruction has relatively few parameters andrequires relatively little data in process CNNs have beenproven to be highly effective in the field of image recognition

2 Security and Communication Networks

[20] For certain network traffic of protocols CNNs canperform well through fast training Fan and Ling-zhi [21]extracted very accurate features by using a multilayer CNNwherein the convolution layer connects to the sampling layerbelow this model outperformed classical detection algo-rithms (eg SVM) on the KDD99 dataset However theCNN can only analyze a single input packagemdashit cannotanalyze timing information in a given traffic scenario Inreality a single packet in an attack traffic scenario is normaldata When a large number of packets are sent at the sametime or in a short period this packet becomes malicioustraffic e CNN does not apply in this situation which inpractice may lead to a large number of missed alerts

e recurrent neural network (RNN) is also often used toanalyze sequential information LSTM a branch of RNNsperforms well in sequence information analysis applicationssuch as natural language processing Kim et al [22] com-pared the LSTM-RNN network against Generalized Re-gression Neural Network (GRNN) Product-based NeuralNetwork (PNN) k-Nearest Neighbor (KNN) SVMBayesian and other algorithms on the KDD99 dataset to findthat it was superior in every aspect they tested e LSTMnetwork alone however centers on a direct relationshipbetween sequences rather than the analysis of a single packetso it cannot readily replace the CNN in this regard

Wu and Guo [23] proposed a hierarchical CNN+RNNneural network and carried out experiments on NSL-KDDand UNSW-NB15 datasets Hsu et al [24] and Ahsan andNygard [25] used another CNN+LSTM model to performmulticlassification experiments on the NSL-KDD datasetHassan et al [26] proposed a hybrid deep learning model todetect network intrusions based on CNN network and aweight-dropped long short-term memory (WDLSTM)network is paper mainly conducted experiments onUNSW-NB15 dataset However these studies are still basedon extracted features in advance

Abdulhammed et al [27] used Autoencoder and Prin-ciple Component Analysis to reduce the CICIDS2017datasetrsquos feature dimensions e resulting low-dimensionalfeatures from both techniques are then used to build variousclassifiers to detect malicious attacks Musafer et al [28]proposed a novel architecture of IDS based on advancedSparse Autoencoder and Random Forest to classify thepatterns of the normal packets from those of the networkattacks and got good results

In this study we adopted a malicious traffic analysismethod based on CNN and LSTM to extract and analyzenetwork traffic information of network raw dataset fromboth spatial and temporal dimensions We conductedtraining and testing based on the CICIDS2017 dataset thatwell simulates the real network environment We ran a seriesof experiments to show that the proposed model facilitatesvery effective malicious flow analysis

3 Datasets and Preprocessing

31 Dataset e IDS is the most important defense toolagainst complex and large-scale network attacks but the lackof available public dataset yet hinders its further

development Many researchers have used private datawithin a single company or conducted manual data col-lection to test IDS applications which affects the credibilityof their results to some extent Public datasets such asKDD99 and NSL-KDD [29] are comprised of dataencompassing manually selected stream characteristicsrather than original network traffic e timing of the datacollection is also outdated compared to modern attackmethods

In this study in an effort to best reflect real trafficscenarios in real networks as well as newer means of attackwe chose the CICIDS2017 dataset (Canadian Institute forCybersecurity) [30] which contains benign traffic and up-to-date common attack traffic representative of a real networkenvironment is dataset constructs the abstract behaviorof 25 users based on HTTP HTTPS FTP SSH and e-mailprotocols to accurately simulate a real network environmente data capture period was from 9 am on July 3 2017 to 5pm on July 7 2017 a total of 511 g data flow was generatedover this five-day periode attack traffic collected includeseight types of attack FTP-Patator SSH-Patator DoSHeartbleed Web Attack Infiltration Botnet and DDoS Asshown in Table 1 the attacks were carried out on TuesdayWednesday ursday and Friday morning and afternoonNormal traffic was generated throughout the day onMondayand during the nonaggressive period from Tuesday to Fri-day e data type for this dataset is a pcap file

After acquiring the dataset we analyzed the original dataand selected seven types of data for subsequent assessmentaccording to the amount of data and its noise rate ey areNormal FTP-Patator SSH-Patator DoS Heartbleed In-filtration and PortScan

32NetworkTraffic SegmentationMethod e format of theCICIDS2017 dataset is one pcap file per day ese pcap filescontain a great deal of information which is not conduciveto training the machineerefore the primary task of trafficclassification based on machine learning is to divide con-tinuous pcap files into several discrete units according to acertain granularity ere are six ways to slice networktraffic by TCP by connection by network flow by sessionby service class and by host When the original traffic data issegmented according to different methods it splits into quitedifferent forms so the selected network traffic segmentationmethod markedly influences the subsequent analysis

We adopted a session sharding method in this study Asession is any packet that consists of a bidirectional flow thatis any packet that has the same quad (source IP source portdestination IP destination port and transport layer pro-tocol) and interchangeable source and destination addressesand ports

33 Data Preprocessing Data preprocessing begins with theoriginal flow namely the data in pcap format for formattingthe model input data e CICIDS2017 dataset provides anoriginal dataset in pcap format and a CSV file detailing someof the traffic To transform the original data into the modelinput format we conducted time division traffic

Security and Communication Networks 3

segmentation PKL file generation PKL file labeling matrixgeneration and one_hot encoding A flow chart of thisprocess is given in Figure 1

Step 1 (time division) Time division refers to interceptingthe pcap file of the corresponding period from the originalpcap file according to the attack time and type [30] einput format is again a pcap file the output format is still apcap filee time periods corresponding to the specific typeand the size of the file are shown in Table 2

Step 2 (traffic segmentation) Traffic segmentation refers todividing the pcap file obtained in Step 1 into correspondingsessions by sharding according to the IP of attack host andvictim host corresponding to each time period [31] especific process is shown in Figure 2 is step involvesshredding the pcap file of Step 1 into the corresponding flowusing pkt2flow [31] which can split the pcap package intothe flow format (ie the original pcap package is divided intodifferent pcap packages according to the flow with differentfive-tuples) Next the pcap package is merged under thepremise that the source and destination are interchangeableFinally the pcap package of Step 1 is divided into sessions

Step 3 (generate the PKL file) and Step 4 (tag the PKL file) Asshown in Table 1 the pcap file is still large in size afterextraction this creates a serious challenge for data reading inthe model To accelerate the data reading process wepackaged the traffic using the pickle tool in Python We usePortScan type traffic as an example of the packaging processhere In this class many sessions are generated after Step 2each of which is saved in a pcap file We generated a label ofthe corresponding attack type for each session Each sessioncontains several data flows and each data flow contains an n1packet We then saved the n2 sessions in a PKL file to speedup the process of reading the data n1 can be changed asneeded according to the experimental results we finallyselected the best value that is n1 8 e value of n2 can becalculated by formula (1) In this case we packaged each typeof sessions into a PKL filee structure of the entire PKL fileis shown in Figure 3

n2 total number of packets of this type

n1 (1)

Step 5 (matrix generation) e input of the model musthave a fixed length so the next step is to unify the lengthof each session e difference between each attack ismainly in the header so we dealt with the packet

according to the uniform length of PACKET_LEN bytesthat is if the packet length was greater than PACK-ET_LEN then bytes were intercepted and if the packetlength was less than PACKET_LEN bytes were filledwith -1 Each session is then divided into a matrixMAX_PACKET_NUM_PER_SESSIONlowastPACKET_LENAccording to the results of our experiment we finallychose MAX_PACKET_NUM_PER_SESSION as 8 andPACKET_LEN as 40

Step 6 (one_hot encoding) To effectively learn and classifythe model the data from Step 5 are processed by one_hotencoding to convert qualitative features into quantitativefeatures

Table 1 CICIDS2017 dataset

Date Type SizeMonday Normal 110GBTuesday Normal + Force + SFTP+ SSH 110GBWednesday Normal +Dos +Heartbleed Attacks 13GBursday Normal +XSS +Web Attack + Infiltration 78GBFriday Normal + Botnet + PortScan +DDoS 83GB

pacp files

Unify the length

Time division

Traffic segmentation

Generate PKL files

Label PKL files

Generate the matrix

One_hot encoding

Input into the model

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Figure 1 Data preprocessing process

Table 2 Distribution of network traffic periods in CICIDS2017dataset

Attack Time SizeNormal Monday 11GBFTP-Patator Tuesday (9 20ndash10 20) 12MBSSH-Patator Tuesday (14 00ndash15 00) 25MBDoS Wednesday (9 47ndash11 23) 23GBHeartbleed Wednesday (15 12ndash15 32) 79MBInfiltration ursday (14 19ndash15 45) 21MBPortScan Friday (13 55ndash14 35) 31MB

4 Security and Communication Networks

ohei(B num) num B i

0 Bne i1113896

OHEj ohej1 ohej(Length)1113872 1113873 ohe1 oplus oplus oheLength

Inputk

OHE1

OHE2

OHEN

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(2)

where B is a byte in the data packet num is the number usedfor encoding in one_hot encoding In the model imple-mentation num 1 ohei is a bit in the one_hot encoding of abyte oplus is the series notation and OHEj is the one_hotencoding of a byte

4 DL-IDS Architecture

is section introduces the traffic classifier we built into DL-IDS which uses a combination of CNN and LSTM to learnand classify traffic packets in both time and space Accordingto the different characteristics of different types of traffic wealso used the weight of classes to improve the stability of themodel

e overall architecture of the classifier is shown inFigure 4 e classifier is composed of CNN and LSTMe CNN section is composed of an input and embeddedlayer convolution layer 1 pooling layer 2 convolutionlayer 3 pooling layer 4 and full connection layer 5 Uponreceiving a preprocessed PKL file the CNN sectionprocesses it and returns a high-dimensional packagevector to the LSTM section e LSTM section is

composed of the LSTM layer 6 LSTM layer 7 full con-nection layer 8 and the OUTPUT layer It can process aseries of high-dimensional package vectors and output avector that represents the probability that the sessionbelongs to each class e Softmax layer outputs the finalresult of the classification according to the vector ofprobability

41 CNN in DL-IDS We converted the data packets ob-tained from the preprocessed data into a traffic image eso-called traffic image is a combination of all or part of thebit values of a network traffic packet into a two-dimensionalmatrix e data in the network traffic packet is composed ofbytes e value range of the bytes is 0ndash255 which is thesame as the value range of the bytes in images We took the xbyte in the header of a packet and the y byte in the payloadand composed them into a traffic image for subsequentprocessing as discussed below

As mentioned above the CNN section is composed ofinput and embedded layers convolution layer 1 poolinglayer 2 convolution layer 3 pooling layer 4 and fullconnection layer 5 We combined convolution layer 1 andpooling layer 2 into Combination I and convolution layer 3and pooling layer 4 into Combination II Each Combi-nation allows for the analysis of input layer characteristicsfrom different prospective In Combination I a convo-lution layer with a small convolution kernel is used toextract local features of traffic image details (eg IP andPort) Clear-cut features and stable results can be obtainedin the pooling layer In Combination II a large convo-lution kernel is used to analyze the relationship betweentwo bits that are far apart such as information in the trafficpayload

After preprocessing and one_hot coding the networktraffic constitutes the input vector of the input layer In theinput layer length information is intercepted from the ithpacket Pkgi (B1 B2 BLength) followed by synthesis S ofn Pkgs information set S (Pkg1 Pkg2 Pkgn)

Formula 3 shows a convolution layer where f is the sidelength of the convolution kernel In the two convolutionlayers f 7 and f 5 s is stride p is padding b is bias w isweight c is channel l is layer Ll is the size of Zl and Z(i j) isthe pixel of the corresponding feature map Additionally(i j) isin 0 1 2 Ll+11113864 1113865Ll+1 ((Ll + 2p minus f)s) + 1

PKL file

Session

Flow

Packet

A format forstoring files in

python fordirect code

reading

Packagesconsisting of

two-wayflows

Packages withthe samequintile

Data units in TCPIPprotocol

communication andtransmission

Figure 3 Structure of entire PKL file

Output

pcap files from Step 1

Traffic to flow

Merge the flow

Figure 2 Traffic segmentation process

Security and Communication Networks 5

Zl+1(i j) 1113944c

k11113944

f

x11113944

f

y1Z1

k(slowast i + x slowast j + y)lowastwl+1k (x y)1113960 1113961 + b

(3)

e convolution layer contains an activation function(formula 4) that assists in the expression of complex featuresK is the number of channels in the characteristic graph and Arepresents the output vector of the Z vector through theactivation function We used sigmoid and ReLU respec-tively after two convolution layers

A1ijk f Z1

ijk1113872 1113873 (4)

After feature extraction in the convolution layer theoutput image is transferred to the pooling layer for featureselection and information filtering e pooling layer con-tains a preset pooling function that replaces the result of asingle point in the feature map with the feature graphstatistics of its adjacent region e pooling layer is calcu-lated by formula 5 where p is the prespecified parameterWeapplied the maximum pooling in this study that isp⟶infin

Alk(i j) 1113944

f

x11113944

f

y1Al

k(slowast i + x slowast j + y)p⎡⎢⎢⎣ ⎤⎥⎥⎦

1p

(5)

We also used a back-propagation algorithm to adjust themodel parameters In the weight adjustment algorithm(formula 6) δ is delta error of loss function to the layer andα is the learning rate

wl wl minus α1113944 δl lowastAlminus1

(6)

We used the categorical cross-entropy algorithm in theloss function In order to reduce training time and enhancethe gradient descent accuracy we used the RmsProp opti-mization function to adjust the learning rate

After two convolution and pooling operations weextracted the entire traffic image into a smaller feature blockwhich represents the feature information of the whole trafficpacket e block can then be fed into the RNN system as aninput to the RNN layer

42 LSTM in DL-IDS Normal network communicationsand network attacks both are carried out according to acertain network protocol is means that attack packetsmust be ensconced in traffic alongside packets containingfixed parts of the network protocol such as normal con-nection establishments key exchanges connections anddisconnections In the normal portion of the attack trafficno data can be used to determine whether the packet isintended to cause an attack Using a CNN alone to train thecharacteristics of a single packet as the basis for the system tojudge the nature of the traffic makes the data difficult tomark leaves too much ldquodirtyrdquo data in the traffic and pro-duces altogether poor training results In this study weremedied this by introducing the LSTM which takes thedata of a single connection (from initiation to disconnec-tion) as a group and judges the characteristics of all datapackets in said group and the relations among them as abasis to judge the nature of the traffic e natural languageprocessing model performs well in traffic informationprocessing [32] under a similar methodology as the groupingjudgment method proposed here

e LSTM section is composed of LSTM layer 6 LSTMlayer 7 full connection layer 8 and Softmax and outputlayers e main functions are realized by two LSTM layerse LSTM is a special RNN designed to resolve gradientdisappearance and gradient explosion problems in theprocess of long sequence training General RNN networksonly have one tanh layer while LSTM networks performbetter processing timing prediction through their uniqueforgetting and selective memory gates Here we call theLSTM node a cell (Ct) the input and output of which are xt

and ht respectivelye first step in the LSTM layer is to determine what

information the model will discard from the cell state isdecision is made through the forgetting gate (formula 7)egate reads htminus1 and xt and outputs a value between 0 and 1 toeach number in the Ctminus1 cell state 1 means ldquocompletelyretainedrdquo and 0 means ldquocompletely discardedrdquo W and b areweight and bias in the neural network respectively

ft σ Wf middot htminus1 xt1113858 1113859 + bf1113872 1113873 (7)

Input C1 S2 C3 S4S5 L6 L7 F8

Output

SomaxLSTMLSTM

Convolution MaxPoolbullbullbull

Convolution

MaxPoolGlobalMaxPool bullbullbull

Figure 4 Architecture of DL-IDS

6 Security and Communication Networks

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 2: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

commands e current mainstream methods are intrusiondetection systems based on machine learning (ML) or deeplearning (DL) Among them the ML-based system mainlyclassifies and detects network traffic by analyzing themanually extracted features of network traffic while the DL-based system can not only analyze the manually extractedfeatures but also automatically extract the features from theoriginal traffic erefore DL-based systems can circumventthe manual feature extraction problem and enhance thedetection accuracy compared to general ML-based systems

To achieve higher accuracy DL-based intrusion detec-tion methods require a large amount of data for trainingespecially different types of attack traffic data In the actualenvironment and the existing datasets [KDD99 NSL-KDDand CICIDS2017] the attack traffic is always less comparedwith normal traffic Moreover because some types of attacktraffic are difficult to capture and simulate the amount ofdata available for model training is particularly small eseproblems greatly restrict the accuracy of the DL-basedmethod making it difficult to judge certain types of attacks

12 Key Contributions is paper proposes a DL-basedintrusion detection system DL-IDS which uses the hybridnetwork of Convolutional Neural Network (CNN) and LongShort-Term Memory Network (LSTM) to extract the tem-poral and spatial features of network traffic data to improvethe accuracy of intrusion detection In the model trainingphase DL-IDS uses category weights to optimize the modelis method reduces the effect of the number of unbalancedsamples of several attack types in model training samples onmodel performance and improves the robustness of trainingand prediction Finally we test DL-IDS to classify multipletypes of network traffic on the CICIDS2017 dataset andcompare it with the CNN-only model LSTM-only modeland othermachine learningmodels because CICIDS2017 is arecent original network traffic dataset simulating real situ-ations e results show that DL-IDS reached 9867 inoverall accuracy and the accuracy of each attack type wasabove 9950 which achieved the best results in all models

13 Paper Organization e remainder of this paper isorganized as follows Section 2 discusses the classification ofabnormal traffic as per previously published studies Section3 describes in detail the datasets and data preprocessingmethods we used in this study e classifier structure andclassificationmethods used for traffic classification under theproposed model are described in Section 4 Section 5presents the results of our model evaluation with varioushyperparameters Section 6 summarizes the paper anddiscusses potential future development trends

2 Related Work

With the continual expansion of the Internet network se-curity has become a problem that cannot be ignoredMalicious network behaviors such as DDos and brute forceattacks tend to be ldquomixed intordquo malicious traffic Securityresearchers seek to effectively analyze the malicious traffic in

a given network so as to identify potential attacks andquickly stop them [5ndash8]

21 Traditional Intrusion Detection System Traditionalmethods of intrusion detection mainly include statisticalanalysis methods [9] threshold analysis methods [10] andsignature analysis methods [11] ese methods do revealmalicious traffic behavior however they require securityresearchers to input data related to their personal experi-ence to this effect their various rules and set parameters arevery inefficient Said ldquoexperiencerdquo is also only a summary ofthe malicious traffic behavior found in the past and istypically difficult to quantify so these methods cannot bereadily adapted to the huge amount of network data andvolatile network attacks of todayrsquos Internet

22 IntrusionDetection SystemBased onML Advancements inmachine learning have produced models that effectivelyclassify and cluster traffic for the purposes of network se-curity Early researchers attempted simple machine learningalgorithms for classification-clustering problems in otherfields such as the k-Nearest Neighbor (KNN) [12] supportvector machine (SVM) [13] and self-organizing maps(SOM) [14] with good results on KDD99 NSL-KDDDARPA and other datasets ese datasets are out of dateunfortunately and contain not only normal data but alsoattack data that are overly simple It is difficult to use thesedatasets to simulate todayrsquos highly complex network envi-ronment It is also difficult to achieve the expected effectusing these algorithms to analyze malicious traffic in arelatively new dataset as evidenced by our work in thisstudy

23 Intrusion Detection System Based on Deep NeuralNetwork e success of machine learning algorithmsgenerally depends on data representation [15] Represen-tation learning also called feature learning is a technique indeep neural network which can be used to learn the ex-planatory factors of variation behind the data Ma et alcombine spectral clustering and deep neural network al-gorithms to detect intrusion behaviors [16] Niyaz et al useddeep belief networks for developing an efficient and flexibleintrusion detection system [17] But these research methodsconstruct their models to learn representations frommanually designed traffic features not taking full advantageof the ability of deep neural networks Eesa et al showed thathigher detection rate and accuracy rate with lower falsealarm rate can be obtained by using improved traffic featureset [18] Learning features directly from traffic raw datashould be feasible such as in the fields of computer visionand natural language processing [19]

Two most widely used deep neural network models areCNN and RNN e CNN uses original data as the directinput to the network does not necessitate feature extractionor image reconstruction has relatively few parameters andrequires relatively little data in process CNNs have beenproven to be highly effective in the field of image recognition

2 Security and Communication Networks

[20] For certain network traffic of protocols CNNs canperform well through fast training Fan and Ling-zhi [21]extracted very accurate features by using a multilayer CNNwherein the convolution layer connects to the sampling layerbelow this model outperformed classical detection algo-rithms (eg SVM) on the KDD99 dataset However theCNN can only analyze a single input packagemdashit cannotanalyze timing information in a given traffic scenario Inreality a single packet in an attack traffic scenario is normaldata When a large number of packets are sent at the sametime or in a short period this packet becomes malicioustraffic e CNN does not apply in this situation which inpractice may lead to a large number of missed alerts

e recurrent neural network (RNN) is also often used toanalyze sequential information LSTM a branch of RNNsperforms well in sequence information analysis applicationssuch as natural language processing Kim et al [22] com-pared the LSTM-RNN network against Generalized Re-gression Neural Network (GRNN) Product-based NeuralNetwork (PNN) k-Nearest Neighbor (KNN) SVMBayesian and other algorithms on the KDD99 dataset to findthat it was superior in every aspect they tested e LSTMnetwork alone however centers on a direct relationshipbetween sequences rather than the analysis of a single packetso it cannot readily replace the CNN in this regard

Wu and Guo [23] proposed a hierarchical CNN+RNNneural network and carried out experiments on NSL-KDDand UNSW-NB15 datasets Hsu et al [24] and Ahsan andNygard [25] used another CNN+LSTM model to performmulticlassification experiments on the NSL-KDD datasetHassan et al [26] proposed a hybrid deep learning model todetect network intrusions based on CNN network and aweight-dropped long short-term memory (WDLSTM)network is paper mainly conducted experiments onUNSW-NB15 dataset However these studies are still basedon extracted features in advance

Abdulhammed et al [27] used Autoencoder and Prin-ciple Component Analysis to reduce the CICIDS2017datasetrsquos feature dimensions e resulting low-dimensionalfeatures from both techniques are then used to build variousclassifiers to detect malicious attacks Musafer et al [28]proposed a novel architecture of IDS based on advancedSparse Autoencoder and Random Forest to classify thepatterns of the normal packets from those of the networkattacks and got good results

In this study we adopted a malicious traffic analysismethod based on CNN and LSTM to extract and analyzenetwork traffic information of network raw dataset fromboth spatial and temporal dimensions We conductedtraining and testing based on the CICIDS2017 dataset thatwell simulates the real network environment We ran a seriesof experiments to show that the proposed model facilitatesvery effective malicious flow analysis

3 Datasets and Preprocessing

31 Dataset e IDS is the most important defense toolagainst complex and large-scale network attacks but the lackof available public dataset yet hinders its further

development Many researchers have used private datawithin a single company or conducted manual data col-lection to test IDS applications which affects the credibilityof their results to some extent Public datasets such asKDD99 and NSL-KDD [29] are comprised of dataencompassing manually selected stream characteristicsrather than original network traffic e timing of the datacollection is also outdated compared to modern attackmethods

In this study in an effort to best reflect real trafficscenarios in real networks as well as newer means of attackwe chose the CICIDS2017 dataset (Canadian Institute forCybersecurity) [30] which contains benign traffic and up-to-date common attack traffic representative of a real networkenvironment is dataset constructs the abstract behaviorof 25 users based on HTTP HTTPS FTP SSH and e-mailprotocols to accurately simulate a real network environmente data capture period was from 9 am on July 3 2017 to 5pm on July 7 2017 a total of 511 g data flow was generatedover this five-day periode attack traffic collected includeseight types of attack FTP-Patator SSH-Patator DoSHeartbleed Web Attack Infiltration Botnet and DDoS Asshown in Table 1 the attacks were carried out on TuesdayWednesday ursday and Friday morning and afternoonNormal traffic was generated throughout the day onMondayand during the nonaggressive period from Tuesday to Fri-day e data type for this dataset is a pcap file

After acquiring the dataset we analyzed the original dataand selected seven types of data for subsequent assessmentaccording to the amount of data and its noise rate ey areNormal FTP-Patator SSH-Patator DoS Heartbleed In-filtration and PortScan

32NetworkTraffic SegmentationMethod e format of theCICIDS2017 dataset is one pcap file per day ese pcap filescontain a great deal of information which is not conduciveto training the machineerefore the primary task of trafficclassification based on machine learning is to divide con-tinuous pcap files into several discrete units according to acertain granularity ere are six ways to slice networktraffic by TCP by connection by network flow by sessionby service class and by host When the original traffic data issegmented according to different methods it splits into quitedifferent forms so the selected network traffic segmentationmethod markedly influences the subsequent analysis

We adopted a session sharding method in this study Asession is any packet that consists of a bidirectional flow thatis any packet that has the same quad (source IP source portdestination IP destination port and transport layer pro-tocol) and interchangeable source and destination addressesand ports

33 Data Preprocessing Data preprocessing begins with theoriginal flow namely the data in pcap format for formattingthe model input data e CICIDS2017 dataset provides anoriginal dataset in pcap format and a CSV file detailing someof the traffic To transform the original data into the modelinput format we conducted time division traffic

Security and Communication Networks 3

segmentation PKL file generation PKL file labeling matrixgeneration and one_hot encoding A flow chart of thisprocess is given in Figure 1

Step 1 (time division) Time division refers to interceptingthe pcap file of the corresponding period from the originalpcap file according to the attack time and type [30] einput format is again a pcap file the output format is still apcap filee time periods corresponding to the specific typeand the size of the file are shown in Table 2

Step 2 (traffic segmentation) Traffic segmentation refers todividing the pcap file obtained in Step 1 into correspondingsessions by sharding according to the IP of attack host andvictim host corresponding to each time period [31] especific process is shown in Figure 2 is step involvesshredding the pcap file of Step 1 into the corresponding flowusing pkt2flow [31] which can split the pcap package intothe flow format (ie the original pcap package is divided intodifferent pcap packages according to the flow with differentfive-tuples) Next the pcap package is merged under thepremise that the source and destination are interchangeableFinally the pcap package of Step 1 is divided into sessions

Step 3 (generate the PKL file) and Step 4 (tag the PKL file) Asshown in Table 1 the pcap file is still large in size afterextraction this creates a serious challenge for data reading inthe model To accelerate the data reading process wepackaged the traffic using the pickle tool in Python We usePortScan type traffic as an example of the packaging processhere In this class many sessions are generated after Step 2each of which is saved in a pcap file We generated a label ofthe corresponding attack type for each session Each sessioncontains several data flows and each data flow contains an n1packet We then saved the n2 sessions in a PKL file to speedup the process of reading the data n1 can be changed asneeded according to the experimental results we finallyselected the best value that is n1 8 e value of n2 can becalculated by formula (1) In this case we packaged each typeof sessions into a PKL filee structure of the entire PKL fileis shown in Figure 3

n2 total number of packets of this type

n1 (1)

Step 5 (matrix generation) e input of the model musthave a fixed length so the next step is to unify the lengthof each session e difference between each attack ismainly in the header so we dealt with the packet

according to the uniform length of PACKET_LEN bytesthat is if the packet length was greater than PACK-ET_LEN then bytes were intercepted and if the packetlength was less than PACKET_LEN bytes were filledwith -1 Each session is then divided into a matrixMAX_PACKET_NUM_PER_SESSIONlowastPACKET_LENAccording to the results of our experiment we finallychose MAX_PACKET_NUM_PER_SESSION as 8 andPACKET_LEN as 40

Step 6 (one_hot encoding) To effectively learn and classifythe model the data from Step 5 are processed by one_hotencoding to convert qualitative features into quantitativefeatures

Table 1 CICIDS2017 dataset

Date Type SizeMonday Normal 110GBTuesday Normal + Force + SFTP+ SSH 110GBWednesday Normal +Dos +Heartbleed Attacks 13GBursday Normal +XSS +Web Attack + Infiltration 78GBFriday Normal + Botnet + PortScan +DDoS 83GB

pacp files

Unify the length

Time division

Traffic segmentation

Generate PKL files

Label PKL files

Generate the matrix

One_hot encoding

Input into the model

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Figure 1 Data preprocessing process

Table 2 Distribution of network traffic periods in CICIDS2017dataset

Attack Time SizeNormal Monday 11GBFTP-Patator Tuesday (9 20ndash10 20) 12MBSSH-Patator Tuesday (14 00ndash15 00) 25MBDoS Wednesday (9 47ndash11 23) 23GBHeartbleed Wednesday (15 12ndash15 32) 79MBInfiltration ursday (14 19ndash15 45) 21MBPortScan Friday (13 55ndash14 35) 31MB

4 Security and Communication Networks

ohei(B num) num B i

0 Bne i1113896

OHEj ohej1 ohej(Length)1113872 1113873 ohe1 oplus oplus oheLength

Inputk

OHE1

OHE2

OHEN

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(2)

where B is a byte in the data packet num is the number usedfor encoding in one_hot encoding In the model imple-mentation num 1 ohei is a bit in the one_hot encoding of abyte oplus is the series notation and OHEj is the one_hotencoding of a byte

4 DL-IDS Architecture

is section introduces the traffic classifier we built into DL-IDS which uses a combination of CNN and LSTM to learnand classify traffic packets in both time and space Accordingto the different characteristics of different types of traffic wealso used the weight of classes to improve the stability of themodel

e overall architecture of the classifier is shown inFigure 4 e classifier is composed of CNN and LSTMe CNN section is composed of an input and embeddedlayer convolution layer 1 pooling layer 2 convolutionlayer 3 pooling layer 4 and full connection layer 5 Uponreceiving a preprocessed PKL file the CNN sectionprocesses it and returns a high-dimensional packagevector to the LSTM section e LSTM section is

composed of the LSTM layer 6 LSTM layer 7 full con-nection layer 8 and the OUTPUT layer It can process aseries of high-dimensional package vectors and output avector that represents the probability that the sessionbelongs to each class e Softmax layer outputs the finalresult of the classification according to the vector ofprobability

41 CNN in DL-IDS We converted the data packets ob-tained from the preprocessed data into a traffic image eso-called traffic image is a combination of all or part of thebit values of a network traffic packet into a two-dimensionalmatrix e data in the network traffic packet is composed ofbytes e value range of the bytes is 0ndash255 which is thesame as the value range of the bytes in images We took the xbyte in the header of a packet and the y byte in the payloadand composed them into a traffic image for subsequentprocessing as discussed below

As mentioned above the CNN section is composed ofinput and embedded layers convolution layer 1 poolinglayer 2 convolution layer 3 pooling layer 4 and fullconnection layer 5 We combined convolution layer 1 andpooling layer 2 into Combination I and convolution layer 3and pooling layer 4 into Combination II Each Combi-nation allows for the analysis of input layer characteristicsfrom different prospective In Combination I a convo-lution layer with a small convolution kernel is used toextract local features of traffic image details (eg IP andPort) Clear-cut features and stable results can be obtainedin the pooling layer In Combination II a large convo-lution kernel is used to analyze the relationship betweentwo bits that are far apart such as information in the trafficpayload

After preprocessing and one_hot coding the networktraffic constitutes the input vector of the input layer In theinput layer length information is intercepted from the ithpacket Pkgi (B1 B2 BLength) followed by synthesis S ofn Pkgs information set S (Pkg1 Pkg2 Pkgn)

Formula 3 shows a convolution layer where f is the sidelength of the convolution kernel In the two convolutionlayers f 7 and f 5 s is stride p is padding b is bias w isweight c is channel l is layer Ll is the size of Zl and Z(i j) isthe pixel of the corresponding feature map Additionally(i j) isin 0 1 2 Ll+11113864 1113865Ll+1 ((Ll + 2p minus f)s) + 1

PKL file

Session

Flow

Packet

A format forstoring files in

python fordirect code

reading

Packagesconsisting of

two-wayflows

Packages withthe samequintile

Data units in TCPIPprotocol

communication andtransmission

Figure 3 Structure of entire PKL file

Output

pcap files from Step 1

Traffic to flow

Merge the flow

Figure 2 Traffic segmentation process

Security and Communication Networks 5

Zl+1(i j) 1113944c

k11113944

f

x11113944

f

y1Z1

k(slowast i + x slowast j + y)lowastwl+1k (x y)1113960 1113961 + b

(3)

e convolution layer contains an activation function(formula 4) that assists in the expression of complex featuresK is the number of channels in the characteristic graph and Arepresents the output vector of the Z vector through theactivation function We used sigmoid and ReLU respec-tively after two convolution layers

A1ijk f Z1

ijk1113872 1113873 (4)

After feature extraction in the convolution layer theoutput image is transferred to the pooling layer for featureselection and information filtering e pooling layer con-tains a preset pooling function that replaces the result of asingle point in the feature map with the feature graphstatistics of its adjacent region e pooling layer is calcu-lated by formula 5 where p is the prespecified parameterWeapplied the maximum pooling in this study that isp⟶infin

Alk(i j) 1113944

f

x11113944

f

y1Al

k(slowast i + x slowast j + y)p⎡⎢⎢⎣ ⎤⎥⎥⎦

1p

(5)

We also used a back-propagation algorithm to adjust themodel parameters In the weight adjustment algorithm(formula 6) δ is delta error of loss function to the layer andα is the learning rate

wl wl minus α1113944 δl lowastAlminus1

(6)

We used the categorical cross-entropy algorithm in theloss function In order to reduce training time and enhancethe gradient descent accuracy we used the RmsProp opti-mization function to adjust the learning rate

After two convolution and pooling operations weextracted the entire traffic image into a smaller feature blockwhich represents the feature information of the whole trafficpacket e block can then be fed into the RNN system as aninput to the RNN layer

42 LSTM in DL-IDS Normal network communicationsand network attacks both are carried out according to acertain network protocol is means that attack packetsmust be ensconced in traffic alongside packets containingfixed parts of the network protocol such as normal con-nection establishments key exchanges connections anddisconnections In the normal portion of the attack trafficno data can be used to determine whether the packet isintended to cause an attack Using a CNN alone to train thecharacteristics of a single packet as the basis for the system tojudge the nature of the traffic makes the data difficult tomark leaves too much ldquodirtyrdquo data in the traffic and pro-duces altogether poor training results In this study weremedied this by introducing the LSTM which takes thedata of a single connection (from initiation to disconnec-tion) as a group and judges the characteristics of all datapackets in said group and the relations among them as abasis to judge the nature of the traffic e natural languageprocessing model performs well in traffic informationprocessing [32] under a similar methodology as the groupingjudgment method proposed here

e LSTM section is composed of LSTM layer 6 LSTMlayer 7 full connection layer 8 and Softmax and outputlayers e main functions are realized by two LSTM layerse LSTM is a special RNN designed to resolve gradientdisappearance and gradient explosion problems in theprocess of long sequence training General RNN networksonly have one tanh layer while LSTM networks performbetter processing timing prediction through their uniqueforgetting and selective memory gates Here we call theLSTM node a cell (Ct) the input and output of which are xt

and ht respectivelye first step in the LSTM layer is to determine what

information the model will discard from the cell state isdecision is made through the forgetting gate (formula 7)egate reads htminus1 and xt and outputs a value between 0 and 1 toeach number in the Ctminus1 cell state 1 means ldquocompletelyretainedrdquo and 0 means ldquocompletely discardedrdquo W and b areweight and bias in the neural network respectively

ft σ Wf middot htminus1 xt1113858 1113859 + bf1113872 1113873 (7)

Input C1 S2 C3 S4S5 L6 L7 F8

Output

SomaxLSTMLSTM

Convolution MaxPoolbullbullbull

Convolution

MaxPoolGlobalMaxPool bullbullbull

Figure 4 Architecture of DL-IDS

6 Security and Communication Networks

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 3: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

[20] For certain network traffic of protocols CNNs canperform well through fast training Fan and Ling-zhi [21]extracted very accurate features by using a multilayer CNNwherein the convolution layer connects to the sampling layerbelow this model outperformed classical detection algo-rithms (eg SVM) on the KDD99 dataset However theCNN can only analyze a single input packagemdashit cannotanalyze timing information in a given traffic scenario Inreality a single packet in an attack traffic scenario is normaldata When a large number of packets are sent at the sametime or in a short period this packet becomes malicioustraffic e CNN does not apply in this situation which inpractice may lead to a large number of missed alerts

e recurrent neural network (RNN) is also often used toanalyze sequential information LSTM a branch of RNNsperforms well in sequence information analysis applicationssuch as natural language processing Kim et al [22] com-pared the LSTM-RNN network against Generalized Re-gression Neural Network (GRNN) Product-based NeuralNetwork (PNN) k-Nearest Neighbor (KNN) SVMBayesian and other algorithms on the KDD99 dataset to findthat it was superior in every aspect they tested e LSTMnetwork alone however centers on a direct relationshipbetween sequences rather than the analysis of a single packetso it cannot readily replace the CNN in this regard

Wu and Guo [23] proposed a hierarchical CNN+RNNneural network and carried out experiments on NSL-KDDand UNSW-NB15 datasets Hsu et al [24] and Ahsan andNygard [25] used another CNN+LSTM model to performmulticlassification experiments on the NSL-KDD datasetHassan et al [26] proposed a hybrid deep learning model todetect network intrusions based on CNN network and aweight-dropped long short-term memory (WDLSTM)network is paper mainly conducted experiments onUNSW-NB15 dataset However these studies are still basedon extracted features in advance

Abdulhammed et al [27] used Autoencoder and Prin-ciple Component Analysis to reduce the CICIDS2017datasetrsquos feature dimensions e resulting low-dimensionalfeatures from both techniques are then used to build variousclassifiers to detect malicious attacks Musafer et al [28]proposed a novel architecture of IDS based on advancedSparse Autoencoder and Random Forest to classify thepatterns of the normal packets from those of the networkattacks and got good results

In this study we adopted a malicious traffic analysismethod based on CNN and LSTM to extract and analyzenetwork traffic information of network raw dataset fromboth spatial and temporal dimensions We conductedtraining and testing based on the CICIDS2017 dataset thatwell simulates the real network environment We ran a seriesof experiments to show that the proposed model facilitatesvery effective malicious flow analysis

3 Datasets and Preprocessing

31 Dataset e IDS is the most important defense toolagainst complex and large-scale network attacks but the lackof available public dataset yet hinders its further

development Many researchers have used private datawithin a single company or conducted manual data col-lection to test IDS applications which affects the credibilityof their results to some extent Public datasets such asKDD99 and NSL-KDD [29] are comprised of dataencompassing manually selected stream characteristicsrather than original network traffic e timing of the datacollection is also outdated compared to modern attackmethods

In this study in an effort to best reflect real trafficscenarios in real networks as well as newer means of attackwe chose the CICIDS2017 dataset (Canadian Institute forCybersecurity) [30] which contains benign traffic and up-to-date common attack traffic representative of a real networkenvironment is dataset constructs the abstract behaviorof 25 users based on HTTP HTTPS FTP SSH and e-mailprotocols to accurately simulate a real network environmente data capture period was from 9 am on July 3 2017 to 5pm on July 7 2017 a total of 511 g data flow was generatedover this five-day periode attack traffic collected includeseight types of attack FTP-Patator SSH-Patator DoSHeartbleed Web Attack Infiltration Botnet and DDoS Asshown in Table 1 the attacks were carried out on TuesdayWednesday ursday and Friday morning and afternoonNormal traffic was generated throughout the day onMondayand during the nonaggressive period from Tuesday to Fri-day e data type for this dataset is a pcap file

After acquiring the dataset we analyzed the original dataand selected seven types of data for subsequent assessmentaccording to the amount of data and its noise rate ey areNormal FTP-Patator SSH-Patator DoS Heartbleed In-filtration and PortScan

32NetworkTraffic SegmentationMethod e format of theCICIDS2017 dataset is one pcap file per day ese pcap filescontain a great deal of information which is not conduciveto training the machineerefore the primary task of trafficclassification based on machine learning is to divide con-tinuous pcap files into several discrete units according to acertain granularity ere are six ways to slice networktraffic by TCP by connection by network flow by sessionby service class and by host When the original traffic data issegmented according to different methods it splits into quitedifferent forms so the selected network traffic segmentationmethod markedly influences the subsequent analysis

We adopted a session sharding method in this study Asession is any packet that consists of a bidirectional flow thatis any packet that has the same quad (source IP source portdestination IP destination port and transport layer pro-tocol) and interchangeable source and destination addressesand ports

33 Data Preprocessing Data preprocessing begins with theoriginal flow namely the data in pcap format for formattingthe model input data e CICIDS2017 dataset provides anoriginal dataset in pcap format and a CSV file detailing someof the traffic To transform the original data into the modelinput format we conducted time division traffic

Security and Communication Networks 3

segmentation PKL file generation PKL file labeling matrixgeneration and one_hot encoding A flow chart of thisprocess is given in Figure 1

Step 1 (time division) Time division refers to interceptingthe pcap file of the corresponding period from the originalpcap file according to the attack time and type [30] einput format is again a pcap file the output format is still apcap filee time periods corresponding to the specific typeand the size of the file are shown in Table 2

Step 2 (traffic segmentation) Traffic segmentation refers todividing the pcap file obtained in Step 1 into correspondingsessions by sharding according to the IP of attack host andvictim host corresponding to each time period [31] especific process is shown in Figure 2 is step involvesshredding the pcap file of Step 1 into the corresponding flowusing pkt2flow [31] which can split the pcap package intothe flow format (ie the original pcap package is divided intodifferent pcap packages according to the flow with differentfive-tuples) Next the pcap package is merged under thepremise that the source and destination are interchangeableFinally the pcap package of Step 1 is divided into sessions

Step 3 (generate the PKL file) and Step 4 (tag the PKL file) Asshown in Table 1 the pcap file is still large in size afterextraction this creates a serious challenge for data reading inthe model To accelerate the data reading process wepackaged the traffic using the pickle tool in Python We usePortScan type traffic as an example of the packaging processhere In this class many sessions are generated after Step 2each of which is saved in a pcap file We generated a label ofthe corresponding attack type for each session Each sessioncontains several data flows and each data flow contains an n1packet We then saved the n2 sessions in a PKL file to speedup the process of reading the data n1 can be changed asneeded according to the experimental results we finallyselected the best value that is n1 8 e value of n2 can becalculated by formula (1) In this case we packaged each typeof sessions into a PKL filee structure of the entire PKL fileis shown in Figure 3

n2 total number of packets of this type

n1 (1)

Step 5 (matrix generation) e input of the model musthave a fixed length so the next step is to unify the lengthof each session e difference between each attack ismainly in the header so we dealt with the packet

according to the uniform length of PACKET_LEN bytesthat is if the packet length was greater than PACK-ET_LEN then bytes were intercepted and if the packetlength was less than PACKET_LEN bytes were filledwith -1 Each session is then divided into a matrixMAX_PACKET_NUM_PER_SESSIONlowastPACKET_LENAccording to the results of our experiment we finallychose MAX_PACKET_NUM_PER_SESSION as 8 andPACKET_LEN as 40

Step 6 (one_hot encoding) To effectively learn and classifythe model the data from Step 5 are processed by one_hotencoding to convert qualitative features into quantitativefeatures

Table 1 CICIDS2017 dataset

Date Type SizeMonday Normal 110GBTuesday Normal + Force + SFTP+ SSH 110GBWednesday Normal +Dos +Heartbleed Attacks 13GBursday Normal +XSS +Web Attack + Infiltration 78GBFriday Normal + Botnet + PortScan +DDoS 83GB

pacp files

Unify the length

Time division

Traffic segmentation

Generate PKL files

Label PKL files

Generate the matrix

One_hot encoding

Input into the model

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Figure 1 Data preprocessing process

Table 2 Distribution of network traffic periods in CICIDS2017dataset

Attack Time SizeNormal Monday 11GBFTP-Patator Tuesday (9 20ndash10 20) 12MBSSH-Patator Tuesday (14 00ndash15 00) 25MBDoS Wednesday (9 47ndash11 23) 23GBHeartbleed Wednesday (15 12ndash15 32) 79MBInfiltration ursday (14 19ndash15 45) 21MBPortScan Friday (13 55ndash14 35) 31MB

4 Security and Communication Networks

ohei(B num) num B i

0 Bne i1113896

OHEj ohej1 ohej(Length)1113872 1113873 ohe1 oplus oplus oheLength

Inputk

OHE1

OHE2

OHEN

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(2)

where B is a byte in the data packet num is the number usedfor encoding in one_hot encoding In the model imple-mentation num 1 ohei is a bit in the one_hot encoding of abyte oplus is the series notation and OHEj is the one_hotencoding of a byte

4 DL-IDS Architecture

is section introduces the traffic classifier we built into DL-IDS which uses a combination of CNN and LSTM to learnand classify traffic packets in both time and space Accordingto the different characteristics of different types of traffic wealso used the weight of classes to improve the stability of themodel

e overall architecture of the classifier is shown inFigure 4 e classifier is composed of CNN and LSTMe CNN section is composed of an input and embeddedlayer convolution layer 1 pooling layer 2 convolutionlayer 3 pooling layer 4 and full connection layer 5 Uponreceiving a preprocessed PKL file the CNN sectionprocesses it and returns a high-dimensional packagevector to the LSTM section e LSTM section is

composed of the LSTM layer 6 LSTM layer 7 full con-nection layer 8 and the OUTPUT layer It can process aseries of high-dimensional package vectors and output avector that represents the probability that the sessionbelongs to each class e Softmax layer outputs the finalresult of the classification according to the vector ofprobability

41 CNN in DL-IDS We converted the data packets ob-tained from the preprocessed data into a traffic image eso-called traffic image is a combination of all or part of thebit values of a network traffic packet into a two-dimensionalmatrix e data in the network traffic packet is composed ofbytes e value range of the bytes is 0ndash255 which is thesame as the value range of the bytes in images We took the xbyte in the header of a packet and the y byte in the payloadand composed them into a traffic image for subsequentprocessing as discussed below

As mentioned above the CNN section is composed ofinput and embedded layers convolution layer 1 poolinglayer 2 convolution layer 3 pooling layer 4 and fullconnection layer 5 We combined convolution layer 1 andpooling layer 2 into Combination I and convolution layer 3and pooling layer 4 into Combination II Each Combi-nation allows for the analysis of input layer characteristicsfrom different prospective In Combination I a convo-lution layer with a small convolution kernel is used toextract local features of traffic image details (eg IP andPort) Clear-cut features and stable results can be obtainedin the pooling layer In Combination II a large convo-lution kernel is used to analyze the relationship betweentwo bits that are far apart such as information in the trafficpayload

After preprocessing and one_hot coding the networktraffic constitutes the input vector of the input layer In theinput layer length information is intercepted from the ithpacket Pkgi (B1 B2 BLength) followed by synthesis S ofn Pkgs information set S (Pkg1 Pkg2 Pkgn)

Formula 3 shows a convolution layer where f is the sidelength of the convolution kernel In the two convolutionlayers f 7 and f 5 s is stride p is padding b is bias w isweight c is channel l is layer Ll is the size of Zl and Z(i j) isthe pixel of the corresponding feature map Additionally(i j) isin 0 1 2 Ll+11113864 1113865Ll+1 ((Ll + 2p minus f)s) + 1

PKL file

Session

Flow

Packet

A format forstoring files in

python fordirect code

reading

Packagesconsisting of

two-wayflows

Packages withthe samequintile

Data units in TCPIPprotocol

communication andtransmission

Figure 3 Structure of entire PKL file

Output

pcap files from Step 1

Traffic to flow

Merge the flow

Figure 2 Traffic segmentation process

Security and Communication Networks 5

Zl+1(i j) 1113944c

k11113944

f

x11113944

f

y1Z1

k(slowast i + x slowast j + y)lowastwl+1k (x y)1113960 1113961 + b

(3)

e convolution layer contains an activation function(formula 4) that assists in the expression of complex featuresK is the number of channels in the characteristic graph and Arepresents the output vector of the Z vector through theactivation function We used sigmoid and ReLU respec-tively after two convolution layers

A1ijk f Z1

ijk1113872 1113873 (4)

After feature extraction in the convolution layer theoutput image is transferred to the pooling layer for featureselection and information filtering e pooling layer con-tains a preset pooling function that replaces the result of asingle point in the feature map with the feature graphstatistics of its adjacent region e pooling layer is calcu-lated by formula 5 where p is the prespecified parameterWeapplied the maximum pooling in this study that isp⟶infin

Alk(i j) 1113944

f

x11113944

f

y1Al

k(slowast i + x slowast j + y)p⎡⎢⎢⎣ ⎤⎥⎥⎦

1p

(5)

We also used a back-propagation algorithm to adjust themodel parameters In the weight adjustment algorithm(formula 6) δ is delta error of loss function to the layer andα is the learning rate

wl wl minus α1113944 δl lowastAlminus1

(6)

We used the categorical cross-entropy algorithm in theloss function In order to reduce training time and enhancethe gradient descent accuracy we used the RmsProp opti-mization function to adjust the learning rate

After two convolution and pooling operations weextracted the entire traffic image into a smaller feature blockwhich represents the feature information of the whole trafficpacket e block can then be fed into the RNN system as aninput to the RNN layer

42 LSTM in DL-IDS Normal network communicationsand network attacks both are carried out according to acertain network protocol is means that attack packetsmust be ensconced in traffic alongside packets containingfixed parts of the network protocol such as normal con-nection establishments key exchanges connections anddisconnections In the normal portion of the attack trafficno data can be used to determine whether the packet isintended to cause an attack Using a CNN alone to train thecharacteristics of a single packet as the basis for the system tojudge the nature of the traffic makes the data difficult tomark leaves too much ldquodirtyrdquo data in the traffic and pro-duces altogether poor training results In this study weremedied this by introducing the LSTM which takes thedata of a single connection (from initiation to disconnec-tion) as a group and judges the characteristics of all datapackets in said group and the relations among them as abasis to judge the nature of the traffic e natural languageprocessing model performs well in traffic informationprocessing [32] under a similar methodology as the groupingjudgment method proposed here

e LSTM section is composed of LSTM layer 6 LSTMlayer 7 full connection layer 8 and Softmax and outputlayers e main functions are realized by two LSTM layerse LSTM is a special RNN designed to resolve gradientdisappearance and gradient explosion problems in theprocess of long sequence training General RNN networksonly have one tanh layer while LSTM networks performbetter processing timing prediction through their uniqueforgetting and selective memory gates Here we call theLSTM node a cell (Ct) the input and output of which are xt

and ht respectivelye first step in the LSTM layer is to determine what

information the model will discard from the cell state isdecision is made through the forgetting gate (formula 7)egate reads htminus1 and xt and outputs a value between 0 and 1 toeach number in the Ctminus1 cell state 1 means ldquocompletelyretainedrdquo and 0 means ldquocompletely discardedrdquo W and b areweight and bias in the neural network respectively

ft σ Wf middot htminus1 xt1113858 1113859 + bf1113872 1113873 (7)

Input C1 S2 C3 S4S5 L6 L7 F8

Output

SomaxLSTMLSTM

Convolution MaxPoolbullbullbull

Convolution

MaxPoolGlobalMaxPool bullbullbull

Figure 4 Architecture of DL-IDS

6 Security and Communication Networks

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 4: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

segmentation PKL file generation PKL file labeling matrixgeneration and one_hot encoding A flow chart of thisprocess is given in Figure 1

Step 1 (time division) Time division refers to interceptingthe pcap file of the corresponding period from the originalpcap file according to the attack time and type [30] einput format is again a pcap file the output format is still apcap filee time periods corresponding to the specific typeand the size of the file are shown in Table 2

Step 2 (traffic segmentation) Traffic segmentation refers todividing the pcap file obtained in Step 1 into correspondingsessions by sharding according to the IP of attack host andvictim host corresponding to each time period [31] especific process is shown in Figure 2 is step involvesshredding the pcap file of Step 1 into the corresponding flowusing pkt2flow [31] which can split the pcap package intothe flow format (ie the original pcap package is divided intodifferent pcap packages according to the flow with differentfive-tuples) Next the pcap package is merged under thepremise that the source and destination are interchangeableFinally the pcap package of Step 1 is divided into sessions

Step 3 (generate the PKL file) and Step 4 (tag the PKL file) Asshown in Table 1 the pcap file is still large in size afterextraction this creates a serious challenge for data reading inthe model To accelerate the data reading process wepackaged the traffic using the pickle tool in Python We usePortScan type traffic as an example of the packaging processhere In this class many sessions are generated after Step 2each of which is saved in a pcap file We generated a label ofthe corresponding attack type for each session Each sessioncontains several data flows and each data flow contains an n1packet We then saved the n2 sessions in a PKL file to speedup the process of reading the data n1 can be changed asneeded according to the experimental results we finallyselected the best value that is n1 8 e value of n2 can becalculated by formula (1) In this case we packaged each typeof sessions into a PKL filee structure of the entire PKL fileis shown in Figure 3

n2 total number of packets of this type

n1 (1)

Step 5 (matrix generation) e input of the model musthave a fixed length so the next step is to unify the lengthof each session e difference between each attack ismainly in the header so we dealt with the packet

according to the uniform length of PACKET_LEN bytesthat is if the packet length was greater than PACK-ET_LEN then bytes were intercepted and if the packetlength was less than PACKET_LEN bytes were filledwith -1 Each session is then divided into a matrixMAX_PACKET_NUM_PER_SESSIONlowastPACKET_LENAccording to the results of our experiment we finallychose MAX_PACKET_NUM_PER_SESSION as 8 andPACKET_LEN as 40

Step 6 (one_hot encoding) To effectively learn and classifythe model the data from Step 5 are processed by one_hotencoding to convert qualitative features into quantitativefeatures

Table 1 CICIDS2017 dataset

Date Type SizeMonday Normal 110GBTuesday Normal + Force + SFTP+ SSH 110GBWednesday Normal +Dos +Heartbleed Attacks 13GBursday Normal +XSS +Web Attack + Infiltration 78GBFriday Normal + Botnet + PortScan +DDoS 83GB

pacp files

Unify the length

Time division

Traffic segmentation

Generate PKL files

Label PKL files

Generate the matrix

One_hot encoding

Input into the model

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Figure 1 Data preprocessing process

Table 2 Distribution of network traffic periods in CICIDS2017dataset

Attack Time SizeNormal Monday 11GBFTP-Patator Tuesday (9 20ndash10 20) 12MBSSH-Patator Tuesday (14 00ndash15 00) 25MBDoS Wednesday (9 47ndash11 23) 23GBHeartbleed Wednesday (15 12ndash15 32) 79MBInfiltration ursday (14 19ndash15 45) 21MBPortScan Friday (13 55ndash14 35) 31MB

4 Security and Communication Networks

ohei(B num) num B i

0 Bne i1113896

OHEj ohej1 ohej(Length)1113872 1113873 ohe1 oplus oplus oheLength

Inputk

OHE1

OHE2

OHEN

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(2)

where B is a byte in the data packet num is the number usedfor encoding in one_hot encoding In the model imple-mentation num 1 ohei is a bit in the one_hot encoding of abyte oplus is the series notation and OHEj is the one_hotencoding of a byte

4 DL-IDS Architecture

is section introduces the traffic classifier we built into DL-IDS which uses a combination of CNN and LSTM to learnand classify traffic packets in both time and space Accordingto the different characteristics of different types of traffic wealso used the weight of classes to improve the stability of themodel

e overall architecture of the classifier is shown inFigure 4 e classifier is composed of CNN and LSTMe CNN section is composed of an input and embeddedlayer convolution layer 1 pooling layer 2 convolutionlayer 3 pooling layer 4 and full connection layer 5 Uponreceiving a preprocessed PKL file the CNN sectionprocesses it and returns a high-dimensional packagevector to the LSTM section e LSTM section is

composed of the LSTM layer 6 LSTM layer 7 full con-nection layer 8 and the OUTPUT layer It can process aseries of high-dimensional package vectors and output avector that represents the probability that the sessionbelongs to each class e Softmax layer outputs the finalresult of the classification according to the vector ofprobability

41 CNN in DL-IDS We converted the data packets ob-tained from the preprocessed data into a traffic image eso-called traffic image is a combination of all or part of thebit values of a network traffic packet into a two-dimensionalmatrix e data in the network traffic packet is composed ofbytes e value range of the bytes is 0ndash255 which is thesame as the value range of the bytes in images We took the xbyte in the header of a packet and the y byte in the payloadand composed them into a traffic image for subsequentprocessing as discussed below

As mentioned above the CNN section is composed ofinput and embedded layers convolution layer 1 poolinglayer 2 convolution layer 3 pooling layer 4 and fullconnection layer 5 We combined convolution layer 1 andpooling layer 2 into Combination I and convolution layer 3and pooling layer 4 into Combination II Each Combi-nation allows for the analysis of input layer characteristicsfrom different prospective In Combination I a convo-lution layer with a small convolution kernel is used toextract local features of traffic image details (eg IP andPort) Clear-cut features and stable results can be obtainedin the pooling layer In Combination II a large convo-lution kernel is used to analyze the relationship betweentwo bits that are far apart such as information in the trafficpayload

After preprocessing and one_hot coding the networktraffic constitutes the input vector of the input layer In theinput layer length information is intercepted from the ithpacket Pkgi (B1 B2 BLength) followed by synthesis S ofn Pkgs information set S (Pkg1 Pkg2 Pkgn)

Formula 3 shows a convolution layer where f is the sidelength of the convolution kernel In the two convolutionlayers f 7 and f 5 s is stride p is padding b is bias w isweight c is channel l is layer Ll is the size of Zl and Z(i j) isthe pixel of the corresponding feature map Additionally(i j) isin 0 1 2 Ll+11113864 1113865Ll+1 ((Ll + 2p minus f)s) + 1

PKL file

Session

Flow

Packet

A format forstoring files in

python fordirect code

reading

Packagesconsisting of

two-wayflows

Packages withthe samequintile

Data units in TCPIPprotocol

communication andtransmission

Figure 3 Structure of entire PKL file

Output

pcap files from Step 1

Traffic to flow

Merge the flow

Figure 2 Traffic segmentation process

Security and Communication Networks 5

Zl+1(i j) 1113944c

k11113944

f

x11113944

f

y1Z1

k(slowast i + x slowast j + y)lowastwl+1k (x y)1113960 1113961 + b

(3)

e convolution layer contains an activation function(formula 4) that assists in the expression of complex featuresK is the number of channels in the characteristic graph and Arepresents the output vector of the Z vector through theactivation function We used sigmoid and ReLU respec-tively after two convolution layers

A1ijk f Z1

ijk1113872 1113873 (4)

After feature extraction in the convolution layer theoutput image is transferred to the pooling layer for featureselection and information filtering e pooling layer con-tains a preset pooling function that replaces the result of asingle point in the feature map with the feature graphstatistics of its adjacent region e pooling layer is calcu-lated by formula 5 where p is the prespecified parameterWeapplied the maximum pooling in this study that isp⟶infin

Alk(i j) 1113944

f

x11113944

f

y1Al

k(slowast i + x slowast j + y)p⎡⎢⎢⎣ ⎤⎥⎥⎦

1p

(5)

We also used a back-propagation algorithm to adjust themodel parameters In the weight adjustment algorithm(formula 6) δ is delta error of loss function to the layer andα is the learning rate

wl wl minus α1113944 δl lowastAlminus1

(6)

We used the categorical cross-entropy algorithm in theloss function In order to reduce training time and enhancethe gradient descent accuracy we used the RmsProp opti-mization function to adjust the learning rate

After two convolution and pooling operations weextracted the entire traffic image into a smaller feature blockwhich represents the feature information of the whole trafficpacket e block can then be fed into the RNN system as aninput to the RNN layer

42 LSTM in DL-IDS Normal network communicationsand network attacks both are carried out according to acertain network protocol is means that attack packetsmust be ensconced in traffic alongside packets containingfixed parts of the network protocol such as normal con-nection establishments key exchanges connections anddisconnections In the normal portion of the attack trafficno data can be used to determine whether the packet isintended to cause an attack Using a CNN alone to train thecharacteristics of a single packet as the basis for the system tojudge the nature of the traffic makes the data difficult tomark leaves too much ldquodirtyrdquo data in the traffic and pro-duces altogether poor training results In this study weremedied this by introducing the LSTM which takes thedata of a single connection (from initiation to disconnec-tion) as a group and judges the characteristics of all datapackets in said group and the relations among them as abasis to judge the nature of the traffic e natural languageprocessing model performs well in traffic informationprocessing [32] under a similar methodology as the groupingjudgment method proposed here

e LSTM section is composed of LSTM layer 6 LSTMlayer 7 full connection layer 8 and Softmax and outputlayers e main functions are realized by two LSTM layerse LSTM is a special RNN designed to resolve gradientdisappearance and gradient explosion problems in theprocess of long sequence training General RNN networksonly have one tanh layer while LSTM networks performbetter processing timing prediction through their uniqueforgetting and selective memory gates Here we call theLSTM node a cell (Ct) the input and output of which are xt

and ht respectivelye first step in the LSTM layer is to determine what

information the model will discard from the cell state isdecision is made through the forgetting gate (formula 7)egate reads htminus1 and xt and outputs a value between 0 and 1 toeach number in the Ctminus1 cell state 1 means ldquocompletelyretainedrdquo and 0 means ldquocompletely discardedrdquo W and b areweight and bias in the neural network respectively

ft σ Wf middot htminus1 xt1113858 1113859 + bf1113872 1113873 (7)

Input C1 S2 C3 S4S5 L6 L7 F8

Output

SomaxLSTMLSTM

Convolution MaxPoolbullbullbull

Convolution

MaxPoolGlobalMaxPool bullbullbull

Figure 4 Architecture of DL-IDS

6 Security and Communication Networks

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 5: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

ohei(B num) num B i

0 Bne i1113896

OHEj ohej1 ohej(Length)1113872 1113873 ohe1 oplus oplus oheLength

Inputk

OHE1

OHE2

OHEN

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(2)

where B is a byte in the data packet num is the number usedfor encoding in one_hot encoding In the model imple-mentation num 1 ohei is a bit in the one_hot encoding of abyte oplus is the series notation and OHEj is the one_hotencoding of a byte

4 DL-IDS Architecture

is section introduces the traffic classifier we built into DL-IDS which uses a combination of CNN and LSTM to learnand classify traffic packets in both time and space Accordingto the different characteristics of different types of traffic wealso used the weight of classes to improve the stability of themodel

e overall architecture of the classifier is shown inFigure 4 e classifier is composed of CNN and LSTMe CNN section is composed of an input and embeddedlayer convolution layer 1 pooling layer 2 convolutionlayer 3 pooling layer 4 and full connection layer 5 Uponreceiving a preprocessed PKL file the CNN sectionprocesses it and returns a high-dimensional packagevector to the LSTM section e LSTM section is

composed of the LSTM layer 6 LSTM layer 7 full con-nection layer 8 and the OUTPUT layer It can process aseries of high-dimensional package vectors and output avector that represents the probability that the sessionbelongs to each class e Softmax layer outputs the finalresult of the classification according to the vector ofprobability

41 CNN in DL-IDS We converted the data packets ob-tained from the preprocessed data into a traffic image eso-called traffic image is a combination of all or part of thebit values of a network traffic packet into a two-dimensionalmatrix e data in the network traffic packet is composed ofbytes e value range of the bytes is 0ndash255 which is thesame as the value range of the bytes in images We took the xbyte in the header of a packet and the y byte in the payloadand composed them into a traffic image for subsequentprocessing as discussed below

As mentioned above the CNN section is composed ofinput and embedded layers convolution layer 1 poolinglayer 2 convolution layer 3 pooling layer 4 and fullconnection layer 5 We combined convolution layer 1 andpooling layer 2 into Combination I and convolution layer 3and pooling layer 4 into Combination II Each Combi-nation allows for the analysis of input layer characteristicsfrom different prospective In Combination I a convo-lution layer with a small convolution kernel is used toextract local features of traffic image details (eg IP andPort) Clear-cut features and stable results can be obtainedin the pooling layer In Combination II a large convo-lution kernel is used to analyze the relationship betweentwo bits that are far apart such as information in the trafficpayload

After preprocessing and one_hot coding the networktraffic constitutes the input vector of the input layer In theinput layer length information is intercepted from the ithpacket Pkgi (B1 B2 BLength) followed by synthesis S ofn Pkgs information set S (Pkg1 Pkg2 Pkgn)

Formula 3 shows a convolution layer where f is the sidelength of the convolution kernel In the two convolutionlayers f 7 and f 5 s is stride p is padding b is bias w isweight c is channel l is layer Ll is the size of Zl and Z(i j) isthe pixel of the corresponding feature map Additionally(i j) isin 0 1 2 Ll+11113864 1113865Ll+1 ((Ll + 2p minus f)s) + 1

PKL file

Session

Flow

Packet

A format forstoring files in

python fordirect code

reading

Packagesconsisting of

two-wayflows

Packages withthe samequintile

Data units in TCPIPprotocol

communication andtransmission

Figure 3 Structure of entire PKL file

Output

pcap files from Step 1

Traffic to flow

Merge the flow

Figure 2 Traffic segmentation process

Security and Communication Networks 5

Zl+1(i j) 1113944c

k11113944

f

x11113944

f

y1Z1

k(slowast i + x slowast j + y)lowastwl+1k (x y)1113960 1113961 + b

(3)

e convolution layer contains an activation function(formula 4) that assists in the expression of complex featuresK is the number of channels in the characteristic graph and Arepresents the output vector of the Z vector through theactivation function We used sigmoid and ReLU respec-tively after two convolution layers

A1ijk f Z1

ijk1113872 1113873 (4)

After feature extraction in the convolution layer theoutput image is transferred to the pooling layer for featureselection and information filtering e pooling layer con-tains a preset pooling function that replaces the result of asingle point in the feature map with the feature graphstatistics of its adjacent region e pooling layer is calcu-lated by formula 5 where p is the prespecified parameterWeapplied the maximum pooling in this study that isp⟶infin

Alk(i j) 1113944

f

x11113944

f

y1Al

k(slowast i + x slowast j + y)p⎡⎢⎢⎣ ⎤⎥⎥⎦

1p

(5)

We also used a back-propagation algorithm to adjust themodel parameters In the weight adjustment algorithm(formula 6) δ is delta error of loss function to the layer andα is the learning rate

wl wl minus α1113944 δl lowastAlminus1

(6)

We used the categorical cross-entropy algorithm in theloss function In order to reduce training time and enhancethe gradient descent accuracy we used the RmsProp opti-mization function to adjust the learning rate

After two convolution and pooling operations weextracted the entire traffic image into a smaller feature blockwhich represents the feature information of the whole trafficpacket e block can then be fed into the RNN system as aninput to the RNN layer

42 LSTM in DL-IDS Normal network communicationsand network attacks both are carried out according to acertain network protocol is means that attack packetsmust be ensconced in traffic alongside packets containingfixed parts of the network protocol such as normal con-nection establishments key exchanges connections anddisconnections In the normal portion of the attack trafficno data can be used to determine whether the packet isintended to cause an attack Using a CNN alone to train thecharacteristics of a single packet as the basis for the system tojudge the nature of the traffic makes the data difficult tomark leaves too much ldquodirtyrdquo data in the traffic and pro-duces altogether poor training results In this study weremedied this by introducing the LSTM which takes thedata of a single connection (from initiation to disconnec-tion) as a group and judges the characteristics of all datapackets in said group and the relations among them as abasis to judge the nature of the traffic e natural languageprocessing model performs well in traffic informationprocessing [32] under a similar methodology as the groupingjudgment method proposed here

e LSTM section is composed of LSTM layer 6 LSTMlayer 7 full connection layer 8 and Softmax and outputlayers e main functions are realized by two LSTM layerse LSTM is a special RNN designed to resolve gradientdisappearance and gradient explosion problems in theprocess of long sequence training General RNN networksonly have one tanh layer while LSTM networks performbetter processing timing prediction through their uniqueforgetting and selective memory gates Here we call theLSTM node a cell (Ct) the input and output of which are xt

and ht respectivelye first step in the LSTM layer is to determine what

information the model will discard from the cell state isdecision is made through the forgetting gate (formula 7)egate reads htminus1 and xt and outputs a value between 0 and 1 toeach number in the Ctminus1 cell state 1 means ldquocompletelyretainedrdquo and 0 means ldquocompletely discardedrdquo W and b areweight and bias in the neural network respectively

ft σ Wf middot htminus1 xt1113858 1113859 + bf1113872 1113873 (7)

Input C1 S2 C3 S4S5 L6 L7 F8

Output

SomaxLSTMLSTM

Convolution MaxPoolbullbullbull

Convolution

MaxPoolGlobalMaxPool bullbullbull

Figure 4 Architecture of DL-IDS

6 Security and Communication Networks

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 6: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

Zl+1(i j) 1113944c

k11113944

f

x11113944

f

y1Z1

k(slowast i + x slowast j + y)lowastwl+1k (x y)1113960 1113961 + b

(3)

e convolution layer contains an activation function(formula 4) that assists in the expression of complex featuresK is the number of channels in the characteristic graph and Arepresents the output vector of the Z vector through theactivation function We used sigmoid and ReLU respec-tively after two convolution layers

A1ijk f Z1

ijk1113872 1113873 (4)

After feature extraction in the convolution layer theoutput image is transferred to the pooling layer for featureselection and information filtering e pooling layer con-tains a preset pooling function that replaces the result of asingle point in the feature map with the feature graphstatistics of its adjacent region e pooling layer is calcu-lated by formula 5 where p is the prespecified parameterWeapplied the maximum pooling in this study that isp⟶infin

Alk(i j) 1113944

f

x11113944

f

y1Al

k(slowast i + x slowast j + y)p⎡⎢⎢⎣ ⎤⎥⎥⎦

1p

(5)

We also used a back-propagation algorithm to adjust themodel parameters In the weight adjustment algorithm(formula 6) δ is delta error of loss function to the layer andα is the learning rate

wl wl minus α1113944 δl lowastAlminus1

(6)

We used the categorical cross-entropy algorithm in theloss function In order to reduce training time and enhancethe gradient descent accuracy we used the RmsProp opti-mization function to adjust the learning rate

After two convolution and pooling operations weextracted the entire traffic image into a smaller feature blockwhich represents the feature information of the whole trafficpacket e block can then be fed into the RNN system as aninput to the RNN layer

42 LSTM in DL-IDS Normal network communicationsand network attacks both are carried out according to acertain network protocol is means that attack packetsmust be ensconced in traffic alongside packets containingfixed parts of the network protocol such as normal con-nection establishments key exchanges connections anddisconnections In the normal portion of the attack trafficno data can be used to determine whether the packet isintended to cause an attack Using a CNN alone to train thecharacteristics of a single packet as the basis for the system tojudge the nature of the traffic makes the data difficult tomark leaves too much ldquodirtyrdquo data in the traffic and pro-duces altogether poor training results In this study weremedied this by introducing the LSTM which takes thedata of a single connection (from initiation to disconnec-tion) as a group and judges the characteristics of all datapackets in said group and the relations among them as abasis to judge the nature of the traffic e natural languageprocessing model performs well in traffic informationprocessing [32] under a similar methodology as the groupingjudgment method proposed here

e LSTM section is composed of LSTM layer 6 LSTMlayer 7 full connection layer 8 and Softmax and outputlayers e main functions are realized by two LSTM layerse LSTM is a special RNN designed to resolve gradientdisappearance and gradient explosion problems in theprocess of long sequence training General RNN networksonly have one tanh layer while LSTM networks performbetter processing timing prediction through their uniqueforgetting and selective memory gates Here we call theLSTM node a cell (Ct) the input and output of which are xt

and ht respectivelye first step in the LSTM layer is to determine what

information the model will discard from the cell state isdecision is made through the forgetting gate (formula 7)egate reads htminus1 and xt and outputs a value between 0 and 1 toeach number in the Ctminus1 cell state 1 means ldquocompletelyretainedrdquo and 0 means ldquocompletely discardedrdquo W and b areweight and bias in the neural network respectively

ft σ Wf middot htminus1 xt1113858 1113859 + bf1113872 1113873 (7)

Input C1 S2 C3 S4S5 L6 L7 F8

Output

SomaxLSTMLSTM

Convolution MaxPoolbullbullbull

Convolution

MaxPoolGlobalMaxPool bullbullbull

Figure 4 Architecture of DL-IDS

6 Security and Communication Networks

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 7: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

e next step is to decide how much new information toadd to the cell state First a sigmoid layer determines whichinformation needs to be updated (formula 8) A tanh layergenerates a vector as an alternative for updating (formula 9)e two parts are then combined to make an update to thestate of the cell (formula 10)

it σ Wi middot htminus1 xt1113858 1113859 + bi( 1113857 (8)

1113957Ct tan h Wc middot htminus1 xt1113858 1113859 + bc( 1113857 (9)

Ct ft lowastCtminus1 + it lowast 1113957Ct (10)

e output gate determines the output of the cell First asigmoid layer determines which parts of the cell state areexported (formula 11) Next the cell state is processedthrough a tanh function to obtain a value between minus1 and 1and then multiplied by the output of the sigmoid gate eoutput is determined accordingly (formula 12)

ot σ Wo middot htminus1 xt1113858 1113859 + bo( 1113857 (11)

ht ot lowast tan h Ct( 1113857 (12)

In the proposed model the feature maps of n data packetsin a group of traffic images in a connection serve as the input ofthe LSTM section e feature relations between these n datapackets were analyzed through the two LSTM layers e firstfew packets may be used to establish connections such packetsmay exist in the normal data streamsbut they may occur in theattack data streams tooe next few packets may contain longpayloads as well as attack data e LSTM finds the groupscontaining attack data and marks all packets of those wholegroups as attack groups

LSTM layer 6 in DL-IDS has a linear activation functiondesigned to minimize the training time LSTM layer 7 isnonlinearly activated through the ReLU function e flowcomprises a multiclassification system so the model istrained to minimize multiclass cross-entropy We did notupdate the ownership weight at every step but instead onlyneeded to add the initial weight according to the volume ofvarious types of data

43Weight of Classes e data obtained after preprocessingis shown in Table 3 where clearly the quantities (ldquonum-bersrdquo) of different data types are unevene number of type0 is the highest while those of types 2 and 4 are the lowestis may affect the final learning outcome of the classifi-cation For example if the machine were to judge all thetraffic as type 0 the accuracy of the model would seem to berelatively high We introduced the weights of classes toresolve this problem classes with different sample numbersin the classification were given different weights class_-weight is set according to the number of samples andclass weight[i] is used instead of 1 to punish the errors in theclass [i] samples A higher class_weight means a greateremphasis on the class Compared with the case withoutconsidering the weight more samples are classified intohigh-weight classes

e class weight is calculated via formula 13 where wi

represents the class weight of class i and ni represents theamount of traffic of class i

wi 1113936

Kminus1i0 ni

ni

(13)

When training the model the weighted loss function informula 14 makes the model focus more on samples fromunderrepresented classes K is the number of categories y isthe label (if the sample category is i then yi 1 otherwiseyi 0) and p is the output of the neural network which isthe probability that the model predicts that the category is iand is calculated by Softmax in this model Loss function J isdefined as follows

J minus 1113944Kminus1

i0wiyilog pi( 1113857 (14)

5 Experimental Results and Analysis

We evaluated the performance of the proposed model on theCICIDS2017 dataset using a series of selected parameters (1)the impact of the length of data packets involved in training(2) the influence of the number of packets in each flow (3)the impact of the selected batch size (4) the effect of thenumber of units in LSTM and (5) the influence of the weightof classes We optimized the DL-IDS parameters accordinglyand then compared them against a sole CNN and a soleLSTM e ratio of Train set Validation set and Test set is18 1 1

51 Metrics We adopted four commonly used parametersfor evaluating intrusion detection systems accuracy (ACC)true positive rate (TPR) false positive rate (FPR) and F1-score ACC represents the overall performance of the modelTPR represents the ratio of the real positive sample in thecurrent positive sample to all positive samples and FPRrepresents the ratio of the real negative sample wronglyassigned to the positive sample type to the total number of allnegative samples Recall represents the number of TruePositives divided by the number of True Positives and thenumber of False Negatives Precision represents the numberof positive predictions divided by the total number ofpositive class values predicted F1-score is a score of aclassifierrsquos accuracy and is defined as the weighted harmonicmean of the Precision and Recall measures of the classifier

Table 3 Quantity of data per category in CICIDS2017

Label Attack Num0 Normal 4775841 FTP-Patator 118702 SSH-Patator 74283 DoS 632404 Heartbleed 47555 Infiltration 645586 PortScan 160002

Security and Communication Networks 7

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 8: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

ACC TP + TN

TP + FP + FN + TNlowast 100

TPR TP

TP + FNlowast 100

FPR FP

FP + TNlowast 100

Precision TP

TP + FPlowast 100

Recall TP

TP + FNlowast 100

F1 minus score 2lowastPrecisionlowastRecallPrecision + Recall

lowast 100

(15)

For each type of attack TP is the number of samplescorrectly classified as this type TN is the number ofsamples correctly classified as not this type FP is thenumber of samples incorrectly classified as this type andFN is the number of samples incorrectly classified as notthis typee definitions of TP TN FP and FN are given inFigure 5

52 Experimental Environment e experimental configu-ration we used to evaluate the model parameters is describedin Table 4

53 LSTM Unit Quantity Effects on Model Performancee number of units in the LSTM represents the modelrsquosoutput dimension In our experiments we found thatmodel performance is first enhanced and then begins todecline as the number of LSTM units continually increasesWe ultimately selected 85 as the optimal number of LSTMunits

54 Training Packet Length Effects on Model PerformanceFigure 6 shows the changes in ACC TPR and FPR withincrease in the length of packets extracted during trainingAs per the training results model performance significantlydeclines when the package length exceeds 70 It is possiblethat excessively long training data packets increase theproportion of data packets smaller than the current packetlength leading to an increase in the proportion of units witha median value of minus1 and thus reducing the accuracy of themodel However the data packet must exceed a certainlength to ensure that effective credible and scientificcontent is put into training is also prevents overfittingeffects and provides more accurate classification ability fordata packets with partial header similarity We found that alength of 40 is optimal

Under the condition that the packet length is 40 theefficiency and performance of the DL-IDS intrusion

detection system in identifying various kinds of traffic areshown in Table 5

55 Per-Flow Packet Quantity Effects on Model PerformanceAs the number of data packets in each flow involved in thetraining process increases the features extracted by themodel become more obvious and the recognition accuracy

12 0 0 0 0 0 0 0

00

0

0 0 0

00

0 0

00

00 0

0

0 0

8

11

11

1 3

3

3

2

2

00000

0 0100

0

0 0 0 0

000000

0

0

0 0 0 0 00

0 0

0 0

0

0

0 0

0 0

0

0

00

0

0

0

0

0

00

0

10

10

19

19

15

15

13

17

2

1

7

True negatives

False negatives

False

pos

itive

s

TN

FP

TPFN

TN

TN

Actu

al

Predicted

Figure 5 TP TN FP and FN in multiple classifications

Table 4 Experimental environmentOS CentOS Linux release 751804CPU Intel (R) Xeon (R) CPU E5-2620 v3 240GHzRAM 126GBAnaconda 4511Keras 222Python 365

100

80

60

40

20

0

ACC

TPR

FPR

rate

30 40 50 60 70 80 90 100Packet length

TPRFPRACC

Figure 6 Impact of training packet length on model performance

8 Security and Communication Networks

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 9: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

of the model is enhanced If this number is too highhowever the proportion of filling data packets increasesthus affecting the modelrsquos ability to extract features Figure 7shows the impact of the number of packets per flow onmodel performance We found that when the number ofpackets in each flow exceeds 8 the performance of the modeldeclines significantlyWe chose 8 as the optimal value of per-flow packet quantity in the network

56 Batch Size Effects onModel Performance Batch size is animportant parameter in the model training processWithin areasonable range increasing the batch size can improve thememory utilization and speed up the data processing Ifincreased to an inappropriate extent however it can sig-nificantly slow down the process As shown in Figure 8 wefound that a batch size of 20 is optimal

57 Class Weight Effects on Model Performance Table 6shows a comparison of two groups of experimental resultswith and without class weights Introducing the class weightdoes appear to reduce the impact of the imbalance of thenumber of data of various types in the CICIDS2017 dataseton model performance

58 Model Evaluation e LSTM unit can effectively ex-tract the temporal relationship between packets Table 7shows a comparison of accuracy between the DL-IDSmodel and models with the CNN or LSTM alone eLSTM unit appears to effectively improve the identifica-tion efficiency of SSH-Patator Infiltration PortScan andother attack traffic for enhanced model performancepossibly due to the apparent timing of these attacksCompared to the LSTM model alone however adding aCNN further improves the identification efficiency of mostattack traffic As shown in Table 7 the proposed DL-IDSintrusion detection model has very low false alarm rate andcan classify network traffic more accurately than the CNNor LSTM alone

Table 8 shows a comparison of models using CNN andLSTM with traditional machine learning algorithms [33]e DL-IDS model achieves the best performance amongthem with the largest ACC value and the lowest FPRvalue

e data input to DL-IDS is raw network traffic ere isno special feature extraction in the model the training andtesting time include the feature extraction time e tradi-tional machine learning algorithm does not consider dataextraction or processing time so we could not directlycompare the time consumption of the various algorithms inTable 8e training time and testing time of the model wereunder 600 s and 100 s respectively so we believe that the DL-IDS achieves optimal detection effects in the same timeframe as the traditional algorithm

Table 5 Model performance with training packet length of 40

Label ACC () TPR () FPR () F1-score ()Normal 9954 9952 000 9961FTP-Patator 9962 9229 027 9145SSH-Patator 9963 8704 000 8687Dos 9961 9825 000 9787Heartbleed 9966 8112 000 8120Infiltration 9955 9807 018 9754PortScan 9954 9842 000 9871Overall 9867 9721 047 9332

100

80

60

40

20

0

ACC

TPR

FPR

rate

6 8 10 12 14 16Packet number per flow

TPRFPRACC

Figure 7 Impact of per-flow packet quantity on modelperformance

TPRFPRACC

100

80

60

40

20

0

ACC

TPR

FPR

rate

0 10 20 30 40Batchsize

Figure 8 Selected batch size affecting model performance

Table 6 Effects of applying class weight on model performance

Without class weights () With class weights ()ACC 9716 9858TPR 9333 9704FPR 038 052

Security and Communication Networks 9

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 10: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

6 Conclusions and Future Research Directions

In this study we proposed a DL-based intrusion detectionsystem named DL-IDS which utilized a hybrid of Con-volutional Neural Network (CNN) and Long Short-TermMemory (LSTM) to extract features from the networkdata flow to analyze the network traffic In DL-IDS CNNand LSTM respectively extract the spatial features of asingle packet and the temporal feature of the data streamand finally fuse them which improve the performance ofintrusion detection system Moreover DL-IDS usescategory weights for optimization in the training phaseis optimization method reduced the adverse of thenumber of unbalanced samples of attack types in Train setand improved the robustness of the model

To evaluate the proposed system we experimented onthe CICIDS2017 dataset which is often used by researchersfor the benchmark Normal traffic data and some attack dataof six typical types of FTP-Patator SSH-Patator DosHeartbleed Infiltration and PortScan were selected to testthe ability of DL-IDS to detect attack data Besides we alsoused the same data to test the CNN-only model the LSTM-only model and some commonly used machine learningmodels

e results show that DL-IDS reached 9867 and9332 in overall accuracy and F1-score respectively whichperformed better than all machine learning models Alsocompared with the CNN-only model and the LSTM-onlymodel DL-IDS reached over 9950 in the accuracy of allattack types and achieved the best performance among thesethree models

ere are yet certain drawbacks to the proposed modelincluding low detection accuracy on Heartbleed and SSH-Patator attacks due to data lack Generative AdversarialNetworks (GAN) may be considered to overcome thedrawback to some degree Further combining with some

traditional traffic features may enhance the overall modelperformance We plan to resolve these problems throughfurther research

Data Availability

Data will be made available upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

is research was funded by Beijing Natural ScienceFoundation under Grant no 4194086 the National NaturalScience Foundation of China under Grant no 61702043 andthe Key RampD Program of Heibei Province under Grant no201313701D

References

[1] M M Hassan A Gumaei S Huda and A Ahmad ldquoIn-creasing the trustworthiness in the industrial IoT networksthrough a reliable cyberattack detection modelrdquo IEEETransactions on Industrial Informatics vol 16 no 9 2020

[2] F A Khan and A Gumaei ldquoA comparative study of machinelearning classifiers for network intrusion detectionrdquo in Pro-ceedings of the International Conference on Artificial Intelli-gence and Security pp 75ndash86 New York NY USA July 2019

[3] M Alqahtani A Gumaei M Mathkour and M Maher BenIsmail ldquoA genetic-based extreme gradient boosting model fordetecting intrusions in wireless sensor networksrdquo Sensorsvol 19 no 20 p 4383 2019

[4] A Derhab M Guerroumi A Gumaei et al ldquoBlockchain andrandom subspace learning-based IDS for SDN-enabled in-dustrial IoT securityrdquo Sensors vol 19 no 14 p 3119 2019

[5] T Yaqoob H Abbas and M Atiquzzaman ldquoSecurity vul-nerabilities attacks countermeasures and regulations ofnetworked medical devices-a reviewrdquo IEEE CommunicationsSurveys amp Tutorials vol 21 no 4 pp 3723ndash3768 2019

[6] X Jing Z Yan X Jiang and W Pedrycz ldquoNetwork trafficfusion and analysis against DDoS flooding attacks with anovel reversible sketchrdquo Information Fusion vol 51 no 51pp 100ndash113 2019

[7] Z A Baig S Sanguanpong S N Firdous et al ldquoAverageddependence estimators for DoS attack detection in IoT net-worksrdquo Future Generation Computer Systems vol 102pp 198ndash209 2020

Table 7 CNN LSTM and CNN+LSTM results

LabelCNN () LSTM () CNN+LSTM ()

ACC TPR FPR F1-score ACC TPR FPR F1-score ACC TPR FPR F1-scoreNormal 9957 9956 000 9964 9956 9955 000 9963 9954 9952 000 9961FTP-Patator 9945 9195 027 8995 9934 8927 015 9172 9962 9229 027 9145SSH-Patator 9953 8090 000 8722 9891 7991 000 8666 9963 8704 000 8687Dos 9953 9804 000 9786 9855 9485 013 8989 9961 9825 000 9787Heartbleed 9964 8008 000 8088 9963 8112 007 7769 9966 8112 000 8120Infiltration 9959 9791 007 9775 9884 9680 116 9717 9955 9807 018 9754PortScan 9935 9748 000 9851 9942 9799 000 9404 9954 9842 000 9871Overall 9844 9646 036 9311 9683 9421 154 9097 9867 9721 047 9332

Table 8 CNN and LSTM models versus traditional machinelearning algorithms

ACC () TPR () FPR () F1-score ()MultinomialNB 7252 7820 3316 5206Random Forest 9608 9547 330 7671J48 9732 9680 217 9143Logistic Regression 9768 9496 147 9055DL-IDS 9867 9721 047 9332

10 Security and Communication Networks

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11

Page 11: DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network ...Apr 06, 2020  · Table 2: Distribution of network traffic periods in CICIDS2017 dataset. Attack Time Size Normal Monday

[8] Y Yuan H Yuan DW C Ho and L Guo ldquoResilient controlof wireless networked control system under denial-of-serviceattacks a cross-layer design approachrdquo IEEE Transactions onCybernetics vol 50 no 1 pp 48ndash60 2020

[9] A Verma and V Ranga ldquoStatistical analysis of CIDDS-001dataset for network intrusion detection systems using dis-tance-based machine learningrdquo Procedia Computer Sciencevol 125 pp 709ndash716 2018

[10] H Xu F Mueller M Acharya et al ldquoMachine learningenhanced real-time intrusion detection using timing infor-mationrdquo in Proceedings of the International Workshop onTrustworthy amp Real-Time Edge Computing for Cyber-PhysicalSystems Nashville TN USA 2018

[11] Y Wang W Meng W Li J Li W-X Liu and Y Xiang ldquoAfog-based privacy-preserving approach for distributed sig-nature-based intrusion detectionrdquo Journal of Parallel andDistributed Computing vol 122 pp 26ndash35 2018

[12] H Xu C Fang Q Cao et al ldquoApplication of a distance-weighted KNN algorithm improved by moth-flame optimi-zation in network intrusion detectionrdquo in Proceedings of the2018 IEEE 4th International Symposium on Wireless Systemswithin the International Conferences on Intelligent Data Ac-quisition and Advanced Computing Systems (IDAACS-SWS)pp 166ndash170 IEEE Lviv Ukraine September 2018

[13] S Teng N Wu H Zhu L Teng and W Zhang ldquoSVM-DT-based adaptive and collaborative intrusion detectionrdquo IEEECAAJournal of Automatica Sinica vol 5 no 1 pp 108ndash118 2018

[14] J Liu and L Xu ldquoImprovement of SOM classification algo-rithm and application effect analysis in intrusion detectionrdquoRecent Developments in Intelligent Computing Communica-tion and Devices pp 559ndash565 Springer Berlin Germany2019

[15] Y Bengio A Courville and P Vincent ldquoRepresentationlearning a review and new perspectivesrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 35 no 8pp 1798ndash1828 2013

[16] T Ma F Wang J Cheng Y Yu and X Chen ldquoA hybridspectral clustering and deep neural network ensemble algo-rithm for intrusion detection in sensor networksrdquo Sensorsvol 16 no 10 p 1701 2016

[17] Q NiyazW Sun A Y Javaid andM Alam ldquoA deep learningapproach for network intrusion detection systemrdquo in Pro-ceedings of the 9th International Conference on Bio-inspiredInformation and Communications Technologies pp 21ndash26Columbia NY USA December 2015

[18] A S Eesa Z Orman and A M A Brifcani ldquoA novel feature-selection approach based on the cuttlefish optimization al-gorithm for intrusion detection systemsrdquo Expert Systems withApplications vol 42 no 5 pp 2670ndash2679 2015

[19] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[20] Z Wang He Applications of Deep Learning on TrafficIdentification pp 21ndash26 BlackHat Las Vegas NV USA 2015

[21] J Fan and K Ling-zhi Intrusion Detection Algorithm Based onConvolutional Neural Network Beijing Institude of Tech-nology Beijing China 2017

[22] J Kim J Kim H L T u and H Kim ldquoLong short termmemory recurrent neural network classifier for intrusiondetectionrdquo in Proceedings of the 2016 International Conferenceon Platform Technology and Service (PlatCon) February 2016

[23] P Wu and H Guo ldquoLuNET a deep neural network fornetwork intrusion detectionrdquo in Proceedings of the Sympo-sium Series on Computational Intelligence (SSCI) IEEEXiamen ChinaIEEE Xiamen China December 2019

[24] C M Hsu H Y Hsieh S W Prakosa M Z Azhari andJ S Leu ldquoUsing long-short-term memory based convolu-tional neural networks for network intrusion detectionrdquo inProceedings of the International Wireless Internet ConferenceSpringer Taipei Taiwan pp 86ndash94 October 2018

[25] M Ahsan and K Nygard ldquoConvolutional neural networkswith LSTM for intrusion detectionrdquo in Proceedings of the 35thInternational Conference vol 69 pp 69ndash79 Seville SpainMay 2020

[26] M M Hassan A Gumaei A Ahmed M Alrubaian andG Fortino ldquoA hybrid deep learning model for efficient in-trusion detection in big data environmentrdquo InformationSciences vol 513 pp 386ndash396 2020

[27] R Abdulhammed H Musafer A Alessa M Faezipour andA Abuzneid ldquoFeatures dimensionality reduction approachesfor machine learning based network intrusion detectionrdquoElectronics vol 8 no 3 p 322 2019

[28] H Musafer A Abuzneid M Faezipour and A MahmoodldquoAn enhanced design of Sparse autoencoder for latent featuresextraction based on trigonometric simplexes for networkintrusion detection systemsrdquo Electronics vol 9 no 2 p 2592020

[29] V Ramos and A Abraham ldquoANTIDS self organized antbased clustering model for intrusion detection systemrdquo inProceedings of the Fourth IEEE InternationalWorkshop on SoftComputing as Transdisciplinary Science and Technology(WSTSTrsquo05) Springer Muroran JapanSpringer MuroranJapan May 2005

[30] I Sharafaldin A H Lashkari and A Ali ldquoToward generatinga new intrusion detection dataset and intrusion trafficcharacterizationrdquo in Proceedings of the He Fourth Interna-tional Conference on Information Systems Security and Privacy(ICISSP) Madeira Portugal January 2018

[31] X Chen ldquoA simple utility to classify packets into flowsrdquohttpsgithubcomcaesar0301pkt2flow

[32] B J Radford and B D Richardson ldquoSequence aggregationrules for anomaly detection in computer network trafficrdquo2018 httpsarxivorgabs180503735v2

[33] A AhmimM A Ferrag LMaglaras M Derdour andH JanickeldquoA detailed analysis of using supervised machine learning for in-trusion detectionrdquo 2019 httpswwwresearchgatenetpublication331673991_A_Detailed_Analysis_of_Using_Supervised_Machine_Learning_for_Intrusion_Detection

Security and Communication Networks 11