anomaly detection of iot cyberattacks in smart city build ... · journal of seybold report issn no:...
TRANSCRIPT
Anomaly Detection of IoT Cyberattacks in Smart City build on
Machine Learning Algorithms
Sushmitha R1, Deepa N P
2 and K L Sudha
3
1,2,3 Department of Electronics and Communication Engineering,
Dayananda Sagar College of Engineering, Visvesvaraya Technological University,
Karnataka, India
Abstract: The distribution of heterogeneous IoT devices has led to creating smart cities
that are based on Fog architecture to use most advanced communication technologies
and to direct economic growth. With the increase in the amount of data across the
network in a smart city over IoT devices cause IoT cyberattacks. This led to cyber security
challenges. These IoT devices in an IoT device layer plugged to sensors that are tied to
cloud servers in a Cloud layer based on Fog architecture. In order to overcome these
cyberattacks in a smart city, an Anomaly Detection of IoT (AD-IoT) system is used based
on machine learning algorithms to predict the accurate performance. This Anomaly
Detection of IoT system can detect compromised IoT devices at fog layer in place of a
cloud layer. Thus, Anomaly Detection of IoT system can functionally meet outrageous
classification accuracy with false positive rate. In this work, UNSW-NB15 dataset is used
for evaluation and to represent the model’s accuracy.
Keywords: Anomaly Detection, cybersecurity, fog architecture, smart city.
1. Introduction
Nowadays, many cyberattacks are found in a smart city due to the growth of network-
based services over the internet. Thus, the network security issue is becoming more
serious. A smart city is nothing but new IoT technology to optimize infrastructure. The
aim of a smart city is to enable better status of life, economic competitiveness and
sustainability which helps to improve the upcoming technologies. The cyberattacks
increase rapidly due to interconnected devices. These cyberattacks can acquire
unapproved access to IoT devices without the awareness of the user. The security
challenges in a smart city are to detect attacks from a variety of protocols in IoT devices
and to detect cyberattacks from the IoT networks before harming a smart city [1]. Earlier
works for detection of IoT attacks used ‘Traditional Intrusion Detection System’ (IDS)
method to detect only known attack. Later, Network-based anomaly detection system
includes various logs in the network that helps to analyze the amount of data in a network.
This system also consists of network attacks that are difficult to identify.
Thus, Anomaly Detection helps to allow the amount of normal data into the network
[2]. Through a variety of heterogeneous networks like wired, wireless, LAN etc., an
attacker can access the information in a network while the process of collecting and
analyzing the data in a network like eavesdropping [3]. Machine Learning techniques will
help in providing security problems and can be used in many different cybersecurity
applications. It also helps to detect attacks on signature-based and anomaly-based
techniques. The drawbacks were found in signature-based techniques where this
technique cannot detect known attacks [4]. However, a new technique called Anomaly
Detection of IoT (AD-IoT) system is subjected to observe the amount of data in a fog
layer and can detect hidden compromised devices based on Fog architecture. Anomaly
detection based on fog network show attacks using modern dataset in an IoT network in a
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 71
smart city. Hence, this system reduces the false positive rate based on machine learning
algorithms to predict an attack and normal.
Smart city formulated on Fog architecture
Smart city is formed on fog computing to reduce the latency between cloud layer
and IoT sensor layer. It consists of three layers namely: cloud layer, fog layer and IoT
sensor layer. Fog layer ensures processing and aggregation of the data [1]. Figure 1 shows
a smart city based on Fog architecture. The Cloud layer has servers to store and manage
big data. Fog layer aid the gap between sensing and cloud layer to make the
computational and management in the edges of the network. IoT sensing layer has a set of
sensors that enable data collection.
Figure 1. Smart city based on Fog architeture
Traditional Intrusion Detection System
Previously, Traditional Intrusion Detection Systems (IDSs) [1] was subjected to
monitor the amount of data in the network. There are mainly two types of IDSs:
1. Host-based Intrusion Detection System (HIDS)
2. Network-based Intrusion Detection System (NIDS)
Host-based IDS helps to monitor and detect intrusion activities only on the computer
system. For example, when a computer is infected with a virus, the files will be missing or
it will be deleted. By installing an antivirus, the virus which is infected to the computer
can be detected and monitored. Thus, HIDS is not significant with some IoT devices.
Network-based IDS helps to analyze incoming network data and monitor the amount
of data in the network and can detect both malicious and non-malicious attacks based on
hybrid techniques that are signature-based and anomaly-based. Therefore, this method
used to detect attack and normal which identifies attack based on the amount of data
across the network in an environment.
2. Related work
In this section, some of the previous and recent techniques for the detection of
intrusion activities are discussed [1]. Related intrusion detection activities work on
Traditional Intrusion Detection System (IDS) in IoT network, IoT cyber threats and
network behavior. Traditional Intrusion Detection System methods subjected to detect
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 72
cyberattacks in different ways in the ‘Host-based Intrusion Detection System’ (HIDS),
‘Network-based Intrusion Detection System’ (NIDS). Some of the difficulties faced by
the IoT networks with traditional IDS methods due to limited resources, consuming
energy etc. Thus, using the Network-based Intrusion Detection System (NIDS) methods
IoT security services can be protected [2]. Several techniques depend on hybrid based
methods [3]. But this method fails to detect attacks when matched with the database
stored and cannot detect unknown attacks in the network traffics on signature-based
method.
An approach to anomaly intrusion detection which was designed for both training and
detection of normal and attacks using Machine Learning classification techniques and
pattern recognition [4]. Many supervised and unsupervised machine learning models
based on classification methods which were applied on a cloud security with the input
dataset and its features [5]. This helps in the selection and tuning model. The UNSW-
NB15 dataset [6], [7] for evaluating the model’s accuracy and to generate its features.
There are 49 features and 9 attacks classification. This dataset has the capability to
generate the model to differentiate attack and normal.
Some future challenges for IoT security which would provide security solutions in an
environment based on Machine Learning and Deep Learning approaches based on
classification methods [8]. An Anomaly Detection of IoT approach for detecting
cyberattacks in fog nodes based on Fog architecture using modern UNSW-NB15 dataset
which has high performance and efficiency. This model learns from the amount of data in
the network by training Machine Learning algorithms to detect malicious behavior.
3. Block diagram
The block diagram for detecting cyberattacks is shown in Figure 2. The model first
collects data from firewall logs where it monitors the amount of data and accepts the data
in the form of files. Dataset is the collection of data for the training set and testing set.
The features of the dataset are performed using python modules and files. Patterns can be
built based on the input labelled dataset and the amount of data in a network. Each
network service has its own pattern and the data can be categorized by its services.
Anomaly is detected by observing the other activities in the network. After the anomaly is
detected, an alert is displayed to the authorized user.
Figure 2. Block diagram
UNSW-NB15 dataset: UNSW-NB15 dataset created by IXIA PerfectStorm tool in the
Cyber Range Lab of the ‘Australian Centre for Cyber Security’ (ACCS) for generating
normal activities and attack behaviors. UNSW-NB15 dataset includes 49 features along
with 9 attack classifications such as Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms to update the malicious behaviors [7].
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 73
4. Working principle
AD-IoT Detection model is employed in detecting various attacks. Figure 3 shows
AD-IoT system model. AD-IoT system is based on Fog architecture that consists of
multiple components having an enormous amount of IoT devices in an environment that
causes cyber threats in an IoT layer. An enormous amount of IoT devices subjected to
gateway in fog layer. Each facility has its own gateway and subjected to Anomaly
Detection of IoT security gateway in the fog layer to hold many gateways. A security
gateway subjected to a fog node can check the communication among the data across the
network and helps to monitor the amount of data in the network that moves through each
fog node.
The UNSW-NB15 dataset is the input to the machine learning model. The UNSW-
NB15 training set features of the UNSW-NB15 dataset helps to train the model. Once the
model is trained, some of the relevant features are selected and processes its input to
produce output. UNSW-NB15 testing set features will be processed to test the model,
processes its input to produce output and helps to classify normal and attacks based on
UNSW-NB15 dataset attacks with the help of binary classification. Detection from this
system at each fog node in a fog layer should alert the security cloud server to analyze and
update the system. To test and classify ‘attack’ and ‘normal’, a random forest algorithm
is used which is discussed in the next section.
Figure 3. AD-IoT detection model
5. Random Forest algorithm Random Forest algorithm is a supervised machine learning algorithm comprising a pool
of tree-composed classifiers. Each tree grows in concordance with a random vector. The
two main parameters like the number of variables to be chosen in each node and the
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 74
number of trees that build the forest. The parameters of the UNSW-NB15 training set,
UNSW-NB15 testing set are considered. These parameters are restricted to some
threshold values. Based on these threshold values ‘normal’ and ‘attacks’ are classified.
Figures 4 and 5 show Random Forest classifying normal and attack considering few
parameters as an example. Considering one of the parameter spkts which is the source to
destination packet count which ranges from 20 to 28 that represents the attack called
reconnaissance. The blue circles in the tree represent the attack and white circles doesn't
fall in the range. Similarly, other parameters should be considered.
Figure 4: Random Forest representing class Normal
Figure 5: Random Forest representing class Attack
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 75
6. Flow diagram
In this section, the modern and standard UNSW-NB15 dataset consisting of training
and testing sets which are the input to the system is discussed. The feature extraction
measures data and generates values. Next, the feature selection helps in selecting the
parameters of the training and testing set of the UNSW-NB15 dataset. Once these
parameters of the dataset are selected, the different supervised machine learning
algorithms like deep learning, KNN parameters, and supervised accuracy, supervised on
unsupervised, supervised categorized accuracy can be adopted. Based on these machine
learning algorithms, normal and attacks can be predicted.
Figure 6: Flow diagram
7. Analysis and Results
Anomaly Detection of IoT model was tested to classify attack and normal. This
approach is implemented using Python programming language using modules like OS,
Pandas, NumPy, CSV, time, Matplotlib, Pylab, sklearn. Experimental results to detect
attacks and normal over training set and testing set of UNSW-NB15 dataset is discussed.
The parameters of the training set and testing set of the UNSW-NB15 dataset are named:
proto (transaction protocol), service (Http, FTP), state (the state and its dependent
protocol), sbytes (source to destination bytes), dbytes (destination to source bytes), spkts
(source to destination packet count), dpkts (destination to source packet count), sttl
(source to destination time to live), dttl (destination to source time to live), sloss (source
packets retransmitted or dropped) and dloss (destination packets retransmitted or
dropped). If the category is Normal, the label assigned is 0 else for attacks the label
assigned is 1. These parameters with respect to threshold values, category and label are
tabulated in Table 1 to Table 4.
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 76
Table 1. UNSW-NB15 training set statistics for Normal
Parameters Threshold value Category Label
proto tcp, udp, arp
Normal
0
service smtp, snmp, http
state FIN, INT, CON
spkts 1-122
dpkts 0-126
sbytes 130-986
dbytes 0-1096
sttl 62-254
dttl 0-252
sloss 0-28
dloss 0-32
Table 2. UNSW-NB15 training set statistics for Attacks
Parameters Threshold value Category Label
proto ddp, mtp, unas Backdoor
1
service ftp Fuzzers
state INT Backdoor
spkts 2 Analysis
dpkts 0-8 Reconnaissance
sbytes 240-610 Shellcode
dbytes 0-2658 Exploits
sttl 0-254 DoS
dttl 0-252 Generic
sloss 2 Worms
dloss 0-36 Exploits
Table 3. UNSW-NB15 testing set statistics for Normal
Parameters Threshold value Category Label
proto udp, arp, tcp
Normal
0
service (-), http
state INT, FIN, REQ
spkts 1-22
dpkts 0-10
sbytes 46-900
dbytes 0-354
sttl 0-254
dttl 0-255
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 77
Table 4. UNSW-NB15 testing set statistics for Attacks
Parameters Threshold value Category Label
proto gre Analysis
1
service smtp, http Exploits
state FIN Fuzzers
spkts 20 DoS
dpkts 0-6 Worms
sbytes 168-564 Reconnaissance
dbytes 0-354 Shellcode
sttl 0 Backdoor
dttl 0-19 Generic
Some of the labels of Machine Learning algorithms are discussed which are used to
obtain results.
1. deeplearning_all_label – the file contains all labels from the training set and
normalizes the trained data. Figure 7. shows the simulation snapshot of reading
training and testing csv file
Figure 7. Representing reading training and testing CSV file
2. deeplearning_each_label – for each training set, categorization and normalization
for testing should be done. Figure 8 shows the trained and test results of nine
attacks and one normal.
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 78
Figure 8. Representing attacks each label
3. knn_parameters_tuning – helps in calculating accuracy score for training and
testing set. The calculating accuracy for KNN neibours from 1 to 29 is shown in
figure 9.
Figure 9. Representing knn parameters
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 79
4. supervised_on_unsupervised and supervised_accuracy – helps to obtain the
scored values. The process of training, testing, creating, predicting the model and
calculating the accuracy score steps shown in figure 10.
Figure 10. Representing the process for evaluation
5. supervised_label_categorized_accuracy – the labels will be categorized and
calculates accuracy scores for normal and attacks. Figure 11 shows the accuracy
score for each attack.
Figure 11. Representing the categorized accuracy
6. supervised_on_unsupervised – the labelled data provides accuracy score with
clusters. Figure 12. represents accuracy score with clusters
Figure 12. Representing accuracy score with clusters
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 80
The results are obtained based on the Confusion matrix that differentiates normal and
attack and also helps to visualize the performance of algorithms.
Figure 13: Confusion matrix for Random Forest
From the figure 13, class Normal observation is True Negative (no attacks found).
Similarly, class Attack observation is False Negative (only attacks are found).
Anomaly Detection of IoT for detecting cyberattacks can be shown by Confusion
matrix for accuracy, based on the parameters of the UNSW-NB15 dataset which are
named: dur (record total duration), proto (transaction protocol), is_sm_ips_ports (if the
source IP address equals to destination IP address and source port number equals to
destination port number, the variable takes value 1 else 0) and label (0 for normal, 1 for
attack records) as shown in figure 14.
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 81
Figure 14: Representing UNSW-NB15 testing set statistics to obtain Accuracy confusion matrix
Figure 15 shows the Confusion matrix for accuracy. Actual labels are in rows and
Predicted labels are in columns. Row zero and column zero indicates the parameter label
(normal) which has a value 0. Row one and column one indicate the parameter proto
predicting the value 1734 with an error 1. Row two and column two indicate the
parameter dur predicting the value 9833. Row three and column three indicate the
parameter is_sm_ips_ports predicting the value 8526. Row four and column four indicate
the parameter category Normal predicting the value 489. The diagonal contains the
correct predictions. The accuracy observed from the above figure is 99.99514162172667.
Figure 15: Accuracy Confusion matrix
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 82
Figure 16 shows binary classification which will classify as attack and normal. The
performance of Binary classification involves data into two groups. In python, heatmap
produces a two-dimensional graphical representation of data in a matrix form that are
represented as colors. From the above figure, the x-axis represents the predicted label and
the y-axis represents the actual label.
Figure 16. Binary classification
8. Conclusion An approach to Network-based Intrusion Detection System termed Anomaly Detection of
IoT (AD-IoT) system can effectively detect various types of attacks in a fog layer in place
of a cloud layer which is based on Fog architecture over UNSW-NB15 dataset using
different supervised machine learning algorithms. The analysis of UNSW-NB15 dataset is
subject to demonstrate the model’s accuracy.
REFERENCES
[1] Ibrahim Alrashdi, Ali Alqazzaz, Esam Aloufi, Raed Alharthi, Mohamed Zohdy, Hua
Ming, “Anomaly Detection of IoT Cyberattacks in Smart City Using Machine Learning”,
978-1-7281-0554-3, 2019 IEEE.
[2] Rashmi H Roplekar and N V Buradkar, “Survey of Random Forest Based Network
Anomaly Detection Systems”, IJARCCE, Vol. 6, Issue 12, December 2017.
[3] Rifkie Primartha and Bayu Adhi Tama, “Anomaly Detection using Random Forest: A
Performance Revisited,” 978-1-5386-1449-5/17/$31.00 ©2017 IEEE. [4] Jadel Alsamiri and Khalid Alsubhi, “Internet of Things Cyber Attacks Detection using
Machine Learning,” International Journal of Advanced Computer Science and
Applications, Vol. 10, No. 12, 2019. [5] Phyu Thi Htun and Kyaw Thet Khaing, “Anomaly Intrusion Detection System using
Random Forests and k-Nearest Neighbor,”International Journal of P2P Network Trends
and Technology,” Volume 3, Issue 1, January to February 2013.
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 83
[6] Rupesh Raj Karn, Prabhakar Kudva and Ibrahim M Elfadel, “Dynamic Autoselection
and Autotuning of Machine Learning Models for Cloud Network Analytics,” 1045-9219
(c) 2018 IEEE.
[7] Nour Moustafa and Jill Slay, “Unsw-nb15: a comprehensive data set for network
intrusion detection systems”, in Military Communications and Information Systems
Conference, 2015, IEEE 2015, pp. 1-6.
[8] N. Moustafa and J. Slay, “The evaluation of network anomaly detection systems:
Statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data
set,” Information Security Journal: A Global Perspective, vol. 25, no. 1-3, pp. 18-31,
2016.
[9] Fatima Hussain, Rasheed Hussain, Syed Ali Hassan and Ekram Hossain, “Machine
Learning in IoT Security: Current Solutions and Future Challenges,” arXiv:
1904.05735v1, 14 March 2019.
Journal of Seybold Report ISSN NO: 1533-9211
VOLUME 15 ISSUE 9 2020 84