anomaly detection of iot cyberattacks in smart city build ... · journal of seybold report issn no:...

Anomaly Detection of IoT Cyberattacks in Smart City build on

Machine Learning Algorithms

Sushmitha R1, Deepa N P

2 and K L Sudha

3

1,2,3 Department of Electronics and Communication Engineering,

Dayananda Sagar College of Engineering, Visvesvaraya Technological University,

Karnataka, India

Abstract: The distribution of heterogeneous IoT devices has led to creating smart cities

that are based on Fog architecture to use most advanced communication technologies

and to direct economic growth. With the increase in the amount of data across the

network in a smart city over IoT devices cause IoT cyberattacks. This led to cyber security

challenges. These IoT devices in an IoT device layer plugged to sensors that are tied to

cloud servers in a Cloud layer based on Fog architecture. In order to overcome these

cyberattacks in a smart city, an Anomaly Detection of IoT (AD-IoT) system is used based

on machine learning algorithms to predict the accurate performance. This Anomaly

Detection of IoT system can detect compromised IoT devices at fog layer in place of a

cloud layer. Thus, Anomaly Detection of IoT system can functionally meet outrageous

classification accuracy with false positive rate. In this work, UNSW-NB15 dataset is used

for evaluation and to represent the model’s accuracy.

Keywords: Anomaly Detection, cybersecurity, fog architecture, smart city.

1. Introduction

Nowadays, many cyberattacks are found in a smart city due to the growth of network-

based services over the internet. Thus, the network security issue is becoming more

serious. A smart city is nothing but new IoT technology to optimize infrastructure. The

aim of a smart city is to enable better status of life, economic competitiveness and

sustainability which helps to improve the upcoming technologies. The cyberattacks

increase rapidly due to interconnected devices. These cyberattacks can acquire

unapproved access to IoT devices without the awareness of the user. The security

challenges in a smart city are to detect attacks from a variety of protocols in IoT devices

and to detect cyberattacks from the IoT networks before harming a smart city [1]. Earlier

works for detection of IoT attacks used ‘Traditional Intrusion Detection System’ (IDS)

method to detect only known attack. Later, Network-based anomaly detection system

includes various logs in the network that helps to analyze the amount of data in a network.

This system also consists of network attacks that are difficult to identify.

Thus, Anomaly Detection helps to allow the amount of normal data into the network

[2]. Through a variety of heterogeneous networks like wired, wireless, LAN etc., an

attacker can access the information in a network while the process of collecting and

analyzing the data in a network like eavesdropping [3]. Machine Learning techniques will

help in providing security problems and can be used in many different cybersecurity

applications. It also helps to detect attacks on signature-based and anomaly-based

techniques. The drawbacks were found in signature-based techniques where this

technique cannot detect known attacks [4]. However, a new technique called Anomaly

Detection of IoT (AD-IoT) system is subjected to observe the amount of data in a fog

layer and can detect hidden compromised devices based on Fog architecture. Anomaly

detection based on fog network show attacks using modern dataset in an IoT network in a

Journal of Seybold Report ISSN NO: 1533-9211

VOLUME 15 ISSUE 9 2020 71

smart city. Hence, this system reduces the false positive rate based on machine learning

algorithms to predict an attack and normal.

Smart city formulated on Fog architecture

Smart city is formed on fog computing to reduce the latency between cloud layer

and IoT sensor layer. It consists of three layers namely: cloud layer, fog layer and IoT

sensor layer. Fog layer ensures processing and aggregation of the data [1]. Figure 1 shows

a smart city based on Fog architecture. The Cloud layer has servers to store and manage

big data. Fog layer aid the gap between sensing and cloud layer to make the

computational and management in the edges of the network. IoT sensing layer has a set of

sensors that enable data collection.

Figure 1. Smart city based on Fog architeture

Traditional Intrusion Detection System

Previously, Traditional Intrusion Detection Systems (IDSs) [1] was subjected to

monitor the amount of data in the network. There are mainly two types of IDSs:

1. Host-based Intrusion Detection System (HIDS)

2. Network-based Intrusion Detection System (NIDS)

Host-based IDS helps to monitor and detect intrusion activities only on the computer

system. For example, when a computer is infected with a virus, the files will be missing or

it will be deleted. By installing an antivirus, the virus which is infected to the computer

can be detected and monitored. Thus, HIDS is not significant with some IoT devices.

Network-based IDS helps to analyze incoming network data and monitor the amount

of data in the network and can detect both malicious and non-malicious attacks based on

hybrid techniques that are signature-based and anomaly-based. Therefore, this method

used to detect attack and normal which identifies attack based on the amount of data

across the network in an environment.

2. Related work

In this section, some of the previous and recent techniques for the detection of

intrusion activities are discussed [1]. Related intrusion detection activities work on

Traditional Intrusion Detection System (IDS) in IoT network, IoT cyber threats and

network behavior. Traditional Intrusion Detection System methods subjected to detect



cyberattacks in different ways in the ‘Host-based Intrusion Detection System’ (HIDS),

‘Network-based Intrusion Detection System’ (NIDS). Some of the difficulties faced by

the IoT networks with traditional IDS methods due to limited resources, consuming

energy etc. Thus, using the Network-based Intrusion Detection System (NIDS) methods

IoT security services can be protected [2]. Several techniques depend on hybrid based

methods [3]. But this method fails to detect attacks when matched with the database

stored and cannot detect unknown attacks in the network traffics on signature-based

method.

An approach to anomaly intrusion detection which was designed for both training and

detection of normal and attacks using Machine Learning classification techniques and

pattern recognition [4]. Many supervised and unsupervised machine learning models

based on classification methods which were applied on a cloud security with the input

dataset and its features [5]. This helps in the selection and tuning model. The UNSW-

NB15 dataset [6], [7] for evaluating the model’s accuracy and to generate its features.

There are 49 features and 9 attacks classification. This dataset has the capability to

generate the model to differentiate attack and normal.

Some future challenges for IoT security which would provide security solutions in an

environment based on Machine Learning and Deep Learning approaches based on

classification methods [8]. An Anomaly Detection of IoT approach for detecting

cyberattacks in fog nodes based on Fog architecture using modern UNSW-NB15 dataset

which has high performance and efficiency. This model learns from the amount of data in

the network by training Machine Learning algorithms to detect malicious behavior.

3. Block diagram

The block diagram for detecting cyberattacks is shown in Figure 2. The model first

collects data from firewall logs where it monitors the amount of data and accepts the data

in the form of files. Dataset is the collection of data for the training set and testing set.

The features of the dataset are performed using python modules and files. Patterns can be

built based on the input labelled dataset and the amount of data in a network. Each

network service has its own pattern and the data can be categorized by its services.

Anomaly is detected by observing the other activities in the network. After the anomaly is

detected, an alert is displayed to the authorized user.

Figure 2. Block diagram

UNSW-NB15 dataset: UNSW-NB15 dataset created by IXIA PerfectStorm tool in the

Cyber Range Lab of the ‘Australian Centre for Cyber Security’ (ACCS) for generating

normal activities and attack behaviors. UNSW-NB15 dataset includes 49 features along

with 9 attack classifications such as Fuzzers, Analysis, Backdoors, DoS, Exploits,

Generic, Reconnaissance, Shellcode and Worms to update the malicious behaviors [7].



4. Working principle

AD-IoT Detection model is employed in detecting various attacks. Figure 3 shows

AD-IoT system model. AD-IoT system is based on Fog architecture that consists of

multiple components having an enormous amount of IoT devices in an environment that

causes cyber threats in an IoT layer. An enormous amount of IoT devices subjected to

gateway in fog layer. Each facility has its own gateway and subjected to Anomaly

Detection of IoT security gateway in the fog layer to hold many gateways. A security

gateway subjected to a fog node can check the communication among the data across the

network and helps to monitor the amount of data in the network that moves through each

fog node.

The UNSW-NB15 dataset is the input to the machine learning model. The UNSW-

NB15 training set features of the UNSW-NB15 dataset helps to train the model. Once the

model is trained, some of the relevant features are selected and processes its input to

produce output. UNSW-NB15 testing set features will be processed to test the model,

processes its input to produce output and helps to classify normal and attacks based on

UNSW-NB15 dataset attacks with the help of binary classification. Detection from this

system at each fog node in a fog layer should alert the security cloud server to analyze and

update the system. To test and classify ‘attack’ and ‘normal’, a random forest algorithm

is used which is discussed in the next section.

Figure 3. AD-IoT detection model

5. Random Forest algorithm Random Forest algorithm is a supervised machine learning algorithm comprising a pool

of tree-composed classifiers. Each tree grows in concordance with a random vector. The

two main parameters like the number of variables to be chosen in each node and the



number of trees that build the forest. The parameters of the UNSW-NB15 training set,

UNSW-NB15 testing set are considered. These parameters are restricted to some

threshold values. Based on these threshold values ‘normal’ and ‘attacks’ are classified.

Figures 4 and 5 show Random Forest classifying normal and attack considering few

parameters as an example. Considering one of the parameter spkts which is the source to

destination packet count which ranges from 20 to 28 that represents the attack called

reconnaissance. The blue circles in the tree represent the attack and white circles doesn't

fall in the range. Similarly, other parameters should be considered.

Figure 4: Random Forest representing class Normal

Figure 5: Random Forest representing class Attack



6. Flow diagram

In this section, the modern and standard UNSW-NB15 dataset consisting of training

and testing sets which are the input to the system is discussed. The feature extraction

measures data and generates values. Next, the feature selection helps in selecting the

parameters of the training and testing set of the UNSW-NB15 dataset. Once these

parameters of the dataset are selected, the different supervised machine learning

algorithms like deep learning, KNN parameters, and supervised accuracy, supervised on

unsupervised, supervised categorized accuracy can be adopted. Based on these machine

learning algorithms, normal and attacks can be predicted.

Figure 6: Flow diagram

7. Analysis and Results

Anomaly Detection of IoT model was tested to classify attack and normal. This

approach is implemented using Python programming language using modules like OS,

Pandas, NumPy, CSV, time, Matplotlib, Pylab, sklearn. Experimental results to detect

attacks and normal over training set and testing set of UNSW-NB15 dataset is discussed.

The parameters of the training set and testing set of the UNSW-NB15 dataset are named:

proto (transaction protocol), service (Http, FTP), state (the state and its dependent

protocol), sbytes (source to destination bytes), dbytes (destination to source bytes), spkts

(source to destination packet count), dpkts (destination to source packet count), sttl

(source to destination time to live), dttl (destination to source time to live), sloss (source

packets retransmitted or dropped) and dloss (destination packets retransmitted or

dropped). If the category is Normal, the label assigned is 0 else for attacks the label

assigned is 1. These parameters with respect to threshold values, category and label are

tabulated in Table 1 to Table 4.



Table 1. UNSW-NB15 training set statistics for Normal

Parameters Threshold value Category Label

proto tcp, udp, arp

Normal

0

service smtp, snmp, http

state FIN, INT, CON

spkts 1-122

dpkts 0-126

sbytes 130-986

dbytes 0-1096

sttl 62-254

dttl 0-252

sloss 0-28

dloss 0-32

Table 2. UNSW-NB15 training set statistics for Attacks


proto ddp, mtp, unas Backdoor

1

service ftp Fuzzers

state INT Backdoor

spkts 2 Analysis

dpkts 0-8 Reconnaissance

sbytes 240-610 Shellcode

dbytes 0-2658 Exploits

sttl 0-254 DoS

dttl 0-252 Generic

sloss 2 Worms

dloss 0-36 Exploits

Table 3. UNSW-NB15 testing set statistics for Normal


proto udp, arp, tcp

Normal

0

service (-), http

state INT, FIN, REQ

spkts 1-22

dpkts 0-10

sbytes 46-900

dbytes 0-354

sttl 0-254

dttl 0-255



Table 4. UNSW-NB15 testing set statistics for Attacks


proto gre Analysis

1

service smtp, http Exploits

state FIN Fuzzers

spkts 20 DoS

dpkts 0-6 Worms

sbytes 168-564 Reconnaissance

dbytes 0-354 Shellcode

sttl 0 Backdoor

dttl 0-19 Generic

Some of the labels of Machine Learning algorithms are discussed which are used to

obtain results.

1. deeplearning_all_label – the file contains all labels from the training set and

normalizes the trained data. Figure 7. shows the simulation snapshot of reading

training and testing csv file

Figure 7. Representing reading training and testing CSV file

2. deeplearning_each_label – for each training set, categorization and normalization

for testing should be done. Figure 8 shows the trained and test results of nine

attacks and one normal.



Figure 8. Representing attacks each label

3. knn_parameters_tuning – helps in calculating accuracy score for training and

testing set. The calculating accuracy for KNN neibours from 1 to 29 is shown in

figure 9.

Figure 9. Representing knn parameters



4. supervised_on_unsupervised and supervised_accuracy – helps to obtain the

scored values. The process of training, testing, creating, predicting the model and

calculating the accuracy score steps shown in figure 10.

Figure 10. Representing the process for evaluation

5. supervised_label_categorized_accuracy – the labels will be categorized and

calculates accuracy scores for normal and attacks. Figure 11 shows the accuracy

score for each attack.

Figure 11. Representing the categorized accuracy

6. supervised_on_unsupervised – the labelled data provides accuracy score with

clusters. Figure 12. represents accuracy score with clusters

Figure 12. Representing accuracy score with clusters



The results are obtained based on the Confusion matrix that differentiates normal and

attack and also helps to visualize the performance of algorithms.

Figure 13: Confusion matrix for Random Forest

From the figure 13, class Normal observation is True Negative (no attacks found).

Similarly, class Attack observation is False Negative (only attacks are found).

Anomaly Detection of IoT for detecting cyberattacks can be shown by Confusion

matrix for accuracy, based on the parameters of the UNSW-NB15 dataset which are

named: dur (record total duration), proto (transaction protocol), is_sm_ips_ports (if the

source IP address equals to destination IP address and source port number equals to

destination port number, the variable takes value 1 else 0) and label (0 for normal, 1 for

attack records) as shown in figure 14.



Figure 14: Representing UNSW-NB15 testing set statistics to obtain Accuracy confusion matrix

Figure 15 shows the Confusion matrix for accuracy. Actual labels are in rows and

Predicted labels are in columns. Row zero and column zero indicates the parameter label

(normal) which has a value 0. Row one and column one indicate the parameter proto

predicting the value 1734 with an error 1. Row two and column two indicate the

parameter dur predicting the value 9833. Row three and column three indicate the

parameter is_sm_ips_ports predicting the value 8526. Row four and column four indicate

the parameter category Normal predicting the value 489. The diagonal contains the

correct predictions. The accuracy observed from the above figure is 99.99514162172667.

Figure 15: Accuracy Confusion matrix



Figure 16 shows binary classification which will classify as attack and normal. The

performance of Binary classification involves data into two groups. In python, heatmap

produces a two-dimensional graphical representation of data in a matrix form that are

represented as colors. From the above figure, the x-axis represents the predicted label and

the y-axis represents the actual label.

Figure 16. Binary classification

8. Conclusion An approach to Network-based Intrusion Detection System termed Anomaly Detection of

IoT (AD-IoT) system can effectively detect various types of attacks in a fog layer in place

of a cloud layer which is based on Fog architecture over UNSW-NB15 dataset using

different supervised machine learning algorithms. The analysis of UNSW-NB15 dataset is

subject to demonstrate the model’s accuracy.

REFERENCES

[1] Ibrahim Alrashdi, Ali Alqazzaz, Esam Aloufi, Raed Alharthi, Mohamed Zohdy, Hua

Ming, “Anomaly Detection of IoT Cyberattacks in Smart City Using Machine Learning”,

978-1-7281-0554-3, 2019 IEEE.

[2] Rashmi H Roplekar and N V Buradkar, “Survey of Random Forest Based Network

Anomaly Detection Systems”, IJARCCE, Vol. 6, Issue 12, December 2017.

[3] Rifkie Primartha and Bayu Adhi Tama, “Anomaly Detection using Random Forest: A

Performance Revisited,” 978-1-5386-1449-5/17/$31.00 ©2017 IEEE. [4] Jadel Alsamiri and Khalid Alsubhi, “Internet of Things Cyber Attacks Detection using

Machine Learning,” International Journal of Advanced Computer Science and

Applications, Vol. 10, No. 12, 2019. [5] Phyu Thi Htun and Kyaw Thet Khaing, “Anomaly Intrusion Detection System using

Random Forests and k-Nearest Neighbor,”International Journal of P2P Network Trends

and Technology,” Volume 3, Issue 1, January to February 2013.



[6] Rupesh Raj Karn, Prabhakar Kudva and Ibrahim M Elfadel, “Dynamic Autoselection

and Autotuning of Machine Learning Models for Cloud Network Analytics,” 1045-9219

(c) 2018 IEEE.

[7] Nour Moustafa and Jill Slay, “Unsw-nb15: a comprehensive data set for network

intrusion detection systems”, in Military Communications and Information Systems

Conference, 2015, IEEE 2015, pp. 1-6.

[8] N. Moustafa and J. Slay, “The evaluation of network anomaly detection systems:

Statistical analysis of the unsw-nb15 data set and the comparison with the kdd99 data

set,” Information Security Journal: A Global Perspective, vol. 25, no. 1-3, pp. 18-31,

2016.

[9] Fatima Hussain, Rasheed Hussain, Syed Ali Hassan and Ekram Hossain, “Machine

Learning in IoT Security: Current Solutions and Future Challenges,” arXiv:

1904.05735v1, 14 March 2019.



anomaly detection of iot cyberattacks in smart city build ... · journal of seybold report issn no:...

Documents