auditing service for cloud computing: hive &...

University of Manchester School of Computer Science

B.Sc.(Hons) Software Engineering

Auditing Service for Cloud Computing: Hive & AuBee

Third-Year Project Report

April, 2016

Author: Chihua Tan

Supervisor: Dr. Ning Zhang

Second Marker: Dr. Aphrodite Galata

Page � of �1 42

ABSTRACT Title: Auditing Service for Cloud Computing

Author: Chihua Tan

Supervisor: Dr. Ning Zhang

Second Marker: Dr. Aphrodite Galata

Date: April 2016

People love sharing their own experience or thoughts on different areas of interest and as a result, books have been one of the typical examples for centuries. However, buying one virtual private server(VPS) for running Blog websites contains the advantages of lower cost, faster broadcasting and wider recognition. Therefore, it becomes one of the most popular options for sharing information. Any operating systems or applications have the feature of writing activity’s history in files, include VPS or web server. This project seeks to provide the functions of monitoring these log files, detecting the potential threats and visualising the data on the interface.

This report starts with the project’s background, offering a comprehensive elucidation on the development and the implementation of the project. Moreover, it also looks into the results of the project and then possible future functions along with final conclusion.

Page � of �2 42

ACKNOWLEDGE I am thankful to the following people for their contribution to the development of my project:

Dr. Ning Zhang, my project supervisor, who has provided valuable advise and patience during my project, allowing me to find the breakthrough and to overcome the obstacles experienced along the project.

Dr. Aphrodite Galata,my second marker, has also contributed to a better overall development with her constructive and pivotal feedback provided after the demonstration and presentation. Each feedback has pointed out the essential actions for better improvements.

My parents, since their endless and touching verbal support has helped me to overcome the pressure during the development of this project.

I would also like to show my gratitude to my friends, because I would waste much effort on what is unnecessary without their technical suggestions.

Lastly, I will always appreciate the School of Computer Science,run by the University of Manchester, for supplying the best facilities and environment to all the Computer Science students.

Page � of �3 42

TABLE OF CONTENT 1 INTRODUCTION 8

1.1 Background 8 1.2 “Hive & AuBee” 8 1.3 Project Aim 9 1.4 Overview 9

2 DEVELOPMENT 10 2.1 Requirements 10

2.1.1 Functional Requirements 10 2.1.2 Non-Functional Requirements 11

2.2 Data Flow Diagram 11 2.2.1 AuBee—audited side 13 2.2.1.1 DDoS 13 2.2.1.2 Cc Attack 14 2.2.1.3 Brute-force Authentication Attack 15 2.2.2 Hive — auditing side 15

2.3 Web Design 15 2.3.1 Grid Diagram 15 2.3.2 Wireframe Diagram 16

3 IMPLEMENTATION 18 3.1 Collection of log data 18

3.1.1 Web Server - Jpcap 18 3.1.2 VPS - grep 19

3.2 Detection from log data 19 3.2.1 Information Entropy - Cc Attack 19 3.2.2 Entropy Formula 20 3.2.3 Entropy Threshold 21 3.2.4 hosts.deny - Brute-Force Attack 21

3.3 Transmission of Information 23 3.3.1 XML 23 3.3.2 TLS 23 3.3.3 How does TSL works? 24

3.4 Web Visualisation 25 3.4.1 Database - MySQL 26 3.4.2 Web development by Java 27

Page � of �4 42

TABLE OF CONTENT 4 Results 29

4.1 Brute-Force Authentication Attempts 29 4.1.1 Synthetic Attempts 29 4.1.2 Intrusion Application 30

4.2 Cc attacks 31 4.2.1 Results from external sources 31 4.2.2 Results with using intrusion application 34

5 Conclusion 35 5.1 Future improvements in this project 35 5.2 Personal achievement 35

References 36 Appendix A 38 Appendix B 39 Appendix C 40 Appendix D 41 Appendix E 42

Page � of �5 42

LIST OF FIGURES Figure 2.1 The Data Flow Diagram about “Hive & AuBee” System 12 Figure 2.2 The monthly Gbps Diagram from 2009 to 2013 [8] 14 Figure 2.3 The Grid Diagram used in Web Design Development 16 Figure 3.1 Some lines of code with using Jpcap 18 Figure 3.3 Some lines of code with using “Grep” command 19 Figure 3.4 Some result of running the following command 19 “grep Apr 20 23:.*Failed.password.for.invalid.user /var/log/auth.log” 19 Figure 3.5 some examples about the number of required HTTP request20Figure 3.6 diagram for showing how entropy threshold is calculated 21 Figure 3.8 structure file of brute-force detection 22 Figure 3.9 Some output in one of XML files 23 Figure 3.10 The commands for generating the keystore and truststore24Figure 3.12 The table structure in the Hive’s database 26 Figure 3.13 The database procedure on calculating the summation. 27 Figure 3.14 The diagram displaying Spring MVC in the Hive’s web 28 Figure 3.15 Some code from mapper.xml 28 Figure 4.1 Synthetic Request to VPS in Two minutes 29 Figure 4.2 The network traffic in VPS without detection 31 Figure 4.3 The network traffic in VPS with detection 31 Figure 4.4 The entropy graph of HTTP request in SDSC web server 32 Figure 4.5 The entropy graph of HTTP request in SDSC web server 33 Figure 4.6 some HTTP request found in CSSA’s access log file 33 Figure 4.7 The interface of LOIC software 34 Figure 4.8 The output when there is an Cc attack 34

Page � of �6 42

LIST OF TABLES

LIST OF EQUATION

Table 2.1 Table of showing functional requirements 10 Table 3.1 Table that explains each table in the Hive’s database 26 Table 4.1 Table that explains the operating specification 30 Table 5.1 Table contains the expectations in the future 35

Equation 3.1 formula for calculating the probability of resources 20 Equation 3.2 formula for calculating the entropy 20 Equation 3.3 the formula used for detection of Cc attack 21

Page � of �7 42

1 INTRODUCTION

1.1 Background Since online blogging first appears in the early 1990s, it has changed our life style and the way we write and publish our thoughts for people to read them. WordPress [1], the most popular and prominent content management platform, has registered more roughly 76.5 million blogs in March 2014 and 56.6 million new posts on WordPress blogs are continuously published each month. This proves the huge demand on the blog website for posting. The demand of Virtual Private Sever(VPS) also has a phenomenal boom over past three years. DigitalOcean [2], which is the second-largest VPS provider company, has provided approximately 163,000 web-facing computers hosting in May 2015. The trend of DigitalOcean’s hosting computers keeps climbing steadily each month and there is no sign of any future decline.

Blogging is always about teaching or sharing what the writer has experienced, known and learnt. The advantages of writing those thoughts into an article are uncountable, for not only confidence boosting but also brainstorming, and that is why people are obsessed with blog posting. Furthermore, hosting up a blog website on the Internet is unquestionably affordable for everyone, since the minimum monthly price,$14.99 [3], is even lower than the daily expense of a man living in developed city.

With the intense passion for writing thoughts on area of the interest and attractive price provided by hosting company, building a blog website becomes a usual event in daily life. Therefore, security questions are risen along with the increasing amount of users. Log files could be found on the virtual private server, web server or any other application that they records the history of any events happened to the servers, and those data helps us on analysing and detecting the potential security risks. Once the security problems appear, inconveniences may arise for VPS or web server’s users because different command would be required for observing log data in command interface and multiple interfaces may also be necessary for one single screen.

1.2 “Hive & AuBee” “Hive & AuBee” is the name designed for the project. AuBee is the short-term combination from the word “audit” and “bee”. Collecting the raw data from a web server or VPS shares the similarity with the bees. The hive, at the same time, will be the home and also the place where the data is collected for the visualisation and the notification. This name is able to leave a positive impression to the application users and also, it is succinct for memorising.

Page � of �8 42

1.3 Project Aim The project aims to develop an application that monitors the VPS’s system authorisation information and the web server’s activity as well as provides an administrator panel for reviewing the log data. Another vital function of this application is to find out the malicious activities from the log data and to reject those visitors with evil intention from VPS or web server. As for application’s users, “Hive & AuBee” targets on those people who has VPS for running a blog.

1.4 Overview Chapter 1 outlines the general issues and motivation for the selected project along with project name design and project aims.

Chapter 2 describes the development of the project, explaining the requirements and structure of system with the circumstantial tables and diagrams

Chapter 3 explains the implementation of the project including the techniques that has been used in the project and the reason of choosing them.

Chapter 4 demonstrates parts of the result visually and also displays the actions that will be taken after detecting the various attacks.

Chapter 5 summarises the the project along with the potential improvements that could be done in the future.

Page � of �9 42

2 DEVELOPMENT Before proceeding with the implementation section, this chapter will outline the development of the application. Firstly, the functional and non-functional requirements will be described as they patently points out the main of the application. Following by a data flow diagram, it depicts how the data is generated, processed, shown on the screen and also the structure of system. The last part of this chapter will be the web interface diagrams in web development.

2.1 Requirements Requirements gathering is the first and most critical step of the development process as the elaborate set of requirements illustrates the end-user’s needs [4] or what the system is supposed to do, which help programmers to save time.

2.1.1 Functional Requirements The functional requirements of the application are a thorough description of the facility required. In comparison to the usual and bare headline list of functional requirements, the usage of MoSCoW technique is clearer and more straightforward for certain users, such as project managers, developers or stakeholders, to entirely understand the requirements. MoSCoW technique uses ranking in order to sort out those requirements into four sections [5], and therefore readers can understand the most important requirements, in what order to develop them, and what not to deliver if there are limited circumstances.

Table 2.1 Table of showing functional requirements

Must 1. The system must capture system administrator access log from VPS 2. The system must capture access log from web server 3. The system must detect the hostile visitor who attempts to connect VPS with

brute-force logins 4. The system must detect one of the DDoS attacks to the web server 5. The system must refuse automatically the hostile visitor to get resources

Should 1. The system should create notification about detail of the malicious events 2. The system should inform the application’s administrators about the malicious

events 3. The system should use the encrypted connection for passing data 4. The system should display the log data and other information on web panel

Could 1. The system could collect data distributively from multiple VPS and web servers 2. The system could be accessible anywhere on web 3. The system could named specified IP address as hostile visitor on web 4. The system could cancel IP address from hostile list

Would Not 1. The system would not have detected all DDoS attacks 2. The system would not have mobile application

Page � of �10 42

2.1.2 Non-Functional Requirements From Hive & AuBee, non-functional requirements can be categorised into the following five categories: usability, reliability, maintainability, performance and supportability.

Usability: The web panel should have the aesthetic appearance that provides comfortable feeling while users interact with the web. Words in the navigation bar or the icons should be displayed using suitable size and colour so as to effectively reduces user’s eyestrain and attract user’s attention [6].

Reliability: The system should have reliable and secured database for storing the sensitive information. For example, no other connections should be allowed to the database except those from “Hive” server.

Maintainability: The system should be as low coupling and high cohesive as possible, which can help to reduce the possible causes of faults. Therefore, it could also boost up the efficiency of debugging. Performance: The system should provide the web panel with the minimum response time, which would directly affect the user experience.

Supportability: The web panel should be supported in all types of modern browsers while suitable mobile-size web page should be available.

2.2 Data Flow Diagram After deciding on the requirements of the system,drafting a data flow diagram is the next essential step for keeping myself on track and solving the design problems systematically and logically. Namely, it delivers a visual representation of the flow of information with the system [7] with a direct description on what happens to the data. The data flow diagram has three major components: entities (Person,Users, Servers), processes (The actions to the data) and data storing place. The lines linking between these components are the data or information.

Page � of �11 42

Figure 2.1 The Data Flow Diagram about “Hive & AuBee” System The data flow diagram is divided into two main regions: “Hive” and “AuBee”.

Page � of �12 42

2.2.1 AuBee—audited side The section named as “AuBee”, concentrate on the collection of the history logs for VPS’s authentication and Web Server’s activity and generating notifications for malicious events. The history log for web server’s activity includes the HTTP requests made by the client and the HTTP responses created by the web server. (Refer to Appendix A for a visual demonstration). As for the VPS’s authentication information, all login connection attempts to the server will be recorded by the operating system. (see Appendix B) Numerous types of log formats vary depending on the application’s programmer or operating system. Therefore, standardising the log data from VPS or web servers into the desirable format is not only necessary for a later analysing phase, but also commodious for the data retrieval. Regular Expression is one of the best options for this task because it gives platform portability and high efficiency on speed for finding the matched pattern in a sequence of words.

As for the alerting to the system’s users, the warning information could be received after processing the analysis phase and it will send to user by using java’s email technique. Also, it is able to visualise to user by storing into the Hive’s database.

Since sensitive information might be involved on the process, data protection strategies should be applied during the information transmission. Examples of this are TSL protocols, which provided elaborated protection processes with the aim to guarantee data’s privacy and integrity.

During the data analysis phase in the AuBee, alert notifications will be generated if there is any suspicious activity. The detection of Challenge Collapse attack(Cc attack) to web server and brute-force authentication attempt to VPS would be developed in this phase. Next, it will discuss from a general point of view the background of the suspicious activity detection process.

2.2.1.1 DDoS DDoS is the abbreviation for Distributed Denial of Service. It refers to an attack of aiming a single target, such as a web server, with several compromised systems and causing the denial of service to the users of the targeted system. The dangerous level of DDoS could be described as enormous, as demonstrated by the following graph depicting the average monthly attack size of DDoS from 2009 to 2013 in Mbps and Gbps [8].

Page � of �13 42

Figure 2.2 The monthly Gbps Diagram from 2009 to 2013 [8] Apart from a significant slump from September of 2010 until January of 2012, the trend has continuously risen, especially after an unprecedented ramp in 2013. There are two main types of DDoS attacks in open system interconnection (OSI) layers: they are network (layer 3) - Transport (layer 4) and application (layer 7). Layer 3 and 4 in DDoS attack usually focus on overwhelming, denying or consuming the resources of the targeted system until it goes offline by using a TCP/UDP protocol. As for the application layer, it has more difficulty on detecting the attackers as they acts as legalistic users to overload website elements.

2.2.1.2 Cc Attack Challenge Collapse attack is one of the typical application-layer attacks. The victim system is attacked through multiple proxy servers controlled by the attacker by receiving the flood of HTTP request until the saturation of the system. With the similar reason of imitating regular user’s behaviours on requesting with various IP addresses, it increases the difficulty to spot out the anomaly. Perhaps, there is still one possible way for detecting Cc attack with capturing the HTTP request and evaluating the information entropy.

Page � of �14 42

2.2.1.3 Brute-force Authentication Attack Brute-force attack, also referred to as password attack, forces an entry by attempting a series of passwords, including frequently used letters or numbers, through exhaustive effort, instead of decrypting the information or cipher. Except having the list of passwords, key combination could be also applied on this attack,such as four-digits password on smartphones. In other words, the time taken to accomplish the task depends on the lengths of list, the strength of the encryption, the information known by the attacker and the computing power of the attacker’s device. The history of utilising brute-force attack to VPS has been more than 15 years [9], but it can not be rebated that it is still immensely occurring at the present time.

Placing the ip addresses in “hosts.deny” file has the function of refusing any connections from those ip addresses. Further description will be given in the implementation phase.

2.2.2 Hive — auditing side The other side of diagram illustrates the visualisation of data on web panels. After remotely receiving the data from the AuBee in a SSL channel, data from VPS’s administration and web server’s activity can be directly stored into a database. In this case, the web panel is only required to communicate with the Hive’s database for any visualisation of the log data.

2.3 Web Design Before getting on the implementation of web panels, the design of web appearance in two dimensional help me ensure an progressing efficient and effective progress. It is one essential phase of the prevalent web design processes and getting familiar with the designing process can be of spectacular support for future development.

2.3.1 Grid Diagram Grid diagram can be defined as “a structure comprising a series of horizontal and vertical lines” [10] and its core functionality is to provide a reasonable arrangement of the content. Moreover, the user experience could be enhanced because the solid base supplied by grid diagrams gives the users a more comfortable experience for reading and understanding the variety of elements.

Page � of �15 42

Figure 2.3 The Grid Diagram used in Web Design Development

2.3.2 Wireframe Diagram Wire frame diagram is a schematic definition of the information hierarchy in web appearance, like an architectural blueprint [11]. For example, the specific items or features, such as navigation bar or search boxes, can be represented literally on a coherent diagram and it acts as excellent guide for the implementation process.

Figure 2.4 The Wireframe Diagram used in Web Design Development

Page � of �16 42

Three main sections, such as “Header”, “Content” and “Footer”, diverged from the above diagram. A vertical navigation bar has been used instead of a horizontal one as the conventional approach despite the fact that each design has its own the advantages and disadvantages. The horizontal navigation benefits on usability because new menu items could be easily added in drop-downs without affecting the others, which could be totally different in the vertical navigation. The vertical navigation is rather flexible in terms of the menu item space, for example, without changing the size of each item container, it is unlikely to add additional menu items to the full horizontal navigation. In order to maintain the possibility of adding the extra items in the future, I have chosen on the vertical navigation. As for the presenting of data, tables are adapted for its ability to display the detailed information and line graph is also chosen for the function to show the activity information with the timeline. Both of these designs aims at upgrading the user experience and providing the readable information.

Page � of �17 42

3 IMPLEMENTATION

This chapter will introduce of technology and tools used in the project and the reasons to use these items. Furthermore, some of the challenges met during the implementation stage would be also illustrated.

3.1 Collection of log data This section shows how “Hive & AuBee” gathers the log data from web servers and VPS.

3.1.1 Web Server - Jpcap

Web servers have the ability to record server activities into log files, so applications can read through the log files in order to obtain data. However, the file would be updated on the highest layer of OSI model, which is the application layer, and the optimal case is to capture the fresh data in the datalink layer. This collection would be quicker for detection protection and also the cohesion between web server and auditing application decreases.

Jpcap offers several high-level interfaces class used for the encapsulation of the low level works in libpcap library and the core work of libpcap is to sniff the data packet in the datalink layer. Therefore, using Jpcap allow programmers to place higher concentration on developing the applications. In the Jpcap, it firstly requires the specific network interface on the monitored machine and the customised port or protocol that would be audited. Afterward, the JpcapCaptor.openDevice() method is used for opening the network in ter face and return an instance. JpcapCaptor.processPacket is the loop method to grab any packets before using the callback method. Finally, Invoker is a class that has implemented the PacketReceiver interface for callback method.

Figure 3.1 Some lines of code with using Jpcap

Page � of �18 42

3.1.2 VPS - grep In Linux system, the VPS contains a discrete file called auth.log placed in /var/log directory, which registers all authentication attempts to the VPS. For viewing this massive file, “grep” is one of the best performance command line utilities [12] and the usage of regular expression provides flexibility and high-speed performance on searching text. “grep” command generally needs two parameters, which are string pattern (regular expression) and the file to search. Therefore, all the log information can be easily found by changing the regular expression.

Figure 3.3 Some lines of code with using “Grep” command

Furthermore, Java offers Runtime.exec() methods for creating a native process, such as running the above process, and returning an instance,which can be used to obtain the information. After that, the achieved log could then be sent to the Hive via the encrypted connection.

Figure 3.4 Some result of running the following command “grep Apr 20 23:.*Failed.password.for.invalid.user /var/log/auth.log”

3.2 Detection from log data This section would illustrate the techniques used to analyse the log data from web server and VPS for detection of attacks. Moreover, it will also explain which actions need to be taken if there is any anomaly.

3.2.1 Information Entropy - Cc Attack In our everyday lives, information is always a measure of the decrease of uncertainty for a receiver [13]. For example, I have a first-time appointment with my supervisor at her office, but I realised that I have forgotten the specific room number and floor number. There are three floors and 20 rooms in each floor. By reporting my supervisor’s name to one student, he gives me the information about correct floor number and that reduces the possibility from 3 to 1. Owning the given information has reduced the uncertainty. Entropy is used to represent the amount of information the experimenter lacks prior to learning the outcome of a probabilistic process [14]. The value of entropy drops when obtaining more information and also goes up in the opposite situation.

Page � of �19 42

In the domain of Computing, data or numbers could be unpredictable and entropy is the measure as the randomness of numbers [15]. In fact, browsing the web pages is a random activity and the entropy would be relatively steady within normal circumstances. However, the sudden increment in visiting one web page could reduce in the randomness and result in much lower entropy. Therefore, HTTP requests from clients could be used to calculate the entropy and the occurrence of a Cc attack will be known if the entropy is beyond entropy threshold.

3.2.2 Entropy Formula Before setting up the entropy threshold as a base, the section will introduce the source and entropy formula. In Cc attack described in 2.2.1.2 section, the various systems and attacker would request one identical resource until the server collapses. Filtering the required resource out of each HTTP request and recording the number could be the input for calculating the entropy.

Figure 3.5 some examples about the number of required HTTP request

Equation 3.1 formula for calculating the probability of resources

Firstly, the probability,P, of required resource’s outcome will be computed from the total amount of HTTP requests divided by the occurrence of HTTP requests.

Equation 3.2 formula for calculating the entropy

The entropy is achieved by calculating the negative of the summation of each probability timing with the logarithm of the probability to based 2. Log base 2 is frequently used by convention, especially in computing.

Page � of �20 42

3.2.3 Entropy Threshold

Figure 3.6 diagram for showing how entropy threshold is calculated

In the training phase, the total size, W, of sampled log data is divided into equivalent sizes, S. Then, the entropy of each block will be added together for the average entropy,Haver(X). The maximum amplitude within these entropies,Amp, becomes the entropy threshold and also as trigger point. The detector freely customises S, but the wider gap of S also easily breaks the randomness of entropy.

Equation 3.3 the formula used for detection of Cc attack

During the test phase, the HTTP request are consecutively gathers and accumulate till the same amount of S for the new entropy,H(X). For next, a Cc attack is existed if the result stands outside the range of entropy threshold, Amp. Finally, a warning message about Cc attack is invoked and it is sent to the Hive via encrypted connection along with the log data.

3.2.4 hosts.deny - Brute-Force Attack “grep” command has usability and practicality on searching the lines with matched string pattern. The log-based detection of brute-force attack basically runs this command every minute and makes counting on the number of suspicious attempts, like invalid user and wrong password. The threshold of authentication, such as the consecutive failures password or invalid username, is enabled to be configured by the detecter in any time.

Page � of �21 42

Figure 3.7 The code demonstrating the threshold of the Brute-Force detection

The following diagram outlines the general relationship between several files. The files in the Data/ directory are the record files containing the past failures attempts of each specific ip address and also the successful attempts. The benefits of this structure is to count the continuos attempts in various “grep” command. As for the dotted line is meaning the dependency of the invalidUser and invalidPassword file to the validIp file. In this case, one benign attempt during the malicious counting is able to erase the history of the poor records in those two files.

Figure 3.8 structure file of brute-force detection

If the suspicion of brute-force attack arises, the IP address of the client and notification would be sent to the Hive via encrypted connection along with authentication history. Moreover, the IP address would be placed in /etc/hosts.deny file and would being checked by TCP wrapper. It controls access to network service, such as SSH, telnet or FTP, by hostname and IP addresses. In the other word, TCP wrapper acts like a guard in front of TCP-based network service and the connections would be denied if the IP addresses are on the /etc/hosts.deny file.

Page � of �22 42

3.3 Transmission of Information This section will discuss how the log data and notifications are packed for effective transmission and provide with further information about the encrypted connections created by TSL protocol.

3.3.1 XML

Figure 3.9 Some output in one of XML files

For achieving goal of transmission the data via network, XML has been extensively used by many programmers. It is a simple text-based format for facilitating common data access in the forms of documents, data, configuration and other more. It is a relatively straightforward skill to be learnt and the stored XML file is readable for debugging or manual editing. Java has supplied rich multiples libraries for parsing, modification, retrieving XML. Those core two classes provided by Java are the XMLEncoder and XMLDecoder for the serialisation from object to XML and versa vice.

3.3.2 TLS The following section will discuss the security in data transmission. Java has provided a variety of security protocols, which could be used for creating a secured connection between two points, and to communicate in an encrypted format like TLS. TLS is the abbreviation for Transport Layer Security and also its predecessor is SSL3.0. The current version is TSLv2 and it is one of the most secured algorithms until the date. The core function is to provide privacy and data integrity for the messages exchanged between two parties via network. It also allows for mutual communication,which means that each party is required to provide its own identity to the other before the communication. Based on the structure of “Hive & AuBee” system, which aims to secure sensitive information, it requires the identity of the AuBee and the Hive to be confirmed before the transmission of data.

Page � of �23 42

Figure 3.10 The commands for generating the keystore and truststore

For processing the mutual authentication, the cipher keys and certificate should be created. “Keytool” a default command installed in almost every operating system can be used for this purpose. “Keytool” firstly generates the public-private key and then places it in a keystore file. Afterwards, the certificates should be signed and be attached to the public key. The next step is to place the certificate on a truststore file from the opposite party.

3.3.3 How does TSL works? TSL process includes the following basic phases: 1. Negotiation between each other regarding support algorithms 2. Key exchange for authentication on both sides 3. Symmetric cipher encryption for communication

In the first place,the handshake starts when a client connects to a TLS enabled server requesting a secure connection, presents a list of supported ciphers and hash functions. Then, the server communicates the client the decision concerning the chosen cipher and hash function. Instantly, the server sends back its identification in the form of a digital certificate and the comparison between the incoming certificate and stored certificate takes place for confirming the origin authentication. In order to generate the session keys used for the secure connection, the client encrypts a random number with the server’s public key, and sends the result to the server. The result could be only decrypted using the private key from the server, therefore the server gets the random number and then generate key material for encryption and decryption.

Page � of �24 42

Figure 3.11 The diagram showing how TSL protocol works

From the above explaining, the appropriate cipher suit is able to be chosen based on different situations and the choice of symmetric cipher encryption will directly affect on the speed of encryption and decryption of data. Due to the reason to have chances for transmitting high amount of data, a high performance on speed is necessary. There are three options for symmetric cypher in the supported cipher suits: DES, 3DES and AES. AES has more options of choosing higher block size for optimising the security than DES and also has high performance on processing data comparing to 3DES. Therefore, I would like to choose AES

3.4 Web Visualisation This chapter demonstrates the structure of the database, such as the tables and procedures, and the techniques used in the web implementation.

Page � of �25 42

3.4.1 Database - MySQL

Figure 3.12 The table structure in the Hive’s database

The above diagram displays all the tables exited in the database of the Hive. Following this, a brief description of each table would be provided:

Table 3.1 Table that explains each table in the Hive’s database

TABLE EXPLANATION

t_request A table for storing the HTTP requests received from clients to web server

t_response A table for storing the HTTP responses made by the web server

t_login_host

A table for saving the information about the authentication attempts to the VPS

t_report A table for saving the number of HTTP requests, responses and attempts log in an hour

t_exinfo A table for recording the notification of Cc attacks or brute-force attacks

t_host A table for synchronising the /etc/hosts.deny in VPS

t_user A table for keeping the user’s account information to access “Hive & AuBee” system

Page � of �26 42

For the scenario of summating the log data in every certain timing, procedure provided by MySQL has its substantial advantages. For example, the isolation of business rule and testing, which are independent from the application [16], leads to low coupling and high cohesion. The following procedure has also similar for recording the number of log data every hour and its obtained number is being stored in database for future trend diagram.

Figure 3.13 The database procedure on calculating the summation.

3.4.2 Web development by Java For the development of this project, I have decided to use Java as the main programming language for web development, due to the different reasons explained below. In the present, the popularity of using Java in web development or the robustness of the frameworks that aid the web development has become the key motivation to people on choose them. Moreover, Java has been used more frequently in the past and discovering new areas about web application will be also valuable experience for future career development. At this time, two core frameworks I have used for accomplishing the dynamic web are Spring model-view-controller(MVC) and MyBatis.

The basic version of Spring is a light-weighted framework with about 1MB [17], but deep description could be given five to seven pages and so only few points will be highlighted. One of its core advantages is the dependency injection,which is a design pattern that helps on separating the application’s dependencies between code and configuration. For example, if there is a requirement on retrieving data from database, constructing object for connection and waiting for the resource are usual processes. Spring could use XML configuration file or annotations for storing the database’s access data and allow the programmer to focus on business logic.

As for Spring MVC framework, it is designed following a structure in which each separate module has clear functionality and responsibility.

Page � of �27 42

Figure 3.14 The diagram displaying Spring MVC in the Hive’s web From this following diagram, each name in the modules is the package name exited in web application of “Hive & AuBee” System. 1: The client sends a request to web application. 2: This controller intercepts this request and then calls the appropriate methods from service object. 3: After the service object is called, the request will be dispatched to the mapper object which has the configuration file of database. 4. With the arrival of data, it could be placed into the predefined model. 5. The controller will then deliver the model to view section for rendering the web page before sending back to client.

With the help of MyBatis Spring framework, only a few lines of code are only needed and distinct difference is seen in comparison to the same goal achieved in JDBC. The reason also comes from the simple XML and annotation. The photo of the web interface has been placed in Appendix C.

Figure 3.15 Some code from mapper.xml

Page � of �28 42

4 Results

4.1 Brute-Force Authentication Attempts Based on the reason to the log-based detection of brute-force authentication attempts, the threshold of the control could be manually configured by the administrator. In the other word, it is not a heuristic detection, but a straightforward rule-based detection.

4.1.1 Synthetic Attempts Two ways to be demonstrating of verifying its destination. One method is to input the synthetic authentication request into the VPS for checking the actions it has taken while its number of attempts is over the threshold.

Figure 4.1 Synthetic Request to VPS in Two minutes

As the working method explained in the implementation phase, the answer from each cycle of calling the command “grep” would be collects and also be recorded into files. Therefore, it plainly shows that the number of consecutively failures

Page � of �29 42

attempts is the crucial factor of being into /etc/hosts.deny file. In Appendix D, there would be some reference photos for displaying invalidIp file, log file generated by detection and warning email received by detection.

4.1.2 Intrusion Application TCP wrapper, as explained in 3.2.4, is using /etc/hosts.deny file as configuration for deciding whether the request is authorised with the access. In the another term, the TCP connection, network traffic or number of connected ports would be gradually boosted up with an offline brute-force detection during an attack. Therefore, the volume of network traffic between offline and online detection while the brute-force attack is happening could be one of the determining features for proving the success of the detection.

The intrusion tool I have used for testing is called THC-Hydra [18] and it provides vast range of service coverage, SSH,AFP,IMAP,MySQL and so on, along with graphic user interface. More importantly, a distinguished success on the speed comparison of SSH and FTP protocols between Medusa and Ncrack tool. Hence, it is also great tools for assessing the performance of the detection.

The real-time network traffic are measured and being drawn on the terminal by the application called Speedometer [19]. Colourful and dynamic interface is the priority factor on deciding to use this tool.

Lastly, the following table displays the specification of aggressive and affected system.

Table 4.1 Table that explains the operating specification

Aggressive SystemOperating

SystemProcessor RAM Network Speed

OSX 10.10.5 2.7 GHz Intel Core i5 8GB DDR3 Download: 129.93 Mbit/s Upload: 83.36 Mbit/s

Affected System (VPS)Operating

SystemProcessor RAM Network Speed

Ubuntu 12.04 x64 2.4GHz GenuineIntel 4GB Download: 917.58 Mbit/s Upload: 446.90 Mbit/s

Page � of �30 42

Figure 4.2 The network traffic in VPS without detection

Figure 4.3 The network traffic in VPS with detection

The above graphs captured from affected system,VPS, displays overall 1.5KiB/s (kilobits per second) on receiving of the network traffic. A sudden escalation appears after starting the engine of intrusion tool. In the figure 4.3, the escalation has not stopped till the attack has finished, but the wave in figure 4.4 has gone back to the ordinary level due to the effort of detection and TCP wrapper. One minute interval is set for finding the anomalies and that is also why the wave has not immediately dropped it.

4.2 Cc attacks As explained in the section 3.2.1, the detection of Cc attack is based on calculating the entropy of certain amount of HTTP requests and comparing the preset amplitude value of entropy. The prerequisite is to utilise the appropriate amplitude as based value and datasets of HTTP requests found online could be helpful.

4.2.1 Results from external sources For example, The Lawrence Berkeley National Laboratory(LBNL), which is United States national laboratory located in the Berkeley Hills near Berkeley, offers a site and it simply a moderated repository to support widespread access to traces of Internet network traffic[20]. two files of one day HTTP requests and one file of two

Page � of �31 42

weeks HTTP requests are being used for my experiments. They are respectively from the web server of San Diego Supercomputer Centre, Research Triangle Park and ClarkNet company. SDSC’s entropy graph demonstrates the entropy value against the round number it has taken in the log file and the rest entropy graphs are able to be found in Appendix E.

Figure 4.4 The entropy graph of HTTP request in SDSC web server

From the graph shown above or the graphs in Appendix E, the entropy values are stabilised in the overall range of 5 to 7 degrees and it is regarded as conventional event. It is caused due to the limited scale probability of HTTP resource within one web server.

Another file,which is from Chinese Student Scholar Association HTTP log, is being tested and it was reported that it had currently the slow performance and the suspicion of being hacked. Due to the log configuration of its web server, HTTP traces in the whole month of Feb is only available for examining.

Page � of �32 42

Figure 4.5 The entropy graph of HTTP request in SDSC web server

The gradually downturn at the area of 200 round number could be meaning as the swelling popularity of the website, for example it was Chinese New Year Festival around that period of time. As for the plunges seen in the graphs, it delivers the messages of being under Cc attack. The following photo is some of the HTTP requests during those plunges. All in conclusion, the method of using entropy is productively useful.

Figure 4.6 some HTTP request found in CSSA’s access log file

Page � of �33 42

4.2.2 Results with using intrusion application There are variety of DDoS attack tools published online and an awareness of understanding their characteristics or ways of attacking extends the knowledge of defence [21] or uses it as experiment tool. In this time, I have used Low Orbit Ion Cannon(LOIC) tool for triggering the Cc attack. It has the advantages of providing the DDoS attacks under TCP, UPD or HTTP protocol and also offering the user-friendly application interface.

The preset threshold for entropy I deployed is from the log trace of SDSC web server due to insufficient log traces I can use in my personal web server.

Figure 4.7 The interface of LOIC software

Once the LOIC tool starts the flood of HTTP requests to my web server, the AuBee would receives the HTTP requests till the certain amount and then the detection application starts process on calculating the entropy of those requests.The following photos are the entropy result of those requests and actions take after realising the Cc attack.

Figure 4.8 The output when there is an Cc attack

Page � of �34 42

5 Conclusion

5.1 Future improvements in this project The improvements in this project could be described as profusion, but it is also said that my satisfaction has never reached to the end and motivation of achieving better outcome keeps growing larger and larger. Like the quote said by Bruce Lee, who is the founder of Jeet Kune Do martial arts system, “Be happy, but never satisfied [22].” The following points listed are my expectations to be done in the future:

Table 5.1 Table contains the expectations in the future

5.2 Personal achievement Throughout the project, I have been gained more than anticipated and also uncovers many more than unexpected. For example, the extensive understanding from TCP/UPD protocols to SSH/HTTP protocols, the vigilance on designing the system to implementing them and the exciting journey on learning new techniques on constructing the web interface. All in one sentence, I am delighted to the chance of having this project and accomplishing it.

1 available to collect authentication or web traces in various platforms or operating system

2 available to shorten the detection slot in brute-force authentication detection

3 available to use more features for examining the Cc attack

4 available to provide more interactive functions on web interface

5 available to provide mobile application

Page � of �35 42

References [1]Smith, C. (2016). 25 Amazing WordPress Statistics. Available: http://expandedramblings.com/index.php/wordpress-statistics/. Last accessed 15th April 2016.

[2]Mutton, P. (2015). DigitalOcean becomes the second largest hosting company in the world. Available: http://news.netcraft.com/archives/2015/05/01/digitalocean-becomes-the-second-largest-hosting-company-in-the-world.html. Last accessed 15th April 2016.

[3]Francher, P. (2016). 2016's Best "VPS" Hosting Reviews. Available: http://www.hostingadvice.com/reviews/vps/. Last accessed 15th April 2016 .

[4]maarga,P. (2016). Why requirements gathering is the most important part in a Notes Domino Project?. Available: http://maargasystems.com/why-requirements-gathering-is-the-most-important-part-in-a-notes-domino-project-a-project-leads-viewpoint/. Last accessed 17th April 2016.

[5]Haughey,D.(2014). MOSCOW method. Available: https://www.projectsmart.co.uk/moscow-method.php. Last accessed 17th April 2016.

[6]Cannon, T. (2012). An Introduction to Color Theory for Web Designers. Available: http://webdesign.tutsplus.com/articles/an-introduction-to-color-theory-for-web-designers--webdesign-1437. Last accessed 19th April 2016.

[7]visaul,P. (2015). Data Flow Diagram with Example. Available: https://www.visual-paradigm.com/tutorials/data-flow-diagram-example-food-ordering-system.jsp. Last accessed 19th April 2016.

[8]Paganini,P. (2013). DDoS Attacks:A serious unstoppable menace for IT security communities. Available: http://thehackernews.com/2013/10/ddos-attacks-serious-unstoppable-menace.html. Last accessed 21th April 2016.

[9]Cid,D. (2013). SSH Brute Force- The 10 Year Old Attack That Still Persists. Available: https://blog.sucuri.net/2013/07/ssh-brute-force-the-10-year-old-attack-that-still-persists.html. Last accessed 19th April 2016.

[10]Shi l lcock,R. (2013). All About Grid Systems. Avai lable: http:/ /webdesign.tutsplus.com/articles/all-about-grid-systems--webdesign-14471. Last accessed 30th April 2016.

[11]Lim, W. (2012). A Beginner's Guide to Wireframing. Available: http://webdesign. tu tsp lus.com/ar t ic les /a-beginners-gu ide- to-wi re f raming--webdesign-7399. Last accessed 19th April 2016.

Page � of �36 42

[12]Williams, F. (2010). Grep command in Linux explained. Available: http://www.techradar.com/news/software/operating-systems/grep-command-in-linux-explained-699455. Last accessed 25th April 2016.

[13]Schneider,T. (1997). Information Is Not Entropy. Available: https://schneider.ncifcrf.gov/information.is.not.uncertainty.html. Last accessed 25th April 2016.

[14]Dolors. (2013). What is Information?. Available: http://crackingthenutshell.com/what-is-information-part-2a-information-theory/. Last accessed 25th April 2016.

[15]Sullivan,N. (2013). Ensuring Randomness with Linux's random number generator. Available: https://blog.cloudflare.com/ensuring-randomness-with-linuxs-random-number-generator/. Last accessed 25th April 2016.

[16]Hambrick, PJ. (2013). Advantages and Drawbacks of Using Stored Procedures for Processing Data. Available: http://www.seguetech.com/blog/06/04/Advantage-drawbacks-stored-procedures-processing-data. Last accessed 30th April 2016.

[17]springTutorial. (2014). What is Spring anyway?. Available: https://springvideotutorials.wordpress.com. Last accessed 30th April 2016.

[18]hydra. (2012). hydra introduction. Available: https://www.thc.org/thc-hydra/network_password_cracker_comparison.html. Last accessed 30th April 2016.

[19]Ward,I. (2016). speedmeter 2.8. Available: https://excess.org/speedometer/. Last accessed 30th April 2016.

[20]LBNL. (2000). The internet traffic archive. Available: http://ita.ee.lbl.gov/index.html. Last accessed 30th April 2016.

[21]Xu,Z. (2015). Evolution of DDoS Attack Tool. Available: https://nsfocusblog.com/2015/08/05/evolution-of-ddos-attack-tools/. Last accessed 30th April 2016.

[22]Lee,B. (2013). Bruce Lee Quotes. Available: http://www.goodreads.com/quotes/19527-be-happy-but-never-satisfied. Last accessed 30th April 2016.

Page � of �37 42

Appendix A

Web server generates the access log when the client sends request to the web server. The following is the description of the structure:

• The IP address of the client who sends the request

• The time the web server receiving the request

• The GET/POST actions of request •The version of HTTP protocol • The requested resource with detailed directory,

sometimes also with parameters.

•The status code about the request. such as, 200 is ok , 404 is unknown resource, •The size of the response back to the client in bytes.

Page � of �38 42

Appendix B

The authentication log is generated by operating system in VPS when the client sends login request to the VPS and the following is the description of the structure:

•The time the system receiving the request

•The fixed user name of the system created by VPS

•The protocol used in the request •The port number used for the request

• The content of the request

Page � of �39 42

Appendix C

Tomcat Page displaying the HTTP request and response and also the time-graph

IP List displaying the /etc/hosts.deny file in audited VPS

Page � of �40 42

Appendix D

invalidPassword file for keeping record the past failures attempts

validIp file for keeping record the past successful attempts

log file for the status of the detection

Email received from the detection while the brute-attack attack occurs

Page � of �41 42

Appendix E

log file for the status of the detection

log file for the status of the detectionPage � of �42 42

auditing service for cloud computing: hive &...

Documents