big data (security issue)

26
8/29/2015 SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING Seminar Advance Topics One Submitted By Md.Mehedi Hassan 1/26 Supervisor Sajjad Waheed Associate Professor Dept. of ICT,MBSTU

Upload: mehedi-hassan

Post on 13-Apr-2017

348 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Big Data (security Issue)

8/29/2015

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTINGSeminar Advance Topics One

Submitted By Md.Mehedi Hassan

1/26

SupervisorSajjad WaheedAssociate ProfessorDept. of ICT,MBSTU

Page 2: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Outline Introduction Big Data Why Big Data Cloud Computing How Big Data is Related with Cloud Computing Why Choose Big Data as a Thesis Topic Introduction to Hadoop

MapReduce Hadoop Distributed File System(HDFS)

Application Advantages of Big Data Alternative of Big Data Security Issue of Big Data Motivation and Related Work Issues and Challenges The Proposed Approaches Conclusions

2/26

Page 3: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Introduction

To analyze complex data and to identify patterns it is very important to securely store, manage and share large amounts of complex data (big data).

Big data applications are a great benefit to organizations, business, companies and many large scale and small scale industries.

Cloud resources are needed to support big data storage and projects, and big data is a huge business case for moving to cloud

The main focus is on security issues in cloud computing that are associated with big data.

3/26

Page 4: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Big Data

Big Data is the word used to describe massive volumes of structured and unstructured data that are so large that it is very difficult to process this data using traditional databases and software technologies.

Big Data Source :

4/26

Page 5: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Big Data Volume Many factors contribute towards increasing Volume

storing transaction, live streaming and data collected from sensors etc

Variety Structured: Relational data. Semi Structured: XML data.

Unstructured: Word, PDF, Text, Media Logs

Velocity Big Data Velocity deals with the

pace at which data flows in from sources and human interaction

5/26

Page 6: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Why Big Data

Speed, Capacity and Scalability of Cloud Storage End Users Can Visualize Data Manage Data Better Company Can Find New Business Opportunities Data Analysis Methods, Capabilities will Evolve

6/26

Page 7: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Cloud Computing

Cloud Computing is a technology which depends on sharing of computing resources than having local servers or personal devices to handle the applications.

In Cloud Computing, the word “Cloud” means “The Internet”, so Cloud Computing means a type of computing in which services are delivered through the Internet.

7/26

Page 8: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

How Big Data is Related with Cloud Computing

Cloud computing is a powerful technology to perform massive-scale and complex computing.

It eliminates the need to maintain expensive computing hardware, dedicated space, and software

Big Data need large on-demand compute power and distributed storage to crunch the 3V data problem and Cloud seamlessly provides this elastic on-demand

8/26

Page 9: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Why Choose Big Data as a Thesis Topic

As a software developer I have handle large volume of data for banking transaction.

Already observed for time consume to execute data for a particular select statement or analytical SQL

System is very slow when all branch are parallel processing. This problem over come using Big Data concept Already use Facebook,Goole,IBM etc. Open source (Hadoop) In this case I choose Big Data Topic

9/26

Page 10: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Introduction to Hadoop

10/26

Hadoop : Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models

Doug Cutting son’s toy Hadoop Architecture Two major layers

Processing layer :MapReduce

Storage layer :Hadoop Distributed File System

MapReduce(Distributed Computation)

HDFS(Distributed Storage)

YARN Framework Common Utilities

Page 11: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Introduction to Hadoop (cont.)

How Hadoop works Core tasks across a cluster of computers Data dividing into directories and files Files are then distributed across various cluster nodes HDFS, supervises the processing. Blocks are replicated. Performing sort that takes place between the map and reduce stages. Sending the sorted data to a certain computer.

Advantages Low-cost alternative to build bigger servers Fault-tolerance and high availability. Dynamic clustering Automatic data distribution and open source

11/26

Page 12: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

MapReduce What is MapReduce : A processing technique and a program model for

distributed computing based on java. Mapper Shuffle Reducer Java based Key Value

12/26

Page 13: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

MapReduce (cont.)

13/26

Page 14: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

MapReduce Example

14/26

Page 15: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Hadoop Distributed File System(HDFS)

The HDFS is a distributed, scalable, and portable file-system written in Java for the Hadoop framework

Features Distributed storage and processing Name Node Data Node Interface in Hadoop Streaming access Cluster status check

15/26

Page 16: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Hadoop Distributed File System(cont.)

16/26

Name Node Meta data(Name, replica…)/home/foo/data, 3…

Client

BlocksReplication

Write

Meta data Ops

Rea

d

Block Ops

D a t a n o d e s D a t a n o d e s

Rack 1 Rack 2

Page 17: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Application

17/26

Homeland Security

Smarter Healthcar

eMulti-

channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics

Search Quality

Page 18: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Advantages of Big Data

Cost reduction Faster, better decision making New products and services Perform risk analysis

18/26

Page 19: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Alternative of Big Data

Apache Spark (Less security than Hadoop) Cluster Map Reduce(Slow and less security than Hadoop)

19/26

Page 20: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Issue and Challenge Network level

Distributed Nodes Distributed Data Internodes Communication

Authentication level Data Protection Administrative Rights for Nodes Authentication of Applications and Nodes Logging

Data level Confidentiality Integrity Availability

Generic types Traditional Security Tools Use of Different Technologies

20/26

Page 21: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

The Proposed Approaches

File Encryption Network Encryption Logging Software Format and Node Maintenance Nodes Authentication Rigorous System Testing of Map Reduce Jobs Honeypot Nodes Layered Framework for Assuring Cloud Third Party Secure Data Publication to Cloud Access Control

21/26

Page 22: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Conclusions I have highlighted the main advantages and application of Big data with

cloud computing . Summarized security issues associated with big data in cloud computing . Propose cloud environments can be secured for complex business

operations. Propose approaches for Big Data security

22/26

Page 23: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Future Works

To Implement data chaptering algorithm with data security Data flow Hadoop to Cloud with confidential security

23/26

Page 24: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

Q & A24/26

Page 25: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015 25/26

Page 26: Big Data (security Issue)

SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015

References Ren, Yulong, and Wen Tang. "A SERVICE INTEGRITY ASSURANCE

FRAMEWORK FOR CLOUD COMPUTING BASED ON MAPREDUCE."Proceedings of IEEE CCIS2012. Hangzhou: 2012, pp 240 –244, Oct. 30 2012-Nov. 1 2012

Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the Hadoop platform."Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.

N, Gonzalez, Miers C, Redigolo F, Carvalho T, Simplicio M, de Sousa G.T, and Pourzandi M. "A Quantitative Analysis of Current Security Concerns and Solutions for Cloud Computing.". Athens:2011., pp 231 – 238, Nov. 29 2011- Dec. 1 2011

Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the Hadoop platform.".Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.

26/26