data mining with bigdata

30
Data mining With Big Data Presented By: Sandip B. Tipayle Patil Under the Guidance of Prof. Y.N.Patil DEPARTMENT OF COMPUTER ENGINEERING DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY Lonere.

Upload: khanfaizakram

Post on 22-Dec-2015

34 views

Category:

Documents


1 download

DESCRIPTION

Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and bio-medical sciences. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.

TRANSCRIPT

Page 1: Data mining with bigdata

Data mining With Big Data

Presented By:

Sandip B. Tipayle Patil

Under the Guidance of

Prof. Y.N.Patil

DEPARTMENT OF COMPUTER ENGINEERING

DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY

Lonere.

Page 2: Data mining with bigdata

Outlines Introduction

What is Big Data?

How Much Data really Exist?

Literature Review

4Vs of Big Data

Proposed System

System Architecture

Big Data mining Framework

Hadoop Framework

Big Data Challenges and solution

Conclusion

Page 3: Data mining with bigdata

Introduction

Page 4: Data mining with bigdata

Interesting Facts

The volume of business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years)

Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.

A regular person is processing daily more data than a 16th century individual in his entire life

In the last years cost of storage and processing power dropped significantly

Bad data or poor data quality costs US businesses $600 billion annually

Facebook processes 10 TB of data every day / Twitter 7 TB

Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)

Page 5: Data mining with bigdata

What is

Page 6: Data mining with bigdata

“Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.”

-- Forrester

Page 7: Data mining with bigdata

“Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.”

-- Forrester

Boring!

Page 8: Data mining with bigdata

“Big data is the data characterized by 3 attributes: volume, variety and velocity.”

-- IBM

Page 9: Data mining with bigdata

“Big data is the data characterized by 3 attributes: volume, variety and velocity.”

Randomwords

-- IBM

Page 10: Data mining with bigdata

Big Data is not about the size of the data,it’s about the value within the data.

Page 11: Data mining with bigdata

What is …… ?

Data Mining

‣ computational process of discovering patterns in large data sets

Big Data

The term Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.

Page 12: Data mining with bigdata

‘Big Data’ is similar to ‘small data’, but bigger

…but having data bigger it requires different approaches: Techniques, tools and architecture

…with an aim to solve new problems …or old problems in a better way

Page 13: Data mining with bigdata

How much Data does exist? 2.5 quintillion bytes of data are created EVERY DAY

IBM: 90 percent of the data in the world today were produced with past two years

Forms of Data????

Examples : Boing Jet, Scientific Data, Sensor Data, Internet Data,

Page 14: Data mining with bigdata
Page 15: Data mining with bigdata

Literature Review

Data has grown tremendously.

This large amount of data is beyond the software tools to manage.

Exploring the large volume of data and extracting useful information and knowledge is a challenge, and sometimes, it is almost infeasible.

Most people don’t know what to do with all data that they already have

Page 16: Data mining with bigdata

Giant Elephant

Page 17: Data mining with bigdata

Huge Data with heterogeneous and diverse dimensionality

‣ represent huge volume of data

Autonomous sources with distributed and decentralized control

‣ main characteristics of Big Data

Complex and evolving relationships

Page 18: Data mining with bigdata

4 Vs of Big Data

Volume• Data

quantity

Velocity• Data Speed

Variety• Data Types

Veracity• Authenticit

y

Page 19: Data mining with bigdata

Proposed System:

Identify relationships between different idea

Capable of handling Huge volume of Data

Uses distributed parallel computing with help of Hadoop

Provides platform for process data in different dimensions and summarized results.

system architecture is to be flexible enough that the components built on top of it for expressing the various kinds of processing tasks can tune it to efficiently run these different workloads.

System will process these data within reasonable cost and time limits.

Page 20: Data mining with bigdata

Gap due to Lack of analysis

Page 21: Data mining with bigdata

System Architecture:

Page 22: Data mining with bigdata

Hadoop framework :

Page 23: Data mining with bigdata

Big Data Mining framework

Big Data Mining Platform

Dig Data Semantics and Application Knowledge

I. Information Sharing and Data Privacy

II. Domain and Application Knowledge

Big Data Mining Algorithm

I. Local Learning and Model Fusion for Multiple Information Sources

II. mining from Sparse, Uncertain, and Incomplete Data

III. Mining Complex and Dynamic Data

Page 24: Data mining with bigdata

Big Data mining Framework

Page 25: Data mining with bigdata

Challenges

Location of Big Data sources- Commonly Big Data are stored in different locations

Volume of the Big Data- size of the Big Data grows continuously.

Hardware resources- RAM capacityPrivacy- Medical reports, bank transactionsHaving domain knowledgeGetting meaningful information

Page 26: Data mining with bigdata

Solutions

Parallel computing programmingAn efficient platform for computing will not

have centralized data storage instead of that platform will be distributed in big scale storage.

Restricting access to the data

Page 27: Data mining with bigdata

Advantages:

Fast response

Extract useful information

Prediction of required data from large amount of data.

Savour of better results in the form of visualization.

Page 28: Data mining with bigdata

Conclusion

We have entered an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific and improving the profitability and success of many enterprises by using technologies like hadoop ,pig and so on.

Proposed system will fully serviceable across a large variety of application domains, and therefore not cost-effective to address in the context of one domain alone.

Furthermore, this system will provide fully transformative solutions, and will be address naturally for the next generation of industrial applications. We must support and encourage this proposed framework towards addressing these technical challenges of unstructured data, if we are to achieve the promised benefits of Big Data.

Page 29: Data mining with bigdata
Page 30: Data mining with bigdata