big data analytics

28
POORNIMA UNIVERSITY Submitted by: Nitesh Saxena M.TECH(CE) SEMINAR REPRESENTATION ON : BIG DATA ANALYTICS Submitted to: Ass. Prof: Nidhi Mishra

Upload: nitesh-saxena

Post on 14-Aug-2015

131 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big data analytics

POORNIMA UNIVERSITY

Submitted by:Nitesh Saxena

M.TECH(CE)

SEMINAR REPRESENTATION ON : BIG DATA ANALYTICS

Submitted to:Ass. Prof: Nidhi Mishra

Page 2: Big data analytics

CONTENT1. Introduction2. List of papers3. Review process adopted4. List of issues5. List of solution approaches6. Issue wise review7. Strengths and Weaknesses8. Scope of our work9. Conclusion 10. References

Page 3: Big data analytics

INTRODUCTION

Human beings now create 2.5 quintillion bytes of data per day. The rate of data creation has increased so much that 90% of the data in the world today has been created in the last two years alone.

The term Big Data refers to large scale information management and analysis technologies that exceed the capability of traditional data processing technologies.

The incorporation of Big Data is changing Business Intelligence and Analytics by providing new tools and opportunities for leveraging large quantities of structured and unstructured data.

Big data analysis-Efficient and effective handling of large data

Page 4: Big data analytics

LIST OF PAPERS

1)“Mobile Agent based New Framework for Improving Big Data Analysis” .(2013)

2)“pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data”.(2013)4)“5Ws model for big data analysis and visualization”(2013)

3) “IOT-StatisticDB: A General Statistical Database Cluster Mechanism for Big Data Analysis in the Internet of Things”.(2013)

4)“Road Traffic Big Data Collision Analysis Processing Framework”.(2012)

5)“ RUBA: Real-time Unstructured Big Data Analysis Framework”(2013)

6)“An Integrated Framework for Disaster Event Analysis in Big Data Environment”(2013)

Page 5: Big data analytics

7)“Large Imbalance Data Classification Based on MapReduce for Traffic Accident Prediction”.(2014) 8)Addressing Big Data Problem Using Hadoop and Map Reduce”.(2012)9)“Big R: Large-scale Analytics on Hadoop using R”. (2014)10)“High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing using Hadoop”. (2014)11)“Big Data Analysis Using Apache Hadoop”(2012)12)”5Ws model for Big Data Analysis and Visualization”(2013)13)” IRIS recognition on hadoop:a biometrics system

implementation on cloudcomputing”(2012)

Page 6: Big data analytics

14)“log analysis in cloud computing environment with hadoop and spark”.(2011)

15) “Minimizing Big Data Problems using Cloud Computing Based on Hadoop Architecture”.(2012)

16)“Big R: Large-scale Analytics on Hadoop using R”. (2014)17)“Access Security on Cloud Computing Implemented in

Hadoop System”. (2012)18)“Big Data Analysis Using Apache Hadoop”(2012)19)” Applying Hadoop’s MapReduce Framework on Clustering the GPS Signals through Cloud Computing”(2011)20)” IRIS recognition on hadoop:a biometrics system

implementation on cloudcomputing”(2012)

Page 7: Big data analytics

21)“Mass Log Data Processing and Mining Based on Hadoop and Cloud Computing”.(2011)22) “H2T: A simple Hadoop-to-Twister Translator for Cloud

Computing”.(2012)23)“An In-depth Study of Map Reduce in Cloud Environment”.

(2014)24)“Optimizing Multiway Joins in a Map-Reduce Environment”.

(2012)25)“Comparing Map-Reduce and FREERIDE for Data-Intensive

Applications”(2013)

Page 8: Big data analytics

Review process adopted

• There are basically 5 stages for review process:

1. Stage 02. Stage 13. Stage 24. Stage 35. Stage 3+

Page 9: Big data analytics

Stage 0 – “Get a feel”

In this stage, we collect the data from environment. -Conference research papers

Stage 1 – “Get the picture”

We describe a picture of our research from collected data.

Page 10: Big data analytics

Stage 2- “Get the detail”

Define all information about research topic such as title,issue, solution approach from collected data and find out

that what we are looking for and where to find it?

Stage 3- “Evaluate the detail”

Here we defined the solution approach in detail such asalgorithm, methodology, mathematical explanation,assumptions.

Page 11: Big data analytics

Stage 3+ - “Synthesize”

There are we synthesize our review, its topic, issue, solution approach, mathematical explanation of solution approach, type of research and find out the alternative approaches.

Page 12: Big data analytics

LIST OF ISSUES These papers present different issues, which are listed as

below :

Paper no. Issues

1,2,3,12,13,14,15,16 Big data analysis

4,6,7,17,18,19,20

Real time big data analysis using hadoop in cloud computing

5,8,10,11,21,22,23,24,

25Classification of big data using Tools and

Frameworks

Page 13: Big data analytics

LIST OF SOLUTION APPROACHESPaper

No.Issues Solution

1,2,3,12,13,

14,15,16

Big data analysis

1)-MapReduce Agent Mobility (MRAM) used to overcome the drawbacks of Hadoop.

2)-A new plug-in system PuntStore with pLSM (Punt Log Structured Merge Tree) improve the read and write throughput in NoSql database.

COLA(Cache Oblivious Look-ahead Array ) was also used for efficiently insertion and range queries.

3)-“IOT-StatisticDB”- Statistical Database Cluster MechanismCan support complicated statistical queries through PostgreSQL8.2.4

12)-a 5Ws model to analyze the big data attributes and patterns and densities between data.

Page 14: Big data analytics

Paper no Issue Solutions

4,6,7,17,18,19,20Real time big data

analysis using hadoop in cloud

computing

4)-Road Traffic Big Data Collision Analysis Processing Frame work

proposed the distributed CEP which dynamically distributed event processing load in road traffic event

6)-An integrated framework using Co-occurring Theory and Markov chain approach to find out probabilities

7)-Hadoop framework and sampling method for removing the imbalance in data.

LIST OF SOLUTION APPROACHES

Page 15: Big data analytics

Paper no Issue Solutions

5,8,9,10,11,21,22,23,24,

25

Classification of big data using

Tools and Frameworks

Hadoop Distributed File System (HDFS), Hadoop cluster,Map Reduce programming frameworkVisual clustering analysisRUBA Unstructured Big data Analysis frameworkApache Hadoop

LIST OF SOLUTION APPROCHES

Page 16: Big data analytics

Issue-Wise Findings :- Issue 1 :- Big Data Analysis

• Worked to improve big data analysis and overcome the drawbacks of Hadoop.

• Designed and developed the MapReduce Agent Mobility (MRAM) which is based on the Java Agent Development Framework (JADE).

• Discussed few research works on big data analysis by using Hadoop and stated the drawbacks of Hadoop on its performance and reliability against big data analysis.

• Designed and developed a new plug-in system PuntStore with pLSM (Punt Log Structured Merge Tree) index engine to provide scalable and efficient index services for real-time data analysis.

• The Punt LSM (pLSM) can satisfy the needs for performing index probes in write optimized systems.

Page 17: Big data analytics

Issue 2 :- Real time big data analysis using hadoop in cloud computing

• Worked to solve the Road traffic collision problem for big data analyzing and processing

• Tested the proposed framework on road traffic data on a 45-mile section of I-880N freeway CA, USA. By integrating freeway traffic big data and collision data over a ten year period (1TB Size), and obtained the collision probability.

• Worked for Real-time analysis and dynamic modification in unstructured big data analysis

• the insufficient number of compute nodes as number of map tasks increases with growing dataset size.

• Hadoop makes the users program the distributed software easily even they know nothing about the bottom circumstances..

• A Markov chain with transition probabilities applied to the random variables of cubes and result was taken to find the probability of disaster events.

Page 18: Big data analytics

Issue 3 :- Classification of big data using Tools and Frameworks

• Worked to investigate the database kernel level, parallel statistical analysis techniques for massive sensor sampling data in the Internet of Things.

• The General Statistical Database Cluster Mechanism for Big Data Analysis in the Internet of Things (“IOT-StatisticDB”)on sensor sampling data is one of the most important procedures in IoT systems to transform “data” into “knowledge”.

• Designed and developed a 5Ws model to analyze the big data attributes and patterns and densities Between data.

• Hadoop Distributed File System (HDFS), Hadoop cluster.• Map Reduce programming framework.

Page 19: Big data analytics

STRENGTH

• Solve the problem of centralized master node if it fails and fault tolerance of the system in hadoop

• Increase the performance by MRAM to analyze the data comparing to Hadoop• Replace the MySql by NoSql by increasing the read and write throughput and making

searching, inserting and deletion easily in database.• Provide parallel statistical analysis techniques for massive sensor sampling data in the

Internet of Things.• Solve the problem of sampling the sensor data in parallel and distributed system.• Provide the information about the big data pattern and visualization by using the 5Ws

model.• Can find out about the attackers location or ip addresses using 5Ws model and its

application.

Page 20: Big data analytics

• Many kinds of real time big data analysis can be done using hadoop clustering techniques.

• Hadoop and HBase techniques can be used for analysis of real time road traffic collision data.

• CEP analysis can be used to analyze an unstructured big data like CCTV data and process it in distributed system.

• One can obtain the information about the current situation for the disaster event.

Page 21: Big data analytics

WEAKNESSES

• Event analysis methods can not be applied for faster and reliable insight information of real time data.

• Working of MRAM based on the Java Agent Development Framework (JADE) so to develop it ,is more complex for anyone.

• pLSM NoSql requires more space and memory size to implement its work.

• Its uneasy to apply statistical analyzing methods on the unstructured data in parallel and distributed environment.

• Providing useful traffic data form loop detectors is quite tough work .

Page 22: Big data analytics

SCOPE OF OUR WORK• Further work can be done on the Hadoop techniques as

MapReduce, HDFS, HBase environment to process the distributed data by using MRAM framework.

• We can apply the RUBA framework to fields of U-city, U-plant and ITS.

• In future we can use the 5WS model by deploying the densities classification in more areas and more data sets and use of Gapminder’s visualization techniques.

• We can improve the current disaster event analysis methods for faster and reliable insight information

Future work will focus on performance evaluation and modeling of hadoop data-intensive applications on cloud platforms like Amazon Elastic Compute Cloud (EC2).

Page 23: Big data analytics

ConclusionWe have elaborated review of 25 research papers ranging from

2011 to 2014 based on Big Data Analysis. The review process consists of 3 stage analysis. Basically we found three main issues in the field of Big Data viz Big data analysis tools, Classification of big data using Tools and Frameworks and Real Time Big Data Analysis.

Here after finding the solution approaches we concluded that Big Data Analysis is the main area into which the future work can be done. We found many Solution approaches out of which MapReduce Agent Mobility (MRAM), PuntStore with pLSM (Punt Log Structured Merge Tree), “IOT-StatisticDB”- Statistical Database Cluster Mechanism & Visual clustering analysis are most promising due to its advantages & properties.

Page 24: Big data analytics

References1) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent

based New Framework for Improving Big Data Analysis” 978-1-4799-2829-3/13 $26.00 © 2013 IEEE

2) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference

3) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering

4) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE

5) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big Data Collision Analysis Processing Framework”(2013)

Page 25: Big data analytics

6 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster Event Analysis in Big Data Environments” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing

7) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time Unstructured Big Data Analysis Framework “ 978-1-4799-0698-7/13/$31.00 ©2013 IEEE

8) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799-2829-3/13 $26.00 © 2013 IEEE

9) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference

10) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering

11) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE

12) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big Data Collision Analysis Processing Framework”(2013)

Page 26: Big data analytics

13 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster Event Analysis in Big Data Environments” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing

14) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time Unstructured Big Data Analysis Framework “ 978-1-4799-0698-7/13/$31.00 ©2013 IEEE

15) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799-2829-3/13 $26.00 © 2013 IEEE

16) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference

17) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering

18) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE

19) Duckwon Chung, Xuhua Rui, Dugki Min, Hwasoo Yeo, “Road Traffic Big Data Collision Analysis Processing Framework”(2013)

Page 27: Big data analytics

20 ) Pyke Tin, Thi Thi Zin, Takashi Toriu” An Integrated Framework for Disaster Event Analysis in Big Data Environments” 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing

21) Jaein Kim, Nacwoo Kim, Joonho Park, Kwangik Seo “ RUBA: Real-time Unstructured Big Data Analysis Framework “ 978-1-4799-0698-7/13/$31.00 ©2013 IEEE

22) Youssef M. ESSA, Gamal ATTIYA and Ayman EL- SAYED “Mobile Agent based New Framework for Improving Big Data Analysis” 978-1-4799-2829-3/13 $26.00 © 2013 IEEE

23) Jin Wang, Yong Zhang, Yang Gao, Chunxiao Xing “pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis “2013 IEEE 37th Annual Computer Software and Applications Conference

24) Jinson Zhang “5Ws model for big data analysis and visualization”2013 IEEE 16th International Conference on Computational Science and Engineering

25) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu” IOT-StatisticDB: A General Statistical Database Cluster Mechanism forBig Data Analysis in the Internet of Things”978-0-7695-5046-6/13 $26.00 © 2013 IEEE

Page 28: Big data analytics

Thank

you