big data- hadoop -mapreduce

15
Department of Mechanical Engineering Humility Entrepreneurship Teamwork Learning Social Responsibility Respect for Individual Deliver The Promise GMR Institute of Technology, Rajam Term Paper Final-Review GMR Institute of Technology An Autonomous Institute Affiliated to JNTUK, Kakinada 1 Department of Computer Science Engineering

Upload: avadhani-mk

Post on 08-Feb-2017

68 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

Term PaperFinal-Review

GMR Institute of TechnologyAn Autonomous Institute Affiliated to JNTUK, Kakinada

1

Department of Computer Science Engineering

Page 2: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

Performance analysis of MapReduce task in Big data using Hadoop

2May 1, 2023

TITLE

by M. S. V. S. K .Avadhani (14341A05A4)

Under the Guidance and supervision Of

Mrs. K . Jayasri Assistant Professor Department Of Computer Science Engineering

Page 3: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

ABSTRACT Big Data is a huge amount of data that cannot be managed by the traditional

data management system.

There can be three forms of data, structured form, unstructured form and semi structured form. Most of the part of big data is in unstructured form.

Unstructured data is difficult to handle. Hadoop is a technological answer to Big Data. The Apache Hadoop project provides better tools and techniques to handle this

huge amount of data.

A Hadoop Distributed File System (HDFS) for storage and the MapReduce techniques for processing the data.

 This paper discusses the work done on Hadoop by applying a number of files as input to the system and then analysing the performance of the Hadoop .

Page 4: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

ABSTRACT(contd..)

Besides it discusses the behaviour of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.

oKeywords:

Big data Hadoop HDFS MapReduce.

4May 1, 2023

Page 5: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

5May 1, 2023

Page 6: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

Hadoop

• Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of commodity hardware.

• Storing HDFS(Hadoop Distributed File System)• Processing MapReduce

6May 1, 2023

Page 7: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

• HDFS: Specially designed file system for storing

huge data sets in cluster of commodity hardware with streaming access pattern.

5 services : Name node Secondary node Job trackerData nodeTask tracker

7May 1, 2023

Page 8: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

8May 1, 2023

Page 9: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

• MapReduce: MapReduce is a processing technique and a program model

for distributed computing based on java.

The MapReduce algorithm contains two important tasks, namely Map and Reduce.

Map stage : The map or mapper’s job is to process the input data.

Reduce stage : This stage is the combination of the Shuffle stage and the Reduce stage.

9May 1, 2023

Page 10: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

10May 1, 2023

Page 11: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

• The performance of the

MapReduce task on the basis of the byte written, File bytes read, Reduce input records, have been recorded in the beside Table.

• Number of bytes written by the Map Reduce task does not increase with the rate at which the number of files is increasing.

11May 1, 2023

Page 12: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

The reason is that when the reduce function reduces the map output it just combines the output of map reduce like in example two time how is saved with only a single value increase by one.

12May 1, 2023

Page 13: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

Conclusion: We have analyzed the performance of the map reduce taskwith the increase number of files. We have used the word count application of the Map reduce for this analysis. The output shows that the Bytes written

do not increase in the same proportion as compared to the amount of files increase.

13May 1, 2023

Page 14: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

REFERENCES

•[1] Shankar Ganesh Manikandan, Siddarth Ravi , “Big Data Analysis using Apache Hadoop”, IEEE,2014

•[2] Ankita Saldhi, Abhinav Goel”,” Big Data Analysis Using Hadoop Cluster”, IEEE,2014

•[3] Amrit Pal, Pinki Agrawal, Kunal Jain, Kunal Jain, ”A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop”, 2014 Fourth International Conference on Communication Systems and Network Technologies

•[4] Aditya B. Patel, Manashvi Birla, Ushma Nair,” “Big Data Problem Using Hadoop and Map Reduce”, NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING, NUiCONE -2012

14May 1, 2023

Page 15: Big data- hadoop -MapReduce

Dep

artm

ent o

f Mec

hani

cal E

ngin

eeri

ngHumility Entrepreneurship Teamwork

Learning Social Responsibility Respect for IndividualDeliver The Promise

GM

R In

stit

ute

of T

echn

olog

y, R

ajam

Thank you….

-Avadhani M.k 15May 1, 2023