big data- hadoop -mapreduce
TRANSCRIPT
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
Term PaperFinal-Review
GMR Institute of TechnologyAn Autonomous Institute Affiliated to JNTUK, Kakinada
1
Department of Computer Science Engineering
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
Performance analysis of MapReduce task in Big data using Hadoop
2May 1, 2023
TITLE
by M. S. V. S. K .Avadhani (14341A05A4)
Under the Guidance and supervision Of
Mrs. K . Jayasri Assistant Professor Department Of Computer Science Engineering
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
ABSTRACT Big Data is a huge amount of data that cannot be managed by the traditional
data management system.
There can be three forms of data, structured form, unstructured form and semi structured form. Most of the part of big data is in unstructured form.
Unstructured data is difficult to handle. Hadoop is a technological answer to Big Data. The Apache Hadoop project provides better tools and techniques to handle this
huge amount of data.
A Hadoop Distributed File System (HDFS) for storage and the MapReduce techniques for processing the data.
This paper discusses the work done on Hadoop by applying a number of files as input to the system and then analysing the performance of the Hadoop .
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
ABSTRACT(contd..)
Besides it discusses the behaviour of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.
oKeywords:
Big data Hadoop HDFS MapReduce.
4May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
5May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
Hadoop
• Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of commodity hardware.
• Storing HDFS(Hadoop Distributed File System)• Processing MapReduce
6May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
• HDFS: Specially designed file system for storing
huge data sets in cluster of commodity hardware with streaming access pattern.
5 services : Name node Secondary node Job trackerData nodeTask tracker
7May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
8May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
• MapReduce: MapReduce is a processing technique and a program model
for distributed computing based on java.
The MapReduce algorithm contains two important tasks, namely Map and Reduce.
Map stage : The map or mapper’s job is to process the input data.
Reduce stage : This stage is the combination of the Shuffle stage and the Reduce stage.
9May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
10May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
• The performance of the
MapReduce task on the basis of the byte written, File bytes read, Reduce input records, have been recorded in the beside Table.
• Number of bytes written by the Map Reduce task does not increase with the rate at which the number of files is increasing.
11May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
The reason is that when the reduce function reduces the map output it just combines the output of map reduce like in example two time how is saved with only a single value increase by one.
12May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
Conclusion: We have analyzed the performance of the map reduce taskwith the increase number of files. We have used the word count application of the Map reduce for this analysis. The output shows that the Bytes written
do not increase in the same proportion as compared to the amount of files increase.
13May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
REFERENCES
•[1] Shankar Ganesh Manikandan, Siddarth Ravi , “Big Data Analysis using Apache Hadoop”, IEEE,2014
•[2] Ankita Saldhi, Abhinav Goel”,” Big Data Analysis Using Hadoop Cluster”, IEEE,2014
•[3] Amrit Pal, Pinki Agrawal, Kunal Jain, Kunal Jain, ”A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop”, 2014 Fourth International Conference on Communication Systems and Network Technologies
•[4] Aditya B. Patel, Manashvi Birla, Ushma Nair,” “Big Data Problem Using Hadoop and Map Reduce”, NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING, NUiCONE -2012
14May 1, 2023
Dep
artm
ent o
f Mec
hani
cal E
ngin
eeri
ngHumility Entrepreneurship Teamwork
Learning Social Responsibility Respect for IndividualDeliver The Promise
GM
R In
stit
ute
of T
echn
olog
y, R
ajam
Thank you….
-Avadhani M.k 15May 1, 2023