hadoop
TRANSCRIPT
A Study of Hadoop in Map-Reduce
Poumita DasShubharthi DasguptaPriyanka Das
What is Big Data??
Big data is an evolving term that describes any voluminous amount of
structured, semi-structured and unstructured data that has the
potential to be mined for information.
The 3 V’s
Why DFS
An introduction to Map-Reduce
Map-Reduce programs are designed to compute large volumes of data in a
parallel fashion. There are 3 steps
• Map
• Shuffle
• Reduce
Map-Reduce continuedMap Shuffle Reduce
What is Hadoop??
Apache Hadoop is a framework
that allows for the distributed
processing of large data sets
across clusters of commodity
computers using a simple
programming model.
Hadoop core components
• Namenode
• Datanode
• Client
• User
• Job tracker
• Task tracker
Namenode
The NameNode maintains the namespace tree and the mapping of
blocks to DataNodes. In a cluster there may exist hundreds or even
thousands of datanodes.
Secondary NameNode reads the metadata from RAM and writes it into a
secondary storage. However it is NOT a substitute of a NameNode
Datanode
On startup, a DataNode connects to the NameNode; spinning until that
service comes up. It then responds to requests from the NameNode for
filesystem operations.
Client applications can talk directly to a DataNode, once the NameNode has
provided the location of the data.
HDFS client
User applications access the filesystem using the HDFS client. A client has mainly 3
operations.
• Creating a new file
• File read
• File write
Creating a new file
File read
HDFS implements a single-
writer, multiple-reader model.
That is reading is a parallel
operation in Hadoop
File write
An HDFS file consists of blocks.
When there is a need for a new
block, the NameNode allocates
a block with a unique block ID
and determines a list of
DataNodes to host replicas of
the block.
Job tracker and task tracker
Hadoop ecosystem
• PIG
• HIVE
• MAHOUT
A Sample Program
The Output
Why Anagrams?
• Started out as a simple relaxation game, finding anagrams in sentences
• Games and Puzzles like Scrabble
• Ciphers, like permutation cipher, transposition ciphers
Future scope
Keeping in mind the vast application of Hadoop we have certain graph-
searching techniques in mind that would be much more easier to solve
with the help of Map-reduce engine.
References
• Introduction to Hadoop: Welcome to Apache https://hadoop.apache.org/ • Cloudera Documentation: Usage
http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_usage.html • Edureka: Anatomy of a Map-Reduce Job
http://www.edureka.co/blog/anatomy-of-a-mapreduce-job-in-apache-hadoop/ • Stackoverflow: Explain Map-Reduce Simply
http://stackoverflow.com/questions/28982/please-explain-mapreduce-simply
Thank you