understanding hadoop framework
TRANSCRIPT
![Page 1: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/1.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 1/31
![Page 2: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/2.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 2/31
Week 1 – Understanding Big Data
– Introduction to HDFS
Week 2 – Playing around with Cluster
– Data loading Techniques
Week 3 – Map-Reduce Basics, types and formats
– Use-cases for Map-Reduce
Week 4 – Analytics using Pig
– Understanding Pig Latin
Week 5 – Analytics using Hive
–
Understanding HIVE QL
Week 6 – NoSQL Databases
– Understanding HBASE
Week 7 – Real world Datasets and
– Hadoop Project Environm
Week 8 – Project Reviews
– Planning a career in Big D
Course Topics
![Page 3: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/3.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 3/31
Live classes
Class recordings Module wise Quizzes, Coding Assignments
24x7 on-demand technical support
Project work on large Datasets
Online certification exam
Lifetime access to the Learning Management System
How it works
Complementary Java Classes
![Page 4: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/4.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 4/31
What is Big Data?
![Page 5: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/5.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 5/31
Facebook Example
Facebook users spend 10.5 b
(almost 20,000 years) online
network
Facebook has an average of
comments are posted every
![Page 6: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/6.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 6/31
Twitter has over 500 million re
users.
The USA, whose 141.8 million represents 27.4 percent of all T
good enough to finish well ahe
Japan, the UK and Indonesia.
79% of US Twitter users are mo
recommend brands they follow 67% of US Twitter users are mo
buy from brands they follow
57% of all companies that use
for business use Twitter
Twitter Example
![Page 7: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/7.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 7/31
Other Industrial Usecases
• Insurance
• Healthcare
• Retail
– Recommendations
–Groupings
• Genome Sequencing
• Utilities
![Page 8: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/8.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 8/31
Hadoop Users
http://wiki.apache.org/hadoop/PoweredBy
![Page 9: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/9.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 9/31
Data volume is growing exponentially
• Estimated Global Data Volum
– 2011: 1.8 ZB
– 2015: 7.9 ZB
• The world's information doubl
• Over the next 10 years:
– The number of servers world
– Amount of information mana
data centers will grow by 50x
– Number of “files” enterprise
will grow by 75x
Source: http://www.emc.com/leaders
universe.htm, which was based on the
Universe Study
![Page 10: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/10.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 10/31
Un-Structured Data is exploding
![Page 11: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/11.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 11/31
Read 1 TB Data
10 Machine 4 I/O Channels
Each Channel – 1 4 I/O Channels
Each Channel – 100 MB/s
1 Machine
Why DFS?
![Page 12: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/12.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 12/31
10 Machine 4 I/O Channels
Each Channel – 1 4 I/O Channels
Each Channel – 100 MB/s
1 Machine
Read 1 TB Data
45 Minutes
Why DFS?
![Page 13: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/13.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 13/31
4.5 Minut45 Minutes
10 Machine 4 I/O Channels
Each Channel – 1 4 I/O Channels
Each Channel – 100 MB/s
1 Machine
Read 1 TB Data
Why DFS?
![Page 14: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/14.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 14/31
What Is Distributed File System? (DFS)
![Page 15: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/15.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 15/31
Apache Hadoop is a framework that allows for the distributed processing of large data sets ac
of commodity computers using a simple programming model.
Companies using Hadoop:
- Yahoo
- Amazon
- AOL
- IBM
- And many more at
http://wiki.apache.org/hadoop/PoweredBy
What is Hadoop?
![Page 16: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/16.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 16/31
Hadoop Eco-System
d
![Page 17: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/17.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 17/31
HDFS – Hadoop Distributed File System (storage)
MapReduce (processing)
Hadoop Core Components:
h i S?
![Page 18: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/18.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 18/31
HDFS - Hadoop Distributed File System
Highly fault-tolerant
High throughput
Suitable for applications with large data sets
Streaming access to file system data
Can be built out of commodity hardware
What is HDFS?
M i C Of HDFS
![Page 19: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/19.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 19/31
NameNode:
master of the system
maintains and manages the blocks which are present on the
DataNodes
Main Components Of HDFS:
DataNodes: slaves which are deployed on each machine and provide the actual
storage
responsible for serving read and write requests for the clients
S d N N d
![Page 20: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/20.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 20/31
Secondary NameNode:
Not a hot standby for the NameNode
Connects to NameNode every hour*
Housekeeping, backup of NemeNode metadata
Saved metadata can build a failed NameNode
Secondary NameNode:
You gi
metada
hour, I sec
Sin
F
Secondary
NameNode
NameNode
metadata
metadata
J bT k d T kT k
![Page 21: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/21.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 21/31
JobTracker and TaskTracker:
HDFS A hit t
![Page 22: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/22.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 22/31
HDFS Architecture
Job Tracker
![Page 23: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/23.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 23/31
Job Tracker
Job Tracker Contd
![Page 24: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/24.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 24/31
Job Tracker Contd.
Job Tracker Contd
![Page 25: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/25.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 25/31
Job Tracker Contd.
Job Tracker Contd
![Page 26: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/26.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 26/31
Job Tracker Contd.
HDFS Client Creates a New File
![Page 27: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/27.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 27/31
HDFS Client Creates a New File
Rack Awareness
![Page 28: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/28.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 28/31
Rack Awareness
Anatomy of a File Write:
![Page 29: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/29.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 29/31
Anatomy of a File Write:
Anatomy of a File Read:
![Page 30: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/30.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 30/31
Anatomy of a File Read:
![Page 31: Understanding Hadoop framework](https://reader031.vdocuments.us/reader031/viewer/2022020718/577cd1831a28ab9e7894a0a4/html5/thumbnails/31.jpg)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 31/31
Thank YouSee You in Class Next Week