hadoop

20
Presented by NIKHIL P L 1

Upload: nikhilpl

Post on 27-Jan-2015

662 views

Category:

Technology


3 download

DESCRIPTION

Apache Hadoop Seminar

TRANSCRIPT

Page 1: Hadoop

1

Presented by NIKHIL P L

Page 2: Hadoop

Apache Hadoop

• Developer(s) : Apache Software Foundation

• Type : Distributed File System• License : Apache License 2.0• Written in : Java• O S : Cross platform• Created by : Doug Cutting (2005)• Inspired by: Google’s MapReduce, GFS

2

Page 3: Hadoop

3

Sub projects

• HDFS– distributed, scalable, and portable file system– Store large data sets– Cope with hardware failure– Runs on top of the existing system

Page 4: Hadoop

4

HDFS - Replication

• Blocks with data are replicated to multiple nodes

• Allow for node failure without data loss

Page 5: Hadoop

5

Sub projects .

• MapReduce– Technology from Google– Hadoop's fundamental data filtering algorithm– Map and Reduce functions– Useful in a wide range of application• distributed pattern-based searching, distributed

sorting, web link-graph reversal, machine learning, statistical machine translation.

Page 6: Hadoop

6

MapReduce - Workflow

Page 7: Hadoop

7

Hadoop cluster (Terminology)

Page 8: Hadoop

8

Types of Nodes

• HDFS nodes– NameNode (Master)– DataNode (Slaves)

• MapReduce nodes– Job Tracker (Master)– Task Tracker (Slaves)

Page 9: Hadoop

9

Types of Nodes .

Page 10: Hadoop

10

Sub projects ..

• Hive– providing data summarization, query, and analysis– initially developed by Facebook

• Hbase– open source, non-relational, distributed database– Providing Google BigTable-model database -like

capabilities

Page 11: Hadoop

11

Sub projects …

• Zookeeper– distributed configuration service, synchronization

services, notification systems and naming registry for large distributed systems.

• Pig– A language and compiler to generate Hadoop

programs– Originally developed at Yahoo!

Page 12: Hadoop

12

How does Hadoop works? .

• HDFS Works

Page 13: Hadoop

13

How does Hadoop works? ..

• MapReduce Works

Page 14: Hadoop

14

How does Hadoop works? …

• MapReduce Works

Page 15: Hadoop

15

How does Hadoop works? ….

• Managing Hadoop Jobs

Page 16: Hadoop

16

Applications

• Marketing analytics• Machin learning (eg: spam filters)• Image processing• Processing of XML messages

Page 17: Hadoop

17

• world's largest Hadoop production application• ~20,000 machines running Hadoop

Page 18: Hadoop

18

• the largest Hadoop cluster in the world with 100 PB of storage

• 1200 machines with 8 cores each + 800 machines with 16 cores each

• 32 GB of RAM per machine• 65 millions files in HDFS• 12 TB of compressed data added per day

Page 19: Hadoop

19

Other Users

Page 20: Hadoop

20

Thanks