hadoop eco system-first class

36
Hadoop Eco-System Training & Hands-On

Upload: alogarg

Post on 15-Jan-2015

167 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hadoop eco system-first class

Hadoop Eco-SystemTraining & Hands-On

Page 2: Hadoop eco system-first class

Introduction

Introduction to Distributed Programming› Sequential Programming › Asynchronous Programming› Concurrent Programming› Distributed Programming› Sequential Programming vs Asynchronous

Programming› Concurrent Programming vs Distributed Programming

Page 3: Hadoop eco system-first class
Page 4: Hadoop eco system-first class
Page 5: Hadoop eco system-first class

Introduction› Open Source Framework for writing and running

distributed applications.› Suited for applications that process large amounts of

data.

› Accessible - eg; EC2 cloud OR commodity hardware› Robust - Easy to recover from hardware failures.› Scalable - Scales linearly to handle larger data by

adding more nodes.› Simple - Enables to quickly write efficient parallel

code.

› Used in Data-Intensive applications such as telecom , finance , account overview pages.

› SCALE-OUT instead of SCALE-UP.

Page 6: Hadoop eco system-first class
Page 7: Hadoop eco system-first class
Page 8: Hadoop eco system-first class
Page 9: Hadoop eco system-first class

Hadoop Vs SQL DB

SCALE-OUT Vs SCALE-UP Key-Value Pair instead of relational DB. Functional Programming – instead of

Declarative SQL statements. Offline Batch Processing Vs Online

Transactions

Page 10: Hadoop eco system-first class

Table of Content

How Hadoop Works› Cluster of Nodes› Type of Nodes

Computation Nodes Job Tracker Task Tracker

Storage Nodes Name Node Data Nodes Secondary Name Node

Page 11: Hadoop eco system-first class
Page 12: Hadoop eco system-first class
Page 13: Hadoop eco system-first class
Page 14: Hadoop eco system-first class
Page 15: Hadoop eco system-first class
Page 16: Hadoop eco system-first class
Page 17: Hadoop eco system-first class
Page 18: Hadoop eco system-first class
Page 19: Hadoop eco system-first class
Page 20: Hadoop eco system-first class
Page 21: Hadoop eco system-first class
Page 22: Hadoop eco system-first class
Page 23: Hadoop eco system-first class
Page 24: Hadoop eco system-first class
Page 25: Hadoop eco system-first class
Page 26: Hadoop eco system-first class
Page 27: Hadoop eco system-first class
Page 28: Hadoop eco system-first class
Page 29: Hadoop eco system-first class
Page 30: Hadoop eco system-first class
Page 31: Hadoop eco system-first class
Page 32: Hadoop eco system-first class
Page 33: Hadoop eco system-first class

UnderStanding MapReduce› Scaling a simple program Manually

Example – Word Count – A single document Scaling Word Count for multiple documents

Front End - Map Program Back End – Reduce Program

› How Hadoop Helps One Central Storage Server vs Distributed

Storage Phase 2 distributed processing

Page 34: Hadoop eco system-first class

Installing Hadoop Setting up Environment Variables Hadoop Usage Execution of Sample WordCount

program on Hadoop. Setting up the Cluster

› Local Mode› Pseudo-Distributed Mode› Fully-Distributed Mode

Monitoring the output› Web-based Cluster UI

Page 35: Hadoop eco system-first class

Working with Files in HDFS› Basic File Commands

Adding Files and Directories Removing Files and Directories

› Reading and Writing to HDFS programmatically Sample program

› Anatomy of a Map-Reduce Program Hadoop Data-Types Mapper Reducer Partitioner Combiner - Local Reduce

Page 36: Hadoop eco system-first class

Working with Files in HDFS› Reading and Writing

InputFormat TextInputFormat KeyValueTextInputFormat

Creating a custom InputFormat InputSplit RecordReader

OutputFormat Types of OutputFormat