hadoop eco system-first class
DESCRIPTION
TRANSCRIPT
Hadoop Eco-SystemTraining & Hands-On
Introduction
Introduction to Distributed Programming› Sequential Programming › Asynchronous Programming› Concurrent Programming› Distributed Programming› Sequential Programming vs Asynchronous
Programming› Concurrent Programming vs Distributed Programming
Introduction› Open Source Framework for writing and running
distributed applications.› Suited for applications that process large amounts of
data.
› Accessible - eg; EC2 cloud OR commodity hardware› Robust - Easy to recover from hardware failures.› Scalable - Scales linearly to handle larger data by
adding more nodes.› Simple - Enables to quickly write efficient parallel
code.
› Used in Data-Intensive applications such as telecom , finance , account overview pages.
› SCALE-OUT instead of SCALE-UP.
Hadoop Vs SQL DB
SCALE-OUT Vs SCALE-UP Key-Value Pair instead of relational DB. Functional Programming – instead of
Declarative SQL statements. Offline Batch Processing Vs Online
Transactions
Table of Content
How Hadoop Works› Cluster of Nodes› Type of Nodes
Computation Nodes Job Tracker Task Tracker
Storage Nodes Name Node Data Nodes Secondary Name Node
UnderStanding MapReduce› Scaling a simple program Manually
Example – Word Count – A single document Scaling Word Count for multiple documents
Front End - Map Program Back End – Reduce Program
› How Hadoop Helps One Central Storage Server vs Distributed
Storage Phase 2 distributed processing
Installing Hadoop Setting up Environment Variables Hadoop Usage Execution of Sample WordCount
program on Hadoop. Setting up the Cluster
› Local Mode› Pseudo-Distributed Mode› Fully-Distributed Mode
Monitoring the output› Web-based Cluster UI
Working with Files in HDFS› Basic File Commands
Adding Files and Directories Removing Files and Directories
› Reading and Writing to HDFS programmatically Sample program
› Anatomy of a Map-Reduce Program Hadoop Data-Types Mapper Reducer Partitioner Combiner - Local Reduce
Working with Files in HDFS› Reading and Writing
InputFormat TextInputFormat KeyValueTextInputFormat
Creating a custom InputFormat InputSplit RecordReader
OutputFormat Types of OutputFormat