become an expert in hadoop · hdfs, mapreduce, spark, hbase, hive, pig, oozie, sqoop & flume....
TRANSCRIPT
BECOME AN EXPERT IN
HADOOP About the Course Hadoop is an Apache project (an open source software) to store and process Big Data. Hadoop stores Big Data in a distributed and fault tolerant manner over commodity hardware. Afterwards, Hadoop tools are used to perform parallel data processing over HDFS (Hadoop Distributed File System). As organizations have realized the benefits of Big Data Analytics, so there is a huge demand for Big Data & Hadoop professionals. Companies are looking for Big data & Hadoop experts with the knowledge of Hadoop Ecosystem and best practices about HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset.
- Students will learn concepts, techniques and tools they need to deal with various facets of Hadoop practice.
- The focus in the treatment of these topics will be a balanced approach on breadth and depth, and emphasis will be placed on integration and synthesis of concepts and their application to real time problems.
- To make the learning contextual, real datasets from a variety of disciplines will be used.
Program Highlights Most Comprehensive Curriculum Trained by passionate and Industry experts Each concept will be explained by golden rule
Theory Example Software Implementation Real-Time applicability All classes explained with REAL TIME projects experience End to End explanation with architecture Designed for the Industry Live Project Placement Assistance Free Mock Interviews for Hadoop Interview preparation Hand written notes copy and slides copy Detailed assistance in Resume preparation. Special attention for experienced
people on previous experience Latest resources, blogs and articles sharing
Audience Basic knowledge on below [FREE classes will be provided if needed]: SQL Commands Linux Commands Java Basics - OOPs Concepts only
Duration & Mode of Training 2 months, Online Training
Course Content
Understanding Big Data and Hadoop
Objectives: In this module, you will understand what Big Data, Hadoop is and its usage, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS and how MapReduce works.
Introduction to Big data What is Big data? Big Data opportunities, Challenges Characteristics of Big data
Limitations & Solutions of Big Data Architecture Introduction to Hadoop and use of Hadoop?
Hadoop & its Features Hadoop Ecosystem Hadoop 2.x Core Components Components of Hadoop Ecosystem
i. Storage: HDFS (Hadoop Distributed File System) ii. Processing: MapReduce Framework
Different Hadoop Distributions
Hadoop Architecture and HDFS
Objectives: In this module, you will learn Cluster environment, Hadoop Cluster Architecture, important configuration files of Hadoop Cluster, installing single node cluster, in-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager.
What is Cluster Environment? Hadoop 2.x Cluster Architecture
Hadoop Cluster Modes Common Hadoop Shell Commands Hadoop 2.x Configuration Files Single Node and Multi-Node Cluster set up Hadoop Administration Significance of HDFS in Hadoop Features of HDFS Storage aspects of HDPS Replication in Hadoop
Hadoop MapReduce Framework
Objectives: In this module, you will understand Hadoop MapReduce framework comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced MapReduce concepts like Input Format, Output Format, Partitioners, Combiners, Shuffle and Sort.
Why Map Reduce is needed in Hadoop? Traditional way vs MapReduce way Why MapReduce? YARN Concepts
YARN Architecture YARN MapReduce Application Execution Flow YARN Workflow
Anatomy of MapReduce Program Input Splits
InputSplit Need Of Input Split in Map Reduce InputSplit Size InputSplit Size Vs Block Size InputSplit Vs Mappers
Relation between Input Splits and HDFS Blocks
Advanced Hadoop MapReduce
Objectives: In this module, you will learn Advanced concepts of MapReduce such as Counters, MapReduce programming model, Distributed Cache, MRunit, MR Joins and XML parsing.
Counters MapReduce Programming Model
Write basic MapReduce program i. Driver Code ii. Mapper Code iii. Reducer Code
Identity Mapper and reducer Input Formats in Map Reduce Output Formats in Map Reduce Combiner and Partitioner Joins in Map Reduce
i. Map side Join ii. Reduce side Join iii. Real time applicability iv. Distributed Cache
XML file Parsing using MapReduce MRunit Custom Input Format XML file Parsing using MapReduce Hands-on Exercises
Apache Pig
Objectives: In this module, you will learn Apache Pig, types of use cases where we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, Pig running modes, Pig UDF, Pig Streaming & Testing Pig Scripts. You will also be working on healthcare dataset.
Introduction to Apache Pig MapReduce vs Pig Pig Components & Pig Execution Different Data Types in Pig Modes of Execution in Pig
Local Mode Map Reduce or Distributed Mode
Transformations in Pig Pig Latin Programs Shell and Utility Commands Pig UDF & Pig Streaming Develop a simple Pig Script Use of commands Group By, Filter By, Distinct, Cross, Split with a Use case
Apache Hive
Objectives: This module will help you in understanding Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.
Introduction to Apache Hive Hive vs Pig Hive Architecture and Components
Driver Compiler Executor
Hive Metastore Importance Of Hive Meta Store Communication mechanism with Metastore and configuration details
Limitations of Hive Comparison with Traditional Database Hive Data Types and Data Models
Array Struct Map
Conditional Functions in Hive Importance of CASE statement with a Use case
Hive Partition Hive Bucketing Hive Tables (Managed Tables and External Tables) Importing Data Querying Data & Managing Outputs Hive Script User Defined Functions(UDFs) in HIVE
UDFs UDAFs UDTFs Need of UDFs in HIVE
Use cases
Advanced Apache Hive
Objectives: In this module, you will understand advanced Apache Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive.
Hive Serializer/Deserializer - SerDe Hive QL: Joining Tables, Dynamic Partitioning Custom MapReduce Scripts Hive Indexes and views Hive Query Optimizers Hive Thrift Server Hive UDF
Sqoop
Objectives: This module will help you in understanding in-depth knowledge of Sqoop and loading the data using Sqoop from a Database.
Introduction to Sqoop. How to connect to Relational Database using Sqoop Performance Implications in SQOOP Import and Export and how to improve the
performance Different Sqoop Commands Different flavors of Imports
Historical Incremental
Export Sqoop imports to Hive tables Sqoop imports to HBase Sqoop Incremental Load VS History Load & Limitations in Incremental Load
Apache HBase
Objectives: This module will help you in understanding in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes and its components.
Apache HBase: Introduction to NoSQL Databases and HBase HBase v/s RDBMS HBase Concepts
HBase Architecture HBase Run Modes HBase Configuration HBase Cluster Deployment
Advanced Apache HBase
Objectives: This module will cover advance Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster & why HBase uses Zookeeper.
HBase Data Model Column families Column Qualifier Name Row Key
HBase Concepts Architecture HBase shell HBase Client API
Hive Data Loading Techniques Apache Zookeeper Introduction
ZooKeeper Data Model Zookeeper Service
HBase0020Bulk Loading Getting and Inserting Data HBase Filters HIVE – HBASE Integration
Oozie
Objectives: In this module, you will understand how multiple Hadoop ecosystem components work together to solve Big Data problems. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs.
Oozie Introduction Oozie Architecture Oozie Workflow Oozie Job Submission
Workflow.xml Coordinator.xml Job.coordinator.properties
Scheduling a Oozie workflow Use Case