we offer · sqoop hive semi structured data (avro ) processing processing the data semi structured...
TRANSCRIPT
Ph no. 7395899448
No. 3 Rajeswari Nagar Vengathur Ondikuppam Chennai -602002
We offer:
About Us:
Zeyobron Analytics was established by a team of Big Data Enthusiasts and experts in the year 2018.
Currently, it is one of the leading and paramount IT training and software Development Company.
We leave no stone unturned in providing the world-class training and thereby helping our students to learn
Big Data-oriented business applications such as Hadoop, Analytics, Data Science, and the Internet of things
(IOT) that are the fastest growing trend-setting technologies that provide a competitive advantage in the
ever-changing IT world.
Our training methodology is designed in such a way to shape up your career for the future. We look up at
every individual student and assist them in clearing their doubts regarding the course by highly skilled and
experienced industry experts. These courses have a lot of demand in the cooperate field, and by being an
expert in this course, you will be offered jobs from top MNC’s eventually helping you to settle in your career.
The Outcome of Training:
Certification is provided at the end of successful completion of the course
Lifetime validity and assistance for students through phone and social media
Provides you with hands-on experience on the real time projects
Goal oriented training and different learning methodologies for different students
Mock interviews are held to make it easier for you to clear the main interviews
We help you to prepare an astonishing resume highlighting the big data projects and the courses
you have learned
Our team helps you to get placed by providing the job news 24*7
Conduct tests at the end of every session to make sure students are up to date
Our expert trainers will cover the Batch, Interactive, Real-time data processing
You will be able to learn to deploy, integrate, install and configure various advanced tools
associated with different courses
Course Content for Hadoop, Spark & Cloud
BIG DATA
Evolution of Data – Introduction to Big data – Classification - Size Hierarchy - Why Big data is
Trending (IOT, Devops, Cloud Computing, Enterprise Mobility) - Challenges in Big Data –
Characteristics - Tools for Big Data - Why Big Data draws attention in IT Industry - What do we do
with Big data - How Big Data can be analyzed - Typical Distributed System - Draw backs in
Traditional distributed System
LINUX
History and Evolution - Architecture – Development Commands – Env Variables - File Management
– Directories Management – Admin Commands – Advanced Commands – Shell Scripting – Groups
and User managements – Permissions – Important directory structure – Disk utilities – Compression
Techniques – Misc Commands
HADOOP: HDFS (1 and 2)
What is Hadoop? - Evolution of Hadoop - Features of Hadoop - Characteristic of Hadoop - Hadoop
compared with Traditional Dist. Systems - When to use Hadoop - When not to use Hadoop -
Components of Hadoop (HDFS & MapReduce) - Hadoop Architecture - Daemons in Hadoop Version
1 & 2 -How Data is stored in Hadoop (Cluster, Datacenter, Spilt, Block, Rack Awareness, Replication,
Hear beat) - Hadoop 1.0 Limitation - NameNode High Availability - NameNode federation - How
Metadata is stored in Disk (FSImage & Editlog file) -Role of Secondary Name Node - Anatomy of File
read & File Write - Data Integrity - Serialization - Compression - What happens when copying data
in Hadoop cluster? - Centos Linux Commands Exercise - Hadoop Next Gen (ver 2) single node
Pseudo mode Custer installation - Hadoop commands Exercise
SQOOP – RDBMS
Introduction & History – History - Installation and configuration - Why Sqoop - Indepth Architecture
- Sqoop Import Properties - Sqoop Export Architecture - Commands (Import – HDSF, HIVE, HBase
from MySQL) - Export – Incremental Import - Saved Jobs - Import All tables - Sqoop installation and
configuration - Sqoop workouts - Sqoop best practices & performance tuning - Sqoop
import/export use cases - Mock test on Sqoop
HIVE – SQL & OLAP Layer on Hadoop
Introduction – Architecture - Hive Vs RDBMS - Detailed Installation (Metastore, Integrating with
Hue)- Starting Metastore and Hive Server - Data types (Primitive, Collection) - Create Tables
(Managed, external) and DML operations (load, insert, export) - Managed Vs External tables - QL
Queries (select, where, group by, having, sort by, order by) - Hive access through Hive Client,
Beeline and Hue - File Formats (RC, ORC, Sequence)- Partitioning (static and dynamic), partition
with external table, dropping partitions and corresponding configuration parameters - Bucketing,
Partitioning Vs Bucketing - Views, different types of joins (inner, outer) - Queries (Union, union all,
intersection, minus) - Add files to the distributed cache, jars to the class path - Optimized joins
(MapSide join, Bucketing join) - Compressions on tables (LZO, Snappy) - Serde (XML Serde,
JsonSerde) - Parallel execution, Sampling data, Speculative execution -Two POCs using the large
dataset on the above topics -Mock Test on Hive and Its Architecture
HADOOP – PROCESSING ARCHITECTURE
Hadoop Ecosystems ROAD MAP-MAP REDUCE FLOW-MapReduce Job submission in YARN Cluster in
details -What is MapReduce? - How MapReduce works on high level - Types of Input and Output
Format - MapReduce in details -Different types of files supported (Text, Sequence, map and Avro) -
STORAGE & PROCESSING DAEMONS Architecture Version 1 - PROCESSING DAEMONS Architecture
Version 1 - Role of Job Tracker and Task Tracker - Manager, Application Master, Node Manager),
Architecture and Failure handling – Schedulers - Resource Manager High availability -YARN
Architecture
SCALA
Scala Introduction – History - Why Scala - Scala Installation - Get deep insights into the functioning
of Scala - Execute Pattern Matching in Scala - OOPs concepts (Classes, Objects, Collections,
Inheritance, Abstraction and Encapsulation) - Functional Programming in Scala (Closures, Currying,
Expressions, Anonymous Functions) - Know the concepts of classes in Scala - Object Orientation in
Scala (Primary, Auxiliary Constructors, Singleton Objects, Companion Objects) - Traits - Abstract
classes
HBASE – Hadoop Database
Introduction to NoSQL - Types of NOSQL - Characteristics of NoSQL - CAP Theorem - What is HBase -
Brief History - Row vs Col - HDFS vs HBASE - RDBMS vs HBASE - Storage Hierarchy – Characteristics -
Table Design - HMaster - Regions - Region Server - Inside Region Server - HBase Architecture (Read
Path, Write Path, Compactions, Splits ) - Minor/Major Compactions – Region Splits - Installation -
Configuration - Role of Zookeeper - HBase Shell - Introduction to Filters - Row Key Design - Map
reduce Integration - Performance Tuning - Hands on - Mock Test on HBase and Its Architecture
Introduction to Cassandra – Comparing with Other DBs – Installation of Single Node Cassandra (Lab)
- KeySpaces - CQL – using cqlSh and commands( Hands on Lab) -- CQL Data Types - Cassandra
Architecture - Failure Detection and recovery(Fault Tolerance) – Data replication – Spark
Integration with Cassandra get and put.
OOZIE – Workflow scheduling and monitoring tool
Introduction – History - Why Oozie – Components – Architecture – Layers - Workflow Engine –
Nodes – Workflow – Coordinator - Action (MapReduce, Hive, Pig and Sqoop) - Introduction to
Bundle - Email Notification - Error Handling – Installation – Workouts
PHOENIX
Introduction – Architecture – Datamodel - Indexes – Concepts – Installation – Shell – Bulk loading –
Salt bucketing – phoenix script - Workouts
NIFI
Introduction – Core Components - Architecture – UI – Data Provenance – NIFI – Kafka - Spark –
Installation Workouts – Real time streaming – Twitter data capture
KAFKA
Introduction –Applications - Architecture – Components – Replication – Distribution – Partitions –
Topics – Producer - Consumer – Broker - Installation – Workouts – Producing – Consuming –
Creation of Topics
SPARK
Introduction – Scala/Python – History – Overview – MR vs Spark – Spark Libraries – Why Spark –
RDDs – Spark Internals – Transformations – Actions – DAG – Fault Tolerance – Lineage –
Terminologies – Cluster types – Hadoop Integration – Spark SQL – Data frames – DataSets –
Optimizers – AST – Session – Structured Streaming– RDDs to Relations – Spark Streaming – Why
Spark Streaming– Data masking techniques – SCD implementation - Real time use cases – End to
end realtime integration with NIFI, Kafka, Spark Streaming, EC2, Cassandra, RDBMS, Different
Filesystems, Hive, Oozie & HBase
MICROSOFT AZURE
AZURE HD insight Cluster creation- Overview on creating cluster on Kafka ,Hadoop and Spark-
Deploying the streaming code in AZURE cluster
ZEYOBRON REAL TIME PROJECTS
Project 1 – Retails Project (Ecommerce)
MySQL incremental data processing in Hadoop using Hive –
Handling incremental data with the transaction of each e-commerce order which gets updated in SQL
Table every day. The data is incremented everyday and imported to HDFS for processing it. This would be
automated as final part of the project.
Project 2– Insurance Project
SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING
Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient
serialization. Once the data got imported in the from the sqoop as a AVRO file we extract the avro file to
edge to generate the schema of that avro file and transfer it to HDFS. Once the data and schema is
available in HDFS , the hive table will be created on top of data to process it using the AVSC schema file
with the help of AVRO deserializer.
Project 3- Banking Project
Spark Hive Hbase Batch data processing and doing OLAP analysis in HBASE
The batch data is HDFS is fetched through spark and applied necessary transformations and put it into hive
table which is connected to external HDFS location and the data is inserted into hive handler table which
acts a connected between external systems to HBASE.
Project 4 – Telecom – Unstructured data Project using Spark and Kafka
Each streaming of unstructured JSON data which is fetched through a directory to Kafka through NIFI is
processing by spark by writing necessary UDFs to convert the unstructured data to structured data for
processing it using Spark sql
Project 5 –Telecom JSON semi Structured data processing using Spark Structured streaming and Kafka
The restful information is fetched and produced to Kafka through Nifi and processing using Spark
structured streaming and produced to another kafka topic after applying necessary transformations and
actions in which we feed the data to SQL to see the visualization in GRAFANA dashboard.