we offer · sqoop hive semi structured data (avro ) processing processing the data semi structured...

7
Ph no. 7395899448 No. 3 Rajeswari Nagar Vengathur Ondikuppam Chennai-602002 We offer:

Upload: others

Post on 15-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

Ph no. 7395899448

No. 3 Rajeswari Nagar Vengathur Ondikuppam Chennai -602002

We offer:

Page 2: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

About Us:

Zeyobron Analytics was established by a team of Big Data Enthusiasts and experts in the year 2018.

Currently, it is one of the leading and paramount IT training and software Development Company.

We leave no stone unturned in providing the world-class training and thereby helping our students to learn

Big Data-oriented business applications such as Hadoop, Analytics, Data Science, and the Internet of things

(IOT) that are the fastest growing trend-setting technologies that provide a competitive advantage in the

ever-changing IT world.

Our training methodology is designed in such a way to shape up your career for the future. We look up at

every individual student and assist them in clearing their doubts regarding the course by highly skilled and

experienced industry experts. These courses have a lot of demand in the cooperate field, and by being an

expert in this course, you will be offered jobs from top MNC’s eventually helping you to settle in your career.

The Outcome of Training:

Certification is provided at the end of successful completion of the course

Lifetime validity and assistance for students through phone and social media

Provides you with hands-on experience on the real time projects

Goal oriented training and different learning methodologies for different students

Mock interviews are held to make it easier for you to clear the main interviews

We help you to prepare an astonishing resume highlighting the big data projects and the courses

you have learned

Our team helps you to get placed by providing the job news 24*7

Conduct tests at the end of every session to make sure students are up to date

Our expert trainers will cover the Batch, Interactive, Real-time data processing

You will be able to learn to deploy, integrate, install and configure various advanced tools

associated with different courses

Page 3: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

Course Content for Hadoop, Spark & Cloud

BIG DATA

Evolution of Data – Introduction to Big data – Classification - Size Hierarchy - Why Big data is

Trending (IOT, Devops, Cloud Computing, Enterprise Mobility) - Challenges in Big Data –

Characteristics - Tools for Big Data - Why Big Data draws attention in IT Industry - What do we do

with Big data - How Big Data can be analyzed - Typical Distributed System - Draw backs in

Traditional distributed System

LINUX

History and Evolution - Architecture – Development Commands – Env Variables - File Management

– Directories Management – Admin Commands – Advanced Commands – Shell Scripting – Groups

and User managements – Permissions – Important directory structure – Disk utilities – Compression

Techniques – Misc Commands

HADOOP: HDFS (1 and 2)

What is Hadoop? - Evolution of Hadoop - Features of Hadoop - Characteristic of Hadoop - Hadoop

compared with Traditional Dist. Systems - When to use Hadoop - When not to use Hadoop -

Components of Hadoop (HDFS & MapReduce) - Hadoop Architecture - Daemons in Hadoop Version

1 & 2 -How Data is stored in Hadoop (Cluster, Datacenter, Spilt, Block, Rack Awareness, Replication,

Hear beat) - Hadoop 1.0 Limitation - NameNode High Availability - NameNode federation - How

Metadata is stored in Disk (FSImage & Editlog file) -Role of Secondary Name Node - Anatomy of File

read & File Write - Data Integrity - Serialization - Compression - What happens when copying data

in Hadoop cluster? - Centos Linux Commands Exercise - Hadoop Next Gen (ver 2) single node

Pseudo mode Custer installation - Hadoop commands Exercise

SQOOP – RDBMS

Introduction & History – History - Installation and configuration - Why Sqoop - Indepth Architecture

- Sqoop Import Properties - Sqoop Export Architecture - Commands (Import – HDSF, HIVE, HBase

from MySQL) - Export – Incremental Import - Saved Jobs - Import All tables - Sqoop installation and

configuration - Sqoop workouts - Sqoop best practices & performance tuning - Sqoop

import/export use cases - Mock test on Sqoop

HIVE – SQL & OLAP Layer on Hadoop

Introduction – Architecture - Hive Vs RDBMS - Detailed Installation (Metastore, Integrating with

Hue)- Starting Metastore and Hive Server - Data types (Primitive, Collection) - Create Tables

(Managed, external) and DML operations (load, insert, export) - Managed Vs External tables - QL

Queries (select, where, group by, having, sort by, order by) - Hive access through Hive Client,

Beeline and Hue - File Formats (RC, ORC, Sequence)- Partitioning (static and dynamic), partition

with external table, dropping partitions and corresponding configuration parameters - Bucketing,

Partitioning Vs Bucketing - Views, different types of joins (inner, outer) - Queries (Union, union all,

intersection, minus) - Add files to the distributed cache, jars to the class path - Optimized joins

(MapSide join, Bucketing join) - Compressions on tables (LZO, Snappy) - Serde (XML Serde,

Page 4: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

JsonSerde) - Parallel execution, Sampling data, Speculative execution -Two POCs using the large

dataset on the above topics -Mock Test on Hive and Its Architecture

HADOOP – PROCESSING ARCHITECTURE

Hadoop Ecosystems ROAD MAP-MAP REDUCE FLOW-MapReduce Job submission in YARN Cluster in

details -What is MapReduce? - How MapReduce works on high level - Types of Input and Output

Format - MapReduce in details -Different types of files supported (Text, Sequence, map and Avro) -

STORAGE & PROCESSING DAEMONS Architecture Version 1 - PROCESSING DAEMONS Architecture

Version 1 - Role of Job Tracker and Task Tracker - Manager, Application Master, Node Manager),

Architecture and Failure handling – Schedulers - Resource Manager High availability -YARN

Architecture

SCALA

Scala Introduction – History - Why Scala - Scala Installation - Get deep insights into the functioning

of Scala - Execute Pattern Matching in Scala - OOPs concepts (Classes, Objects, Collections,

Inheritance, Abstraction and Encapsulation) - Functional Programming in Scala (Closures, Currying,

Expressions, Anonymous Functions) - Know the concepts of classes in Scala - Object Orientation in

Scala (Primary, Auxiliary Constructors, Singleton Objects, Companion Objects) - Traits - Abstract

classes

HBASE – Hadoop Database

Introduction to NoSQL - Types of NOSQL - Characteristics of NoSQL - CAP Theorem - What is HBase -

Brief History - Row vs Col - HDFS vs HBASE - RDBMS vs HBASE - Storage Hierarchy – Characteristics -

Table Design - HMaster - Regions - Region Server - Inside Region Server - HBase Architecture (Read

Path, Write Path, Compactions, Splits ) - Minor/Major Compactions – Region Splits - Installation -

Configuration - Role of Zookeeper - HBase Shell - Introduction to Filters - Row Key Design - Map

reduce Integration - Performance Tuning - Hands on - Mock Test on HBase and Its Architecture

Introduction to Cassandra – Comparing with Other DBs – Installation of Single Node Cassandra (Lab)

- KeySpaces - CQL – using cqlSh and commands( Hands on Lab) -- CQL Data Types - Cassandra

Architecture - Failure Detection and recovery(Fault Tolerance) – Data replication – Spark

Integration with Cassandra get and put.

OOZIE – Workflow scheduling and monitoring tool

Introduction – History - Why Oozie – Components – Architecture – Layers - Workflow Engine –

Nodes – Workflow – Coordinator - Action (MapReduce, Hive, Pig and Sqoop) - Introduction to

Bundle - Email Notification - Error Handling – Installation – Workouts

PHOENIX

Page 5: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

Introduction – Architecture – Datamodel - Indexes – Concepts – Installation – Shell – Bulk loading –

Salt bucketing – phoenix script - Workouts

NIFI

Introduction – Core Components - Architecture – UI – Data Provenance – NIFI – Kafka - Spark –

Installation Workouts – Real time streaming – Twitter data capture

KAFKA

Introduction –Applications - Architecture – Components – Replication – Distribution – Partitions –

Topics – Producer - Consumer – Broker - Installation – Workouts – Producing – Consuming –

Creation of Topics

SPARK

Introduction – Scala/Python – History – Overview – MR vs Spark – Spark Libraries – Why Spark –

RDDs – Spark Internals – Transformations – Actions – DAG – Fault Tolerance – Lineage –

Terminologies – Cluster types – Hadoop Integration – Spark SQL – Data frames – DataSets –

Optimizers – AST – Session – Structured Streaming– RDDs to Relations – Spark Streaming – Why

Spark Streaming– Data masking techniques – SCD implementation - Real time use cases – End to

end realtime integration with NIFI, Kafka, Spark Streaming, EC2, Cassandra, RDBMS, Different

Filesystems, Hive, Oozie & HBase

MICROSOFT AZURE

AZURE HD insight Cluster creation- Overview on creating cluster on Kafka ,Hadoop and Spark-

Deploying the streaming code in AZURE cluster

ZEYOBRON REAL TIME PROJECTS

Project 1 – Retails Project (Ecommerce)

MySQL incremental data processing in Hadoop using Hive –

Handling incremental data with the transaction of each e-commerce order which gets updated in SQL

Table every day. The data is incremented everyday and imported to HDFS for processing it. This would be

automated as final part of the project.

Page 6: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

Project 2– Insurance Project

SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING

Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient

serialization. Once the data got imported in the from the sqoop as a AVRO file we extract the avro file to

edge to generate the schema of that avro file and transfer it to HDFS. Once the data and schema is

available in HDFS , the hive table will be created on top of data to process it using the AVSC schema file

with the help of AVRO deserializer.

Project 3- Banking Project

Spark Hive Hbase Batch data processing and doing OLAP analysis in HBASE

The batch data is HDFS is fetched through spark and applied necessary transformations and put it into hive

table which is connected to external HDFS location and the data is inserted into hive handler table which

acts a connected between external systems to HBASE.

Page 7: We offer · SQOOP HIVE SEMI STRUCTURED DATA (AVRO ) PROCESSING Processing the data semi structured data like AVRO which is rich in Schema evolution and efficient serialization. Once

Project 4 – Telecom – Unstructured data Project using Spark and Kafka

Each streaming of unstructured JSON data which is fetched through a directory to Kafka through NIFI is

processing by spark by writing necessary UDFs to convert the unstructured data to structured data for

processing it using Spark sql

Project 5 –Telecom JSON semi Structured data processing using Spark Structured streaming and Kafka

The restful information is fetched and produced to Kafka through Nifi and processing using Spark

structured streaming and produced to another kafka topic after applying necessary transformations and

actions in which we feed the data to SQL to see the visualization in GRAFANA dashboard.