big data course · hadoop architecture hadoop 1.x core components hadoop 2.x core components...

[Since 2009]

In Exclusive Association with

26,500+ Participants | 10,000+ Brands | 1900+ Trainings | 55+ Countries

BIG DATA COURSE

in India to train their partnersis training partner for

Specialization in Hive, Pig, Sqoop, Flume, Oozie

Big Data Engineer

Big Data Engineer / ETL Engineer

Some of the core skills required for this role are:

1: Working in big data environmentHadoop ecosystem which includes various

tools like Map Reduce, Hive, Sqoop, Pig, HBase, Spark, Flume etc. and able to design solutions.

2: Well versed with concepts of datawarehousing, ETL, fundamentals of Hadoop,

YARN, HDFS, Map Reduce etc.

3: Hand-on experience in big data ETL technolgies like HiveQL,

Pig Latin, Sqoop, Flume etc.

4:Good knowledge of big data querying tools such as Hive, Beeline,

Impala.

5:Familiarity in creating data pipelines &Oozie workflows for scheduling of ETL batch

processing jobs.

6:Experience with NoSQL databasessuch as HBase, Cassandra, MongoDB.

Design and develop ETL platforms (using Hive, Pig, Sqoop, Flume etc.) on Hadoop for various business use cases which are fault-tolerant, highly distributed, and robust.

Work on structured, semi-structured data to put this data to business use. This will involve organizing the data (collecting, storing, processing) and analyze huge sets of structured and semi-structured data for business analytics solution using cutting-edge and open source tools and techniques.

Are you passionate about data and the technology needed to drive a multi-billion dollar business? This course is for you and designed to address those opportunities for the role of Big Data/ETL Engineer. The primary description of this role includes:

With massive upscaling of data on a daily basis, there is a huge demand for people with skills to manage, analyze and help organizations to make data driven decisions which help them to solve a business use case. A growing field like this offers a new and challenging career opportunity to people who are enthusiastic about data and have a zeal to learn new cutting edge technologies having a strong analytical and logical reasoning mindset.

Salient Features

Course Advisors

Programmers, Software Engineers, Data Engineers, Database Experts, Data Analysts, SQL Experts, Developers, Testers, Quality Engineers,

Java J2EE Developers, Python Developers

Course Highlights

Govt. of India(Vskills Certified Course)

3 Hrs/Week Live Instructor-Led Online Sessions

Lifetime Access ToUpdated Content and

Videos

Industry andAcademia Faculty

15 Days of Project Work

Specialize in Hive, Pig, Sqoop, Flume, Oozie

Active Q/A Forum

Placement Support

Class Labs/Home Assignment (10 hours/Week Learning Time)

Personalised Training Program

Industry’s TopBig Data Advisors

Top Big Data Tools Covered

Who is this Course for

Shweta GuptaVice President, Tech.

Shweta Gupta has 19+ years of Technology Leadership experience. She holds a patent and number of publications in ACM, IEEE and IBM journals like Redbook and developerWorks.

Manas Garg heads the Analytics for Marketing at Paypal. He takes Data Driven Decisions for Marketing Success.

Vishal is a Technology Influencer and CEO of Right Relevance. (A platform used by millions for content & influencer discovery)

Manas GargArchitect

Vishal MishraCEO & Co-Founder

Course Trainers

Course Curriculum

Foundation Courseware

Introduction to Big Data StorageWhat is Big Data (Evolution)Introduction to Big DataProblems with Traditional Large-Scale SystemsIntroduction to Distributed File Systems/ Computing

Introduction to Hadoop and HDFSLimitations and Solutions of Traditional SystemMotivation for HadoopHistory of HadoopBenefits of HadoopHadoop Ecosystem

Big Data Solution LandscapeIndustry Insight Use Cases of Big Data AnalyticsBig Data Technology Career PathCloudera Hadoop Docker Image Installation

5 Weeks

ROHIT KUMAR

Rohit Kumar is a Big Data Researcher with publications in many prestigious international conferences. He has 6 plus years experience in industry and expertise in various programming languages including Java, Scala, C++, Python, and Haskel. He works in variety of different database systems such as MySQL, Microsoft SQL, and Oracle Coherence and in many Big Data systems like Hadoop, Apache Spark, Apache Storm, Kafka, MongoDB.

Hadoop ArchitectureHadoop 1.x Core ComponentsHadoop 2.x Core ComponentsFundamentals of HadoopHadoop Master-Slave ArchitectureYARN for Resource ManagementDifferent Types of Cluster SetupsUnderstanding Hadoop Configuration FilesHadoop SecurityHDFS Architecture

Hadoop Fault ToleranceHands-On Exercise: HDFS Commands

Processing FrameworkMap ReduceUnderstanding Map ReduceMap Reduce OverviewData Flow of Map ReduceYARN Map Reduce Detail FlowConcept of Mapper & ReducerSpeculative ExecutionHadoop Fault ToleranceSubmission & Initialization of Map Reduce JobMonitoring & Progress of Map Reduce Job

Data StorageRDBMS, NOSQL Database - HBASEIntroduction to NOSQL DatabasesNOSQL v/s RDBMSNOSQL Database TypesIntroduction to HBaseHBase vs RDBMSHBase ArchitectureHBase Components

Big Data and Cloud PlatformsIntroduction to Cloud PlatformsIntroduction to Cloud ComputingCloud Computing ModelsUnderstanding of Public, Private, Hybrid CloudCharacteristics of Cloud ComputingMajor Players in Market - AWS, Azure, Google CloudOverview of Amazon Web ServicesAmazon Web Services Cloud PlatformBig Data on Cloud - Amazon EMRAmazon Cloud Storage - S3Adoption of AWS in Public and Private Sector

Big Data Engineer

Data Warehousing Platforms on HadoopOverview of ETL and Data Warehousing

Data Query and Reporting - IData Analysis Using HiveIntroduction to HiveHive Architecture and Query FlowIntroduction to HiveServer2 - BeelineHive Table TypesHive Operations on CLI and HUEHive File FormatsPartitioning, Bucketing, Views, IndexesHiveQL Scripting - JOINS, Partitioning, ExecutionHive Integration with Spark: PySparkHive Integration with HBase: CRUD OperationsHands-On ExerciseQuiz

Data Analytics using SAS

Data Query and Reporting - IIData Analysis Using PigIntroduction to PigPig ArchitectureHive vs Pig: Use CasePig Latin DatatypesPig ModesPig Latin Program & ExecutionPig Latin OperationsData Analysis using Pig on Structured DataData Analysis using Pig on Semi-structured DataHands-On ExerciseQuiz

Data Query and Reporting - IIIData Analysis Using ImpalaIntroduction to ImpalaImpala ArchitectureHive vs ImpalaHands-On ExerciseQuiz

Data Movement FrameworksSqoop And FlumeIntroduction to Sqoop and ArchitectureImport Data from RDBMS to HDFS & HiveExport Data from HDFS & Hive to RDBMSIntroduction to Flume and ArchitectureReal-time Streaming Data Ingestion into HiveHands-On ExerciseQuiz

Job Scheduling FrameworkOozieIntroduction to OozieOozie ArchitectureFork, Join Control Nodes, Oozie CoordinatorOozie Workflow Setup and Execution: HiveQL, Shell, Pig Latin ScriptHands-On ExerciseQuiz

Prerequisite: Basic Unix Commands, RDBMS, SQL

9 Weeks

Capstone Project

Tools for Everyone

Tools for Big Data Engineer

Tools Covered

(3 Weeks)

With the advancement of social media, gathering and analyzing social media data for Marketing andTrend Analysis is becoming quite popular. Among all the different social media platforms presentcurrently, Twitter is one of the most popular platforms where people share their views actively. There aremany companies which mine and analyze Twitter data for various purposes for example:

1. Voters sentiment analysis during elections by analyzing the polarity of tweets and the sentiments ofpeople after a political event, many believe Twitter played a critical role in US 2016 Election.

2. Marketing of new products such as to understand when, where, and how consumers speak aboutpurchasing your product or category, and track changes over time.

3. Evaluate campaign impact by assessing whether your latest creative campaign generate social buzz,and review which interest segments the campaign resonated with most.

Currently, approximately 500 million tweets are generated daily!! That's more than 50 GB of data every day. Storing and analyzing it efficiently is one of the most challenging tasks in Twitter data analysis.

As part of the Big Data Engineering Specialization, we will look into following:

Explore various ways of how to analyze Twitter data, in a few typical business scenarios.

Solve interesting assignments where we will analyze Twitter data.

Use Hive to store and analyze Twitter data and similarly use Pig Latin to perform similar analysis.

Explore the approaches on how to swiftly do the switch from one technology to another during a

problem-solving situation.

Do a complete ETL pipeline flow using Sqoop/Flume.

Advanced: Extra Credits: Do the same pipeline using PySpark/SparkSQL.

ProjectsBatch Options

Certification

WeekendBatch Option

Certificate By

Fee 29,900 (+GST)

14 Weeks

Digital Vidya

+91-84680-02880

www.digitalvidya.com

Interested? Contact Us!

[email protected]

Duration

INR

big data course · hadoop architecture hadoop 1.x core components hadoop 2.x core components...

Documents