big data course · hadoop architecture hadoop 1.x core components hadoop 2.x core components...
TRANSCRIPT
[Since 2009]
In Exclusive Association with
26,500+ Participants | 10,000+ Brands | 1900+ Trainings | 55+ Countries
BIG DATA COURSE
in India to train their partnersis training partner for
Specialization in Hive, Pig, Sqoop, Flume, Oozie
Big Data Engineer
Big Data Engineer / ETL Engineer
Some of the core skills required for this role are:
1: Working in big data environmentHadoop ecosystem which includes various
tools like Map Reduce, Hive, Sqoop, Pig, HBase, Spark, Flume etc. and able to design solutions.
2: Well versed with concepts of datawarehousing, ETL, fundamentals of Hadoop,
YARN, HDFS, Map Reduce etc.
3: Hand-on experience in big data ETL technolgies like HiveQL,
Pig Latin, Sqoop, Flume etc.
4:Good knowledge of big data querying tools such as Hive, Beeline,
Impala.
5:Familiarity in creating data pipelines &Oozie workflows for scheduling of ETL batch
processing jobs.
6:Experience with NoSQL databasessuch as HBase, Cassandra, MongoDB.
Design and develop ETL platforms (using Hive, Pig, Sqoop, Flume etc.) on Hadoop for various business use cases which are fault-tolerant, highly distributed, and robust.
Work on structured, semi-structured data to put this data to business use. This will involve organizing the data (collecting, storing, processing) and analyze huge sets of structured and semi-structured data for business analytics solution using cutting-edge and open source tools and techniques.
Are you passionate about data and the technology needed to drive a multi-billion dollar business? This course is for you and designed to address those opportunities for the role of Big Data/ETL Engineer. The primary description of this role includes:
With massive upscaling of data on a daily basis, there is a huge demand for people with skills to manage, analyze and help organizations to make data driven decisions which help them to solve a business use case. A growing field like this offers a new and challenging career opportunity to people who are enthusiastic about data and have a zeal to learn new cutting edge technologies having a strong analytical and logical reasoning mindset.
Salient Features
Course Advisors
Programmers, Software Engineers, Data Engineers, Database Experts, Data Analysts, SQL Experts, Developers, Testers, Quality Engineers,
Java J2EE Developers, Python Developers
Course Highlights
Govt. of India(Vskills Certified Course)
3 Hrs/Week Live Instructor-Led Online Sessions
Lifetime Access ToUpdated Content and
Videos
Industry andAcademia Faculty
15 Days of Project Work
Specialize in Hive, Pig, Sqoop, Flume, Oozie
Active Q/A Forum
Placement Support
Class Labs/Home Assignment (10 hours/Week Learning Time)
Personalised Training Program
Industry’s TopBig Data Advisors
Top Big Data Tools Covered
Who is this Course for
Shweta GuptaVice President, Tech.
Shweta Gupta has 19+ years of Technology Leadership experience. She holds a patent and number of publications in ACM, IEEE and IBM journals like Redbook and developerWorks.
Manas Garg heads the Analytics for Marketing at Paypal. He takes Data Driven Decisions for Marketing Success.
Vishal is a Technology Influencer and CEO of Right Relevance. (A platform used by millions for content & influencer discovery)
Manas GargArchitect
Vishal MishraCEO & Co-Founder
Course Trainers
Course Curriculum
Foundation Courseware
Introduction to Big Data StorageWhat is Big Data (Evolution)Introduction to Big DataProblems with Traditional Large-Scale SystemsIntroduction to Distributed File Systems/ Computing
Introduction to Hadoop and HDFSLimitations and Solutions of Traditional SystemMotivation for HadoopHistory of HadoopBenefits of HadoopHadoop Ecosystem
Big Data Solution LandscapeIndustry Insight Use Cases of Big Data AnalyticsBig Data Technology Career PathCloudera Hadoop Docker Image Installation
5 Weeks
ROHIT KUMAR
Rohit Kumar is a Big Data Researcher with publications in many prestigious international conferences. He has 6 plus years experience in industry and expertise in various programming languages including Java, Scala, C++, Python, and Haskel. He works in variety of different database systems such as MySQL, Microsoft SQL, and Oracle Coherence and in many Big Data systems like Hadoop, Apache Spark, Apache Storm, Kafka, MongoDB.
Hadoop ArchitectureHadoop 1.x Core ComponentsHadoop 2.x Core ComponentsFundamentals of HadoopHadoop Master-Slave ArchitectureYARN for Resource ManagementDifferent Types of Cluster SetupsUnderstanding Hadoop Configuration FilesHadoop SecurityHDFS Architecture
Hadoop Fault ToleranceHands-On Exercise: HDFS Commands
Processing FrameworkMap ReduceUnderstanding Map ReduceMap Reduce OverviewData Flow of Map ReduceYARN Map Reduce Detail FlowConcept of Mapper & ReducerSpeculative ExecutionHadoop Fault ToleranceSubmission & Initialization of Map Reduce JobMonitoring & Progress of Map Reduce Job
Data StorageRDBMS, NOSQL Database - HBASEIntroduction to NOSQL DatabasesNOSQL v/s RDBMSNOSQL Database TypesIntroduction to HBaseHBase vs RDBMSHBase ArchitectureHBase Components
Big Data and Cloud PlatformsIntroduction to Cloud PlatformsIntroduction to Cloud ComputingCloud Computing ModelsUnderstanding of Public, Private, Hybrid CloudCharacteristics of Cloud ComputingMajor Players in Market - AWS, Azure, Google CloudOverview of Amazon Web ServicesAmazon Web Services Cloud PlatformBig Data on Cloud - Amazon EMRAmazon Cloud Storage - S3Adoption of AWS in Public and Private Sector
Big Data Engineer
Data Warehousing Platforms on HadoopOverview of ETL and Data Warehousing
Data Query and Reporting - IData Analysis Using HiveIntroduction to HiveHive Architecture and Query FlowIntroduction to HiveServer2 - BeelineHive Table TypesHive Operations on CLI and HUEHive File FormatsPartitioning, Bucketing, Views, IndexesHiveQL Scripting - JOINS, Partitioning, ExecutionHive Integration with Spark: PySparkHive Integration with HBase: CRUD OperationsHands-On ExerciseQuiz
Data Analytics using SAS
Data Query and Reporting - IIData Analysis Using PigIntroduction to PigPig ArchitectureHive vs Pig: Use CasePig Latin DatatypesPig ModesPig Latin Program & ExecutionPig Latin OperationsData Analysis using Pig on Structured DataData Analysis using Pig on Semi-structured DataHands-On ExerciseQuiz
Data Query and Reporting - IIIData Analysis Using ImpalaIntroduction to ImpalaImpala ArchitectureHive vs ImpalaHands-On ExerciseQuiz
Data Movement FrameworksSqoop And FlumeIntroduction to Sqoop and ArchitectureImport Data from RDBMS to HDFS & HiveExport Data from HDFS & Hive to RDBMSIntroduction to Flume and ArchitectureReal-time Streaming Data Ingestion into HiveHands-On ExerciseQuiz
Job Scheduling FrameworkOozieIntroduction to OozieOozie ArchitectureFork, Join Control Nodes, Oozie CoordinatorOozie Workflow Setup and Execution: HiveQL, Shell, Pig Latin ScriptHands-On ExerciseQuiz
Prerequisite: Basic Unix Commands, RDBMS, SQL
9 Weeks
Capstone Project
Tools for Everyone
Tools for Big Data Engineer
Tools Covered
(3 Weeks)
With the advancement of social media, gathering and analyzing social media data for Marketing andTrend Analysis is becoming quite popular. Among all the different social media platforms presentcurrently, Twitter is one of the most popular platforms where people share their views actively. There aremany companies which mine and analyze Twitter data for various purposes for example:
1. Voters sentiment analysis during elections by analyzing the polarity of tweets and the sentiments ofpeople after a political event, many believe Twitter played a critical role in US 2016 Election.
2. Marketing of new products such as to understand when, where, and how consumers speak aboutpurchasing your product or category, and track changes over time.
3. Evaluate campaign impact by assessing whether your latest creative campaign generate social buzz,and review which interest segments the campaign resonated with most.
Currently, approximately 500 million tweets are generated daily!! That's more than 50 GB of data every day. Storing and analyzing it efficiently is one of the most challenging tasks in Twitter data analysis.
As part of the Big Data Engineering Specialization, we will look into following:
Explore various ways of how to analyze Twitter data, in a few typical business scenarios.
Solve interesting assignments where we will analyze Twitter data.
Use Hive to store and analyze Twitter data and similarly use Pig Latin to perform similar analysis.
Explore the approaches on how to swiftly do the switch from one technology to another during a
problem-solving situation.
Do a complete ETL pipeline flow using Sqoop/Flume.
Advanced: Extra Credits: Do the same pipeline using PySpark/SparkSQL.
ProjectsBatch Options
Certification
WeekendBatch Option
Certificate By
Fee 29,900 (+GST)
14 Weeks
Digital Vidya
+91-84680-02880
www.digitalvidya.com
Interested? Contact Us!
Duration
INR