about intellipaat · 2019-08-14 · the big data hadoop certification combo course provided by the...
Post on 23-May-2020
3 Views
Preview:
TRANSCRIPT
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
About Intellipaat
Intellipaat is a global online professional training provider. We are
offering some of the most updated, industry-designed certification
training programs in the domains of Big Data, Data Science & AI,
Business Intelligence, Cloud, Blockchain, Database, Programming,
Testing, SAP and 150 more technologies.
We help professionals make the right career decisions, choose the
trainers with over a decade of industry experience, provide extensive
hands-on projects, rigorously evaluate learner progress and offer
industry-recognized certifications. We also assist corporate clients to
upskill their workforce and keep them in sync with the changing
technology and digital landscape.
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
About The Course
The Big Data Hadoop certification combo course provided by the
pioneering e-learning institute Intellipaat will help you master various
aspects of Big Data Hadoop, Apache Storm, Apache Spark and Scala
programming language. An online classroom training will be
provided for Big Data Hadoop, Spark and Scala, and for Apache
Storm self-paced videos will be provided for self-study.
Instructor Led Training
102 Hrs of highly
interactive instructor led
training
Self-Paced Training
114 Hrs of Self-Paced
sessions with Lifetime
access
Exercise and project
work
166 Hrs of real-time
projects after every
module
Lifetime Access
Lifetime access and
free upgrade to latest
version
Support
Lifetime 24*7
technical support
and query resolution
Get Certified
Get global industry
recognized
certifications
Job Assistance
Job assistance
through 80+
corporate tie-ups
Flexi Scheduling
Attend multiple
batches for lifetime &
stay updated.
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Why take this Course?
This is a comprehensive course to help you make a big leap into the
Big Data Hadoop ecosystem. This training will provide you with
enough proficiency to work on real-world projects on Big Data, build
resilient Hadoop clusters, perform high-speed data processing using
Apache Spark, write versatile application using Scala programming
and so on. Above all, this is a great combo course to help you land in
the best jobs in the Big Data domain.
1. Hadoop Installation and Setup
2. Introduction to Big Data Hadoop and
Understanding HDFS and MapReduce
3. Deep Dive in Mapreduce
4. Introduction to Hive
5. Advance Hive and Impala
6. Introduction to Pig
7. Flume, Sqoop and HBase
Big Data Hadoop Course
Content
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
8. Hadoop Administration – Multi-node Cluster Setup Using
Amazon EC2
9. Hadoop Administration – Cluster Configuration
10. Hadoop Administration – Maintenance, Monitoring and
Troubleshooting
11. ETL Connectivity with Hadoop Ecosystem
12. Project Solution Discussion and Cloudera Certification
Tips and Tricks
1. Hadoop Application Testing
2. Roles and Responsibilities of Hadoop Testing Professional
3. Framework Called MR Unit for Testing of Map-Reduce
Programs
4. Unit Testing
5. Test Execution
6. Test Plan Strategy and Writing Test Cases for Testing
Hadoop Application
Following topics will be available only
in self-paced mode
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Scala Course Content
1. Introduction to Scala
2. Pattern Matching
3. Executing the Scala Code
4. Classes Concept in Scala
5. Case Classes and Pattern Matching
6. Concepts of Traits with Example
7. Scala Java Interoperability
8. Scala Collections
9. Mutable Collections Vs. Immutable Collections
10. Use Case Bobsrockets Package
Spark Course Content
1. Introduction to Spark
2. Spark Basics
3. Working with RDDs in Spark
4. Aggregating Data with Pair RDDs
5. Writing and Deploying Spark Applications
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
6. Writing and Deploying Spark Applications
7. Parallel Processing
8. Spark RDD Persistence
9. Spark Mllib
10. Integrating Apache Flume and Apache Kafka
11. Spark Streaming
12. Improving Spark Performance
13. Spark SQL and Data Frames
14. Scheduling/Partitioning
Apache Storm Course Content
1. Understanding Architecture of Storm
2. Installation of Apache Storm
3. Introduction to Apache Storm
4. Apache Kafka Installation
5. Apache Storm Advanced
6. Storm Topology
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
7. Overview of Trident
8. Storm Components and classes
9. Cassandra Introduction
10. Boot Stripping
Hadoop Installation and Setup The architecture of Hadoop 2.0 cluster
What is High Availability and Federation
How to setup a production cluster, various shell commands in Hadoop
Understanding configuration files in Hadoop 2.0
Installing single node cluster with Cloudera Manager and understanding Spark,
Scala, Sqoop, Pig and Flume
Introduction to Big Data Hadoop and
Understanding HDFS and MapReduce Introducing Big Data and Hadoop, what is Big Data and where does Hadoop fit in
Two important Hadoop ecosystem components, namely, Map Reduce and HDFS,
in-depth Hadoop Distributed File System – Replications,
Block Size, Secondary Name node, High Availability and in-depth YARN – resource
manager and node manager
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Deep Dive in Mapreduce Learning the working mechanism of MapReduce
Understanding the mapping and reducing stages in MR
Various terminologies in MR like Input Format, Output Format, Partitioners,
Combiners, Shuffle and Sort
Introduction to Hive Introducing Hadoop Hive, detailed architecture of Hive
Comparing Hive with Pig and RDBMS
Working with Hive Query Language, creation of database, table, Group by and
other clauses
Various types of Hive tables, HCatalog, storing the Hive Results, Hive
partitioning and Buckets
Advance Hive and Impala Indexing in Hive, the Map Side Join in Hive
Working with complex data types, the Hive User-defined Functions
Introduction to Impala, comparing Hive with Impala, the detailed architecture
of Impala
Introduction to Pig Apache Pig introduction, its various features
Various data types and schema in Hive
The available functions in Pig, Hive Bags, Tuples and Fields
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Flume, Sqoop and HBase Apache Sqoop introduction, overview, importing and exporting data,
performance improvement with Sqoop
Sqoop limitations, introduction to Flume and understanding the architecture of
Flume and what is HBase and the CAP theorem
Hadoop Administration – Multi-node
Cluster Setup Using Amazon EC2 Create a 4-node Hadoop cluster setup
Running the MapReduce Jobs on the Hadoop cluster
Successfully running the MapReduce code and working with the Cloudera
Manager setup
Hadoop Administration – Cluster
Configuration The overview of Hadoop configuration, the importance of Hadoop
configuration file, the various parameters and values of configuration
The HDFS parameters and MapReduce parameters
Setting up the Hadoop environment, the Include and Exclude configuration
files
The administration and maintenance of NameNode, DataNode directory
structures and files
What is a File system image and understanding Edit log.
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Hadoop Administration –
Maintenance, Monitoring and
Troubleshooting Introduction to the checkpoint procedure
NameNode failure and how to ensure the recovery procedure, Safe Mode,
Metadata and Data backup
Various potential problems and solutions, what to look for and how to add and
remove nodes
ETL Connectivity with Hadoop
Ecosystem How ETL tools work in Big Data Industry
Introduction to ETL and data warehousing
Working with prominent use cases of Big Data in ETL industry and end-to-end
ETL PoC showing Big Data integration with ETL tool
Project Solution Discussion and
Cloudera Certification Tips and Tricks Working towards the solution of the Hadoop project solution, its problem
statements and the possible solution outcomes
Preparing for the Cloudera certifications, points to focus for scoring the
highest marks and tips for cracking Hadoop interview questions
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Following topics will be available only
in self-paced mode
Hadoop Application Testing Why testing is important, Unit testing, Integration testing, Performance testing,
Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional
testing, Release certification testing, Security testing, Scalability testing,
Commissioning and Decommissioning of data nodes testing, Reliability testing
and Release testing
Roles and Responsibilities of Hadoop
Testing Professional Understanding the Requirement, preparation of the Testing Estimation, Test
Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect
Retest, Daily Status report delivery, Test completion
ETL testing at every stage (HDFS, Hive and HBase) while loading the input
(logs, files, records, etc.) using Sqoop/Flume which includes but not limited to
data verification, Reconciliation
User Authorization and Authentication testing (Groups, Users, Privileges, etc.),
reporting defects to the development team or manager and driving them to
closure
Consolidating all the defects and create defect reports, validating new feature
and issues in Core Hadoop
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Framework Called MR Unit for Testing
of Map-Reduce Programs Report defects to the development team or manager and driving them to
closure, consolidate all the defects and create defect reports
Responsible for creating a testing framework called MR Unit for testing of
MapReduce programs
Unit Testing Automation testing using the OOZIE and data validation using the query surge
tool
Test Execution Test plan for HDFS upgrade, test automation and result
Test Plan Strategy and Writing Test
Cases for Testing Hadoop Application
How to test install and configure
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Introduction to Scala Introducing Scala and deployment of Scala for Big Data applications and
Apache Spark analytics
Scala REPL, Lazy Values, Control Structures in Scala
Directed Acyclic Graph (DAG), First Spark Application Using SBT/Eclipse,
Spark Web UI, Spark in Hadoop Ecosystem.
Pattern Matching The importance of Scala, the concept of REPL (Read Evaluate Print Loop)
Deep dive into Scala pattern matching, type interface, higher-order function,
currying, traits, application space and Scala for data analysis
Executing the Scala Code Learning about the Scala Interpreter, static object timer in Scala and testing
string equality in Scala, implicit classes in Scala
The concept of currying in Scala and various classes in Scala
Classes Concept in Scala Learning about the Classes concept, understanding the constructor
overloading, various abstract classes
The hierarchy types in Scala
The concept of object equality and the val and var methods in Scala
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Case Classes and Pattern Matching Understanding sealed traits, wild, constructor, tuple, variable pattern and
constant pattern
Concepts of Traits with Example Understanding traits in Scala, the advantages of traits
Linearization of traits, the Java equivalent, and avoiding of boilerplate code
Scala Java Interoperability Implementation of traits in Scala and Java and handling of multiple traits
extending
Scala Collections Introduction to Scala collections, classification of collections
The difference between Iterator and Iterable in Scala and example of list
sequence in Scala
Mutable Collections Vs. Immutable
Collections The two types of collections in Scala, Mutable and Immutable collections,
understanding lists and arrays in Scala
The list buffer and array buffer, queue in Scala and double-ended queue
Deque, Stacks, Sets, Maps and Tuples in Scala
Use Case Bobsrockets Package Introduction to Scala packages and imports
The selective imports, the Scala test classes
Introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test
Packaging of Scala applications in Directory Structure and examples of Spark
Split and Spark Scala
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Introduction to Spark Introduction to Spark, how Spark overcomes the drawbacks of working
MapReduce, understanding in-memory MapReduce
Interactive operations on MapReduce, Spark stack, fine vs. coarse-grained
update, Spark stack, Spark Hadoop YARN, HDFS Revision, YARN Revision
The overview of Spark and how it is better Hadoop, deploying Spark without
Hadoop, Spark history server and Cloudera distribution
Spark Basics Spark installation guide, Spark configuration, memory management, executor
memory vs. driver memory
Working with Spark Shell, the concept of resilient distributed datasets (RDD)
Learning to do functional programming in Spark and the architecture of Spark
Working with RDDs in Spark Spark RDD, creating RDDs, RDD partitioning, operations, and transformation in
RDD, Deep dive into Spark RDDs
The RDD general operations, a read-only partitioned collection of records
Using the concept of RDD for faster and efficient data processing, RDD action
for collect, count, collects map, save-as-text-files and pair RDD functions
Aggregating Data with Pair RDDs Understanding the concept of Key-Value pair in RDDs
Learning how Spark makes MapReduce operations faster
Various operations of RDD, MapReduce interactive operations, fine and
coarse-grained update and Spark stack
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Writing and Deploying Spark
Applications Comparing the Spark applications with Spark Shell
Creating a Spark application using Scala or Java
Deploying a Spark application, Scala built application, creation of mutable list,
set and set operations, list, tuple, concatenating list
Creating application using SBT, deploying application using Maven
The web user interface of Spark application, a real-world example of Spark and
configuring of Spark
Parallel Processing Learning about Spark parallel processing
Deploying on a cluster, introduction to Spark partitions
File-based partitioning of RDDs, understanding of HDFS and data locality,
mastering the technique of parallel operations
Comparing repartition and coalesce and RDD actions
Spark RDD Persistence The execution flow in Spark
Understanding the RDD persistence overview, Spark execution flow, and Spark
terminology
Distribution shared memory vs. RDD, RDD limitations
Spark shell arguments, distributed persistence
RDD lineage, Key-Value pair for sorting implicit conversions like CountByKey,
ReduceByKey, SortByKey and AggregateByKey
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Spark MLlib Introduction to Machine Learning
Types of Machine Learning
Introduction to Mllib
Various ML algorithms supported by Mllib
Linear Regression, Logistic Regression, Decision Tree, Random Forest, K-means
clustering techniques, building a Recommendation Engine
Integrating Apache Flume and Apache
Kafka Why Kafka, what is Kafka, Kafka architecture, Kafka workflow
Configuring Kafka cluster, basic operations, Kafka monitoring tools
Integrating Apache Flume and Apache Kafka
Spark Streaming
Introduction to Spark Streaming
Features of Spark Streaming, Spark Streaming workflow
Initializing StreamingContext, Discretized Stream (DStreams), Input DStreams
and Receivers, transformations on DStreams, Output Operations on Dstreams
Windowed Operators and why it is useful
Important Windowed Operators, Stateful Operators.
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Improving Spark Performance Introduction to various variables in Spark like shared variables and broadcast
variables
Learning about accumulators
The common performance issues and troubleshooting the performance
problems
Spark SQL and Data Frames Learning about Spark SQL, the context of SQL in Spark for providing structured
data processing,JSON support in Spark SQL
Working with XML data, parquet files,Creating Hive context, writing Data
Frame to Hive, reading JDBC files
Understanding the Data Frames in Spark,Creating Data Frames, manual
inferring of schema
Working with CSV files, reading JDBC tables, Data Frame to JDBC
User-defined functions in Spark SQL,Shared variables and accumulators
Learning to query and transform data in Data Frames
How Data Frame provides the benefit of both Spark RDD and Spark SQL and
deploying Hive on Spark as the execution engine
Scheduling/Partitioning Learning about the scheduling and partitioning in Spark, hash partition, range
partition
Scheduling within and around applications, static partitioning, dynamic
sharing, fair scheduling
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Understanding Architecture of Storm Big Data characteristics, understanding Hadoop distributed computing
The Bayesian Law, deploying Storm for real time analytics
Apache Storm features
Comparing Storm with Hadoop
Storm execution and learning about Tuple, Spout and Bolt
Installation of Apache Storm Installing Apache Storm and various types of run modes of Storm
Introduction to Apache Storm Understanding Apache Storm and the data model
Apache Kafka Installation Installation of Apache Kafka and its configuration
Map partition with index, the Zip, GroupByKey, Spark master high availability,
standby masters with ZooKeeper
Single-node Recovery with Local File System and High Order Functions
Apache Storm Advanced Understanding of advanced Storm topics like Spouts, Bolts, Stream Groupings
Topology and its Life cycle and learning about Guaranteed Message
Processing
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Storm Topology Various grouping types in Storm, reliable and unreliable messages, Bolt
structure and life cycle
Understanding Trident topology for failure handling
Process and Call Log Analysis Topology for an analyzing call logs for calls
made from one number to another
Overview of Trident Understanding of Trident Spouts and its different types
Various Trident Spout interface and components
Familiarizing with Trident Filter, Aggregator and Functions and a practical and
hands-on use case on solving call log problem using Storm Trident
Storm Components and classes Various components, classes and interfaces in Storm like, Base Rich Bolt Class
i RichBolt Interface, i RichSpout Interface, Base Rich Spout class, and the
various methodology of working with them
Cassandra Introduction Understanding Cassandra, its core concepts and its strengths and deployment.
Boot Stripping Twitter Boot Stripping, detailed understanding of Boot Stripping
Concepts of Storm and Storm Development Environment
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project Works
Project 1 : Working with MapReduce, Hive and Sqoop
Industry : General
Problem Statement : How to successfully import data using Sqoop into HDFS for data analysis.
Topics : As part of this project, you will work on the various Hadoop components like
MapReduce, Apache Hive and Apache Sqoop. You will have to work with Sqoop to import data
from relational database management system like MySQL data into HDFS. You need to deploy
Hive for summarizing data, querying and analysis. You have to convert SQL queries using
HiveQL for deploying MapReduce on the transferred data. You will gain considerable
proficiency in Hive and Sqoop after the completion of this project.
Highlights
Sqoop data transfer from RDBMS to Hadoop
Coding in Hive Query Language
Data querying and analysis
Project 2: Work on MovieLens data for finding the top movies
Industry : Media and Entertainment
Problem Statement : How to create the top ten movies list using the MovieLens data
Topics : In this project you will work exclusively on data collected through MovieLens available
rating data sets. The project involves writing MapReduce program to analyze the MovieLens
data and creating the list of top ten movies. You will also work with Apache Pig and Apache
Hive for working with distributed datasets and analyzing it.
Highlights
MapReduce program for working on the data file
Apache Pig for analyzing data
Apache Hive data warehousing and querying
Hadoop Projects
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 3 : Hadoop YARN Project; End-to-end PoC
Industry : Banking
Problem Statement : How to bring the daily data ( incremental data) into the Hadoop
Distributed File System
Topics : In this project, we have transaction data which is daily recorded/stored in the RDBMS.
Now this data is transferred everyday into HDFS for further Big Data Analytics. You will work on
live Hadoop YARN cluster. YARN is part of the Hadoop 2.0 ecosystem that lets Hadoop to
decouple from MapReduce and deploy more competitive processing and wider array of
applications. You will work on the YARN central resource manager.
Highlights
Using Sqoop commands to bring the data into HDFS
End to End flow of transaction data
Working with the data from HDFS
Project 4: Table Partitioning in Hive
Industry : Banking
Problem Statement : How to improve the query speed using Hive data partitioning.
Topics : This project involves working with Hive table data partitioning. Ensuring the right
partitioning helps to read the data, deploy it on the HDFS, and run the MapReduce jobs at a
much faster rate. Hive lets you partition data in multiple ways. This will give you hands-on
experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic
partitioning and bucketing of data so as to break it into manageable chunks.
Highlights
Manual Partitioning
Dynamic Partitioning
Bucketing
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 5 : Connecting Pentaho with Hadoop Ecosystem
Industry : Social Network
Problem Statement : How to deploy ETL for data analysis activities.
Topics : This project lets you connect Pentaho with the Hadoop ecosystem. Pentaho works
well with HDFS, HBase, Oozie and ZooKeeper. You will connect the Hadoop cluster with
Pentaho data integration, analytics, Pentaho server and report designer. This project will give
you complete working knowledge on the Pentaho ETL tool.
Highlights
Working knowledge of ETL and Business Intelligence
Configuring Pentaho to work with Hadoop distribution
Loading, transforming and extracting data into Hadoop cluster
Project 6: Multi-node Cluster Setup
Industry : General
Problem Statement : How to setup a Hadoop real-time cluster on Amazon EC2.
Topics : This is a project that gives you opportunity to work on real world Hadoop multi-node
cluster setup in a distributed environment. You will get a complete demonstration of working
with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for
running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster.
Highlights
Hadoop installation and configuration
Running a Hadoop multi-node using a 4 node cluster on Amazon EC2
Deploying of MapReduce job on the Hadoop cluster.
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 7 : Hadoop Testing Using MRUnit
Industry : General
Problem Statement : How to test MapReduce applications
Topics : In this project you will gain proficiency in Hadoop MapReduce code testing using
MRUnit. You will learn about real-world scenarios of deploying MRUnit, Mockito and
PowerMock. This will give you hands-on experience in various testing tools for Hadoop
MapReduce. After completion of this project you will be well-versed in test-driven development
and will be able to write light-weight test units that work specifically on the Hadoop
architecture.
Highlights
Writing JUnit tests using MRUnit for MapReduce applications
Doing mock static methods using PowerMock and Mockito
MapReduce Driver for testing the map and reduce pair
Project 8: Hadoop WebLog Analytics
Industry : Internet Services
Problem Statement : How to derive insights from web log data
Topics : This project is involved with making sense of all the web log data in order to derive
valuable insights from it. You will work with loading the server data onto a Hadoop cluster
using various techniques. The web log data can include various URLs visited, cookie data, user
demographics, location, date and time of web service access, etc. In this project you will
transport the data using Apache Flume or Kafka, workflow and data cleansing using
MapReduce, Pig or Spark. The insight thus derived can be used for analyzing customer behavior
and predict buying patterns.
Highlights
Aggregation of log data
Apache Flume for data transportation
Processing of data and generating analytics
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 9 : Hadoop Maintenance
Industry : General
Problem Statement : How to administer a Hadoop cluster
Topics : This project is involved with working on the Hadoop cluster for maintaining and
managing it. You will work on a number of important tasks that include recovering of data,
recovering from failure, adding and removing of machines from the Hadoop cluster and
onboarding of users on Hadoop.
Highlights
Working with Name Node directory structure
Audit logging, data node block scanner and balancer.
Failover, fencing, DISTCP and Hadoop file formats.
Project 10: Twitter Sentiment Analysis
Industry : Social Media
Problem Statement : Find out what is the reaction of the people to the demonetization move
by India by analyzing their tweets.
Topics : This Project involves analyzing the tweets of people by going through what they are
saying about the demonetization decision taken by the Indian government. Then you look for
key phrases and words and analyze them using the dictionary and the value attributed to them
based on the sentiment that they are conveying.
Highlights
Download the tweets and Load into Pig storage
Divide tweets into words to calculate sentiment
Rating the words from +5 to -5 on AFFIN dictionary
Filtering the tweets and analyzing sentiment
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 11 : Analyzing IPL T20 Cricket
Industry : Sports and Entertainment
Problem Statement : Analyze the entire cricket match and get answers to any question
regarding the details of the match.
Topics : This project involves working with the IPL dataset that has information regarding
batting, bowling, runs scored, wickets taken and more. This dataset is taken as input, and then it
is processed so that the entire match can be analyzed based on the user queries or needs.
Highlights
Load the data into HDFS
Analyze the data using Apache Pig or Hive
Based on user queries give the right output
Project 1 : Movie Recommendation
Industry : Entertainment
Problem Statement : How to recommend the most appropriate movie to a user based on his
taste
Topics : This is a hands-on Apache Spark project deployed for the real-world application of
movie recommendations. This project helps you gain essential knowledge in Spark MLlib which
is a Machine Learning library; you will know how to create collaborative filtering, regression,
clustering and dimensionality reduction using Spark MLlib. Upon finishing the project, you will
have first-hand experience in the Apache Spark streaming data analysis, sampling, testing and
statistics, among other vital skills.
Highlights
Apache Spark MLlib component
Statistical analysis
Regression and clustering
Apache Spark Projects
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 2 : Twitter API Integration for tweet Analysis
Industry : Social Media
Problem Statement : Analyzing the user sentiment based on the tweet
Topics : This is a hands-on Twitter analysis project using the Twitter API for analyzing of
tweets. You will integrate the Twitter API and do programming using Python or PHP for
developing the essential server-side codes. Finally, you will be able to read the results for
various operations by filtering, parsing and aggregating it depending on the tweet analysis
requirement.
Highlights
Making requests to Twitter API
Building the server-side codes
Filtering, parsing and aggregating data
Project 3 : Data Exploration Using Spark SQL – Wikipedia Data Set
Industry : Internet
Problem Statement : Making sense of Wikipedia data using Spark SQL
Topics : In this project you will be using the Spark SQL tool for analyzing the Wikipedia data.
You will gain hands-on experience in integrating Spark SQL for various applications like batch
analysis, Machine Learning, visualizing and processing of data and ETL processes, along with
real-time analysis of data.
Highlights
Machine Learning using Spark
Deploying data visualization
Spark SQL integration
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 1 : Movie Recommendation
Industry : Entertainment
Topics : This is a project wherein you will gain hands-on experience in deploying Apache Spark
for movie recommendation. You will be introduced to the Spark Machine Learning Library, a
guide to MLlib algorithms and coding which is a Machine Learning library. You will understand
how to deploy collaborative filtering, clustering, regression, and dimensionality reduction in
MLlib. Upon the completion of the project, you will gain experience in working with streaming
data, sampling, testing and statistics.
Project 2 : Twitter API Integration for Tweet Analysis
Industry : Social Media
Topics : With this project, you will learn to integrate Twitter API for analyzing tweets. You will
write codes on the server side using any of the scripting languages like PHP, Ruby or Python,
for requesting the Twitter API and get the results in JSON format. You will then read the results
and perform various operations like aggregation, filtering and parsing as per the need to come
up with tweet analysis.
Project 3 : Data Exploration Using Spark SQL – Wikipedia Data set
Industry : Technology
Topics : This project lets you work with Spark SQL. You will gain experience in working with
Spark SQL for combining it with ETL applications, real time analysis of data, performing batch
analysis, deploying Machine Learning, creating visualizations and processing of graphs.
Apache Spark – Scala Project
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Project 1 : Call Log Analysis Using Trident
Industry : Technology
Topics : In this project, you will be working on call logs to decipher the data and gather
valuable insights using Apache Storm Trident. You will extensively work with data about calls
made from one number to another. The aim of this project is to resolve the call log issues with
Trident stream processing and low latency distributed querying. You will gain hands-on
experience in working with Spouts and Bolts, along with various Trident functions, filters,
aggregation, joins and grouping.
Project 2 : Twitter Data Analysis Using Trident
Industry : Social Media
Topics : This is a project that involves working with Twitter data and processing it to extract
patterns out of it. The Apache Storm Trident is the perfect framework for real-time analysis of
tweets. While working with Trident, you will be able to simplify the task of live Twitter feed
analysis. In this project, you will gain real-world experience of working with Spouts, Bolts,
Trident filters, joins, aggregation, functions and grouping.
Project 3 : The US Presidential Election Result Analysis Using Trident DRPC Query
Industry : Politics
Topics : This is a project that lets you work on the US presidential election results and predict
who is leading and trailing on a real-time basis. For this, you exclusively work with Trident
distributed remote procedure call server. After the completion of the project, you will learn how
to access data residing in a remote computer or network and deploy it for real-time processing,
analysis and prediction.
Apache Storm Project
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Job Assistance ProgramIntellipaat is offering job assistance to all the learners who have completed the training. You
should get a minimum of 60% marks in the qualifying exam to avail job assistance.
Intellipaat has exclusive tie-ups with over 80 MNCs for placements.
Intellipaat Alumni Working in Top Companies
Start receiving interview callsSuccessfully finish the training Get your resume updated
Robin Jack
Mainframe Senior Developer at IBM
This software testing automation training is the most practical and easy way to learn
Selenium covering all topics.
David Juvan
Software Tester at Dell
I'm extremely impressed with this training session. Thanks to the instructor who was very
patient in explaining all our doubts clearly. I was concerned initially if I have made a rright
choice in picking up a right institute. But now I will definitely recommend Intellipaat for
training course
Niharika Mittal
Blockchain Developer and Testing Enthusiast at IBM
This is a great way to learn Selenium automated testing. The best part is that the entire
Selenium course is in line with the industry certification.
More Customer Reviews
www.intellipaat.com ©Copyright IntelliPaat. All rights reserved.
Q 1. What is the criterion for availing the Intellipaat job assistance program?
Ans. All Intellipaat learners who have successfully completed the training post April 2017 are
directly eligible for the Intellipaat job assistance program.
Q 2. Which are the companies that I can get placed in?
Ans. We have exclusive tie-ups with MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma,
Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, and more. So you have the
opportunity to get placed in these top global companies.
Q 3. Do I need to have prior industry experience for getting an interview call?
Ans. There is no need to have any prior industry experience for getting an interview call. In fact,
the successful completion of the Intellipaat certification training is equivalent to six months of
industry experience. This is definitely an added advantage when you are attending an interview.
Q 4. If I don’t get a job in the first attempt, can I get another chance?
Ans. Definitely, yes. Your resume will be in our database and we will circulate it to our MNC
partners until you get a job. So there is no upper limit to the number of job interviews you can
attend.
Q 5. Does Intellipaat guarantee a job through its job assistance program?
Ans. Intellipaat does not guarantee any job through the job assistance program. However, we
will definitely offer you full assistance by circulating your resume among our affiliate partners.
Frequently Asked Questions
Our Clients
+80 Corporates
top related