hadoop bigdata training bangalore.pptx
TRANSCRIPT
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
1/27
Big Data & Hadoop
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
2/27
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
3/27
I ntroduction to Big Data
Big Data: A Term for collective data sets with large and complex
volumes of data.
Volumes are in Petabytes (1024 TB) or Exabytes (1024 PB) &
will soon be Zettabytes (1024 EB).
Hence, the data are hard to interpret & process in the existing
traditional data processing application and tools.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
4/27
Why Big Data
To Manage Huge Data in a better way.
Benefit of Data Speed, Capacity & Scalability from cloud storage.
Potential insights by Data Analysis Methods.
Companies can find new prospects & Business Opportunities.
Unlike other methods, with Big Data, Business Users can
Visualize the Data.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
5/27
Big Data Overview
Big Data include:
Traditional Structured Databasesfrom inventories, orders and
customer information.
Unstructured Datafrom web, social networking sites etc.,
The problem with these massive datasets are that it cant be
analyzed with standard tool & procedures.
Processing these data appropriately can help an Organization
gain useful insights on the business prospects.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
6/27
Unstructured data Growth
No. of Emails sent per second 2.9 M il lion
Videos uploaded on YouTube per min 20 hours
Data processed by Google per day 20PetaBytes
Tweets per day 50 M il lion
Minutes spent on FaceBook per month 700 Bil li on
Data sent & received by mobile users per day 1.3 ExaBytes
Products ordered on Amazon per second 73 items
*Source:http://www. http://ibm.com/
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
7/27
Unstructured data Growth
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
8/27
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
9/27
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
10/27
Hadoop Overview
Hadoop allows batch processing for colossal data sets
(Petabytes & Exabytes) as a series of parallel processes.
Hadoop cluster comprises a number of server "nodes.
Nodes store and process data in a parallel and distributed
fashion.
Its a parallelized, distributed storage & processing framework
that can operate on commodity servers.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
11/27
Commodity Hardware
Its the average amount of computing resources.
It doesnt imply low quality but, affordability.
Hadoop Clusters run on Commodity Servers.
Commodity servers have an average ratio of disk space to
memory which is not like specialized servers with high memory
or CPU.
Servers are not designed specifically to distribute storage and
process framework, but its made to fit the purpose.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
12/27
Benefits of Hadoop
Scalable
Hadoop can store and distribute very large data setsacross hundreds of inexpensive servers that operate in parallel.
Failure ToleranceHDFS can replicate files for specified
number of times and can automatically re-replicate data blocks
on nodes that have failed.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
13/27
Benefits of Hadoop
Cost-EffectiveHadoop is a scale-out architecture that stores all
the company's data for later use, for which it offers computing
and storage capabilities for a reasonable price.
SpeedHadoops unique storage method is based on a distributed
file system, resulting in much faster data processing.
FlexibleHadoop easily access new data sources and different
types of data to generate insights.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
14/27
*Source:http://www. http://datanami.com/
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
15/27
Why Hadoop
It provides insights into daily operations
Drives new product ideas
Used by companies for research and development and marketing
analysis
Image and text processing.
Analyses huge amount of data in comparatively less time.
Network monitoring
Log and/or click stream analysis of various kinds.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
16/27
Hadoop Forecast
*Source:http://www. http://alliedmarketresearch.com/
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
17/27
Who can Learn Hadoop
Anyone with basic knowledge of Java & Linux.
Even if you arent introduced to Java & Linux before, You can
learn it parallel along with Hadoop.
Hadoop projects are available as Architect, Developer, Tester,
Linux/Network/Hardware Administrator.
Some need the knowledge of Java and some dont.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
18/27
Who can Learn Hadoop
SQL knowledge will help in learning HiveQL, which is a
feature in Hadoop Ecosystem.
Knowledge of Linux in will be helpful in understanding
Hadoop command line Parameters.
But even without any prerequisite knowledge of Java & Linux,
with the help of few basic classes you can Learn Hadoop.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
19/27
#Trending: Hadoop Jobs
*Source:http://www. http://the451group.com/
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
20/27
Job Opportunities in Hadoop
MNCs like IBM, Microsoft & Oracle have integrated with
Hadoop.
Also, companies like Facebook, HortonWorks, Amazon, ebay
and Yahoo! are currently looking for Hadoop Professionals.
So, companies are looking for IT professionals with enough
Hadoop Mapreduce skills.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
21/27
Salary Trend in Hadoop
*Source:http://www. http://itproportal.com/
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
22/27
Hadoop Architecture
The 2 main components of Hadoop are:
Hadoop Distr ibuted F ile System (HDFS)is the storage
component that breaks files into blocks, replicates and stores
them across the cluster.
MapReduce, the processing component that distributes the
workload for operations on files stored in HDFS and
automatically restarts failed work.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
23/27
*Source:http://www. http://cloudera.com/
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
24/27
Hadoop Ecosystem
ApacheHadoop Distr ibuted File Systemoffers storage of large
files across multiple machines.
Apache MapReduceis a program for processing large data sets
with a parallel & distributed algorithm on a clusters.
Apache H ivedata warehouse in distributed storage facilitating
data summarization, queries and managing large datasets.
Apache Pigis an engine for executing data flows in parallel on
Apache Hadoop.
Apache HBaseNon-relational distributed database performing
real-time operations in large tables.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
25/27
Hadoop Ecosystem
Apache F lumeis an Unstructured data aggregator to HDFS.
Apache Sqoopis a system for transferring bulk data
between HDFS and relational databases.
Apache Oozieis a workflow scheduler system to manage Apache
Hadoop jobs.
Apache Zookeeperis a coordinator with tools to write correct
distributed applications.
Apache Avrois a framework for modelling, serializing and
making Remote Procedure Calls.
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
26/27
Q & A
Q & A
-
8/10/2019 Hadoop Bigdata Training Bangalore.pptx
27/27