hadoop bigdata training bangalore.pptx

8/10/2019 Hadoop Bigdata Training Bangalore.pptx

1/27

Big Data & Hadoop


2/27


3/27

I ntroduction to Big Data

Big Data: A Term for collective data sets with large and complex

volumes of data.

Volumes are in Petabytes (1024 TB) or Exabytes (1024 PB) &

will soon be Zettabytes (1024 EB).

Hence, the data are hard to interpret & process in the existing

traditional data processing application and tools.


4/27

Why Big Data

To Manage Huge Data in a better way.

Benefit of Data Speed, Capacity & Scalability from cloud storage.

Potential insights by Data Analysis Methods.

Companies can find new prospects & Business Opportunities.

Unlike other methods, with Big Data, Business Users can

Visualize the Data.


5/27

Big Data Overview

Big Data include:

Traditional Structured Databasesfrom inventories, orders and

customer information.

Unstructured Datafrom web, social networking sites etc.,

The problem with these massive datasets are that it cant be

analyzed with standard tool & procedures.

Processing these data appropriately can help an Organization

gain useful insights on the business prospects.


6/27

Unstructured data Growth

No. of Emails sent per second 2.9 M il lion

Videos uploaded on YouTube per min 20 hours

Data processed by Google per day 20PetaBytes

Tweets per day 50 M il lion

Minutes spent on FaceBook per month 700 Bil li on

Data sent & received by mobile users per day 1.3 ExaBytes

Products ordered on Amazon per second 73 items

*Source:http://www. http://ibm.com/


7/27

Unstructured data Growth


8/27


9/27


10/27

Hadoop Overview

Hadoop allows batch processing for colossal data sets

(Petabytes & Exabytes) as a series of parallel processes.

Hadoop cluster comprises a number of server "nodes.

Nodes store and process data in a parallel and distributed

fashion.

Its a parallelized, distributed storage & processing framework

that can operate on commodity servers.


11/27

Commodity Hardware

Its the average amount of computing resources.

It doesnt imply low quality but, affordability.

Hadoop Clusters run on Commodity Servers.

Commodity servers have an average ratio of disk space to

memory which is not like specialized servers with high memory

or CPU.

Servers are not designed specifically to distribute storage and

process framework, but its made to fit the purpose.


12/27

Benefits of Hadoop

Scalable

Hadoop can store and distribute very large data setsacross hundreds of inexpensive servers that operate in parallel.

Failure ToleranceHDFS can replicate files for specified

number of times and can automatically re-replicate data blocks

on nodes that have failed.


13/27

Benefits of Hadoop

Cost-EffectiveHadoop is a scale-out architecture that stores all

the company's data for later use, for which it offers computing

and storage capabilities for a reasonable price.

SpeedHadoops unique storage method is based on a distributed

file system, resulting in much faster data processing.

FlexibleHadoop easily access new data sources and different

types of data to generate insights.


14/27

*Source:http://www. http://datanami.com/


15/27

Why Hadoop

It provides insights into daily operations

Drives new product ideas

Used by companies for research and development and marketing

analysis

Image and text processing.

Analyses huge amount of data in comparatively less time.

Network monitoring

Log and/or click stream analysis of various kinds.


16/27

Hadoop Forecast

*Source:http://www. http://alliedmarketresearch.com/


17/27

Who can Learn Hadoop

Anyone with basic knowledge of Java & Linux.

Even if you arent introduced to Java & Linux before, You can

learn it parallel along with Hadoop.

Hadoop projects are available as Architect, Developer, Tester,

Linux/Network/Hardware Administrator.

Some need the knowledge of Java and some dont.


18/27

Who can Learn Hadoop

SQL knowledge will help in learning HiveQL, which is a

feature in Hadoop Ecosystem.

Knowledge of Linux in will be helpful in understanding

Hadoop command line Parameters.

But even without any prerequisite knowledge of Java & Linux,

with the help of few basic classes you can Learn Hadoop.


19/27

#Trending: Hadoop Jobs

*Source:http://www. http://the451group.com/


20/27

Job Opportunities in Hadoop

MNCs like IBM, Microsoft & Oracle have integrated with

Hadoop.

Also, companies like Facebook, HortonWorks, Amazon, ebay

and Yahoo! are currently looking for Hadoop Professionals.

So, companies are looking for IT professionals with enough

Hadoop Mapreduce skills.


21/27

Salary Trend in Hadoop

*Source:http://www. http://itproportal.com/


22/27

Hadoop Architecture

The 2 main components of Hadoop are:

Hadoop Distr ibuted F ile System (HDFS)is the storage

component that breaks files into blocks, replicates and stores

them across the cluster.

MapReduce, the processing component that distributes the

workload for operations on files stored in HDFS and

automatically restarts failed work.


23/27

*Source:http://www. http://cloudera.com/


24/27

Hadoop Ecosystem

ApacheHadoop Distr ibuted File Systemoffers storage of large

files across multiple machines.

Apache MapReduceis a program for processing large data sets

with a parallel & distributed algorithm on a clusters.

Apache H ivedata warehouse in distributed storage facilitating

data summarization, queries and managing large datasets.

Apache Pigis an engine for executing data flows in parallel on

Apache Hadoop.

Apache HBaseNon-relational distributed database performing

real-time operations in large tables.


25/27

Hadoop Ecosystem

Apache F lumeis an Unstructured data aggregator to HDFS.

Apache Sqoopis a system for transferring bulk data

between HDFS and relational databases.

Apache Oozieis a workflow scheduler system to manage Apache

Hadoop jobs.

Apache Zookeeperis a coordinator with tools to write correct

distributed applications.

Apache Avrois a framework for modelling, serializing and

making Remote Procedure Calls.


26/27

Q & A

Q & A


27/27

hadoop bigdata training bangalore.pptx

Documents