hadoop bigdata training bangalore.pptx

Upload: code-frux

Post on 02-Jun-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    1/27

    Big Data & Hadoop

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    2/27

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    3/27

    I ntroduction to Big Data

    Big Data: A Term for collective data sets with large and complex

    volumes of data.

    Volumes are in Petabytes (1024 TB) or Exabytes (1024 PB) &

    will soon be Zettabytes (1024 EB).

    Hence, the data are hard to interpret & process in the existing

    traditional data processing application and tools.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    4/27

    Why Big Data

    To Manage Huge Data in a better way.

    Benefit of Data Speed, Capacity & Scalability from cloud storage.

    Potential insights by Data Analysis Methods.

    Companies can find new prospects & Business Opportunities.

    Unlike other methods, with Big Data, Business Users can

    Visualize the Data.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    5/27

    Big Data Overview

    Big Data include:

    Traditional Structured Databasesfrom inventories, orders and

    customer information.

    Unstructured Datafrom web, social networking sites etc.,

    The problem with these massive datasets are that it cant be

    analyzed with standard tool & procedures.

    Processing these data appropriately can help an Organization

    gain useful insights on the business prospects.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    6/27

    Unstructured data Growth

    No. of Emails sent per second 2.9 M il lion

    Videos uploaded on YouTube per min 20 hours

    Data processed by Google per day 20PetaBytes

    Tweets per day 50 M il lion

    Minutes spent on FaceBook per month 700 Bil li on

    Data sent & received by mobile users per day 1.3 ExaBytes

    Products ordered on Amazon per second 73 items

    *Source:http://www. http://ibm.com/

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    7/27

    Unstructured data Growth

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    8/27

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    9/27

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    10/27

    Hadoop Overview

    Hadoop allows batch processing for colossal data sets

    (Petabytes & Exabytes) as a series of parallel processes.

    Hadoop cluster comprises a number of server "nodes.

    Nodes store and process data in a parallel and distributed

    fashion.

    Its a parallelized, distributed storage & processing framework

    that can operate on commodity servers.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    11/27

    Commodity Hardware

    Its the average amount of computing resources.

    It doesnt imply low quality but, affordability.

    Hadoop Clusters run on Commodity Servers.

    Commodity servers have an average ratio of disk space to

    memory which is not like specialized servers with high memory

    or CPU.

    Servers are not designed specifically to distribute storage and

    process framework, but its made to fit the purpose.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    12/27

    Benefits of Hadoop

    Scalable

    Hadoop can store and distribute very large data setsacross hundreds of inexpensive servers that operate in parallel.

    Failure ToleranceHDFS can replicate files for specified

    number of times and can automatically re-replicate data blocks

    on nodes that have failed.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    13/27

    Benefits of Hadoop

    Cost-EffectiveHadoop is a scale-out architecture that stores all

    the company's data for later use, for which it offers computing

    and storage capabilities for a reasonable price.

    SpeedHadoops unique storage method is based on a distributed

    file system, resulting in much faster data processing.

    FlexibleHadoop easily access new data sources and different

    types of data to generate insights.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    14/27

    *Source:http://www. http://datanami.com/

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    15/27

    Why Hadoop

    It provides insights into daily operations

    Drives new product ideas

    Used by companies for research and development and marketing

    analysis

    Image and text processing.

    Analyses huge amount of data in comparatively less time.

    Network monitoring

    Log and/or click stream analysis of various kinds.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    16/27

    Hadoop Forecast

    *Source:http://www. http://alliedmarketresearch.com/

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    17/27

    Who can Learn Hadoop

    Anyone with basic knowledge of Java & Linux.

    Even if you arent introduced to Java & Linux before, You can

    learn it parallel along with Hadoop.

    Hadoop projects are available as Architect, Developer, Tester,

    Linux/Network/Hardware Administrator.

    Some need the knowledge of Java and some dont.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    18/27

    Who can Learn Hadoop

    SQL knowledge will help in learning HiveQL, which is a

    feature in Hadoop Ecosystem.

    Knowledge of Linux in will be helpful in understanding

    Hadoop command line Parameters.

    But even without any prerequisite knowledge of Java & Linux,

    with the help of few basic classes you can Learn Hadoop.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    19/27

    #Trending: Hadoop Jobs

    *Source:http://www. http://the451group.com/

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    20/27

    Job Opportunities in Hadoop

    MNCs like IBM, Microsoft & Oracle have integrated with

    Hadoop.

    Also, companies like Facebook, HortonWorks, Amazon, ebay

    and Yahoo! are currently looking for Hadoop Professionals.

    So, companies are looking for IT professionals with enough

    Hadoop Mapreduce skills.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    21/27

    Salary Trend in Hadoop

    *Source:http://www. http://itproportal.com/

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    22/27

    Hadoop Architecture

    The 2 main components of Hadoop are:

    Hadoop Distr ibuted F ile System (HDFS)is the storage

    component that breaks files into blocks, replicates and stores

    them across the cluster.

    MapReduce, the processing component that distributes the

    workload for operations on files stored in HDFS and

    automatically restarts failed work.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    23/27

    *Source:http://www. http://cloudera.com/

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    24/27

    Hadoop Ecosystem

    ApacheHadoop Distr ibuted File Systemoffers storage of large

    files across multiple machines.

    Apache MapReduceis a program for processing large data sets

    with a parallel & distributed algorithm on a clusters.

    Apache H ivedata warehouse in distributed storage facilitating

    data summarization, queries and managing large datasets.

    Apache Pigis an engine for executing data flows in parallel on

    Apache Hadoop.

    Apache HBaseNon-relational distributed database performing

    real-time operations in large tables.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    25/27

    Hadoop Ecosystem

    Apache F lumeis an Unstructured data aggregator to HDFS.

    Apache Sqoopis a system for transferring bulk data

    between HDFS and relational databases.

    Apache Oozieis a workflow scheduler system to manage Apache

    Hadoop jobs.

    Apache Zookeeperis a coordinator with tools to write correct

    distributed applications.

    Apache Avrois a framework for modelling, serializing and

    making Remote Procedure Calls.

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    26/27

    Q & A

    Q & A

  • 8/10/2019 Hadoop Bigdata Training Bangalore.pptx

    27/27