introduction to big data and hadoop

21
Introduction to Big Data and Hadoop

Upload: greycampus

Post on 13-Jan-2017

394 views

Category:

Technology


1 download

TRANSCRIPT

Introduction to Big Data and Hadoop

Agenda

• What is Big Data?• Facts about Big Data• Need for Big Data• Hadoop Overview• Big Data Course at GreyCampus• Understanding HDFS and MapReduce• Enterprise Hadoop• Hadoop Career Opportunities• Pre-Requisites to learn Big Data

What is Big Data?

• Big data is a buzzword describing massive volume of data that is so

large it is difficult to process using traditional database and software

techniques

• In most enterprise scenarios the volume of data is too big or it moves

too fast or it exceeds current processing capacity

• Despite these problems, big data has the potential to help companies

improve operations and make faster, more intelligent decisions

• Big Data has 3 attributes – Volume, Velocity and Variety

• Applications generating massive data in terabytes and petabytes

• Stock market generates 1 terabyte of data per day

What is Big Data?

Facts about Big Data

Need for Big Data

• Currently the data is 4 Zetabytes in the digital world• The predictions are it might reach 40 Zetabytes in 2020• The size of the data is doubled every 2 years• We need mechanisms to store this data• We need to process this data almost in real time for helping businesses

make informed decisions• Fact: We are generating big data and we need faster information

Need for Big Data

Overview of Hadoop

Big Data Course at GreyCampus

• Module 1– Introduction to Big Data

• Module 2– HDFS Architecture

• Module 3– MapReduce

• Module 4– Advanced MapReduce

• Module 5– Hive

• Module 6– PIG

• Module 7– HBase and Zookeeper

• Module 8– Sqoop and Flume – Moving Data to and

from HDFS

• Module 9– Hadoop Ecosystem and Components –

Introduction

• Module 10– Commercial Distributions of Hadoop

Course Topics:

Features of HDFS

• When a dataset outgrows the storage capacity of a single physical machine, it

becomes necessary to partition it across a number of separate machines

• File systems that manage the storage across a network of machines are called

distributed filesystems

• HDFS is a filesystem designed for storing very large files with streaming data

access patterns, running on clusters of commodity hardware

• There are Hadoop clusters running today that store petabytes of data

Features of MapReduce

• Developers don’t have to worry about the plumbing for their jobs

• No threads or inter process communications or semaphores to program

• Just write programs that process part of your input files and produce the output

• The mappers and reducers share nothing. That means each mapper is independent of

what other mapper does and each reducer is independent of other reducers

• So the mappers and reducers can be massively parallel

• The MapReduce system is built handling failure

• The system is built robust so that the users don’t have to take any action and the system

automatically handles the failures.

Enterprise Hadoop

Hadoop Career Advantages

More Job Opportunities!

Look who is hiring!

Hadoop means high on salary!

Transform your career

Future of Big Data

Careers in Big Data

• “By 2015, 4.4 million IT jobs globally will be created to support big

data, generating 1.9 million IT jobs in the United States,” said Peter

Sondergaard, senior vice president at Gartner and global head of

Research. “In addition, every big data-related role in the U.S. will

create employment for three people outside of IT, so over the next

four years a total of 6 million jobs in the U.S. will be generated by

the information economy.“

Careers in Big Data

• “But there is a challenge. There is not enough talent in the industry. Our

public and private education systems are failing us. Therefore, only one-

third of the IT jobs will be filled. Data experts will be a scarce, valuable

commodity,” Mr. Sondergaard said. “IT leaders will need immediate focus

on how their organization develops and attracts the skills required. These

jobs will be needed to grow your business. These jobs are the future of

the new information economy.”

Pre-Requisites

• Good programming skills

• Basic understanding of database management systems

• Knowledge on core Java (added advantage)