big data and hadoop

PowerPoint Presentation

Handling Of Big Data using Hadoop FrameworkRahul Mahawar1142030 (IT-2)1

2

What is big data. How do we generate it.Problem with Big dataHow to handle.

What is Hadoop.ComponentsTerminology behind it.Why to choose Hadoop.

ContentBIG DATAHADOOP

3Big Data.?What is BIGDATA?

How Much big is this data How do we Generate itHow should it be HANDLEDHow to Process Where do we Store How big data is a Problem

4 What is BIGDATA?

For a System In terms of Storage beyond its capacity In terms of Processing Power

5 In other words.. Small data is when Big data is when is crash is fit in the ram because is not fit in the ram.- DevOps Borat

ByteByte: 1 Grain of Rice

6

7Byte: 1 Grain of Rice

Kilo Byte

Kilo Byte: 1 Cup of Rice

8Byte: 1 Grain of RiceKilo Byte: 1 Cup of Rice

Megabyte

Mega Byte: 8 Bags of Rice

9Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi Trucks

Gigabyte

Gigabyte

10Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships

Byte

11Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small Island

Petabyte

12Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A State

Exabyte

13Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A StateZettabyte: Fills the Pacific Ocean

Zettabyte

14Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A StateZettabyte: Fills the Pacific OceanYottabyte: A EARTH SIZE Rice ball

Yottabyte

15Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A StateZettabyte: Fills the Pacific OceanYottabyte: A EARTH SIZE Rice ballApple 1st Gen

Desktop

Internet

Big Data

The Future?

16How do we Generate big data??In Every 60 Seconds

3.3 million Post3,42,000 tweets41000 Photo Upload4 million Searches50 billion messages120 hours video upload

17

18 Problem with big data! Storage? Processing?or

19

Is it Storage?

Google Servers

20 Processing Big data

Stock Exchange

Power Grid Line

Banks

Airlines

CC Cams

Hospitality

21What is BIGDATA?

How Much big is this data How do we Generate itHow should it be HANDLEDHow to Process Where do we Store How big data is a Problem

22 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..

23100 Why Distribution?

50 Signature per hour..

2nd hour50 + 100

100

100

= 150 = 200 = 250 3rd hour100 + 4th hour150 +

+

+ 50 50 2nd hour 3rd hour 4th hour

No pending Files


25wwwwwwwwwww

Server

Cluster Of Computer

Commodity Hardware

HardwareThat ischeapAffordableIn priceAndEasy to Obtain

2For StoringFor Processing26 Components of Hadoop? HDFSMapReduce

27A Specially Designed File System For HadoopTo Store huge amount of dataUsing Commodity HardwareWhy Special.?

In our System Block Size = 4kb

When we install HadoopBlock Size = 64mb

HDFS[ Hadoop distributed file system ]

File to store

200 Mb File.txt28

Client1

Server15423786Data NodesName Node

a.txt b.txt c.txt d.txt

64mb + 64mb + 64mb + 8mb=200mb

Request from clientAcknowledge to clientMeta dataFile.txt

Blog ReportHeart Beat

Acknowledge to clienta.txt- 1,2,3b.txt- 3,4,5c.txt- 5,6,7 d.txt- 6,7,9

4 blocks of memory

a.txt - 2,3,4

2For StoringFor Processing29 Components of Hadoop? HDFSMapReduce

30Map-Reduce[ Technique to process the data ] MapReduce

TechniqueTo map The way For desired LocationTechnique To get The final output

Program

Output.txt31

Client1

Server15423786Name NodeMeta dataFile.txta.txt 1,3,4b.txt 3,,5,7c.txt 4,7,8 d.txt 6,7,8

Data Nodes

Job TrackerMeta data

Task Tracker

32Program

Output.txt32

Client1

Server15423786Name NodeMeta dataFile.txta.txt 1,3,4b.txt 3,,5,7c.txt 4,7,8 d.txt 6,7,8

Data Nodes

Job TrackerMeta data

Task Tracker

Reduce


34200 mb = 4 different nodesTo process 200mb = t secTo process 64mb = t/4 sec

Think of a size of 1000 Mb..1000 Gb..?

4 Times faster

35Hadoop Scalable Fast Cost Effective Handle Failure efficiently Use simple programming model

36 Not Fit for Small data Replication Potential StabilityToo hard maintaining clusterVery Complex Algorithms

Hadoop

37Thank You

big data and hadoop

Technology