big data and hadoop
TRANSCRIPT
PowerPoint Presentation
Handling Of Big Data using Hadoop FrameworkRahul Mahawar1142030 (IT-2)1
2
What is big data. How do we generate it.Problem with Big dataHow to handle.
What is Hadoop.ComponentsTerminology behind it.Why to choose Hadoop.
ContentBIG DATAHADOOP
3Big Data.?What is BIGDATA?
How Much big is this data How do we Generate itHow should it be HANDLEDHow to Process Where do we Store How big data is a Problem
4 What is BIGDATA?
For a System In terms of Storage beyond its capacity In terms of Processing Power
5 In other words.. Small data is when Big data is when is crash is fit in the ram because is not fit in the ram.- DevOps Borat
ByteByte: 1 Grain of Rice
6
7Byte: 1 Grain of Rice
Kilo Byte
Kilo Byte: 1 Cup of Rice
8Byte: 1 Grain of RiceKilo Byte: 1 Cup of Rice
Megabyte
Mega Byte: 8 Bags of Rice
9Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi Trucks
Gigabyte
Gigabyte
10Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships
Byte
11Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small Island
Petabyte
12Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A State
Exabyte
13Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A StateZettabyte: Fills the Pacific Ocean
Zettabyte
14Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A StateZettabyte: Fills the Pacific OceanYottabyte: A EARTH SIZE Rice ball
Yottabyte
15Byte: 1 Grain of RiceKilobyte: 1 Cup of RiceMegabyte: 8 Bags of RiceGigabyte: 3 Semi TrucksTerabyte: 2 Container Ships Petabyte: A Small IslandExabyte: Almost A StateZettabyte: Fills the Pacific OceanYottabyte: A EARTH SIZE Rice ballApple 1st Gen
Desktop
Internet
Big Data
The Future?
16How do we Generate big data??In Every 60 Seconds
3.3 million Post3,42,000 tweets41000 Photo Upload4 million Searches50 billion messages120 hours video upload
17
18 Problem with big data! Storage? Processing?or
19
Is it Storage?
Google Servers
20 Processing Big data
Stock Exchange
Power Grid Line
Banks
Airlines
CC Cams
Hospitality
21What is BIGDATA?
How Much big is this data How do we Generate itHow should it be HANDLEDHow to Process Where do we Store How big data is a Problem
22 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..
23100 Why Distribution?
50 Signature per hour..
2nd hour50 + 100
100
100
= 150 = 200 = 250 3rd hour100 + 4th hour150 +
+
+ 50 50 2nd hour 3rd hour 4th hour
No pending Files
24 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..
25wwwwwwwwwww
Server
Cluster Of Computer
Commodity Hardware
HardwareThat ischeapAffordableIn priceAndEasy to Obtain
2For StoringFor Processing26 Components of Hadoop? HDFSMapReduce
27A Specially Designed File System For HadoopTo Store huge amount of dataUsing Commodity HardwareWhy Special.?
In our System Block Size = 4kb
When we install HadoopBlock Size = 64mb
HDFS[ Hadoop distributed file system ]
File to store
200 Mb File.txt28
Client1
Server15423786Data NodesName Node
a.txt b.txt c.txt d.txt
64mb + 64mb + 64mb + 8mb=200mb
Request from clientAcknowledge to clientMeta dataFile.txt
Blog ReportHeart Beat
Acknowledge to clienta.txt- 1,2,3b.txt- 3,4,5c.txt- 5,6,7 d.txt- 6,7,9
4 blocks of memory
a.txt - 2,3,4
2For StoringFor Processing29 Components of Hadoop? HDFSMapReduce
30Map-Reduce[ Technique to process the data ] MapReduce
TechniqueTo map The way For desired LocationTechnique To get The final output
Program
Output.txt31
Client1
Server15423786Name NodeMeta dataFile.txta.txt 1,3,4b.txt 3,,5,7c.txt 4,7,8 d.txt 6,7,8
Data Nodes
Job TrackerMeta data
Task Tracker
32Program
Output.txt32
Client1
Server15423786Name NodeMeta dataFile.txta.txt 1,3,4b.txt 3,,5,7c.txt 4,7,8 d.txt 6,7,8
Data Nodes
Job TrackerMeta data
Task Tracker
Reduce
33 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..
34200 mb = 4 different nodesTo process 200mb = t secTo process 64mb = t/4 sec
Think of a size of 1000 Mb..1000 Gb..?
4 Times faster
35Hadoop Scalable Fast Cost Effective Handle Failure efficiently Use simple programming model
36 Not Fit for Small data Replication Potential StabilityToo hard maintaining clusterVery Complex Algorithms
Hadoop
37Thank You