introduction of big data and hadoop

Presentation on Big Data & Hadoop

PRESENTED BY: AROHI KHANDELWAL

1

Contents : What is BIG DATA ? Why BIG DATA ? Hadoop Hadoop Architecture Hadoop Distributed File System HDFS Architecture Map Reduce How Map Reduce Works ? Hadoop Ecosystem What is Hadoop used for ? Users of Hadoop Advantage & Disadvantage of Hadoop Conclusion

2

What Is BIG DATA ?

Big Data

VolumeVarietyVelocity

3

Why BIG DATA ? 4

Mobile phone increased 70.3% to 918m in last two years.

Twitter has 328m monthly active users – 55% growth

Facebook has 765m active users.

Google+ has 495m monthly active users – grow 45%

LinkedIn has 300m users.

On every single minute 48 hours of video are posted.

Hadoop :

Open source distributed computing framework . Built on Java and Scala languages. Named by Doug Cutting on his son’s toy elephant.

5

Storage

Process

Hadoop

Hadoop Architecture :

Hadoop designed and built on two independent frame works namely : Hadoop Distributed File System Map Reduce

Hadoop

Map ReduceHDFS

6

Hadoop Distributed File System :

Based on Google File System. Data is stored in the form of blocks . Provide data reliability. Provide fast processing on data.

7

HDFS Architecture :

Hadoop Distributed File System has : Name node Data nodes

8

Map Reduce :9

Takes a set of data & breaks individual

elements into tuple

Takes Map’s o/p as i/p and combine those data tuple forming a similar set of

tuple

How Map Reduce works ?10

Hadoop Ecosystem

:HDFSYARN Map Reduce V2HBASEHIVEApache PigOozieZookeeperSqoop

11

What is Hadoop used for ?

Search • Yahoo , AmazonLog processing • Facebook , Yahoo

Data Warehouse • Facebook , AOLVideo & Image Analysis • New York Times

12

Users of Hadoop :13

Advantage of Hadoop :

platform independent. Block structured file system. We can store any thing. Huge storage capacity. Rapidly process large amounts of data in parallel. Fault-tolerance.

14

Disadvantage of Hadoop :

Not Fit for Small Data Setup Issue Programming model is very restrictive

15

Summery

Hadoop excels at Big Data , analytics , batch processing.

Not real-time , no random access ; not a database.

HDFS makes it all possible: Fault tolerant file system Fast accessing speed . Pig , Hive are easy to use.

16

THANKING YOU …

introduction of big data and hadoop

Education