big data and hadoop ecosystem

29
Building A Smarter Planet “ INTRODUCTION TO BIG DATA AND HADOOPAvishek ghoshPresented By: ACADEMY OF TECHNOLOGY,ADISAPTAGRAM

Upload: avishek-ghosh

Post on 14-Jan-2017

65 views

Category:

Documents


0 download

TRANSCRIPT

Building A Smarter Planet INTRODUCTION TO BIG DATA AND HADOOP Avishek ghoshPresented By:ACADEMY OF TECHNOLOGY,ADISAPTAGRAM

BIG DATA AND HADOOP ECOSYSTEM

NOTE:To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image.

What is Big Data?Big data-A growing torrent of data

1

$600 to buy a disk drive that can store all of the world music2

What Launch Big Data Era?

Data TorrentComputing Anytime, Anywhere

Big Data Era 3

Where Does Big Data Comes From?3 major sources of Big Data

4

Machine Generated Data-Its Everywhere and theres a Lot!

Big Plane -> Big Data?5

More data = More safeSensorsTemperature Pressure MalfunctionsReal time problem Detection6

Big Data Generated By People-The Unstructured Challenge

Text HeavyUnstructuredDaily Facebook Data > All US Academic Libraries 2Pbs Vs 30+Pbs7

CompanyData Processed DailyeBay100Petabytes(PB)Google 100PBFacebook30+PBTwitter100TeraBytes(TB)Spotify64 TB

The Unstructured Data Challenge

8

Structure9

80%-90% of entire Data is Unstructured!

10

11

ToolsDataSkilled People

Value12

Author (A) - Data AqusitionAuthor (A) - StorageAuthor (A) - RetrivalAuthor (A) - CleaningAuthor (A) - ProcesseingOrganization Generated Data-Structured But Often SiloedCommercial TransactionsBanking/StockRecordsCreditCardsGovernment Open DataE-CommerceMedical Records..13

Real-World Examples

16 Million Shipments Per Day 40 MillionTracking RecordsUPS is estimated to have 16 PBsOf data about its operations14

Can You Guess How much money UPS Can Save by Reducing Each Drivers Routeby just 1 Mile?50 MillionDollars!15

How much Companies are spending on Big Data?Benefits using Big DataEfficient OperationHigher SalesImproved SafetyCustomer SatisfactionBetter Profit MarginsImproved Product Placement16

Characteristics Of Big Data-Vs Of Big Data

17

Getting Started-Why Hadoop?The Hadoop Ecosystem is Great for Big DataMajor GoalsEnable ScalabilityOptimized for a variety data typesFacilitate Shared Environment Provide ValueHandle Fault Tolerance18

The Hadoop Ecosystem

Main Hadoop ComponentsMapReduceYARNHDFS19

HDFS = foundation for Hadoop EcosystemWhat is HDFS?

Up to 200 petabytes,1 billion files and blocks!20

21

22

23

QUESTIONS?

24

SOURCES:-University Of California , San Diego(Super Computer) http://www.cloudera.com/http://www.ibm.com/big-data/us/en/25

ACKNOWLEDGEMENTS:-

I would like to thank Prof. Prasenjit Das for her cordial support and encouragement which was one of the key resources behind this presentation. And also thanks to all faculty of CSE for your support too.

26