big data and hadoop ecosystem
TRANSCRIPT
Building A Smarter Planet INTRODUCTION TO BIG DATA AND HADOOP Avishek ghoshPresented By:ACADEMY OF TECHNOLOGY,ADISAPTAGRAM
BIG DATA AND HADOOP ECOSYSTEM
NOTE:To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image.
What is Big Data?Big data-A growing torrent of data
1
$600 to buy a disk drive that can store all of the world music2
What Launch Big Data Era?
Data TorrentComputing Anytime, Anywhere
Big Data Era 3
Where Does Big Data Comes From?3 major sources of Big Data
4
Machine Generated Data-Its Everywhere and theres a Lot!
Big Plane -> Big Data?5
More data = More safeSensorsTemperature Pressure MalfunctionsReal time problem Detection6
Big Data Generated By People-The Unstructured Challenge
Text HeavyUnstructuredDaily Facebook Data > All US Academic Libraries 2Pbs Vs 30+Pbs7
CompanyData Processed DailyeBay100Petabytes(PB)Google 100PBFacebook30+PBTwitter100TeraBytes(TB)Spotify64 TB
The Unstructured Data Challenge
8
Structure9
80%-90% of entire Data is Unstructured!
10
11
ToolsDataSkilled People
Value12
Author (A) - Data AqusitionAuthor (A) - StorageAuthor (A) - RetrivalAuthor (A) - CleaningAuthor (A) - ProcesseingOrganization Generated Data-Structured But Often SiloedCommercial TransactionsBanking/StockRecordsCreditCardsGovernment Open DataE-CommerceMedical Records..13
Real-World Examples
16 Million Shipments Per Day 40 MillionTracking RecordsUPS is estimated to have 16 PBsOf data about its operations14
Can You Guess How much money UPS Can Save by Reducing Each Drivers Routeby just 1 Mile?50 MillionDollars!15
How much Companies are spending on Big Data?Benefits using Big DataEfficient OperationHigher SalesImproved SafetyCustomer SatisfactionBetter Profit MarginsImproved Product Placement16
Characteristics Of Big Data-Vs Of Big Data
17
Getting Started-Why Hadoop?The Hadoop Ecosystem is Great for Big DataMajor GoalsEnable ScalabilityOptimized for a variety data typesFacilitate Shared Environment Provide ValueHandle Fault Tolerance18
The Hadoop Ecosystem
Main Hadoop ComponentsMapReduceYARNHDFS19
HDFS = foundation for Hadoop EcosystemWhat is HDFS?
Up to 200 petabytes,1 billion files and blocks!20
21
22
23
QUESTIONS?
24
SOURCES:-University Of California , San Diego(Super Computer) http://www.cloudera.com/http://www.ibm.com/big-data/us/en/25
ACKNOWLEDGEMENTS:-
I would like to thank Prof. Prasenjit Das for her cordial support and encouragement which was one of the key resources behind this presentation. And also thanks to all faculty of CSE for your support too.
26