introduction to big data by manouj bongirr
DESCRIPTION
Introduction to Big Data by Manouj Bongirr presented at Big Data Meetup - Pune Chapter #BigDataPuneTRANSCRIPT
Copyright ©2012 Big Logic Technologies Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
-- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it provides a Big Data Analytics Platform.
-- At Big Logic, we share our experiences after guiding many enterprises through successful Big Data projects. We empower you to decide on build versus buy when it comes to achieving your defined business objectives across various technical environments.
A Big Data - Technology, Consulting & Training Firm
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
4
Big data is a term applied to data sets whose size is beyond the ability of commonly used
software tools to capture, manage, and process the data within a tolerable elapsed time.
Gartner Predicts
800% data
growth over next
5 years
80-90% of data
produced today
is unstructured
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
6
Copyright ©2012 Big Logic Technologies
7
Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010
2009
800,000 petabytes
2020
35 zettabytes
i.e. 35Billion TBs
44x as much
Data and Content
Over Coming Decade
gigabyte (GB) 109 1024MB
terabyte (TB) 1012 1024GB
petabyte (PB) 1015 1024TB
exabyte (EB) 1018 1024PB
zettabyte (ZB) 1021 1024EB
yottabyte (YB) 1024 1024YB
1 zettabyte = 1 099 511 627 776 GB
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Source:
http://www.slideshare.net/cultureofperform
ance/gartner-idc-and-mckinsey-on-big-data
Copyright ©2012 Big Logic Technologies
“ Moore's law is the observation that, over the history of computing hardware, the
number of transistors on integrated circuits doubles approximately every two years. ”
..Intel co-founder Gordon E. Moore
Copyright ©2012 Big Logic Technologies
HDD Max Size : 6TB RAM Max Capacity : 32GB
-------------------CPU Max Speed-------------------
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
If I Need to process 100TB datasets
• On 1 node:
– scanning @ 50MB/s = 23 days
• On 1000 node cluster:
– scanning @ 50MB/s = 33 min
Challenge: Hardware Problems / Process and combine data from
Multiple disks
Copyright ©2012 Big Logic Technologies
•Apache Hadoop is an open source framework for storing, processing
and analysing massive amounts of multi-structured data in a
distributed environment.
•Hadoop was inspired by Google's MapReduce and Google File
System (GFS) papers.
Copyright ©2012 Big Logic Technologies
If you are in any of the above segments you would be the part of the above revenue
Copyright ©2012 Big Logic Technologies