introduction to big data by manouj bongirr

17
Copyright ©2012 Big Logic Technologies Copyright ©2012 Big Logic Technologies

Upload: pranav-kulkarni

Post on 29-Nov-2014

293 views

Category:

Technology


2 download

DESCRIPTION

Introduction to Big Data by Manouj Bongirr presented at Big Data Meetup - Pune Chapter #BigDataPune

TRANSCRIPT

Page 1: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies Copyright ©2012 Big Logic Technologies

Page 2: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

-- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it provides a Big Data Analytics Platform.

-- At Big Logic, we share our experiences after guiding many enterprises through successful Big Data projects. We empower you to decide on build versus buy when it comes to achieving your defined business objectives across various technical environments.

A Big Data - Technology, Consulting & Training Firm

Page 3: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

Page 4: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

4

Big data is a term applied to data sets whose size is beyond the ability of commonly used

software tools to capture, manage, and process the data within a tolerable elapsed time.

Gartner Predicts

800% data

growth over next

5 years

80-90% of data

produced today

is unstructured

Page 5: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

Page 6: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

6

Page 7: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

7

Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010

2009

800,000 petabytes

2020

35 zettabytes

i.e. 35Billion TBs

44x as much

Data and Content

Over Coming Decade

gigabyte (GB) 109 1024MB

terabyte (TB) 1012 1024GB

petabyte (PB) 1015 1024TB

exabyte (EB) 1018 1024PB

zettabyte (ZB) 1021 1024EB

yottabyte (YB) 1024 1024YB

1 zettabyte = 1 099 511 627 776 GB

Page 8: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

Page 9: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

Source:

http://www.slideshare.net/cultureofperform

ance/gartner-idc-and-mckinsey-on-big-data

Page 10: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

“ Moore's law is the observation that, over the history of computing hardware, the

number of transistors on integrated circuits doubles approximately every two years. ”

..Intel co-founder Gordon E. Moore

Page 11: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

HDD Max Size : 6TB RAM Max Capacity : 32GB

-------------------CPU Max Speed-------------------

Page 12: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

Page 13: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

Page 14: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

If I Need to process 100TB datasets

• On 1 node:

– scanning @ 50MB/s = 23 days

• On 1000 node cluster:

– scanning @ 50MB/s = 33 min

Challenge: Hardware Problems / Process and combine data from

Multiple disks

Page 15: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

•Apache Hadoop is an open source framework for storing, processing

and analysing massive amounts of multi-structured data in a

distributed environment.

•Hadoop was inspired by Google's MapReduce and Google File

System (GFS) papers.

Page 16: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies

If you are in any of the above segments you would be the part of the above revenue

Page 17: Introduction to Big Data by Manouj Bongirr

Copyright ©2012 Big Logic Technologies