big data tools hadoop s.s.mulay sr. v.p. engineering february 1, 2013

26
Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Upload: mary-mcdowell

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Confidential Netmagic Internal Use Only Apache Project and Animal Friendly names Some of the Projects under Apache Foundation to mention: 3 Apache Zookeeper Apache Tomcat Apache Pig

TRANSCRIPT

Page 1: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Big Data Tools

Hadoop

S.S.MulaySr. V.P. Engineering

February 1, 2013

Page 2: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop - A Prelude

2

Page 3: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Apache Project and Animal Friendly names

Some of the Projects under Apache Foundation to mention:

3

Apache Zookeeper Apache Tomcat

Apache Pig

Page 4: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

And now Hadoop

Page 5: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – The Name

5

Page 6: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – The Relevance

6

Apache Zookeeper

Two Important things to know when discussing Big Data

● MapReduce

● Hadoop.

Page 7: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – How was it Born?

● To Process Huge Volume of data, as the amount of generated data continued to rapidly increase. (Big Data).

● Also the Web generated more and more information, which was becoming quite challenging to index the content.

7

Apache Zookeeper Apache Tomcat

Page 8: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – The Reality Vs Myth

8

Apache Zookeeper Apache Tomcat

Page 9: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Some Use Cases

9Apache Tomcat

Page 10: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – What do we expect from it ?

If we analyze the mentioned use cases, we realize that

10

Page 11: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Components which come to the rescue

11

Apache Zookeeper Apache Tomcat

Page 12: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Who’s Using It ?

12

Apache Zookeeper

Apache Tomcat

Uses Hadoop and HBase for :• Social services • Structured data storage• Processing for internal use Uses Hadoop for :

• Amazon's product search indices They process millions of sessions daily for analytics.

Uses Hadoop for :• Search optimization• Research

Uses Hadoop for :• Databasing and analyzing Next Generation Sequencing (NGS) data produced for the Cancer Genome Atlas (TCGA) project and other groups

Uses Hadoop for :• Internal log reporting/parsing systems designed to scale to infinity and beyond.• web-wide analytics platform

Uses Hadoop :• As a source for reporting/analytics and machine learning.

And Many More ….

Page 13: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – The Various Forms Today

13

Apache Zookeeper Apache Tomcat

Page 14: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Use Case Example – Log Processing

● Some of the Practical Use cases for Log Processing Generally in use today :

Assuming a situation we have Huge Log’s generated for a period of time ranging in TB’s and we want to know :

14Apache Tomcat

Page 15: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Use Case Example – Log Processing

In the Conventional Method :Parallelism is on a per file basis and not on a Single file.

15Apache Tomcat

Final Data Set

Concatenate Data Set

Task - new

Page 16: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Use Case Example – Log Processing

With Map Reduce:

16Apache Tomcat

Page 17: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Use Case Example – Log Processing

● Infrastructure realities in Conventional Method :

● How things Change With Map Reduce

● Assuming ● Single Disk can transfer data at the speed of 75MB/Sec● If we consider a Hadoop Cluster of 4000 Nodes and each Server of 6 Disks each.● The overall Throughput of the Setup would be

17Apache Tomcat

Page 18: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Big Data Integration Challenges

18Apache Tomcat

Page 19: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Native Solutions & Challenges

19

Apache Zookeeper Apache Tomcat

Page 20: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Advantages of Commercial Solutions

20

Apache Zookeeper Apache Tomcat

Page 21: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop – Commercial Solutions For Hadoop

The Solutions Fit into 2 Categories :● Infrastructure Automation● Application Automation

21

Apache Zookeeper Apache Tomcat

Page 22: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Gartner Report – Magic Quadrant for Data Integration Tools

22

Apache Zookeeper Apache Tomcat

Page 23: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Hadoop & Cloud – Hand in Hand ?

What Advantages does Cloud Bring in :

Thus Hadoop going on Cloud does bring in the above advantages on the table to the Enterprises.All the Commercial Distributions available today, do offer a Virtual image option to deploy on Cloud / Virtualization Platform.

Virtualization Solution Providers like vmware have come up with Project “Serengeti” to Support Quick Deployment and Management of Hadoop on Cloud.

Cloud Service providers like Amazon, Netmagic and others have a deployment option of Hadoop Infrastructure on Cloud.

23Apache Zookeeper

Apache Tomcat

Page 24: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Insert your image here

Contact Details

For related queries/ feedback, mail [email protected]

+91-9820453568

Page 25: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

Confidential Netmagic Internal Use Only

Thank You

Page 26: Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013

http://www.linkedin.com/companies/netmagic

http://twitter.com/netmagic http://www.facebook.com/NetmagicSolutions

http://www.youtube.com/user/netmagicsolutions