dunning strata-2012-27-02

44
1 ©MapR Technologies - Confidential Expect More from Hadoop!

Upload: mapr-technologies

Post on 26-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Dunning strata-2012-27-02

1©MapR Technologies - Confidential

Expect More from Hadoop!

Page 2: Dunning strata-2012-27-02

2©MapR Technologies - Confidential

My Background

University, Startups– Aptex, MusicMatch, ID Analytics, Veoh– big data since before it was big

Open source– even before the internet– Apache Hadoop, Mahout, Zookeeper, Drill– bought the beer at first HUG

MapR Founding member of Apache Drill

Page 3: Dunning strata-2012-27-02

3©MapR Technologies - Confidential

MapR Technologies

Enterprise quality distribution for Hadoop–Many extensions beyond basic Hadoop

Super strong team–Long history of successful startups

Strong supporter of Apache Drill– and open source in general

Page 4: Dunning strata-2012-27-02

4©MapR Technologies - Confidential

meta-Hadoop?

Page 5: Dunning strata-2012-27-02

5©MapR Technologies - Confidential

meta Meta- (from Greek: μετά = "after", "beyond", "with", "adjacent", "self"), is a…

Page 6: Dunning strata-2012-27-02

6©MapR Technologies - Confidential

Beyond ≠Answering yesterday’s problems

Page 7: Dunning strata-2012-27-02

7©MapR Technologies - Confidential

Philosophy First

What is History?

Page 8: Dunning strata-2012-27-02

8©MapR Technologies - Confidential

The study of the past

(what came before now)

Page 9: Dunning strata-2012-27-02

9©MapR Technologies - Confidential

What is the future?

(it comes after now)

Page 10: Dunning strata-2012-27-02

10©MapR Technologies - Confidential

Page 11: Dunning strata-2012-27-02

11©MapR Technologies - Confidential

Page 12: Dunning strata-2012-27-02

12©MapR Technologies - Confidential

But the future also has a past!

Page 13: Dunning strata-2012-27-02

13©MapR Technologies - Confidential

the future of the pastis not

the past of the future

Page 14: Dunning strata-2012-27-02

14©MapR Technologies - Confidential

Do you remember the future?

Page 15: Dunning strata-2012-27-02

15©MapR Technologies - Confidential

Page 16: Dunning strata-2012-27-02

16©MapR Technologies - Confidential

Page 17: Dunning strata-2012-27-02

17©MapR Technologies - Confidential

Page 18: Dunning strata-2012-27-02

18©MapR Technologies - Confidential

Those are yesterday’s answers

Page 19: Dunning strata-2012-27-02

19©MapR Technologies - Confidential

and also the seeds

of tomorrow

Page 20: Dunning strata-2012-27-02

20©MapR Technologies - Confidential

Guys wearing Fedoras

Page 21: Dunning strata-2012-27-02

21©MapR Technologies - Confidential

Hadoop has a history

Page 22: Dunning strata-2012-27-02

22©MapR Technologies - Confidential

Hadoop also has a

future

Page 23: Dunning strata-2012-27-02

23©MapR Technologies - Confidential

The Old Future of Hadoop

Implementing yet another Google paper– Map-reduce and HDFS, and Yarn and Tez– more and more, but not really different

Eco-system additions (more Google papers)– simpler programming (Hive and Pig and Crunch) (Sawzall, FlumeJava, etc)– key-value store (big table)– ad hoc query (Dremel)– also not really different

Stands apart from other computing– required by HDFS and other limitations

Page 24: Dunning strata-2012-27-02

24©MapR Technologies - Confidential

The New Future of Hadoop

Real-time processing– Combines real-time and long-time

Integration with traditional IT– No need to stand apart

Integration with new technologies– Solr, Node.js, Twisted all should work directly on Hadoop

Fast and flexible computation– Drill logical plan language

Page 25: Dunning strata-2012-27-02

25©MapR Technologies - Confidential

Example #1Search Abuse

Page 26: Dunning strata-2012-27-02

26©MapR Technologies - Confidential

History matrix

One row per user

One column per thing

Page 27: Dunning strata-2012-27-02

27©MapR Technologies - Confidential

Recommendation based on cooccurrence

Cooccurrence gives item-item mapping

One row and column per thing

Page 28: Dunning strata-2012-27-02

28©MapR Technologies - Confidential

Cooccurrence matrix can also be implemented as a search index

Page 29: Dunning strata-2012-27-02

29©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Page 30: Dunning strata-2012-27-02

30©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Page 31: Dunning strata-2012-27-02

31©MapR Technologies - Confidential

Objective Results

At a very large credit card company

History is all transactions, all web interaction

Processing time cut from 20 hours per day to 3

Recommendation engine load time decreased from 8 hours to 3 minutes

Page 32: Dunning strata-2012-27-02

32©MapR Technologies - Confidential

Scaling Estimates – Twitter Fire hose

Old School – 8+ separate clusters, 20-25 nodes– >3 Kafka nodes– >2 TwitterLogger– 5-10 Hadoop– >3 Storm– 3 zookeepers (or not?)– NAS for web storage– >2 web servers

MapR – one platform– 5-10 nodes total, any node does any

job– Full HA included,

backups included,disaster recovery included

Page 33: Dunning strata-2012-27-02

33©MapR Technologies - Confidential

Example #2Web

Technology

Page 34: Dunning strata-2012-27-02

34©MapR Technologies - Confidential

Fast analysis(Storm)

Analytic output

Real-timedata

Raw logs

Page 35: Dunning strata-2012-27-02

35©MapR Technologies - Confidential

Large analysis(map-reduce)

Analytic output Raw logs

Page 36: Dunning strata-2012-27-02

36©MapR Technologies - Confidential

Presentation tier (d3 + node.js)

Analytic output

Browser query

Raw logs

Page 37: Dunning strata-2012-27-02

37©MapR Technologies - Confidential

StormKafka

Twitter

Twitter API

TwitterLoggerKafka

ClusterKafka

ClusterKafka

Cluster

Kafka API

Storm

NAS

Web Data

Hadoop

Flume

HDFS Data

Old School Storm: Complex architecture

Web-server

http

Page 38: Dunning strata-2012-27-02

38©MapR Technologies - Confidential

Twitter

TwitterAPI

CatcherCatcher Storm

Topic Queue

Web-server

http

Web Data

MapR

TwitterLogger

MapR: One Platform with Streaming Writes

Users can also run extended analytics/MapReduce on the stored data

OptionalMapReduce HDFS

API

NFS NFS NFS NFS

Page 39: Dunning strata-2012-27-02

39©MapR Technologies - Confidential

Page 40: Dunning strata-2012-27-02

40©MapR Technologies - Confidential

Objective Results

Real-time + long-time analysis is seamless

Web tier can be rooted directly on Hadoop cluster

No need to move data

Page 41: Dunning strata-2012-27-02

41©MapR Technologies - Confidential

The future is not what we thought it would be

Page 42: Dunning strata-2012-27-02

42©MapR Technologies - Confidential

It is better!

Page 43: Dunning strata-2012-27-02

43©MapR Technologies - Confidential

Get Involved!

Tweet:#strataconf

#mapr@ted_dunning

Page 44: Dunning strata-2012-27-02

44©MapR Technologies - Confidential

Get Involved!

Join Apache Drill!– [email protected] – Follow @apachedrill

Join MapR!– [email protected]

Download these slides– http://www.mapr.com/company/events/strata-conference-2-2-27-13

Contact me:– [email protected][email protected]– @ted_dunning