treasure data: big data analytics on heroku

Treasure Data:Big Data Analytics on HerokuMuga Nishizawa, Chief Software Architect

Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data

3

Treasure Data Overview Founded to deliver big data analytics in days not months without

specialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team

• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc.

Treasure Data is in production• 20 customers incl. Fortune 500 companies• 100+ billion records stored

Processing 10,000 messages per second

4

Our Customers – Fortune Global 500 leaders and start-ups including:

5

One Hundred Billion Records and Growing!

120

100

80

60

40

20Sep2011

Nov2011

Jan2012

Mar2012

May2012

Jul2012

Aug2012

6

Treasure Data Service“Store Your Data Now for Future Insights”

7

Treasure Data Service

UserApache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

“Store Your Data Now for Future Insights”

User

8

Apache

App

App

Other data sources

RDBMSTreasure Data



Query API


JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

2012-02-04 01:33:51myappdb.buylog { “user”: ”12345”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}


User

9

Apache

App

App

Other data sources

RDBMSTreasure Data



Query API


JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

$ td query -w -d myappdb \ "SELECT \ TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") AS day, \ COUNT(1) AS cnt \ FROM buylog \ GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") \ ORDER BY cnt"


Apache

10

App

App

Other data sources

RDBMSTreasure Data



Query API


JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

+------------+------+| day | cnt |+------------+------+| 2012-05-26 | 4981 || 2012-05-27 | 4481 || 2012-05-28 | 481 |+------------+------+

User


11

Comparing On-Premise & Cloud Big Data Mkts

On-Premise

Cloud

Data Volume

Database-as-a-

Service

Big Data-as-a-Service

Low High

Data Warehouse

Traditional DBMS

(ODS, Data Mart) Hadoop

© 2012 Forrester Research, Inc. Reproduction Prohibited

Treasure Data as Heroku Add-on

12

Demo with Heroku

13

Synergy Effect for Data-Driven Development!

10

14

×

The Power of the Cloud

Easier to ScaleEasier to MaintainEasier to Iterate

11

15

Implementation ProcessTraditional DW and On-Premise Big Data

16

Implementation ProcessTraditional DW and On-Premise Big Data

Dramatically streamlinedImplementation process

17

Heroku×

Treasure Data

Viki.com: “Global Hulu”

14

18

Viki Before

Hard to manage Hadoop Complicated data collection

19

Viki After

No more Hadoop maintenance Versatile data collector, td-agent

20

Please Try It!

21

How Does It Work?

22

Query ProcessingQuery Language

Query Execution

Columnar Data

Object Storage

23

1/4: Compile SQL into MapReduce

SELECT COUNT(DISTINCT ip) FROM tbl;

24

2/4: MapReduce is executed in parallel

cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)


25

3/4: Columnar Data Access

10Gbps Network

Read ONLY the Required Part of Data


26

4/4: Object-based Storage

27

Enjoy Data-Driven Development!

28

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

32

Great Investors Bill Tai Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO Dave Stamm – Clarify, Daisy Systems, Enkata Othman Laraki –Twitter James Lindembaum, Adam Wiggins and Orion Henry – Heroku Anand Babu Periasamy and Hitesh Chellani –Gluster Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku Dan Schienman – Former Cisco SVP Jean-Philippe Emelie Marcos – Tango, D.E. Shaw + executives from Cisco, Red Hat, Salesforce.com, GREE

33

What are your options? Traditional

Too much complexity Too long to get live Too expensive to maintain Can only innovate at speed of

vendor

OnPremise Hadoop• Never design for analytic

processing• Too many people• Too much software from too

many sources

Cloud Hadoop• Partial solution• Vendor lock-in

34Confidential

35

Example Use Case – MySQL to TD

36

Example Use Case – MySQL to TD

treasure data: big data analytics on heroku

Documents

data sourcespath

treasure data servicestore

treasure data overview

big data analytics

columnar data access

data sourcesmapreduce

datadriven development

data storage columnar