treasure data: big data analytics on heroku

36
Treasure Data: Big Data Analytics on Heroku Muga Nishizawa, Chief Software Architect

Upload: salesforce-developers-japan

Post on 28-May-2015

2.235 views

Category:

Documents


3 download

DESCRIPTION

2012年12月6日 Cloudforce Japan Developer Zone内のシアターで講演された資料です。

TRANSCRIPT

Page 1: Treasure Data: Big Data Analytics on Heroku

Treasure Data:Big Data Analytics on HerokuMuga Nishizawa, Chief Software Architect

Page 2: Treasure Data: Big Data Analytics on Heroku

Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data

Page 3: Treasure Data: Big Data Analytics on Heroku

3

Treasure Data Overview Founded to deliver big data analytics in days not months without

specialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team

• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc.

Treasure Data is in production• 20 customers incl. Fortune 500 companies• 100+ billion records stored

Processing 10,000 messages per second

Page 4: Treasure Data: Big Data Analytics on Heroku

4

Our Customers – Fortune Global 500 leaders and start-ups including:

Page 5: Treasure Data: Big Data Analytics on Heroku

5

One Hundred Billion Records and Growing!

120

100

80

60

40

20Sep2011

Nov2011

Jan2012

Mar2012

May2012

Jul2012

Aug2012

Page 6: Treasure Data: Big Data Analytics on Heroku

6

Treasure Data Service“Store Your Data Now for Future Insights”

Page 7: Treasure Data: Big Data Analytics on Heroku

7

Treasure Data Service

UserApache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

“Store Your Data Now for Future Insights”

Page 8: Treasure Data: Big Data Analytics on Heroku

User

8

Apache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

2012-02-04 01:33:51myappdb.buylog { “user”: ”12345”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

Treasure Data Service“Store Your Data Now for Future Insights”

Page 9: Treasure Data: Big Data Analytics on Heroku

User

9

Apache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

$ td query -w -d myappdb \ "SELECT \ TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") AS day, \ COUNT(1) AS cnt \ FROM buylog \ GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") \ ORDER BY cnt"

Treasure Data Service“Store Your Data Now for Future Insights”

Page 10: Treasure Data: Big Data Analytics on Heroku

Apache

10

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

+------------+------+| day | cnt |+------------+------+| 2012-05-26 | 4981 || 2012-05-27 | 4481 || 2012-05-28 | 481 |+------------+------+

User

Treasure Data Service“Store Your Data Now for Future Insights”

Page 11: Treasure Data: Big Data Analytics on Heroku

11

Comparing On-Premise & Cloud Big Data Mkts

On-Premise

Cloud

Data Volume

Database-as-a-

Service

Big Data-as-a-Service

Low High

Data Warehouse

Traditional DBMS

(ODS, Data Mart) Hadoop

© 2012 Forrester Research, Inc. Reproduction Prohibited

Page 12: Treasure Data: Big Data Analytics on Heroku

Treasure Data as Heroku Add-on

12

Page 13: Treasure Data: Big Data Analytics on Heroku

Demo with Heroku

13

Page 14: Treasure Data: Big Data Analytics on Heroku

Synergy Effect for Data-Driven Development!

10

14

×

Page 15: Treasure Data: Big Data Analytics on Heroku

The Power of the Cloud

Easier to ScaleEasier to MaintainEasier to Iterate

11

15

Page 16: Treasure Data: Big Data Analytics on Heroku

Implementation ProcessTraditional DW and On-Premise Big Data

16

Page 17: Treasure Data: Big Data Analytics on Heroku

Implementation ProcessTraditional DW and On-Premise Big Data

Dramatically streamlinedImplementation process

17

Heroku×

Treasure Data

Page 18: Treasure Data: Big Data Analytics on Heroku

Viki.com: “Global Hulu”

14

18

Page 19: Treasure Data: Big Data Analytics on Heroku

Viki Before

Hard to manage Hadoop Complicated data collection

19

Page 20: Treasure Data: Big Data Analytics on Heroku

Viki After

No more Hadoop maintenance Versatile data collector, td-agent

20

Page 21: Treasure Data: Big Data Analytics on Heroku

Please Try It!

21

Page 22: Treasure Data: Big Data Analytics on Heroku

How Does It Work?

22

Page 23: Treasure Data: Big Data Analytics on Heroku

Query ProcessingQuery Language

Query Execution

Columnar Data

Object Storage

23

Page 24: Treasure Data: Big Data Analytics on Heroku

1/4: Compile SQL into MapReduce

SELECT COUNT(DISTINCT ip) FROM tbl;

24

Page 25: Treasure Data: Big Data Analytics on Heroku

2/4: MapReduce is executed in parallel

cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)

SELECT COUNT(DISTINCT ip) FROM tbl;

25

Page 26: Treasure Data: Big Data Analytics on Heroku

3/4: Columnar Data Access

10Gbps Network

Read ONLY the Required Part of Data

SELECT COUNT(DISTINCT ip) FROM tbl;

26

Page 27: Treasure Data: Big Data Analytics on Heroku

4/4: Object-based Storage

27

Page 28: Treasure Data: Big Data Analytics on Heroku

Enjoy Data-Driven Development!

28

Page 29: Treasure Data: Big Data Analytics on Heroku

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

Page 30: Treasure Data: Big Data Analytics on Heroku
Page 31: Treasure Data: Big Data Analytics on Heroku
Page 32: Treasure Data: Big Data Analytics on Heroku

32

Great Investors Bill Tai Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO Dave Stamm – Clarify, Daisy Systems, Enkata Othman Laraki –Twitter James Lindembaum, Adam Wiggins and Orion Henry – Heroku Anand Babu Periasamy and Hitesh Chellani –Gluster Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku Dan Schienman – Former Cisco SVP Jean-Philippe Emelie Marcos – Tango, D.E. Shaw + executives from Cisco, Red Hat, Salesforce.com, GREE

Page 33: Treasure Data: Big Data Analytics on Heroku

33

What are your options? Traditional

Too much complexity Too long to get live Too expensive to maintain Can only innovate at speed of

vendor

OnPremise Hadoop• Never design for analytic

processing• Too many people• Too much software from too

many sources

Cloud Hadoop• Partial solution• Vendor lock-in

Page 34: Treasure Data: Big Data Analytics on Heroku

34Confidential

Page 35: Treasure Data: Big Data Analytics on Heroku

35

Example Use Case – MySQL to TD

Page 36: Treasure Data: Big Data Analytics on Heroku

36

Example Use Case – MySQL to TD