keynote: getting serious about mysql and hadoop at continuent

36
©Continuent 2014 Getting Serious about MySQL and Hadoop at Continuent Robert Hodges, CEO

Upload: continuent

Post on 27-Jan-2015

110 views

Category:

Technology


0 download

DESCRIPTION

Lean, mean MySQL and hulking Hadoop clusters may seem like an odd couple, but tying them together is now priority #1 for many MySQL users. This keynote talk on 1st day of this year's Percona Live MySQL Conference & Expo 2014 explores the data management trends spurring integration, how the MySQL community is stepping up, and where the integration may go in the future. Robert Hodges, CEO at Continuent, outlines how work at Continuent fits into this picture and how we are contributing to the MySQL community response to Hadoop.

TRANSCRIPT

Page 1: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Getting Serious about MySQL and Hadoop at

Continuent

Robert Hodges, CEO

Page 2: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Why should MySQL users care about Hadoop?

2

Page 3: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

What is a Hadoop?

3

Hadoop Distributed File System (HDFS)

MapReduce Spark

Hive

Storm

Pig

Shark

MahoutHBase

Oozie

Avro

HCatalog

Scalding

Stinger

Impala

Sqoop

AmbariCassandra

Zookeeper

Page 4: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

With this much funding it must be good

4

(ZDNet)

(jaxenter.com)

(forbes.com)

(451 Group)

Page 5: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Hadoop analyzes any type of data

5

Server Logs

Social media feeds

Geolocation data

Clickstreams

Sensor readings

Business transactions

Analytic reports

Page 6: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Hadoop data loading is simple

!mysql> select * into -> outfile '/tmp/sakila.rental.csv' -> fields terminated by ',' -> lines terminated by '\n' -> from sakila.rental; Query OK, 16044 rows affected (0.03 sec) !mysql> quit Bye $ hadoop fs -put /tmp/sakila.rental.csv

6

Page 7: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Hadoop exploits downward cost of storing and processing data

7

Disk Storage -- Average Cost Per Gigabyte

$0.01

$0.10

$1.00

$10.00

$100.00

$1,000.00

$10,000.00

1990 1993 1996 1999 2002 2005 2008 2011 2014

(Source: John McCallum, http://www.jcmit.com)

Page 8: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Hadoop is shifting from batch to real-time analytics

8

Cycle time for different iterative algorithms

Page Rank

K-Means Clustering

Logistic Regression

0 40 80 120 160

0.96

4.1

14

110

155

80

Core Hadoop Spark

(Source: Pat McDonough, http://spark-summit.org/2013)

Page 9: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Hadoop is becoming the way that users œš‘“›⁸see’”⁹ data

9

Page 10: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

What does it mean to integrate with Hadoop?

10

Page 11: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Three integration problems

11

1.Continuous, high-performance loading

2.Meaningful analytics on Hadoop

3.Optimized operation for large-scale deployment

Page 12: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Thesis: Snapshots

12

Data volumes? System load?

Latency? Change history?

Dump/load

Page 13: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

MySQL does not do it that way...

13

Binlog

Replication

Page 14: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Antithesis: Real-time replication

14

Raw files? Overwrite/append?

Replication

Binlog

Page 15: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Synthesis: Snapshots + real-time replication

15

Replication

CSV FilesCSV FilesBuffered

TransactionsBinlog

Dump/load

Page 16: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

We can implement that!

16

MySQL

binlog_format=row

MySQL Binlog

Tungsten 3.0 Master

hadoop

Tungsten 3.0 Slave

hadoop

CSV FilesCSV FilesCSV FilesCSV FilesCSV

Apache Sqoop/ETL

Fast data filtering

Buffered CSV

Programmable load scripts

Parallel applyParallel table

dumps

Low impact replication from the binlog

Page 17: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

How do you like your data?

(Your data stored in MySQL) +---------+--------------------+-------------+--------+ | film_id | title | rental_rate | length | +---------+--------------------+-------------+--------+ | 556 | MALTESE HOPE | 4.99 | 127 | | 557 | MANCHURIAN CURTAIN | 2.99 | 177 | | 558 | MANNEQUIN WORST | 2.99 | 71 | | 559 | MARRIED GO | 2.99 | 114 | +---------+--------------------+-------------+--------+ !

17

Page 18: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Does it really look better like this?

!!!

!

556,MALTESE HOPE,4.99,127\n 557,MANCHURIAN CURTAIN,3.99,177\n 558,MANNEQUIN WORST,2.99,71\n 559,MARRIED GO,2.99,114\n

18

field separator

file partitioning

record separator

compression type conversions

(Your data stored in Hadoop)

Page 19: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Or this?

19

!(INSERT)

I,57,556,2014-03-27 21:04:24.000,556,MALTESE HOPE,4.99,127\n !

(UPDATE) D,57,557,2014-03-27 21:04:24.000,557,\N,\N,\N\n I,57,558,2014-03-27 21:04:24.000,557,MANCHURIAN CURTAIN,2.99,177\n !

(DELETE) D,57,559,2014-03-27 21:04:24.000,558,\N,\N,\N\n

Page 20: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

One more thing to replicate...

20

Dump/load

Replication

CSV FilesCSV FilesBuffered

TransactionsBinlog

Table metadata

Page 21: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

A more civilized view of data

!!(Your data viewed through Hive) 556 MALTESE HOPE 4.99 127 557 MANCHURIAN CURTAIN 3.99 177 558 MANNEQUIN WORST 2.99 71 559 MARRIED GO 2.99 114

21

Page 22: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Are we done yet?

22

Transaction logs Snapshot

????

Page 23: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Introducing a useful MapReduce trick...

23

Transaction logs Snapshot

UNION ALL

Emit last row per key if not a delete

MAP

REDUCE

Materialized view including all updates

Sort by key(s), transaction orderSHUFFLE

Page 24: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

...With some amazing properties

24

Apache Sqoop

Tungsten Replication

CSV FilesCSV FilesBuffered CSV Files

No replication failures due to consistency

Reconstruct consistent views at will

No locks No transactions No need to pause processing

Reprovision any table at will

Table metadata

Page 25: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

We can implement that too!!

25

https://github.com/continuent/continuent-tools-hadoop

Continuent Hadoop Tools

Schema creation

Materialized view

generation

Data comparison

Apache 2.0 licensing

Page 26: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Optimizing large scale deployments

26

Replicator

m1 (slave)

m2 (slave)

m3 (slave)

Replicator

m1 (master)

m2 (master)

m3 (master)

Replicator

Replicator

RBR

RBR

RBR

Page 27: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Where we want to be

27

Single path loading

CSV FilesCSV FilesBuffered

TransactionsBinlog

Page 28: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Where we want to be

28

Single path loading

CSV FilesCSV FilesBuffered

TransactionsBinlog

Page 29: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Tungsten 3.0 Roadmap for Hadoop

29

Q1 2014 Q2 2014

Features • Parallel extractor • Polished MapReduce

tools • Improved schema

change handling • Binary data

conversion • HortonWorks 2.0

Features • Scripted load • Better block commit • Hive CSV format • Hive DDL generation • Partitioned files • Auto-recovery • Parallel batch apply • Sqoop integration • Cloudera 4.x/5.0

Page 30: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

How can we prepare for Hadoop integration?

30

Page 31: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Users can prepare...

• Use Unicode/UTF8

• Standardize on UTC for time

• Enable row replication

• Cluster your data in a way that supports restarts

31

Page 32: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

MySQL can prepare...

32

By being MySQL

Page 33: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

The MySQL community can prepare...

• Fast heterogeneous replication and loading

• Innovative projects to make relational data easy to consume on Hadoop

• Competing solutions that improve life for users

33

Page 34: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Conclusion

• Hadoop is for real and the MySQL community needs to adapt

• The challenge is to move data to Hadoop and make it easy to integrate into analytics

• MySQL can be *the* preferred RDBMS to use with Hadoop

34

Page 35: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Thanks to our many customers

35

23

Page 36: Keynote: Getting Serious about MySQL and Hadoop at Continuent

©Continuent 2014

Wed 2:20pm Ballroom B - Hadoop for MySQL People !

Thurs 1pm Ballroom D - From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication

We’re Hiring!

http://www.continuent.com