big data with mysql

26
Ivan Zoratti Big Data with MySQL Percona Live Santa Clara 2013 V1304.01 Friday, 3 May 13

Upload: ivan-zoratti

Post on 10-May-2015

3.633 views

Category:

Technology


1 download

DESCRIPTION

I presented this slides for the first time at the Percona Live Conference 2013 in Santa Clara.

TRANSCRIPT

Page 1: Big Data with MySQL

Ivan Zoratti

Big Data with MySQLPercona Live Santa Clara 2013

V1304.01 Friday, 3 May 13

Page 2: Big Data with MySQL

Who is Ivan

?Friday, 3 May 13

Page 3: Big Data with MySQL

SkySQL

•Leading provider of open source databases, services and solutions

•Home for the founders and the original developers of the core of MySQL

•The creators of MariaDB, the drop-off, innovative replacement of MySQL

Friday, 3 May 13

Page 4: Big Data with MySQL

What is Big Data?http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpg

Friday, 3 May 13

Page 5: Big Data with MySQL

PAGE

Big Data!

Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

5

http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpg

Friday, 3 May 13

Page 6: Big Data with MySQL

PAGE

Big Data By Structure

6

Unstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it may

come handy•Storing unstructured data is usually cheaper than

storing it in a more structured datastore•Does not fit well in a relational database•Examples:•Text: Plain text, documents, web content,

messages•Bitmap: Image, audio, video

•Typical approach:•Mining, pattern recognition, tagging•Usually batch analysis

Structured•Store only what you need•In a good format, ready to be used•You should already know how to use it, or at

least what it means•Storing structured data is quite expensive•Raw data, indexing, denormalisation,

aggregation•A relational database is still the best choice•Examples:•Machine-Generated Data (MGD)•Tags, counters, sales

•Typical approach:•BI tools, reporting•Real time analysis change data capture

Friday, 3 May 13

Page 7: Big Data with MySQL

PAGE

Unstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it may

come handy•Storing unstructured data is usually cheaper than

storing it in a more structured datastore•Does not fit well in a relational database•Examples:•Text: Plain text, documents, web content,

messages•Bitmap: Image, audio, video

•Typical approach:•Mining, pattern recognition, tagging•Usually batch analysis

Structured•Store only what you need•In a good format, ready to be used•You should already know how to use it, or at

least what it means•Storing structured data is quite expensive•Raw data, indexing, denormalisation,

aggregation•A relational database is still the best choice•Examples:•Machine-Generated Data (MGD)•Tags, counters, sales

•Typical approach:•BI tools, reporting•Real time analysis change data capture

Big Data By Structure

7Friday, 3 May 13

Page 8: Big Data with MySQL

PAGE

How “Big” is Big Data?

•Data Factors•Size•Speed to collect/

generate•Variety

•Resources•Administrators•Developers•Infrastructure

•Growth•Collection•Processing•Availability

•To whom?•For how long?•In which format?

•Aggregated•Detailed

8Friday, 3 May 13

Page 9: Big Data with MySQL

PAGE

How to manage Big Data

•Collection - Storage - Archive

•Load - Transform - Analyze

•Access - Explore - Utilize

9

http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-it

Friday, 3 May 13

Page 10: Big Data with MySQL

Big Data with MySQLhttp://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpg

Friday, 3 May 13

Page 11: Big Data with MySQL

PAGE

Technologies toUse / Consider / Watch•MyISAM and MyISAM compression

•InnoDB compression

•MySQL 5.6 Partitioning

•MariaDB Optimizer

•MariaDB Virtual & Dynamic Columns

•Cassandra Storage Engine

•Connect Storage Engine

•Columnar Databases•InfiniDB•Infobright

•TokuDB Storage Engine

11Friday, 3 May 13

Page 12: Big Data with MySQL

PAGE

Columnar Databases

•Automatic compression

•Automatic column storage

•Data distribution

•Map/Reduce approach

•MPP / Parallel loading

•No indexes

•On public clouds, HW or SW appliances

12Friday, 3 May 13

Page 13: Big Data with MySQL

PAGE

TokuDB

•Increased Performance

•Increased Compression

•Online administration

•No Index rebuild

13Friday, 3 May 13

Page 14: Big Data with MySQL

PAGE

MyISAM

•Static, dynamic and compressed format

•Multiple key cache, CACHE INDEX and LOAD INDEX

•Compressed tables

•Horizontal partitioning (manual)

•External locking

14Friday, 3 May 13

Page 15: Big Data with MySQL

PAGE

InnoDB/XtraDB•Data Load

•Pre-order data•Split data into chunks•unique_checks = 0;•foreign_key_checks = 0;•sql_log_bin = 0;•innodb_autoinc_lock_mode = 2;

•Compression and block size

•Persistent optimizer stats•innodb_stats_persistent•innodb_stats_auto_recalc

15

SET GLOBAL innodb_file_per_table = 1;SET GLOBAL innodb_file_format = Barracuda;

CREATE TABLE t1 ( c1 INT PRIMARY KEY, c2 VARCHAR(255) ) ROW_FORMAT = COMPRESSED KEY_BLOCK_SIZE = 8;

LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t1;Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec)Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0

LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t2;Query OK, 134217728 rows affected (25 min 20.75 sec)Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0

Friday, 3 May 13

Page 16: Big Data with MySQL

PAGE

Partitioning (MySQL 5.6)•Partitioning Types

•RANGE, LIST, RANGE COLUMN, HASH, LINEAR HASH, KEY LINEAR KEY, sub-partitions

•Partition and lock pruning

•Use of INDEX and DATA DIRECTORY

•PARTITION ADD, DROP, REORGANIZE, COALESCE, TRUNCATE, EXCHANGE, REBUILD, OPTIMIZE, CHECK, ANALYZE, REPAIR

16

CREATE TABLE t1 ( c1 INT, c2 DATE ) PARTITION BY RANGE( YEAR( c2 ) ) SUBPARTITION BY HASH ( TO_DAYS( c2 ) ) ( PARTITION p0 VALUES LESS THAN (1990) ( SUBPARTITION s0 DATA DIRECTORY = '/disk0/data' INDEX DIRECTORY = '/disk0/idx', SUBPARTITION s1 DATA DIRECTORY = '/disk1/data' INDEX DIRECTORY = '/disk1/idx' ),...

ALTER TABLE t1 EXCHANGE PARTITION p3 WITH TABLE t2;

-- Range and List partitionsALTER TABLE t1 REORGANIZE PARTITION p0,p1,p2,p3 INTO ( PARTITION m0 VALUES LESS THAN (1980), PARTITION m1 VALUES LESS THAN (2000));

-- Hash and Key partitionsALTER TABLE t1 COALESCE PARTITION 10;ALTER TABLE t1 ADD PARTITION PARTITIONS 5;

Friday, 3 May 13

Page 17: Big Data with MySQL

PAGE

MariaDB Optimizer

•Multi-Range Read (MRR)*

•Index Merge / Sort intersection

•Batch Key Access*

•Block hash join

•Cost-based choice of range vs. index_merge

•ORDER BY ... LIMIT <limit>* •MariaDB 10

•Subqueries•Semi-join*

•Materialization*

•subquery cache

•LIMIT ... ROWS EXAMINED <limit>

17

(*) - Available in MySQL 5.6

Friday, 3 May 13

Page 18: Big Data with MySQL

PAGE

Virtual & Dynamic ColumnsVIRTUAL COLUMNS

•For InnoDB, MyISAM and Aria•PERSISTENT (stored) or VIRTUAL

(generated)

18

CREATE TABLE t1 ( c1 INT NOT NULL, c2 VARCHAR(32), c3 INT AS ( c1 MOD 10 ) VIRTUAL, c4 VARCHAR(5) AS ( LEFT(B,5) ) PERSISTENT);

DYNAMIC COLUMNS•Implement a schemaless,

document store•COLUMN_ CREATE, ADD, GET, LIST,

JSON, EXISTS, CHECK, DELETE•Nested colums are allowed•Main datatypes are allowed•Max 1GB documents

CREATE TABLE assets ( item_name VARCHAR(32) PRIMARY KEY, dynamic_cols BLOB );

INSERT INTO assets VALUES ( 'MariaDB T-shirt', COLUMN_CREATE( 'color', 'blue', 'size', 'XL' ) );INSERT INTO assets VALUES ( 'Thinkpad Laptop', COLUMN_CREATE( 'color', 'black', 'price', 500 ) );

Friday, 3 May 13

Page 19: Big Data with MySQL

PAGE

Cassandra Storage Engine•Column Family == Table•Rowkey, static and dynamic

columns allowed•Batch key access supportSET cassandra_default_thrift_host = '192.168.0.10'

CREATE TABLE cassandra_tbl ( rowkey INT PRIMARY KEY, col1 VARCHAR(25), col2 BIGINT, dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes ) ENGINE = cassandra KEYSPACE = 'cassandra_key_space' COLUMN_FAMILY = 'column_family_name';

19Friday, 3 May 13

Page 20: Big Data with MySQL

PAGE

Connect Storage Engine•Any file format as MySQL TABLE:

•ODBC•Text, XML, *ML•Excel, Access etc.

•MariaDB CREATE TABLE options•Multi-file table•Table Autocreation

•Condition push down

•Read/Write and Multi Storage Engine Join

•CREATE INDEX

20

CREATE TABLE handoutENGINE = CONNECTTABLE_TYPE = XMLFILE_NAME = 'handout.htm'HEADER = yes OPTION_LIST = 'name = TABLE, coltype = HTML, attribute = (border=1;cellpadding=5)';

Friday, 3 May 13

Page 21: Big Data with MySQL

Starting Your Big Data ProjectFriday, 3 May 13

Page 22: Big Data with MySQL

PAGE

Why would you use MySQL?

• Time

• Knowledge

• Infrastructure

• Costs

• Simplified Integration

• Not so “big” data

22Friday, 3 May 13

Page 23: Big Data with MySQL

PAGE

Apache Hadoop & Friends

23

HDFS

MapReduce

PIG HIVE

HCatalog

HBA

SE

ZooK

eepe

r

•Mahout•Ambari, Ganglia,

Nagios•Sqoop•Cascading•Oozie•Flume•Protobuf, Avro,

Thrift•Fuse-DFS•Chukwa•Cassandra

Friday, 3 May 13

Page 24: Big Data with MySQL

PAGE

MySQL & Friends

24

MySQL/MariaDB/Storage Engines

SQL Optimizer

ScriptsStored Procedures DML

DB Schema / DDL

MyS

QL/

Mar

iaD

B

SkyS

QL

DS

•Mahout•SDS, Ganglia,

Nagios•mysqlimport•Cascading•Talend, Pentaho•Connect

Friday, 3 May 13

Page 25: Big Data with MySQL

PAGE

Join us at the Solutions Day

•Cassandra and Connect Storage Engine

•Map/Reduce approach - Proxy optimisation

•Multiple protocols and more

25Friday, 3 May 13