c* summit 2013: can't we all just get along? mariadb and cassandra by colin charles
Post on 10-May-2015
1.731 Views
Preview:
DESCRIPTION
TRANSCRIPT
#CASSANDRA13
Colin Charles | colin@mariadb.org | SkySQL Ab | http://mariadb.org/ @bytebot on Twitter | http://bytebot.net/blog/
MariaDB and Cassandra Interoperability
#CASSANDRA13
whoami*Work on MariaDB today
*Formerly of MySQL AB (USD$1bil acquisition -> Sun Microsystems)
*Worked on The Fedora Project & OpenOffice.org previously
*Monty Program Ab is a major sponsor of MariaDB
*SkySQL & Monty Program Ab merge
*MariaDB governed by MariaDB Foundation
#CASSANDRA13
What we will discuss today...*What is MariaDB?
*MariaDB Architecture
*The Cassandra Storage Engine (CassandraSE)
*Data & Command Mapping
*Use Cases
*Benchmarks
*Conclusions
#CASSANDRA13
What is MariaDB?*Community developed, feature enhanced, backward compatible MySQL
*Drop-in replacement to MySQL
*Shipped in many Linux distributions as a default
*Enhanced features: threadpool, table elimination, optimizer changes (subqueries materialize!), group commit in the replication binary log, HandlerSocket, SphinxSE, multi-source replication, dynamic columns
#CASSANDRA13
#CASSANDRA13
MariaDB/MySQL and NoSQL*HandlerSocket
*memcached access to InnoDB
*Hadoop Applier
*LevelDB Storage Engine
*Cassandra Storage Engine
*CONNECT Engine
#CASSANDRA13
Dynamic Columns*Store a different set of columns for every row in the table
*Basically a blob with handling functions (GET, CREATE, ADD, DELETE, EXISTS, LIST, JSON)
*Dynamic columns can be nested
*You can request rows in JSON format
*You can now name dynamic columns as well
INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE("column_name", "value");
#CASSANDRA13
Cassandra background*Distributed key/value store (limited range scan support), optionally flexible schema (pre-defined “static” columns, ad-hoc dynamic columns), automatic sharding/replication, eventual consistency
*Column families are like “tables”
*Row key -> column mapping
*Supercolumns are not supported
#CASSANDRA13
CQL at workcqlsh> CREATE KEYSPACE mariadbtest
... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1};
cqlsh> use mariadbtest;
cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key, data1 varchar, data2 bigint ) with compact storage;
cqlsh:mariadbtest> insert into cf1 (pk, data1,data2) values ('row1', 'data-in-cassandra', 1234);
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where pk='row1';
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where data2=1234;
Bad Request: No indexed columns present in by-columns clause with Equal operator
cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2';
Bad Request: line 1:34 missing EOF at 'or'
#CASSANDRA13
CQL*Looks like SQL at first glance
*No joins or subqueries
*No GROUP BY, ORDER BY must be able to use available indexes
*WHERE clause must represent an index lookup
*Simple goal of the Cassandra Storage Engine? Provide a “view” of Cassandra’s data from MariaDB
#CASSANDRA13
Getting started*Get MariaDB 10.0.2 from https://downloads.mariadb.org/
*Load the Cassandra plugin
- From SQL:MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so';
- Or start it from my.cnf
[mysqld]
...
plugin-load=ha_cassandra.so
#CASSANDRA13
Is everything ok?*Check to see that it is loaded - SHOW PLUGINS
MariaDB [(none)]> show plugins;
+--------------------+--------+-----------------+-----------------+---------+
| Name | Status | Type | Library | License |
+--------------------+--------+-----------------+-----------------+---------+
...
| CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL |
+--------------------+--------+-----------------+-----------------+---------+
#CASSANDRA13
Create an SQL table which is a view of a column familyMariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113';
MariaDB [test]> create table t2 (pk varchar(36) primary key,
-> data1 varchar(60),
-> data2 bigint
-> ) engine=cassandra
-> keyspace='mariadbtest'
-> thrift_host='10.196.2.113'
-> column_family='cf1';
*thrift_host can be set per-table
*@@cassandra_default_thrift_host allows to re-point the table to different node dynamically, and not change table DDL when Cassandra IP changes
#CASSANDRA13
Potential issues*SELinux blocks the connection
ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1]
*Disable SELinux: echo 0 > /selinux/enforce
*Cassandra 1.2 with Column Families (CFs) without “COMPACT STORAGE”
ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in keyspace mariadbtest
*Change in Cassandra 1.2 that broke Pig as well; we’ll update this soon
#CASSANDRA13
Accessing Cassandra data from MariaDB*Get data from Cassandra
MariaDB [test]> select * from t2;
+------+-------------------+-------+
| pk | data1 | data2 |
+------+-------------------+-------+
| row1 | data-in-cassandra | 1234 |
+------+-------------------+-------+
*Insert data into CassandraMariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123);
*Ensure Cassandra sees inserted datacqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
row2 | data-from-mariadb | 123
#CASSANDRA13
Data mapping between Cassandra and SQLcreate table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
#CASSANDRA13
Data mapping between Cassandra and SQLcreate table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
*Table must have a primary key
- name/type must match Cassandra’s rowkey
#CASSANDRA13
Data mapping between Cassandra and SQLcreate table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family- can use any table name, column_family=... specifies CF
*Table must have a primary key- name/type must match Cassandra’s rowkey
*Columns map to Cassandra’s static columns- name must be same as in Cassandra, datatypes must match, can be subset of CF’s columns
#CASSANDRA13
Datatype mappingCassandra MariaDB
blob BLOB, VARBINARY(n)
ascii BLOB, VARCHAR(n), use charset=latin1
text BLOB, VARCHAR(n), use charset=utf8
varint VARBINARY(n)
int INT
bigint BIGINT, TINY, SHORT
uuid CHAR(36) (text in MariaDB)
timestamp TIMESTAMP (second), TIMESTAMP(6) (microsecond), BIGINT
boolean BOOL
float FLOAT
double DOUBLE
decimal VARBINARY(n)
counter BIGINT
#CASSANDRA13
Dynamic columns revisited*Cassandra supports “dynamic column families”, can access ad-hoc columns
create table tbl( rowkey type PRIMARY KEY
column1 type, ... dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes) engine=cassandra keyspace=... column_family=...;
insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2')); select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;
#CASSANDRA13
All data mapping is safe*CassandraSE will refuse incorrect mappings (throw errors)create table t3 (pk varchar(60) primary key, no_such_field int)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra'
create table t3 (pk varchar(60) primary key, data1 double)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.UTF8Type'
#CASSANDRA13
Command Mapping*Cassandra commands
- PUT (upsert)
- GET (performs a scan)
- DELETE (if exists)
*SQL commands
- SELECT -> GET/Scan
- INSERT -> PUT (upsert)
- UPDATE/DELETE -> read/write
#CASSANDRA13
SELECT command mapping*MariaDB has an SQL interpreter
*CassandraSE supports lookups and scans
*Can now do:
- arbitrary WHERE clauses
- JOINs between Cassandra tables and MariaDB tables (BKA supported)
#CASSANDRA13
Batched Key Access is fast!select max(l_extendedprice) from orders, lineitem where
o_orderdate between $DATE1 and $DATE2 andl_orderkey=o_orderkey
#CASSANDRA13
DML command mapping*No SQL semantics
- INSERT overwrites rows
- UPDATE reads, then writes (have you updated what you read?)
- DELETE reads, then writes (can’t be sure if/what you’ve deleted)
*CassandraSE doesn’t make it SQL!
#CASSANDRA13
CassandraSE use cases*Collect massive amounts of data like web page hits
*Collect massive amounts of data from sensors
*Updates are non-conflicting
- keyed by UUIDs, timestamps
*Reads are served with one lookup
*Good for certain kinds of data (though moving from SQL entirely may be difficult)
#CASSANDRA13
Access Cassandra data from SQL*Send an update to Cassandra
- be a sensor
*Get a piece of data from Cassandra
- This webpage was last viewed by...
- Last known position of this user was...
- You are user number n of n-thousands...
#CASSANDRA13
From MariaDB...*Want a table that is:
- auto-replicated
- fault-tolerant
- very fast
*Get Cassandra and create a CassandraSE table
#CASSANDRA13
CassandraSE non-use cases*Huge, sift through all data joins?
- use Pig
*Bulk data transfer to/from Cassandra Cluster?
- use Sqoop
*A replacement for InnoDB?
- remember no full SQL semantics, InnoDB is useful for myriad reasons
#CASSANDRA13
A tiny benchmark*One table
*Amazon EC2 environment
- m1.large nodes
- ephemeral disks
*Stream of single-line INSERTs
*Tried InnoDB & CassandraSE
*No tuning
#CASSANDRA13
A tiny benchmark II*InnoDB with tuning, same setup as before
#CASSANDRA13
Conclusions*CassandraSE can be used to peek at data in Cassandra from MariaDB
*It is not a replacement for Pig/Hive
*It is really easy to setup & use
#CASSANDRA13
Roadmap*Do you want support for:
- fast counter column updates?
- awareness/discovery of Cassandra cluster topology?
- secondary indexes?
- ... ?
#CASSANDRA13
Resources*https://kb.askmonty.org/en/cassandrase/
*http://wiki.apache.org/cassandra/DataModel
*http://cassandra.apache.org/
*http://www.datastax.com/docs/1.1/ddl/column_family
#CASSANDRA13
THANK YOU
Colin Charles | colin@mariadb.org | SkySQL Ab | http://mariadb.org/ @bytebot on Twitter | http://bytebot.net/blog/
#CASSANDRA13
Cassandra SE internals*Developed against Cassandra 1.1
*Uses Thrift API
- cannot stream CQL resultset in 1.1
- cannot use secondary indexes
*Only supports AllowAllAuthenticator
*In Cassandra 1.2
- “CQL Binary Protocol” with streaming
- CASSANDRA-5234: Thrift can only read CFs “WITH COMPACT STORAGE”
top related