cassandra - doag deutsche oracle-anwendergruppe e.v. · • cql creates a common languageso that...
TRANSCRIPT
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Apache CassandraBig Data: Why do I need Cassandra
Jan Ott
@jhsott
Our company.
Trivadis DOAG16: Cassandra2 17.11.16
Trivadis is a market leader in IT consulting, system integration, solution engineeringand the provision of IT services focusing on andtechnologiesin Switzerland, Germany, Austria and Denmark. We offer our services in the followingstrategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
Trivadis DOAG16: Cassandra3 17.11.16
14 Trivadis branches and more than600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:CHF 5.0 million
Financially self-supporting andsustainably profitable
Experience from more than 1,900 projects per year at over 800customers
17.11.16 Trivadis DOAG16: Cassandra4
Technology on its own won't help you.You need to know how to use it properly.
Agenda
1. Introduction - What is Apache Cassandra?
2. Cassandra Data Model
3. First Steps in the Cassandra World
4. Cassandra Query Language - CQL
5. Summary
17.11.16 Trivadis DOAG16: Cassandra5
What is Apache Cassandra?
17.11.16 Trivadis DOAG16: Cassandra6
The Cassandra Elevator Pitch
„Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneablyconsistent, row-oriented database that bases its distributiondesign on Amazon’s Dynamo and its data model on Google’sBigtable. Created at Facebook, it is now used at some of themost popular sites on the Web.“
Cassandra: The Definitive Guide by Jeff Carpenter and Eben Hewitt
17.11.16 Trivadis DOAG16: Cassandra7
History of Cassandra
Bigtable Dynamo
17.11.16 Trivadis DOAG16: Cassandra8
Highlights
• Apache Cassandra™ is free• Distributed & Decentralized• Elastic Scalable• High Available• Fault Tolerance• Tunable data consistency• CQL language (like SQL)
17.11.16 Trivadis DOAG16: Cassandra9
Elastic Scalable
• Capable of comfortably scaling to petabytes• New nodes = Linear performance increases• Add and remove nodes online
100'000
txns/sec
200'000
txns/sec
400'000
txns/sec
17.11.16 Trivadis DOAG16: Cassandra10
Continuous availability
• All nodes the same
• Customized replication affords tunable data redundancy
• Read/write from any node
• Can replicate data among different physical data center racks
Node1
Node2
Node3
Node4[26-50]
[0-25]
[51-75]
[76-100] [0-25]
[0-25]
[26-50]
[26-50]
[51-75]
[51-75]
[76-100]
[76-100]
client
17.11.16 Trivadis DOAG16: Cassandra11
Multi Data-centers (DCs)
• Out-of-the-box (config only)
• Client writes local
• Data syncs across WAN
• Replication per Data Center
• Cloud platforms support
• AWS config for multi-regions DCs
• Google Compute Engine (GCE) Support
• Microsoft Azure support
• Cloud Stack support
17.11.16 Trivadis DOAG16: Cassandra12
Multi DC usages
• Data locality• Disaster recovery
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
WestEastClient
Trivadis DOAG16: Cassandra17.11.1613
Write Requests
coordinator sends a write request to all replicas that own the row being written
17.11.16 Trivadis DOAG16: Cassandra14
Read Requests
There are two types of read requests that a coordinator can send to a replica:
• A direct read request
• A background read repair request
The number of replicas contacted by a direct read request is determined by the consistency level specified by the client.
17.11.16 Trivadis DOAG16: Cassandra15
Tunable Data Consistency
What do I need?
• Writes• Reads• Consistency 1
2
3
4
5
6
� Any
� One/Two/Three
� Local_One
� Quorum
� Local_Quorum
� Each_Quorum
� All
Writes
� One/Two/Three
� Local_One
� Quorum
� Local_Quorum
� Each_Quorum
� All
Reads
17.11.16 Trivadis DOAG16: Cassandra16
Who is using Cassandra?
Largest infrastructure running over 75,000 Cassandra nodes, storing more than 10 petabytes of data with one cluster was over 1,000 nodes
17.11.16 Trivadis DOAG16: Cassandra17
Cassandra Data Model
17.11.16 Trivadis DOAG16: Cassandra18
Cassandra Data Model – 3. Normalform
• Forget Normalization – DENORMALIZE
• Design by Query
• No Joins – Denormalize
- Model
- Materialized Views
- Do it on the client side – not recommended
• No Referential Integrity
- Possible to define but not enforced
17.11.16 Trivadis DOAG16: Cassandra19
How Cassandra stores data
• Model brought from Google Bigtable• Row Key and a lot of columns• Column names sorted (UTF8, Int, Timestamp, etc.)
ColumnName … Column Name
ColumnValue ColumnValue
Timestamp Timestamp
TTL TTL
RowKey
1 2Billion
BillionofRows
Trivadis DOAG16: Cassandra17.11.1620
Static Column Family – "Skinny Row"
21
rowkey
CREATE TABLE skinny (rowkey text, c1 text PRIMARY KEY,c2 text,c3 text,
PRIMARY KEY (rowkey));
GrowsuptoBillionofRows
rowkey-1 c1 c2 c3
value-c1 value-c2 value-c3
rowkey-2 c1 c3
value-c1 value-c3
rowkey-3 c1 c2 c3
value-c1 value-c2 value-c3
c1 c2 c3
PartitionKey
17.11.16 Trivadis DOAG16: Cassandra
Dynamic Column Family – "Wide Row"
22
rowkey
BillionofRows rowkey-1 ckey-1:c1 ckey-1:c2
value-c1 value-c2
rowkey-2
rowkey-3
CREATE TABLE wide (rowkey text, ckey text,c1 text,c2 text,
PRIMARY KEY (rowkey, ckey) WITH CLUSTERING ORDER BY (ckey ASC);
ckey-2:c1 ckey-2:c2
value-c1 value-c2
ckey-3:c1 ckey-3:c2
value-c1 value-c2
ckey-1:c1 ckey-1:c2
value-c1 value-c2
ckey-2:c1 ckey-2:c2
value-c1 value-c2
ckey-1:c1 ckey-1:c2
value-c1 value-c2
ckey-2:c1 ckey-2:c2
value-c1 value-c2
ckey-3:c1 ckey-3:c2
value-c1 value-c2
1 2Billion
PartitionKey Clustering Key
17.11.16 Trivadis DOAG16: Cassandra
First Steps
17.11.16 Trivadis DOAG16: Cassandra23
Getting Cassandra
Apache Cassandra Distribution• http://cassandra.apache.org/
DataStax Distribution• DataStax Enterprice 5.0 - Sandbox - VM
https://academy.datastax.com/downloads/welcomeVM with Cassandra (1 node), DataStax DevCenter, DataStax OpsCenter
• Oracle Virtual Boxhttp://www.oracle.com/technetwork/server-storage/virtualbox/downloads/index.html
• Login – datastax/datastax
17.11.16 Trivadis DOAG16: Cassandra24
DataStax OpsCenter
• At-a-Glance Cluster Management
• Point-and-Click Provisioning andAdministration
• Secured Administration
• Always On Management and Monitoring
• Visual Monitoring and Tuning
• Best Practice Advice
• Proactive Assistance
• Smart Data Protection17.11.16 Trivadis DOAG16: Cassandra25
DataStax OpsCenter
Point-and-Click Provisioning and Administration
17.11.16 Trivadis DOAG16: Cassandra26
DataStax OpsCenter
Visual Monitoring and Tuning
17.11.16 Trivadis DOAG16: Cassandra27
CQL – Cassandra Query Language
17.11.16 Trivadis DOAG16: Cassandra28
Introducing CQL
• CQL is a reintroduction of schema so that you don't have to read code to understand the data model.
• CQL creates a common language so that details of the data model can be easily communicated.
• CQL is a best-practices Cassandra interface and hides the messy details.
17.11.16 Trivadis DOAG16: Cassandra29
CQL Language
• SQL like syntax
• Data Definition Language – DDL CREATE / ALTER / DROP / …
• Data Manipulation Language – DML INSERT, UPDATE, DELETE
• Query data with SELECT
• Build in Functions – COUNT, MIN, MAX, sum, avg, LIMIT, ...
• UDF – User Defined Function / UDA - User Defined Aggregate
17.11.16 Trivadis DOAG16: Cassandra30
CQL Shell for Apache Cassandra
cqlsh is the command line utility for execution CQL commands (think of SQL*Plus for Cassandra)
CQL3 is default since Cassandra 1.2
$ cqlshConnected to DataStaxCluster at localhost:9160.[cqlsh 4.1.0 | Cassandra 2.0.5.24 | CQL spec 3.1.1 | Thrift protocol 19.39.0]Use HELP for help.cqlsh>
17.11.16 Trivadis DOAG16: Cassandra31
CQL Shell for Apache Cassandra
$ cat create-table.cql | cqlsh
$ cqlsh –f create-table.cql
cqlsh> SOURCE '~/cassandra_training/cql/create-table.cql'
Execute a script with the –f option
Alternatively pie scripts into cqlsh
Source files inside cqlsh
17.11.16 Trivadis DOAG16: Cassandra32
Creating a Keyspace
Create a keyspace with SimpleStrategy and replication factor option
Make the new keyspace the active one
cqlsh> CREATE KEYSPACE my_spaceWITH REPLICATION = {'class':'SimpleStrategy',
'replication_factor':1};
cqlsh> USE my_space;
cqlsh:my_space>
17.11.16 Trivadis DOAG16: Cassandra33
Describing a Keyspace
Use the DESCRIBE KEYSPACE to show the metadata of the keyspace
cqlsh> DESCRIBE KEYSPACE my_space;
CREATE KEYSPACE training WITH replication = {'class': 'SimpleStrategy','replication_factor': '1'
};
cqlsh>
17.11.16 Trivadis DOAG16: Cassandra34
Create a Static table Dept
Use CREATE TABLE to create a static column family (table) named ”dept"
cqlsh:my_space> CREATE TABLE dept(deptno int,dname varchar,loc varchar,PRIMARY KEY (deptno));
dname loc
10 ACCOUNTING NEWYORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
17.11.16 Trivadis DOAG16: Cassandra35
Create a Dynamic table (wide-row) Employee
A Dynamic Table is also created with the CREATE TABLE statement but using a composite partition key cqlsh:training> CREATE TABLE emp(
empno int,ename varchar,…deptno int,primary key (dname,ename));
KING:empno ... CLARK:empno ...
10 7839 ... 7782 ...
JONES:empno ... SCOTT:empno ... FORD:empno ...
20 7566 ... 7788 ... 7902 ...
17.11.16 Trivadis DOAG16: Cassandra36
Truncate / Drop Table
Use TRUNCATE to truncate the data
Use DROP TABLE to drop the whole table, operation is irreversible and removes all information within the specified table!
• Will raise an error, if it does not exist, use IF EXISTS to prevent (new in 2.0):
cqlsh:training> TRUNCATE employee;
cqlsh:training> DROP TABLE employee;
cqlsh:training> DROP TABLE IF EXISTS employee;
17.11.16 Trivadis DOAG16: Cassandra37
Insert data into Dept
• PRIMARY KEY is always required• Insert with same primary key => update
cqlsh:training> INSERT INTO dept (deptno, dname, loc)VALUES (10, 'ACCOUNTING', 'NEW YORK');
17.11.16 Trivadis DOAG16: Cassandra38
Retrieving data from Dept table
SELECT statement returns rows and columns, just as in SQLIt can optionally also have a WHERE clause, an ORDER BY clause and a LIMIT clause
cqlsh:training> SELECT deptno, dname FROM dept LIMIT 2;
deptno | dname--------+------------
10 | ACCOUNTING30 | SALES
(2 rows)
17.11.16 Trivadis DOAG16: Cassandra39
Retrieving data from Dept table (II)
Restriction on column other than PRIMARY KEY won't work
Can be solved with an Index (but be careful, better use de-normalization)
cqlsh:my_space> SELECT * FROM dept WHERE loc = 'NEW YORK';InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:my_space> CREATE INDEX ON dept(loc);cqlsh:my_space> SELECT * FROM dept WHERE loc = 'NEW YORK';deptno | dname | loc
--------+------------+----------10 | ACCOUNTING | NEW YORK
17.11.16 Trivadis DOAG16: Cassandra40
Update data in Dept
• WHERE over Primary Key• If Primary Key does not exist => INSERT
cqlsh:my_space> UPDATE dept SET loc = 'LOS ANGELES' WHERE deptno = 10;
17.11.16 Trivadis DOAG16: Cassandra41
Cassandra Data TypesCategory CQLDataType Description
String ascii US-ASCIIcharacterstring
text UTF-8encodedstring, usedmostofthetimefor
storingStringdata.
varchar UTF-8Strings.
inet Used forstoringIPaddresses
Numeric int 32-bitsignedinteger
float 32-bitIEEE-754floatingpoint
double 64-bitIEEE-754floatingpoint
varint Arbitraryprecision integers
bigint 64-bitnumber,equivalenttolong.
decimal Variable-precisiondecimal
counter Distributedcountervalue(64-bitlong)
17.11.16 Trivadis DOAG16: Cassandra42
Cassandra Data Types (II)
Category CQLDataType Description
UUIDs uuid AUUIDinstandardUUIDformat
timeuuid Type1UUIDonly,forstoringuniquetime-base IDs
Collections list Ordered collectionofoneormoreelements
map Collectionofarbitrary key-valuepairs
set Unorderedcollectionofoneormoreunique
elements
Miscellaneous boolean Boolean(true/false)
blob Usedfor storingbinarydatawritteninhexadecimal
timestamp Date/Time
17.11.16 Trivadis DOAG16: Cassandra43
Batch operation
• COMMIT ?• BEGIN BATCH … APPLY BATCH – execute multiple mutations – single operation
BEGIN BATCHINSERT INTO dept (deptno, dname, loc)VALUES (50, 'IT', 'ZURICH');UPDATE emp SET sal = 9000 WHERE empno = 9000;
APPLY BATCH;
17.11.16 Trivadis DOAG16: Cassandra44
Alter Table
• ALTER TABLE change meta data
• CQL is quickflexible schema, not changes to existing data
cqlsh:my_space> ALTER TABLE deptADD operational BOOLEAN;
cqlsh:training> DESCRIBE TABLE employee;
CREATE TABLE my_space.dept (deptno int PRIMARY KEY,dname text,loc text,operational boolean
) WITH...;
cqlsh:my_space> SELECT * FROM dept LIMIT 2;deptno | dname | loc | operational
--------+------------+-------------+-------------
50 | IT | ZURICH | null10 | ACCOUNTING | LOS ANGELES | null
(2 rows)
17.11.16 Trivadis DOAG16: Cassandra45
Collections
CQL3 also supports collections for storing complex data structures• Set {value,…}, List [value,…], Map {key:value,…}
cqlsh:training> CREATE TABLE collection_sample(id int PRIMARY KEY,
string_set set<text>,string_list list<text>,string_map map<text, text>);
cqlsh:training> INSERT INTO coll(id, string_set, string_list, string_map) VALUES (1,
{'text1','text2','text1'}, ['text1','text2','text1'], {'key1':'value1'});
17.11.16 Trivadis DOAG16: Cassandra46
Collections (II)
cqlsh:training> SELECT * FROM collection_sample;
id | string_list | string_map | string_set----+-----------------------------+--------------------+--------------------1 | ['text1', 'text2', 'text1'] | {'key1': 'value1'} | {'text1', 'text2'}
(1 rows)
17.11.16 Trivadis DOAG16: Cassandra47
UDF – User Defined Function
Sample CodeCREATE FUNCTION count_if_true(input boolean) RETURNS NULL ON NULL INPUT RETURNS intLANGUAGE java AS 'if (input) return 1; else return total;';
SELECT door_number, count_if_true(is_open)FROM my_doors;
„CREATE OR REPLACE” or “IF NOT EXSITS” Syntax possible
17.11.16 Trivadis DOAG16: Cassandra48
UDA – User Defined Aggregate
Sample CodeCREATE FUNCTION state_count_if_true(total int, input boolean) RETURNS NULL ON NULL INPUT RETURNS intLANGUAGE java AS 'if (input) return total+1; else return total;';
CREATE AGGREGATE total_open (boolean)SFUNC state_count_if_trueSTYPE intINITCOND 0;
SELECT door_number, total_open(is_open)FROM my_doors;
„CREATE OR REPLACE” or “IF NOT EXSITS” Syntax possible
17.11.16 Trivadis DOAG16: Cassandra49
Materialized Views
Relieve the pain of manual denormalization
cqlsh:training> CREATE MATERIALIZED VIEW employee_by_role (AS SELECT role, name, ageFROM employeeWHERE role IS NOT NULLPRIMARY KEY (role, name);
cqlsh:training> CREATE TABLE employee_by_role (role text, name text, age int,PRIMARY KEY (role, name));
Cassandra3.0
17.11.16 Trivadis DOAG16: Cassandra50
Time-to-Live (TTL) on Insert
• Insert a row with a TTL in seconds (30s) • after that the row is deleted
cqlsh:my_space> UPDATE emp USING ttl 15 SET sal = 8000 WHERE empno = 9000;
cqlsh:my_space> SELECT ename, sal, ttl(sal) FROM emp WHERE empno = 9000;
ename | sal | ttl(sal)-------+------+----------
null | 8000 | 15
(1 rows)
17.11.16 Trivadis DOAG16: Cassandra51
Summary
• Just great J• No single point of failure – Ring Model• Distribution over nodes / rack’s / data center’s• Tuneable Consistency• CQL• Spark / Cassandra Integration
• CQL limited
• Forget 20 years of experience in relational modelling L => DENORMALIZE J
17.11.16 Trivadis DOAG16: Cassandra52
References
• BooksCassandra: The Definite Guide, 2nd Edition
• Apache – Cassandra CQL Documentationhttps://cassandra.apache.org/doc/latest/cql/index.html
• DataStax – CQL Documentationhttp://docs.datastax.com/en/cql/3.3/cql/cql_using/useAboutCQL.html
• Netflix and Cassandrahttp://techblog.netflix.com/search/label/Cassandra
17.11.16 Trivadis DOAG16: Cassandra54
Trivadis @ DOAG 2016
Booth: 3rd Floor – next to the escalatorKnow how, T-Shirts, Contest and Trivadis Power to goWe look forward to your visitBecause with Trivadis you always win !
17.11.16 Trivadis DOAG16: Cassandra55