version 4svoboda/courses/2016... · datamodel columnvalues • emptyvalue null • atomicvalue...
TRANSCRIPT
B4M36DS2: Database Systems 2h p://www.ksi.mff.cuni.cz/˜svoboda/courses/2016-1-B4M36DS2/
Lecture 10
Column-Family Stores: CassandraMar n [email protected]
5. 12. 2016
Charles University in Prague, Faculty of Mathema cs and PhysicsCzech Technical University in Prague, Faculty of Electrical Engineering
Lecture OutlineColumn-family stores
• General introduc onApache Cassandra
• Data model• Cassandra query language
DDL statementsDML statements
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 2
Wide Column StoresData model
• Column family (table)Table is a collec on of similar rows (not necessarily iden cal)
• RowRow is a collec on of columns
– Should encompass a group of data that is accessed togetherAssociated with a unique row key
• ColumnColumn consists of a column name and column value(and possibly other metadata records)Scalar values, but also flat sets, lists or mapsmay be allowed
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 3
Wide Column StoresQuery pa erns
• Create, update or remove a row within a given column family• Select rows according to a row key or simple condi ons
Warning• Wide column stores are not just a special kind of RDBMSs
with a variable set of columns!
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 4
Wide Column StoresSuitable use cases
• Event logging, content management systems, blogs, …I.e. for structured flat data with similar schema
When not to use• ACID transac ons are required• Complex queries: aggrega on (SUM, AVG, …), joining, …• Early prototypes: i.e. when database design may change
Representa ves• Apache Cassandra, Apache HBase, Apache Accumulo,
Hypertable, Google Bigtable
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 5
Wide Column StoresRepresenta ves
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 6
Apache Cassandra
Apache CassandraColumn-family database
• h p://cassandra.apache.org/• Features
Open-source, high availability, linear scalability, sharding(spanning mul ple datacenters), peer-to-peer configurablereplica on, tunable consistency, MapReduce support
• Developed by Apache So ware Founda onOriginally at Facebook
• Implemented in Java• Opera ng systems: cross-pla orm• Ini al release in 2008
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 8
Data ModelDatabase system structure
Instance→ keyspaces→ tables→ rows→ columns
• Keyspace• Table (column family)
Collec on of (similar) rowsTable schema must be specified, yet can be modified later on
• RowCollec on of columnsRows in a table do not need to have the same columnsEach row is uniquely iden fied by a primary key
• ColumnName-value pair + addi onal data
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 9
Data ModelColumn values
• Empty valuenull
• Atomic valueNa ve data types such as texts, integers, dates, …Tuples
– Tuple of anonymous fields, each of any type (even different)User defined types (UDT)
– Set of named fields of any type
• Collec onsLists, sets, andmaps
– Nested tuples, UDTs, or collec ons are allowed, but currentlyonly in frozen mode (such elements are serialized when stored)
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 10
Data ModelCollec ons
• ListSorted collec on of non-unique valuesList elements are ordered by their posi onsNot always recommended because of performance issues
– Internal read-before-write opera ons have to be executed
• SetSorted collec on of unique values
• MapSorted collec on of key-value pairsMap elements are ordered by their keysKeys must be unique
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 11
Sample DataTable of actors
id
trojanname
( Ivan, Trojan )year1964
movies{ samotari, medvidek }
machacek
name( Jiří, Macháček )
year1966
movies{ medvidek, vratnelahve, samotari }
schneiderovaname
( Jitka, Schneiderová )year1973
movies{ samotari }
sverakname
( Zdeněk, Svěrák )year1936
movies{ vratnelahve }
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 12
Sample DataTable ofmovies
id
samotaritle
Samotářiyear2000
actorsnull
genres[ comedy, drama ]
medvidek
tleMedvídek
director( Jan, Hřebejk )
year2007
actors{ trojan: Ivan, machacek: Jirka }
vratnelahvetle
Vratné lahveyear2006
actors{ machacek: Robert Landa }
zelarytle
Želaryyear2003
actors{}
genres[ romance, drama ]
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 13
Data ModelAddi onal data associated with…
the whole column in case of atomic values, orevery element of a collec on
• Time-to-live (TTL)A er a certain amount of me (number of seconds)a given value is automa cally deleted
• Timestamp (write me)Timestamp of the last value modifica onAssigned automa cally or manually as well
• Both the records can be queriedUnfortunately not in case of collec ons and their elements
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 14
Cassandra APICQLSH
• Interac ve command line shell• bin/cqlsh• Uses CQL (Cassandra Query Language)
Client drivers• Provided by the community• Available for various languages
Java, Python, Ruby, PHP, C++, Scala, Erlang, …
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 15
Query LanguageCQL = Cassandra Query Language
• Declara ve query languageInspired by SQL
• DDL statementsCREATE KEYSPACE – creates a new keyspaceCREATE TABLE – creates a new table…
• DML statementsSELECT – selects and projects rows from a single tableINSERT – inserts rows into a tableUPDATE – updates columns of rows in a tableDELETE – removes rows from a table…
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 16
KeyspacesCREATE KEYSPACE
CREATECREATE KEYSPACEKEYSPACE
IFIF NOTNOT EXISTSEXISTS
keyspace namekeyspace name
WITHWITH option nameoption name == option valueoption value
• Creates a new keyspace• Replica on op on is mandatory
SimpleStrategy (one replica on factor)NetworkTopologyStrategy(individual replica on factor for each data center)
CREATE KEYSPACE moviedbWITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 17
KeyspacesUSE
• Changes the current keyspace
USEUSE keyspace namekeyspace name
DROP KEYSPACE• Removes a keyspace, all its tables, data etc.
DROPDROP KEYSPACEKEYSPACE keyspace namekeyspace name
IFIF EXISTSEXISTS
ALTER KEYSPACE• Modifies op ons of an exis ng keyspace
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 18
TablesCREATE TABLE
• Creates a new table within the current keyspace• Each table must have exactly one primary key specified
CREATECREATE TABLETABLE
IFIF NOTNOT EXISTSEXISTS
table nametable name ((
column namecolumn name typetype
PRIMARYPRIMARY KEYKEY
,,
,, primary keyprimary key
))
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 19
TablesExamples – tables for actors and movies
CREATE TABLE actors (id TEXT PRIMARY KEY,name TUPLE<TEXT, TEXT>,year SMALLINT,movies SET<TEXT>
)
CREATE TABLE movies (id TEXT,title TEXT,director TUPLE<TEXT, TEXT>,year SMALLINT,actors MAP<TEXT, TEXT>,genres LIST<TEXT>,countries SET<TEXT>,PRIMARY KEY (id)
)
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 20
TablesPrimary key has two parts:
• Compulsory par on keySingle column or mul ple columnsDescribes how table rows are distributed among par ons
• Op onal clustering columnsDefines the clustering order,i.e. how table rows are locally stored within a par on
PRIMARYPRIMARY KEYKEY ((
column namecolumn name
(( column namecolumn name
,,
)),, column namecolumn name
))
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 21
TablesDROP TABLE
• Removes a table together with all data it contains
DROPDROP TABLETABLE table nametable name
IFIF EXISTSEXISTS
TRUNCATE TABLE• Preserves a table but removes all data it contains
TRUNCATETRUNCATE
TABLETABLE
table nametable name
ALTER TABLE• Allows to alter, add or drop table columns
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 22
TypesTypes of columns
• Na ve types• Tuples• Collec on types: lists, sets, andmaps• User-defined types
native typenative type
TUPLETUPLE << typetype
,,
>>
LISTLIST << typetype >>
SETSET << typetype >>
MAPMAP << typetype ,, typetype >>
......
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 23
TypesNa ve types
• tinyint, smallint, int, bigintSigned integers (1B, 2B, 4B, 8B)
• varintArbitrary-precision integer
• decimalVariable-precision decimal
• float, doubleFloa ng point numbers (4B, 8B)
• booleanBoolean values true and false
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 24
TypesNa ve types
• text, varcharUTF8 encoded stringEnclosed in single quotes (not double quotes)
– Escaping sequence: ''
• asciiASCII encoded string
• date, time, timestampDates, mes and mestampsE.g. '2016-12-05', '2016-12-05 09:15:00', 1480929300
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 25
TypesNa ve types
• counter – 8B signed integerOnly 2 opera ons supported: incremen ng and decremen ng
– I.e. value of a counter cannot be set to a par cular numberRestric ons in usage
– Counters cannot be a part of a primary key– Either all table columns (outside the primary key) are counters,
or none of them– TTL is not supported– …
• blob – arbitrary bytes• inet – IP address (both IPv4 and IPv6)• …
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 26
LiteralsTuple and collec on literals
• Literals for tuples, lists, sets, andmaps
((
termterm
,,
))
[[
termterm
,,
]]
{{
termterm
,,
}}
{{
termterm :: termterm
,,
}}
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 27
Selec onSELECT statement
• Selects matching rows from a single table
SELECT clauseSELECT clause FROM clauseFROM clause
WHERE clauseWHERE clause
GROUP BY clauseGROUP BY clause ORDER BY clauseORDER BY clause LIMIT clauseLIMIT clause
ALLOWALLOW FILTERINGFILTERING
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 28
Selec onClauses of SELECT statements
• SELECT – columns or values to appear in the result• FROM – single table to be queried• WHERE – filtering condi ons to be applied on table rows• GROUP BY – columns used for grouping of rows• ORDER BY – criteria defining the order of rows in the result• LIMIT – number of rows to be included in the result
ExampleSELECT id, title, actorsFROM moviesWHERE year = 2000 AND genres CONTAINS 'comedy'
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 29
Selec onFROM clause
• Defines a single table to be queriedFrom the current / specified keyspace
• I.e. joining of mul ple tables is not possible
FROMFROM
keyspace namekeyspace name ..
table nametable name
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 30
Selec onWHERE clause
• One or more rela ons a row must sa sfyin order to be included in the query result
WHEREWHERE relationrelation
ANDAND
• Only simple condi ons can be expressed andnot all rela ons are allowed, e.g.:
only primary key columns can be involvedunless secondary index structures existnon-equal rela ons on par on keys are not supported…
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 31
Selec onWHERE clause: rela ons
column namecolumn name
(( column namecolumn name
,,
))
==
!=!=
<<
<=<=
=>=>
>>
termterm
ININ (( termterm
,,
))
CONTAINSCONTAINS
CONTAINSCONTAINS KEYKEY
termterm
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 32
Selec onWHERE clause: rela ons
• Comparisons=, !=, <, <=, =>, >
• INReturns true if the actual value is one of the enumerated
• CONTAINSMay only be used on collec ons (lists, sets, and maps)Returns true if a collec on contains a given element
• CONTAINS KEYMay only be used on mapsReturns true is a map contains a given key
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 33
Selec onSELECT clause
• Defines columns or values to be included in the result* = all the table columnsAliases can be defined using AS
SELECTSELECT
DISTINCTDISTINCT
**
selectorselector
ASAS identifieridentifier
,,
• DISTINCT – duplicate rows are removed
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 34
Selec onSELECT clause: selectors
column namecolumn name
termterm
COUNTCOUNT (( ** ))
WRITETIMEWRITETIME (( column namecolumn name ))
TTLTTL (( column namecolumn name ))
• COUNT(*)Number of all the rows in a group (see aggrega on)
• WRITETIME and TTLSelects mestamp / remaining me-to-live of a given columnCannot be used on collec ons and their elementsCannot be used in other clauses (e.g. WHERE)
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 35
Selec onORDER BY clause
• Defines the order of rows returned in the query result• Only orderings induced by clustering columns are allowed!
ORDERORDER BYBY column namecolumn name
ASCASC
DESCDESC
,,
LIMIT clause• Limits the number of rows returned in the query result
LIMITLIMIT integerinteger
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 36
Selec onGROUP BY clause
• Groups rows of a table according to certain columns• Only groupings induced by primary key columns are allowed!
GROUPGROUP BYBY column namecolumn name
,,
• When a non-grouping column is selected without anaggregate func on, the first value encounter is alwaysreturned
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 37
Selec onGROUP BY clause: aggregates
• Na ve aggregatesCOUNT(column)
– Number of all the values in a given column– null values are ignored
MIN(column), MAX(column)– Minimal / maximal value in a given column
SUM(column)– Sum of all the values of a given column
AVG(column)– Average of all the values of a given column
• User-defined aggregates
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 38
Selec onALLOW FILTERINGmodifier
• By default, only non-filtering queries are allowedI.e. queries wherethe number of rows read∼ the number of rows returnedSuch queries have predictable performance
– They will execute in a me that is propor onalto the amount of data returned
• ALLOW FILTERING enables (some) filtering queries
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 39
Inser onsINSERT statement
• Inserts a new row into a given tableWhen a row with a given primary key already exists,it is updated
• At least primary key columns must be specified… and they have to always be explicitly enumerated
INSERTINSERT INTOINTO
keyspace namekeyspace name ..
table nametable name
(( column namecolumn name
,,
)) VALUESVALUES (( termterm
,,
))
IFIF NOTNOT EXISTSEXISTS USINGUSING update parametersupdate parameters
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 40
Inser onsExample
INSERT INTO movies (id, title, director, year, actors, genres)VALUES (
'stesti','Štěstí',('Bohdan', 'Sláma'),2005,{ 'vilhelmova': 'Monika', 'liska': 'Toník' },[ 'comedy', 'drama' ]
)USING TTL 86400
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 41
Inser ons and UpdatesUpdate parameters
• TTL: me-to-live0 or null or simply missing for persistent values
• TIMESTAMP: write me
TIMESTAMPTIMESTAMP
TTLTTL
integerinteger
ANDAND
• Only newly inserted / updated values are really affected
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 42
UpdatesUPDATE statement
• Updates exis ng rows within a given tableWhen a row with a given primary key does not yet exist,it is inserted
• All primary key columns must be specified in theWHERE clause
UPDATEUPDATE
keyspace namekeyspace name ..
table nametable name
USINGUSING update parametersupdate parameters
SETSET assignmentassignment
,,
WHERE clauseWHERE clause
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 43
UpdatesUPDATE statement: assignments
• Describe modifica ons to be applied• Allowed assignments:
Value of a whole column is replacedValue of a list or map element is replacedValue of an UDT field is replaced
column namecolumn name
column namecolumn name [[ termterm ]]
column namecolumn name .. field namefield name
== termterm
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 44
UpdatesExamples
UPDATE moviesSET
year = 2006,director = ('Jan', 'Svěrák'),actors = { 'machacek': 'Robert Landa', 'sverak': 'Josef Tkaloun' },genres = [ 'comedy' ],countries = { 'CZ' }
WHERE id = 'vratnelahve'
UPDATE moviesSET
actors = actors + { 'vilhelmova': 'Helenka' },genres = [ 'drama' ] + genres,countries = countries + { 'SK' }
WHERE id = 'vratnelahve'
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 45
UpdatesExamples
UPDATE moviesSET
actors = actors - { 'vilhelmova', 'landovsky' },genres = genres - [ 'drama', 'sci-fi' ],countries = countries - { 'SK' }
WHERE id = 'vratnelahve'
UPDATE moviesSET
actors['vilhelmova'] = 'Helenka',genres[1] = 'comedy'
WHERE id = 'vratnelahve'
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 46
Dele onsDELETE statement
• Removes exis ng rows / columns / collec on elementsfrom a given table
DELETEDELETE
column namecolumn name
column namecolumn name [[ termterm ]]
column namecolumn name .. field namefield name
,,
FROM clauseFROM clause WHERE clauseWHERE clause
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 47
Lecture ConclusionCassandra
• Column-family storeCassandra query language
• DDL statements• DML statements
SELECT, INSERT, UPDATE, DELETE
B4M36DS2: Database Systems 2 | Lecture 10: Column-Family Stores: Cassandra | 5. 12. 2016 48