learn to use stratio crossdata
DESCRIPTION
This is achieved thanks to its generic architecture and the definition of a custom SQL-like language. Our language augments the classical SQL data manipulation language in order to add support for streaming queries. From the point of view of the user, a common logical view of the existing catalogs and datastores is presented independently of which cluster or technology stores a particular table. Supporting multiple architectures imposes two main challenges: how to normalize the access to the datastores, and how to cope with datastore limitations. In order to be able to access multiple datastore technologies Crossdata defines a common unifying interface containing a basic set of operations that a datastore may support. New connectors can be easily added to Crossdata to increase its connectivityTRANSCRIPT
Let’s think about how we approach projects
3
4
5
6
7
8
9
10
Don’t worry
12
o A new technology that: • Is not limited by the underlying datastore capabili9es
• Leverages Spark to perform non-‐na9vely supported opera9ons
• Supports batch and streaming queries
• Supports mul9ple clusters and technologies
Crossdata
13 #BDS14
Learn to use Stratio Crossdata Developing your first connector
14
Daniel Higuero [email protected] @dhiguero
Alvaro Agea [email protected] @alvaroagea
#BDS14
• Crossdata architecture • Crossdata Connectors • Contest
Agenda
15 #BDS14
Our architecture
16 #BDS14
o Crossdata defines an IConnector extension interface o User can easily add new connectors to support • Different datastores • Different processing engines • Different versions
o Where each connector defines its capabili9es
Connecting to the outside world
17
Our planner will choose the best connector for each query
#BDS14
Query execution
18
Parsing Valida6on Planning Execu6on datastore
Connector1
Connector2
Connector3
Our planner will choose the best connector for each query
#BDS14
o Stra9o Crossdata offers the possibility of accessing a single catalog across a set of datastores.
• Mul9ple clusters can coexist to op9mize plaOorm performance
§ E.g., produc9on cluster, test cluster, write-‐op9mized cluster, read-‐op9mized cluster, etc.
• A table is saved in a unique datastore
Multi-cluster support
19 #BDS14
Logical and physical mapping
20
C* Produc6on M development Other datastores
App catalog
Users table Test table old_users table
SELECT * FROM app.users;
#BDS14
Metadata Management
21
o Some datastores are schemaless but our applica9ons are not!
• Flexible schemas vs Schemaless
• Crossdata provides a Metadata manager that stores schemas for any datasource
§ Remember ODBC and those BI tools
Metadata in the era of Schemaless NoSQL datastores
?
101001010101010101101010101111010001111001000
22 #BDS14
Metadata management
23
C* produc6on
Connector
Infinispan
Metadata Store
Metadata Manager
2
Updated metadata informa6on is
maintained among Crossdata servers using Infinispan
If the connector does not support metadata opera6ons those are
skipped 2 1
#BDS14
ODBC/JDBC
24
o Well-‐known interface standard (for BI tools, external apps, …)
o We have implemented it using Simba SDK
o It opens the full poten9al of Stra9o Crossdata to the external world
o Currently tested with Tableau, Qlikview and MS Excel
Stratio Crossdata ODBC/JDBC
25
One ODBC/JDBC for all datastores!
#BDS14
Connectors
26
o The Crossdata core is abstracted of the inner workings of each Connectors.
• Common IConnector interface.
• Use of XML manifest to define datastore and connector capabilies
o Each connector may access different datastores
o Each connector supports many clusters of the same datastore technology
Crossdata Connectors
27 #BDS14
Crossdata connectors
28 #BDS14
Connector0
Datastore 0
Crossdata Server
Connector1 Connector2
Datastore 1 Datastore 2
ConnectorApp ConnectorApp ConnectorApp
o The ConnectorApp abstracts the communica9on with the server using AKKA.
• No worries about transferring data, status, etc.
• It takes an implementa9on of IConnector and launches the required actors
Connector interface
29 #BDS14
IConnectorImpl
Datastore
ConnectorApp
IConnector
IConnector
30 #BDS14
public interface IConnector { /** * Get the name of the connector. * @return A name. */
String getConnectorName(); /** * Get the names of the datastores supported by the connector. * Several connectors may declare the same datastore name. * @return The names. */
String[] getDatastoreName();
IConnector (II)
31 #BDS14
/** * Initialize the connector service. * @param configuration The configuration. * @throws InitializationException If the connector initialization fails. */ void init(IConfiguration configuration) throws InitializationException; /** * Connect to a datastore using a set of options. * @param credentials The required credentials * @param config The cluster configuration. * @throws ConnectionException If the connection could not be established. */ void connect(ICredentials credentials,
ConnectorClusterConfig config) throws ConnectionException;
IConnector (III)
32 #BDS14
/** * Get the storage engine. * ... */ IStorageEngine getStorageEngine() throws UnsupportedException; /** * Get the query engine. * ... */ IQueryEngine getQueryEngine() throws UnsupportedException; /** * Get the metadata engine ... */ IMetadataEngine getMetadataEngine() throws UnsupportedException;
o Defines opera9ons related with the metadata management
IMetadataEngine
33 #BDS14
createCatalog(ClusterName, CatalogMetadata) dropCatalog(ClusterName, CatalogName) createTable(ClusterName, TableMetadata) alterTable(ClusterName, TableName, AlterOptions) dropTable(ClusterName, TableName) createIndex(ClusterName, IndexMetadata) dropIndex(ClusterName, IndexName)
o Defines opera9ons related with wri9ng data
IStorageEngine
34 #BDS14
insert(ClusterName, TableMetadata, Row) insert(ClusterName, TableMetadata, Collection<Row>) delete(ClusterName, TableName, Collection<Filter>) update(ClusterName, TableName, Collection<Relation>, Collection<Filter>) truncate(ClusterName, TableName)
o Defines opera9ons related to querying data
IQueryEngine
35 #BDS14
execute(LogicalWorkflow) asyncExecute(String, LogicalWorkflow, IResultHandler) stop(String)
o Graph representa9on of a query, composed of two types of logical steps
§ Transforma9on step: one input, one output
§ Union step: n inputs, one output
Logical workflows
36 #BDS14
o Consider the following query
Building a Logical Workflow
37 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
Parsing Valida6on Planning
Building a Logical Workflow - Project
38 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
Iden9fy tables and required fields
Building a Logical Workflow - Project
39 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
For each table, retrieve all columns that are involved in the query
Building a Logical Workflow - Project
40 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
For each table, retrieve all columns that are involved in the query
Building a Logical Workflow - Project
41 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
Project TWEET(id, user, hashtag)
Build a Project logical step per table
Building a Logical Workflow - Project
42 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
Project TWEET(id, user, hashtag)
Building a Logical Workflow - Project
43 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
Project TWEET(id, user, hashtag)
Project MENTIONS(user, counter)
Building a Logical Workflow - Filters
44 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET(id, user, hashtag)
P MENTIONS(user, counter)
Next, we add filtering steps ASAP
Building a Logical Workflow - Filters
45 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET(id, user, hashtag)
P MENTIONS(user, counter)
Filter (hashtag = ‘bds14’)
Building a Logical Workflow - Filters
46 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET(id, user, hashtag)
P MENTIONS(user, counter) Filter (counter > 100)
Filter (hashtag = ‘bds14’)
Building a Logical Workflow - Window
47 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter (hashtag = ‘bds14’)
Building a Logical Workflow - Window
48 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter (hashtag = ‘bds14’) Window (2 min)
Building a Logical Workflow - Join
49 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter Window (2 m)
Building a Logical Workflow - Join
50 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter Window (2 m) Join m.user = t.user
Building a Logical Workflow – Order By
51 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter Window (2 m) Join
Building a Logical Workflow – Order By
52 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter Window (2 m) Join GroupBy
(m.counter)
Building a Logical Workflow – Limit
53 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter Window (2 m) Join GroupBy
(m.counter)
Building a Logical Workflow – Limit
54 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter (counter > 100)
Filter Window (2 m) Join GroupBy
(m.counter) Limit 100
Building a Logical Workflow – Select
55 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter
Filter Window Join GroupBy Limit
100
Building a Logical Workflow – Select
56 #BDS14
SELECT tweet.id, tweet.user FROM transactions WITH WINDOW 2 minutes JOIN mentions ON mentions.user = tweet.user WHERE mentions.counter > 100 AND tweet.hashtag = ‘#bds14’ ORDER BY mentions.counter LIMIT 100
P TWEET
P MENTIONS Filter
Filter Window Join GroupBy Limit
100 Select id, user
IQueryEngine.execute()
o Na9ve • Cassandra • MongoDB
• Aerospike • Elas9cSearch • Stra9o Streaming
o based
• Cassandra • MongoDB
• Aerospike • HDFS
Existing connectors
57 #BDS14
Demo - IRC
58
Crossdata Connector Challenge
59
Crossdata Connector Challenge
60 #BDS14
Deadline: February, 2nd 2015
Crossdata Connector Challenge
61 #BDS14
Crossdata Connector Challenge
62 #BDS14
stra9o.github.io/crossdata stra9o.github.io/crossdata/contest
crossdata.atlassian.net
More information
63 #BDS14
Learn to use Stratio Crossdata Developing your first connector
64
Daniel Higuero [email protected] @dhiguero
Alvaro Agea [email protected] @alvaroagea
#BDS14