learn to use stratio crossdata

64
Let’s think about how we approach projects

Upload: alvaro-agea-herradon

Post on 02-Jul-2015

344 views

Category:

Software


0 download

DESCRIPTION

This is achieved thanks to its generic architecture and the definition of a custom SQL-like language. Our language augments the classical SQL data manipulation language in order to add support for streaming queries. From the point of view of the user, a common logical view of the existing catalogs and datastores is presented independently of which cluster or technology stores a particular table. Supporting multiple architectures imposes two main challenges: how to normalize the access to the datastores, and how to cope with datastore limitations. In order to be able to access multiple datastore technologies Crossdata defines a common unifying interface containing a basic set of operations that a datastore may support. New connectors can be easily added to Crossdata to increase its connectivity

TRANSCRIPT

Page 1: Learn to use Stratio Crossdata

Let’s think about how we approach projects

Page 2: Learn to use Stratio Crossdata
Page 3: Learn to use Stratio Crossdata

3  

Page 4: Learn to use Stratio Crossdata

4  

Page 5: Learn to use Stratio Crossdata

5  

Page 6: Learn to use Stratio Crossdata

6  

Page 7: Learn to use Stratio Crossdata

7  

Page 8: Learn to use Stratio Crossdata

8  

Page 9: Learn to use Stratio Crossdata

9  

Page 10: Learn to use Stratio Crossdata

10  

Page 11: Learn to use Stratio Crossdata

Don’t worry

Page 12: Learn to use Stratio Crossdata

12  

Page 13: Learn to use Stratio Crossdata

o  A  new  technology  that: •  Is  not  limited  by  the  underlying  datastore  capabili9es

•  Leverages  Spark  to  perform  non-­‐na9vely  supported  opera9ons

•  Supports  batch  and  streaming  queries

•  Supports  mul9ple  clusters  and  technologies

Crossdata

13  #BDS14

Page 14: Learn to use Stratio Crossdata

Learn to use Stratio Crossdata Developing your first connector

14  

Daniel Higuero [email protected] @dhiguero

Alvaro Agea [email protected] @alvaroagea

#BDS14

Page 15: Learn to use Stratio Crossdata

•  Crossdata architecture •  Crossdata Connectors •  Contest

Agenda

15  #BDS14

Page 16: Learn to use Stratio Crossdata

Our architecture

16  #BDS14

Page 17: Learn to use Stratio Crossdata

o  Crossdata  defines  an  IConnector  extension  interface o  User  can  easily  add  new  connectors  to  support •  Different  datastores •  Different  processing  engines •  Different  versions

o  Where  each  connector  defines  its  capabili9es

Connecting to the outside world

17  

Our planner will choose the best connector for each query

#BDS14

Page 18: Learn to use Stratio Crossdata

Query execution

18  

Parsing   Valida6on   Planning   Execu6on  datastore  

Connector1  

Connector2  

Connector3  

Our planner will choose the best connector for each query

#BDS14

Page 19: Learn to use Stratio Crossdata

o  Stra9o  Crossdata  offers  the  possibility  of  accessing  a  single  catalog  across  a  set  of  datastores.  

•  Mul9ple  clusters  can  coexist  to  op9mize  plaOorm  performance

§  E.g.,  produc9on  cluster,  test  cluster,  write-­‐op9mized  cluster,  read-­‐op9mized  cluster,  etc.  

•  A  table  is  saved  in  a  unique  datastore

Multi-cluster support

19  #BDS14

Page 20: Learn to use Stratio Crossdata

Logical and physical mapping

20  

C*  Produc6on   M  development   Other  datastores  

App  catalog  

Users  table   Test  table   old_users  table  

SELECT  *  FROM  app.users;  

#BDS14

Page 21: Learn to use Stratio Crossdata

Metadata Management

21  

Page 22: Learn to use Stratio Crossdata

o  Some  datastores  are  schemaless  but  our  applica9ons  are  not!  

•  Flexible  schemas  vs  Schemaless

•  Crossdata  provides  a  Metadata  manager  that  stores  schemas  for  any  datasource

§  Remember  ODBC  and  those  BI  tools

Metadata in the era of Schemaless NoSQL datastores

?    

101001010101010101101010101111010001111001000  

22  #BDS14

Page 23: Learn to use Stratio Crossdata

Metadata management

23  

C*  produc6on  

Connector  

Infinispan  

Metadata  Store  

Metadata  Manager  

2  

Updated  metadata  informa6on  is  

maintained  among  Crossdata  servers  using  Infinispan  

If  the  connector  does  not  support  metadata  opera6ons  those  are  

skipped   2  1  

#BDS14

Page 24: Learn to use Stratio Crossdata

ODBC/JDBC

24  

Page 25: Learn to use Stratio Crossdata

o  Well-­‐known  interface  standard  (for  BI  tools,  external  apps,  …)

o  We  have  implemented  it  using  Simba  SDK

o  It  opens  the  full  poten9al  of  Stra9o  Crossdata  to  the  external  world

o  Currently  tested  with  Tableau,  Qlikview  and  MS  Excel

Stratio Crossdata ODBC/JDBC

25  

One ODBC/JDBC for all datastores!

#BDS14

Page 26: Learn to use Stratio Crossdata

Connectors

26  

Page 27: Learn to use Stratio Crossdata

o  The  Crossdata  core  is  abstracted  of  the  inner  workings  of  each  Connectors.

•  Common  IConnector  interface.

•  Use  of  XML  manifest  to  define  datastore  and  connector  capabilies

o  Each  connector  may  access  different  datastores

o  Each  connector  supports  many  clusters  of  the  same  datastore  technology

Crossdata Connectors

27  #BDS14

Page 28: Learn to use Stratio Crossdata

Crossdata connectors

28  #BDS14

Connector0  

Datastore  0  

Crossdata  Server  

Connector1   Connector2  

Datastore  1   Datastore  2  

ConnectorApp   ConnectorApp   ConnectorApp  

Page 29: Learn to use Stratio Crossdata

o  The  ConnectorApp  abstracts  the  communica9on  with  the  server  using  AKKA.

•  No  worries  about  transferring  data,  status,  etc.

•  It  takes  an  implementa9on  of  IConnector  and  launches  the  required  actors

Connector interface

29  #BDS14

IConnectorImpl  

Datastore  

ConnectorApp  

IConnector  

Page 30: Learn to use Stratio Crossdata

IConnector

30  #BDS14

public  interface  IConnector  {            /**            *  Get  the  name  of  the  connector.            *  @return  A  name.            */  

       String  getConnectorName();            /**            *  Get  the  names  of  the  datastores  supported  by  the  connector.            *  Several  connectors  may  declare  the  same  datastore  name.            *  @return  The  names.            */  

       String[]  getDatastoreName();  

Page 31: Learn to use Stratio Crossdata

IConnector (II)

31  #BDS14

       /**            *  Initialize  the  connector  service.            *  @param  configuration  The  configuration.            *  @throws  InitializationException  If  the  connector  initialization  fails.            */          void  init(IConfiguration  configuration)              throws  InitializationException;            /**            *  Connect  to  a  datastore  using  a  set  of  options.            *  @param  credentials  The  required  credentials            *  @param  config            The  cluster  configuration.            *  @throws  ConnectionException  If  the  connection  could  not  be  established.            */          void  connect(ICredentials  credentials,  

   ConnectorClusterConfig  config)  throws  ConnectionException;  

Page 32: Learn to use Stratio Crossdata

IConnector (III)

32  #BDS14

       /**            *  Get  the  storage  engine.            *  ...            */          IStorageEngine  getStorageEngine()  throws  UnsupportedException;            /**            *  Get  the  query  engine.            *  ...            */          IQueryEngine  getQueryEngine()  throws  UnsupportedException;            /**            *  Get  the  metadata  engine  ...            */          IMetadataEngine  getMetadataEngine()  throws  UnsupportedException;  

Page 33: Learn to use Stratio Crossdata

o  Defines  opera9ons  related  with  the  metadata  management

IMetadataEngine

33  #BDS14

createCatalog(ClusterName,  CatalogMetadata)      dropCatalog(ClusterName,  CatalogName)      createTable(ClusterName,  TableMetadata)        alterTable(ClusterName,  TableName,  AlterOptions)          dropTable(ClusterName,  TableName)      createIndex(ClusterName,  IndexMetadata)          dropIndex(ClusterName,  IndexName)  

Page 34: Learn to use Stratio Crossdata

o  Defines  opera9ons  related  with  wri9ng  data

IStorageEngine

34  #BDS14

   insert(ClusterName,  TableMetadata,  Row)      insert(ClusterName,  TableMetadata,  Collection<Row>)      delete(ClusterName,  TableName,  Collection<Filter>)      update(ClusterName,  TableName,                          Collection<Relation>,  Collection<Filter>)  truncate(ClusterName,  TableName)        

Page 35: Learn to use Stratio Crossdata

o  Defines  opera9ons  related  to  querying  data

IQueryEngine

35  #BDS14

             execute(LogicalWorkflow)      asyncExecute(String,  LogicalWorkflow,  IResultHandler)                      stop(String)            

Page 36: Learn to use Stratio Crossdata

o  Graph  representa9on  of  a  query,  composed  of  two  types  of  logical  steps

§  Transforma9on  step:  one  input,  one  output

§  Union  step:  n  inputs,  one  output

Logical workflows

36  #BDS14

Page 37: Learn to use Stratio Crossdata

o  Consider  the  following  query

Building a Logical Workflow

37  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

Parsing   Valida6on   Planning  

Page 38: Learn to use Stratio Crossdata

Building a Logical Workflow - Project

38  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

Iden9fy  tables  and  required  fields

Page 39: Learn to use Stratio Crossdata

Building a Logical Workflow - Project

39  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

For  each  table,  retrieve  all  columns  that  are  involved  in  the  query

Page 40: Learn to use Stratio Crossdata

Building a Logical Workflow - Project

40  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

For  each  table,  retrieve  all  columns  that  are  involved  in  the  query

Page 41: Learn to use Stratio Crossdata

Building a Logical Workflow - Project

41  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

Project  TWEET(id,  user,  hashtag)  

Build  a  Project  logical  step  per  table

Page 42: Learn to use Stratio Crossdata

Building a Logical Workflow - Project

42  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

Project  TWEET(id,  user,  hashtag)  

Page 43: Learn to use Stratio Crossdata

Building a Logical Workflow - Project

43  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

Project  TWEET(id,  user,  hashtag)  

Project  MENTIONS(user,  counter)  

Page 44: Learn to use Stratio Crossdata

Building a Logical Workflow - Filters

44  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET(id,  user,  hashtag)  

P  MENTIONS(user,  counter)  

Next,  we  add  filtering  steps  ASAP

Page 45: Learn to use Stratio Crossdata

Building a Logical Workflow - Filters

45  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET(id,  user,  hashtag)  

P  MENTIONS(user,  counter)  

Filter  (hashtag  =  ‘bds14’)  

Page 46: Learn to use Stratio Crossdata

Building a Logical Workflow - Filters

46  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET(id,  user,  hashtag)  

P  MENTIONS(user,  counter)   Filter  (counter  >  100)  

Filter  (hashtag  =  ‘bds14’)  

Page 47: Learn to use Stratio Crossdata

Building a Logical Workflow - Window

47  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter  (hashtag  =  ‘bds14’)  

Page 48: Learn to use Stratio Crossdata

Building a Logical Workflow - Window

48  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter  (hashtag  =  ‘bds14’)   Window  (2  min)  

Page 49: Learn to use Stratio Crossdata

Building a Logical Workflow - Join

49  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter   Window  (2  m)  

Page 50: Learn to use Stratio Crossdata

Building a Logical Workflow - Join

50  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter   Window  (2  m)   Join  m.user  =  t.user  

Page 51: Learn to use Stratio Crossdata

Building a Logical Workflow – Order By

51  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter   Window  (2  m)  Join  

Page 52: Learn to use Stratio Crossdata

Building a Logical Workflow – Order By

52  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter   Window  (2  m)  Join   GroupBy  

(m.counter)  

Page 53: Learn to use Stratio Crossdata

Building a Logical Workflow – Limit

53  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter   Window  (2  m)  Join   GroupBy  

(m.counter)  

Page 54: Learn to use Stratio Crossdata

Building a Logical Workflow – Limit

54  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  (counter  >  100)  

Filter   Window  (2  m)  Join   GroupBy  

(m.counter)  Limit  100  

Page 55: Learn to use Stratio Crossdata

Building a Logical Workflow – Select

55  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  

Filter   Window  Join   GroupBy   Limit  

100  

Page 56: Learn to use Stratio Crossdata

Building a Logical Workflow – Select

56  #BDS14

SELECT  tweet.id,  tweet.user  FROM  transactions      WITH  WINDOW  2  minutes      JOIN  mentions  ON  mentions.user  =  tweet.user      WHERE  mentions.counter  >  100  AND  tweet.hashtag  =  ‘#bds14’      ORDER  BY  mentions.counter      LIMIT  100                          

P  TWEET  

P  MENTIONS   Filter  

Filter   Window  Join   GroupBy   Limit  

100  Select  id,  user  

IQueryEngine.execute()

Page 57: Learn to use Stratio Crossdata

o  Na9ve •  Cassandra •  MongoDB

•  Aerospike •  Elas9cSearch •  Stra9o  Streaming

o                         based  

•  Cassandra •  MongoDB

•  Aerospike •  HDFS

Existing connectors

57  #BDS14

Page 58: Learn to use Stratio Crossdata

Demo - IRC

58  

Page 59: Learn to use Stratio Crossdata

Crossdata Connector Challenge

59  

Page 60: Learn to use Stratio Crossdata

Crossdata Connector Challenge

60  #BDS14

Deadline: February, 2nd 2015

Page 61: Learn to use Stratio Crossdata

Crossdata Connector Challenge

61  #BDS14

Page 62: Learn to use Stratio Crossdata

Crossdata Connector Challenge

62  #BDS14

Page 63: Learn to use Stratio Crossdata

stra9o.github.io/crossdata stra9o.github.io/crossdata/contest

crossdata.atlassian.net

More information

63  #BDS14

Page 64: Learn to use Stratio Crossdata

Learn to use Stratio Crossdata Developing your first connector

64  

Daniel Higuero [email protected] @dhiguero

Alvaro Agea [email protected] @alvaroagea

#BDS14