Download - Big Data MDX with Mondrian and Apache Kylin
Sébastien JelschLondon, 7-11-2015
Big Data MDX with Mondrian and Apache Kylin
1Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
1
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Big Data
Situation▪ More and more data becoming available on Hadoop▪ Limitations in existing Business Intelligence Tools
○ Limited support for Hadoop○ Data size growing exponentially○ High latency of interactive queries
▪ Challenges to adapt Hadoop for interactive analysis○ OLAP capability on Hadoop ecosystem not ready yet
2
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
OLAP and Big Data
Goals▪ Full OLAP capability and advanced functionality▪ Interactive analysis in subseconds▪ ANSI SQL or MDX for analysts and engineers▪ Seamless integration with BI Tools▪ High concurrency with thousands of end users▪ Distributed and scale out architecture for large data volume
3
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
What is Apache Kylin?
Solution:Apache Kylin
Extreme OLAP Engine for Big Data▪ Distributed Analytics Engine from eBay▪ OLAP-on-Hadoop▪ Provides SQL interface for multidimensional analysis▪ Based on Hadoop ecosystem
Open Source on: 1. October 2014Accepted into incubation: 25. November 2014Current version: 1.1 (25. October 2015)
4
OLAP Cube
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Short introduction into OLAP
5
8 7 14
12 22 19
30 15 25Beer
Water
WineBerlin
Paris
London
20132014
2015
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Architecture
6
3rd Party App Web App BI Tools
REST Server
Query Engine
Routing
OLAPCube(HBase)OLAPCube(HBase)Metadata
Cube Build Engine
HiveHDFS
Star Schema Data Key Value Data
Mid Latency Low Latency
SQLSQL JDBC / ODBC
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
7
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
8
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
9
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
10
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
11
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
12
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Monitoring
13
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: SQL Interface
14
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
15
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin and MDX
SQL returns 2-dimensional result setFor more dimensions SQL was not designed
Wish:▪ Multidimensional result set▪ Consider hierarchies and levels in the data
16
Query Language: MDX
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Pentahos Mondrian
Mondrian▪ OLAP Engine▪ Transforms MDX queries into SQL▪ Multidimensional representation of data▪ Integrated into Saiku / Pentahos Business Analytics Platform
▪ Expandable through SQL dialectse.g. MySQL, Postgres, Hive, Impala, ...
17
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Idea
18
OLAP Client
Apache KylinHBase, Cuboids ...
MondrianMondrian Schema
MeasuresDimensionsHierarchiesLevelsAttributes
XMLMDX
JDBC
Kylin Dialect
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Implementation
Work done:▪ Kylin dialect created▪ Optimized Kylins JDBC driver▪ Bugs fixed to get Mondrian working with Kylin
TBD:▪ Integrate Kylin dialect into Mondrians official code*▪ Make every MDX query executable
Successful tests**:▪ Current Saiku and Mondrian 4.4▪ Current Saiku and Mondrian 3.x (not tested very well)* Pull Request: https://github.com/pentaho/mondrian/pull/480** Github Project: https://github.com/mustangore/kylin-mondrian-interaction
19
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Examples
20
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Examples
21
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
22
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Conclusion
▪ Extremely fast and scalable OLAP Engine▪ OLAP-on-Hadoop▪ Depends on Apache Hadoop infrastructure▪ MOLAP Cube▪ Incremental refresh of cubes▪ Integration into existing BI Tools▪ MDX queries with Mondrian possible (ongoing work)
23
Contact
Sébastien JelschBig Data Scientist
inovex GmbHOffice KarlsruheLudwig-Erhard-Allee 676131 Karlsruhe
Tel: +49 176 - 45786280E-Mail: [email protected]: @inovexgmbh | @Mustangore
Thank you for your attention
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Introduction into OLAP
B1
1,1,1,0 1,1,0,1
1,1,1,1
1,0,1,1 0,1,1,1
0,1,1,0
1,0,0,0 0,1,0,0 0,0,1,0 0,0,0,1
0,0,0,0
0,0,1,10,1,0,11,0,0,11,0,1,01,1,0,0
Cube: All combinationsCuboid: One single combination
Number cuboids growing exponentially
0-Cuboid
N-Cuboid
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Aggregation Groups
Problem: Number of Cuboids grows exponentiallyExample:Cube with 30 dimensionsNumber of Cuboids: 2³º > 1 billion
Solution: Partial CubeClassificate the OLAP Cube in Aggregation GroupsExample:30 dimensions splitted into 3 groups of 10 dimensionsNumber of Cuboids: 2¹º + 2¹º + 2¹º = 3072 << 1 billion
B2
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Build Process
B3
SourceHive Tables
HiveQL
DimensionDictionaries
Intermediate Hive Table
HiveQL MapReduce
HDFSSequenceFiles
N-Cuboid
(1)
(2) (3)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Build Process
B4
MapReduce
N-Cuboid
HDFSSequenceFiles
N-1-Cuboid
HDFSSequenceFiles
0-Cuboid
HDFSSequenceFiles
... MapReduce
MapReduce
HFiles
HBase
Bulk Import