future of data paris - bi and big data

BI and Big Data: practical experience

Future Of Data

Mathias KlubaArchitecte Technique Data ManagementSGCIB

Busines Intelligence (BI) et

Big Data ?

Staging / ODSSource data

Data Sources

EL

DatawarehouseSingle aggregated view

ETL ETL

DataMartsSpecific viewsData Quality - Data Cleansing Data Quality – Completeness

Ad Hoc QueryReporting

Data Quality – Data Accuracy


ReportingData Mining


Data Sources

ELDatawarehouseSingle aggregated view

ETL ETL

DataMartsSpecific views

Big Data – BI Offloading




Data Sources


ETL ETL




http://hortonworks.com/blog/how-pioneering-banks-adopt-hadoop-for-enterprise-data-management/

ODBC Driver compatibility ?

Hive QL specificity ?

Kerberos ?

Low Latency ? Indexes ?

Spark SQL- Pas d’intégration Ranger- Pas réussi à l’intégrer à Knox- Compatibilité Kerberos+Tableau impossible

Phoenix- Compatible Microstrategy- Pas d’impersonation/authentification sur le Query Server- Pas de support des HBase Namespaces*

*en 4.7.0 avec HDP 2.5, support depuis la 4.8.0

Solr- Compatible Microstrategy mais pas Tableau

Hawq- Protocol/SQL de PostgreSQL… mais pas tout à fait

Drill- Peut-être la bonne solution, pas assez de tests…

https://github.com/airbnb/superset https://github.com/Quantiply/grafana-plugins/tree/master/features/druid

- Pas de SQL!! Utilisable uniquement avec les IHMs Web - Pas de sécurité !!! Difficile de faire du Multi-tenant- Uniquement time-series- Certaines fonctions d’agrégations difficile à implémenter

- Performance du « orienté colonne » !- Scalabilité- Utilise HDFS pour le stockage historique- Ingestion temps réel depuis Kafka

- L’IHM… c’est un bon début…- Bug sur l’authentification LDAP- Bug sur la gestion des Namespaces HBase- Configuration du Cube parfois complexe- Tableau fonctionne bien, pas Microstrategy- Driver ODBC uniquement Windows- Pas trop multi-tenant

- Scalabilité, supporte de gros volume de cube- SQL!! Et API REST!- Facile à installer: utilise les composants de la stack Hadoop


Data Sources


ETL ETL


- SSAS Tabular Models: limité à la RAM - Temps de chargement de Hadoop vers le cube

- Très bonnes performances avec l’orienté colonne in-memory- Technologie mature- Compatibilité Excel / PowerBI- Fonctions d’agrégations complexes- Modèle de visibilité puissant

future of data paris - bi and big data

Data & Analytics