cbd-pentaho big data - meetup

17
© OpenBI, LLC 2010 1 Pentaho Big Data Overview & Demo Presented to Chicago Big Data Meetup April 19, 2012

Upload: others

Post on 11-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

© OpenBI, LLC 2010 1�

Pentaho Big Data Overview & Demo��Presented to Chicago Big Data Meetup��April 19, 2012��

© OpenBI, LLC 2010

Quick Quiz �

2�

When it comes to big data programming which person are you? �

A � B� C �

© OpenBI, LLC 2010 3�

About OpenBI �

• OpenBI is a premier professional services firm that provides affordable, best-in-class Business Intelligence, Analytics and Big Data solutions �–  Performance Management-driven, directing BI

investments to evidence-based business results �–  Cost-Effective, leveraging open-source/low-cost

technologies with high-value, quick initial-result engagements �

–  Reliable, created by industry experts and using repeatable best practices and templates �

© OpenBI, LLC 2010

About Pentaho �

•  Leader in business analytics & data integration�•  Subscription-based business model�•  Achieved critical mass: �

–  Over 1,200 commercial customers �–  Over 10,000 production deployments �–  Over 185 countries �

•  Stewardship of most important open source analytics projects �

4�

INDUSTRY RECOGNITION OVER 160 PARTNERS GLOBALLY

© OpenBI, LLC 2010

Comprehensive BI Platform�

CENTRAL ADMINISTRATION, AUDITING & MONITORING

DELIVER When & Where Users Need It

STREAMLINE Information Delivery

VISUALIZE & Report Information In Any Style

ACCESS All Enterprise Data Sources

ISV & Packaged Applications

SaaS / Cloud Applications

EMBEDDED

Web

Mobile

Print

E-Mail

STANDALONE

‣  Advanced & Predictive Analytics

DATA MINING

‣  Interactive ‣  Operational

‣  Enterprise

REPORTING

‣  Ad hoc Exploration ‣  Multi-Dimensional

ANALYSIS

‣  Interactive Metrics ‣  Rich Visualizations

DASHBOARDS

ERP / CRM / Enterprise Apps (e.g. SAP, Oracle)

Hadoop & NoSQL Data

Unstructured & semi-structured (Logs, social, machine���generated)

Relational Data Sources

Cloud (e.g. Salesforce, Amazon, Dell)

‣  Direct Access

‣ Data Integration

‣  Hadoop Clustering

‣  Graphical ETL Designer

‣  Enterprise Scalability

INTEGRATE, CLEANSE, & ENRICH DATA

‣  In Memory Caching

‣  High Performance

‣  Relational OLAP Cubes

METADATA LAYER

5�

© OpenBI, LLC 2010

Explosion of Big Data Solutions �

6�

•  Analytic Databases �–  e.g. Vertica, Vectorwise, Teradata/

Aster Data, Netezza, Infobright, etc.�• Columnar, MPP, in-memory, DW

appliances, OLAP databases �

•  Hadoop�–  Apache, Cloudera, EMC Greenplum HD,

Hadapt, HortonWorks, MapR �

•  NoSQL �–  e.g. HBase, MongoDB, Cassandra, etc.�

© OpenBI, LLC 2010

The Big Data Challenge �

7�

© OpenBI, LLC 2010

•  Disconnected –  Not easily connected to data sources

•  Extremely technical –  Requires highly technical resources –  Barrier to entry is high –  Long development and deployment cycles

•  No “whole product” –  Lacks consistent management, systems tools –  Lacks data integration & orchestration tools

•  Performance on moving data is imperative

•  Not optimized for BI –  Not a database –  High latency –  Limited SQL access

8�

Hadoop and NoSQL Challenges �

© OpenBI, LLC 2010

A Big Data Solution�

9�

© OpenBI, LLC 2010 10�

Big

Data

Mgm

t��

Data Integration�Job Orchestration�Workflow �

Scheduling �High Performance �Visual IDE �

Data

Inte

grat

ion �

�An

alyt

ics�

Pentaho in the Big Data Ecosystem�

Pentaho Business Analytics �

•  R •  3rd Party BI Tools •  Applications

3rd Party Tools �

Kettle (Pentaho Data Integration) �

Analytic Databases �NoSQL Databases �Hadoop�Java MapReduce, Pig, Pentaho MapReduce �

© OpenBI, LLC 2010

Pentaho as a Hadoop Client �

•  HDFS Integration�–  Read/Write Files �–  Copy/Move/Delete �–  Folder Management �

•  Java MR Execution�•  Pig Script Execution�•  Hive �

–  Queries �–  Scripts �

11�

Hadoop

PDI

© OpenBI, LLC 2010

Pentaho MapReduce �

12�

Pentaho Kettle engine executing in cluster �

•  Pentaho MapReduce �–  PDI Transforms as Map/Combine/

Reduce programs �•  No Node-specific installation�

–  Distributed Cache used to distribute Kettle jar files�

•  Visual MR programming and debugging IDE �

Hadoop

PDI

© OpenBI, LLC 2010 13�

Graphical Interfaces to NoSQL Databases �

•  Read/Write to �–  HBase �–  Cassandra�–  MongoDB�–  Others to come… �

•  NoSQL to RDBMS�•  NoSQL to Report � PDI

NoSQL �

Reports �

© OpenBI, LLC 2010 14�

•  Integrates classic ETL & Big Data processes.�–  Scheduling �–  Events �–  Dependencies�–  Error Handling �

Graphical Job Orchestration�

© OpenBI, LLC 2010

Pentaho Big Data Value Proposition�

15�

Integrated

Accessible

Productive

•  One Data Programming Platform�–  Big Data & Traditional ETL�

•  Connect Anything to Anything �–  No more islands�

•  MapReduce for Data Programmers �–  No/Minimal Java Required�

•  Increase Utilization of Your Hadoop Investment �

•  Graphical, Integrated IDE �–  Rich MR Programming Semantic �–  Pre-built connectors to NoSQL & Analytic

DBs and HDFS�–  Same technology used for traditional ETL

programs �

© OpenBI, LLC 2010

Demo�

16�

© OpenBI, LLC 2010

Dave Reinke �[email protected]

630-405-8404�

17�

Thank You! �