slides pentaho-hadoop-weka

22
F**** around with Big Data and Predictive Analytics Featuring Kettle, Weka & Hadoop. © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Upload: lucboudreau

Post on 26-Jan-2015

109 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Slides pentaho-hadoop-weka

F**** around with Big Data and Predictive Analytics

Featuring Kettle, Weka & Hadoop.

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 2: Slides pentaho-hadoop-weka

Pentahuh?

2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 3: Slides pentaho-hadoop-weka

What’s Pentaho exactly?

CENTRAL ADMINISTRATION, AUDITING & MONITORING

DELIVER When & WhereUsers Need It

STREAMLINE Information Delivery

VISUALIZE& Report Information In Any Style

ACCESSAll Enterprise Data Sources

ISV & Packaged Applications

SaaS / Cloud Applications

EMBEDDED

Web

Mobile

Print

E-Mail

STANDALONE

‣ Advanced & Predictive Analytics

DATA MINING

‣ Interactive

‣ Operational

‣ Enterprise

REPORTING

‣ Ad hoc Exploration

‣ Multi-Dimensional

ANALYSIS

‣ Interactive Metrics

‣ Rich Visualizations

DASHBOARDS

ERP / CRM / Enterprise Apps (e.g. SAP, Oracle)

Hadoop & NoSQL Data

Unstructured & semi-structured (XML, Excel, Files, etc.)

Relational Data Sources

Cloud(e.g. Salesforce, Amazon, Dell)

‣Data Integration

‣ Graphical ETL Designer

INTEGRATE, CLEANSE, & ENRICH DATA

‣ In Memory Caching

‣ High Performance

ANALYTICS ACCELERATOR

‣ Direct Access

‣ Hadoop Clustering/ Scheduling

‣ Instant OLAP Cubes

‣ Enterprise Scalability

Page 4: Slides pentaho-hadoop-weka

We do open source analytics.

4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 5: Slides pentaho-hadoop-weka

Why does Pentaho claim to have anything to do with Big Data??

5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 6: Slides pentaho-hadoop-weka

Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities

using an innovative, metadata-driven approach

6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 7: Slides pentaho-hadoop-weka

Bring the code to the data

7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

JDBC

Page 8: Slides pentaho-hadoop-weka

Bring the code to the data

8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

JDBCKettle

Page 9: Slides pentaho-hadoop-weka

KettleKettle

Bring the code to the data

9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Page 10: Slides pentaho-hadoop-weka

Project Weka a comprehensive set of tools for machine learning and data mining

10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 11: Slides pentaho-hadoop-weka

11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 12: Slides pentaho-hadoop-weka

12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 13: Slides pentaho-hadoop-weka

Bring Weka to the data

13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Kettle

JDBCKettle

Kettle

Page 14: Slides pentaho-hadoop-weka

Bring Weka to the data

14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 15: Slides pentaho-hadoop-weka

JDBC Services for Kettleruntime optimization and SQL pushdown

15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 16: Slides pentaho-hadoop-weka

A smart(er) JDBC Layer

16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Kettle

Kettle

Kettle JDBC

SELECT CUSTOMER_ID, SUM(UNIT_SALES)

FROM SALES_FACT

WHERE AGE_GROUP_ID > 3

GROUP BY CUSTOMER_ID;

Page 17: Slides pentaho-hadoop-weka

SELECT CUSTOMER_ID

FROM SALES_FACT;

SELECT CUSTOMER_ID, SUM(UNIT_SALES)

FROM SALES_FACT

WHERE AGE_GROUP_ID > 3

GROUP BY CUSTOMER_ID;

A smart(er) JDBC Layer

17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Kettle

Kettle

Kettle Kettle JDBC

Kettle

Kettle

Page 18: Slides pentaho-hadoop-weka

The gains

18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

• Job design and

administration becomes

trivial.

• Runs the rich Kettle plugin

environment directly on the

nodes.

• Performs much better than

Hive.

• The JDBC layer is pretty

neat.

Page 19: Slides pentaho-hadoop-weka

The caveats

19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

• True parallel machine

learning algorithms are rare

and hard to design.

• Not an actual

production-ready design.

• Clients might have caches,

which must be notified by

the BD store for updates.

Page 20: Slides pentaho-hadoop-weka

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755520

Demo!

Page 21: Slides pentaho-hadoop-weka

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755521

Thank you!

Join the conversation. You can find us on:

blog.pentaho.com

@Pentaho

Facebook.com/Pentaho

Pentaho Business Analytics

Page 22: Slides pentaho-hadoop-weka

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755522

Want to learn more?

Learning Linear Models in Hadoop with Wekahttp://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html

Introduction to MapReduce with Pentaho Data Integrationhttp://www.youtube.com/watch?v=KZe1UugxXcs`