edw optimization with hadoop big data vfinal - pentahoevents.pentaho.com/rs/pentaho/images/webinar 1...

24
Enterprise Data Warehouse Optimization with Hadoop Big Data © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 @Pentaho #BigDataWebSeries

Upload: vuongtu

Post on 27-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Enterprise Data Warehouse Optimization with Hadoop Big Data

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

@Pentaho #BigDataWebSeries

Page 2: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Your Hosts Today

Dave Henry SVP Enterprise Solutions

2 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Davy Nys VP EMEA & APAC

Page 3: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

3 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Source/copyright: The Human Face of Big Data

Page 4: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Pentaho Webinar Series

4 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Sign-up at: pentaho.com

Page 5: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Goals for Today

5 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

To understand:

•  Challenges with the current EDW architecture

•  Trends and shifts in data processing

•  How Hadoop can help

•  How to leverage Hadoop with Pentaho Visual MapReduce

Page 6: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Complete Analytics and Visual Data Management

Hadoop NoSQL Databases

Data Discovery &

Visualization

Enterprise &

Ad Hoc Reporting

Predictive Analytics &

Machine Learning

Data Ingestion, Manipulation &

Integration

Analytic Databases

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 6

Page 7: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Traditional Data Warehouse Architecture

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 7

Source data acquisition / Ingestion Initial consolidation as required

Cleansing Transformation Change Data Capture Data Warehouse Management

Extract Transform

Load

Dashboard

Report

Analyzer

Structured Data

Unstructured Data

Data Mart(s) / Warehouse

Metadata

Page 8: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Trends with Data Processing

8 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Data Load

•  Volume of existing data sources are steadily increasing

•  Requirement to make data available for longer periods of time (3 years -> 30 years)

•  New sources of data are desired for analysis – machine-generated or external/3rd-party data

•  Extract data from source systems •  Load it (in its raw form) into the EDW •  Transform it via SQL, creating new tables •  Load the new tables into the “official” data

warehouse

“ELTL” Approach

To Data Load

Page 9: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

EDW can’t handle increasing data and workloads, so companies must:

•  Reduce the volume of data •  Restrict end-user access (# of users or access windows) to

accommodate longer batch processing windows •  Purchase additional capacity (hardware / licenses), which can be

as much as $100K / TB Then, companies are faced with the following challenges:

•  The compromise itself •  The incremental outlay of capital required to expand the EDW or

purchase more proprietary ETL tool capacity •  The inability of the incumbent ETL vendor to work with Hadoop

Challenges with Traditional Approaches

9 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 10: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Solution Architecture with Hadoop

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 10

Data Integration Source data acquisition / Ingestion Initial consolidation as required

ETL ETL Metadata

Dashboard

Report

Analyzer

Structured Data

Unstructured Data

Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management

Data Mart(s) / Warehouse

Page 11: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Core Benefits

1.  Improve performance –  Meet critical data processing SLAs

2.  Retain all data for analysis 3.  Lower costs of data

management, growth 4.  Extend existing EDW

capacity –  Increase ROI from current investments

11 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Costs

Time

Flexibility

Page 12: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Challenges with Hadoop: Scripting and Coding

12 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Costs

Time

Flexibility

Page 13: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Pentaho: Quickest, Most Complete Solution for Big Data

Design, develop and deploy 15x faster: •  Full continuity from data access to decisions – complete data integration &

business analytics platform for any big data store

•  Faster development, faster runtime – visual development, distributed execution

•  Instant and interactive analysis – no coding, no ETL required

13 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Page 14: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Solution Architecture & Demo

14 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Solution Architecture & Demo

Page 15: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Data Warehouse Optimization

Data Sources Big Data Architecture

Data Warehouse (Master & Transactional Data)

ERP

CRM

CDR

Analytic Data Mart(s)

Analytic Data Mart(s)

Analytic Data Mart(s)

Logs Logs

Other Data

Raw Data

Parsed Data

Analytic Datasets

Master Data

Tape Archive

Page 16: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

ORCHESTRATE

ERP DW

Processing

CRM

Pig, Oozie, Flume, Hive, HBase, Sqoop

Raw Data

Parsed Data

Analytic Datasets

Pentaho for Hadoop – Data Integration + Analytics

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 16

Master Data

Analysis & Reporting

ANALYZE

VISUAL MAP REDUCE

Data Integration Analytics

INGEST

Ingestion

Structured Data

Unstructured Data

Page 17: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Example – Call Record Processing

•  What are the top 10 states for outbound calls on Fridays, Saturdays and Sundays?

•  Data available: –  Call records: date/timestamp & source phone # –  Reference data: area code by country, state &

time zone (North American Numbering Plan)

•  Goal: –  Parse, enrich and filter the data –  Load the data into Postgres for analysis

•  Challenge –  Prepare the data without impacting the EDW (no

ELT)

?

Page 18: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Raw Data

Hadoop Data Processing Scenario

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 18

Master Data

Ingestion Structured Data

Unstructured Data

INGEST

Page 19: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Processing

Raw Data

Parsed Data

Analytic Datasets

Visual MapReduce

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 19

Master Data

VISUAL MAP REDUCE

1.  MapReduce Input – calling data

2.  Calculate Month, Day, Day of Week

3.  Extract 3 digit area code

4.  Lookup geo master data in HDFS

5.  Filter for weekend and US only calls

6.  Create “Value” field for Key-Value Pair

7.  Create “Key “ field for Key-Value Pair

8.  MapReduce Output – Key-Value Pair

Java Programing

Page 20: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Solution Architecture & Demo

20 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

End of Demo

Page 21: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Leveraging Hadoop with Pentaho

21 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

OEM –  Flexibility, Extensibility, Architected to Embed Pricing –  One of top reasons customers choose us Community/Open Source Cache –  Similar to Hadoop

Data Management Platform –  Visual Map Reduce, Orchestration,

Connectivity –  Fusion of all data sources & processing –  Control/Manage/Optimize flow of data Hybrid –  Leverages non-Hadoop infrastructure

Page 22: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Overall Benefits

22 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Business Benefits

•  You can defer upgrades to expensive EDW hardware

•  You can offload batch processing from the EDW and make it more available to end-users (improve performance / comply with SLAs)

•  With better performance you may need smaller cluster sizes

•  This is a low-risk use case that lets you get familiar with Hadoop while creating business value

•  It’s easy to evaluate – you don’t need to modify your cluster and risk disrupting the configuration

Technical Benefits

You should keep your EDW, but use Hadoop and Pentaho to optimize data processing

Page 23: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

Solution Architecture & Demo

23 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Q & A

Page 24: EDW Optimization with Hadoop Big Data vFINAL - Pentahoevents.pentaho.com/rs/pentaho/images/Webinar 1 PPT.pdf · Pentaho: Quickest, Most Complete Solution for Big Data Design, develop

24 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Contact Us or Sign-up at: pentaho.com