edw optimization with hadoop big data vfinal - pentahoevents.pentaho.com/rs/pentaho/images/webinar 1...

Post on 27-Jul-2018

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Enterprise Data Warehouse Optimization with Hadoop Big Data

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

@Pentaho #BigDataWebSeries

Your Hosts Today

Dave Henry SVP Enterprise Solutions

2 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Davy Nys VP EMEA & APAC

3 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Source/copyright: The Human Face of Big Data

Pentaho Webinar Series

4 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Sign-up at: pentaho.com

Goals for Today

5 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

To understand:

•  Challenges with the current EDW architecture

•  Trends and shifts in data processing

•  How Hadoop can help

•  How to leverage Hadoop with Pentaho Visual MapReduce

Complete Analytics and Visual Data Management

Hadoop NoSQL Databases

Data Discovery &

Visualization

Enterprise &

Ad Hoc Reporting

Predictive Analytics &

Machine Learning

Data Ingestion, Manipulation &

Integration

Analytic Databases

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 6

Traditional Data Warehouse Architecture

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 7

Source data acquisition / Ingestion Initial consolidation as required

Cleansing Transformation Change Data Capture Data Warehouse Management

Extract Transform

Load

Dashboard

Report

Analyzer

Structured Data

Unstructured Data

Data Mart(s) / Warehouse

Metadata

Trends with Data Processing

8 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Data Load

•  Volume of existing data sources are steadily increasing

•  Requirement to make data available for longer periods of time (3 years -> 30 years)

•  New sources of data are desired for analysis – machine-generated or external/3rd-party data

•  Extract data from source systems •  Load it (in its raw form) into the EDW •  Transform it via SQL, creating new tables •  Load the new tables into the “official” data

warehouse

“ELTL” Approach

To Data Load

EDW can’t handle increasing data and workloads, so companies must:

•  Reduce the volume of data •  Restrict end-user access (# of users or access windows) to

accommodate longer batch processing windows •  Purchase additional capacity (hardware / licenses), which can be

as much as $100K / TB Then, companies are faced with the following challenges:

•  The compromise itself •  The incremental outlay of capital required to expand the EDW or

purchase more proprietary ETL tool capacity •  The inability of the incumbent ETL vendor to work with Hadoop

Challenges with Traditional Approaches

9 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Solution Architecture with Hadoop

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 10

Data Integration Source data acquisition / Ingestion Initial consolidation as required

ETL ETL Metadata

Dashboard

Report

Analyzer

Structured Data

Unstructured Data

Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management

Data Mart(s) / Warehouse

Core Benefits

1.  Improve performance –  Meet critical data processing SLAs

2.  Retain all data for analysis 3.  Lower costs of data

management, growth 4.  Extend existing EDW

capacity –  Increase ROI from current investments

11 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Costs

Time

Flexibility

Challenges with Hadoop: Scripting and Coding

12 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Costs

Time

Flexibility

Pentaho: Quickest, Most Complete Solution for Big Data

Design, develop and deploy 15x faster: •  Full continuity from data access to decisions – complete data integration &

business analytics platform for any big data store

•  Faster development, faster runtime – visual development, distributed execution

•  Instant and interactive analysis – no coding, no ETL required

13 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Solution Architecture & Demo

14 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Solution Architecture & Demo

Data Warehouse Optimization

Data Sources Big Data Architecture

Data Warehouse (Master & Transactional Data)

ERP

CRM

CDR

Analytic Data Mart(s)

Analytic Data Mart(s)

Analytic Data Mart(s)

Logs Logs

Other Data

Raw Data

Parsed Data

Analytic Datasets

Master Data

Tape Archive

ORCHESTRATE

ERP DW

Processing

CRM

Pig, Oozie, Flume, Hive, HBase, Sqoop

Raw Data

Parsed Data

Analytic Datasets

Pentaho for Hadoop – Data Integration + Analytics

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 16

Master Data

Analysis & Reporting

ANALYZE

VISUAL MAP REDUCE

Data Integration Analytics

INGEST

Ingestion

Structured Data

Unstructured Data

Example – Call Record Processing

•  What are the top 10 states for outbound calls on Fridays, Saturdays and Sundays?

•  Data available: –  Call records: date/timestamp & source phone # –  Reference data: area code by country, state &

time zone (North American Numbering Plan)

•  Goal: –  Parse, enrich and filter the data –  Load the data into Postgres for analysis

•  Challenge –  Prepare the data without impacting the EDW (no

ELT)

?

Raw Data

Hadoop Data Processing Scenario

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 18

Master Data

Ingestion Structured Data

Unstructured Data

INGEST

Processing

Raw Data

Parsed Data

Analytic Datasets

Visual MapReduce

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 19

Master Data

VISUAL MAP REDUCE

1.  MapReduce Input – calling data

2.  Calculate Month, Day, Day of Week

3.  Extract 3 digit area code

4.  Lookup geo master data in HDFS

5.  Filter for weekend and US only calls

6.  Create “Value” field for Key-Value Pair

7.  Create “Key “ field for Key-Value Pair

8.  MapReduce Output – Key-Value Pair

Java Programing

Solution Architecture & Demo

20 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

End of Demo

Leveraging Hadoop with Pentaho

21 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

OEM –  Flexibility, Extensibility, Architected to Embed Pricing –  One of top reasons customers choose us Community/Open Source Cache –  Similar to Hadoop

Data Management Platform –  Visual Map Reduce, Orchestration,

Connectivity –  Fusion of all data sources & processing –  Control/Manage/Optimize flow of data Hybrid –  Leverages non-Hadoop infrastructure

Overall Benefits

22 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Business Benefits

•  You can defer upgrades to expensive EDW hardware

•  You can offload batch processing from the EDW and make it more available to end-users (improve performance / comply with SLAs)

•  With better performance you may need smaller cluster sizes

•  This is a low-risk use case that lets you get familiar with Hadoop while creating business value

•  It’s easy to evaluate – you don’t need to modify your cluster and risk disrupting the configuration

Technical Benefits

You should keep your EDW, but use Hadoop and Pentaho to optimize data processing

Solution Architecture & Demo

23 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Q & A

24 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Contact Us or Sign-up at: pentaho.com

top related