optimizing your modern data architecture - with attunity, rcg global services and hortonworks

24
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Optimizing the Modern Data Architecture with Attunity, Hortonworks and RCG Global Services We do Hadoop.

Upload: hortonworks

Post on 15-Jul-2015

247 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Optimizing the Modern Data Architecture with Attunity, Hortonworks and RCG Global Services

We do Hadoop.

Page 2: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Speakers    Hortonworks

◦  Adis Cesir, Big Data Solution Engineer

   RCG Global Services ◦  Ramu Kalvakuntla, Principal, Big Data Practice

   Attunity ◦  Santosh Chitakki, Director of Product Management

Page 3: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Partnership

Strategy  and  Solu/on  Delivery  

Hadoop  Distribu/on,  Support  and  Training  

Any  Data,  Anywhere,  Any/me  

RCG  GLOBAL  SERVICES,    HORTONWORKS  AND  ATTUNITY  ARE  PARTNERING  TO  PROVIDE  AN  EDW  OPTIMIZATION  SOLUTION  THAT  DELIVERS  REAL  FINANCIAL  BENEFITS  BY  EFFECTIVELY  IMPLEMENTING  APACHE  HADOOP  TO  

AUGMENT  CURRENT  EDW  PLATFORMS.

Page 4: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Traditional systems under pressure Challenges •  Can’t manage new data •  Constrains data to app •  Costly to scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012 2.8 Zettabytes

2020 40 Zettabytes

LAGGARDS

INDUSTRY LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 5: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

A Typical EDW Faces Three Challenges

1.  Data Storage: storing cold data or throwing data away

2.  Processing Capacity: wasting processing cycles on low value workloads

3.  New Data Sources: unable to capture and use new data

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

DAT

A SY

STEM

S

Systems of Record

RDBMS

ERP

CRM

Other

Clickstream   Web  &  Social   Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  NEW

SO

UR

CES

1 2

3

Page 6: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Most EDWs Are Used Inefficiently A

NA

LYTI

CS

Data Marts

Business Analytics

Visualization & Dashboards

DAT

A SY

STEM

S

Systems of Record

RDBMS

ERP

CRM

Other

1.  Data Storage: –  More than 50% of data is

unused

2.  Processing Capacity: –  55% of CPU capacity is ETL –  35% of CPU consumed by

ETL is to load unused data –  30-40% of CPU is consumed

by only 5% of ETL workloads

In a typical EDW*:

Hot Warm Cold

Why pay first class price for economy data?

Page 7: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Optimization: Realize Cost Savings with HDP

Archive data away from the EDW •  Move cold or rarely used data to Hadoop

as active archive

•  Store more data longer

Offload costly ETL processes •  Free your EDW to perform high-value functions like

analytics & operations, not ETL

•  Use Hadoop for advanced ELT

Enrich the value of your EDW •  Use Hadoop to refine new data sources, such as

web and machine data, for new analytical context

HDP helps you reduce costs and optimize the value associated with your EDW

Clickstream   Web    &  Social  

Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  

SOU

RC

ES

Existing Systems

ERP   CRM   SCM  

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

AN

ALY

TIC

S

Applications Business Analytics

Visualization & Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-Time Batch Partner ISV Batch Batch MPP   EDW  

Page 8: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

•  Time spent understanding source data and defining destination structure

•  High latency between data generation and availability

Challenge with traditional Architecture

DB

Structured Data

Source Layer

ETL / ELT EDW ETL

Data Collection & Processing

Data Mart

Integration, Storage & Business View

Business / Department Specific

Data Mart Data Mart

Data Mart Data Mart

Incapable/high complexity when

dealing with loosely structured data

•  No linear scale •  High license cost •  Large code footprint

Data discarded due to cost or

performance

Low or no visibility into transactional

data

EDW used as an ETL tool with 100s of

staging tables

Data Collection & Processing

Page 9: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Offload/Archive/Process – Hadoop based Platform

DB

Structured Data

Data Collection, Integration, Storage and Processing

°   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   N  

Integrate, Transform, Archive, Enrich

Source Layer

EDW Data Mart

Data Mart Data Mart

Data Mart Data Mart

Data Mart

•  Store transactional data •  Retain 7+ years of data (Hot archive) •  Data Lineage – ability to store intermediate data sets •  Becomes an analytics platform for data scientists

•  Linearly scalable commodity hardware

•  Massively parallel compute and storage

Support for any type of data: structured or

unstructured with any volume and velocity

Data Warehouse can now focus less on storage and

transformation and more on presentation

Clickstream   Social   Geo   Sensor   Server    Logs  

Unstrctur.  

Page 10: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Optimization Customer Stories

Archive TrueCar stores data on millions of car purchases at $0.12 per GB with HDP, well below the $19 per GB possible with other solutions.

Offload Luminar cut its ETL processing times from 3 days to 3 hours with HDP, quickly refreshing its models with new customer transaction data.

Enrich ZirMed enriches its EDW with new data, including pharmacy receipts, text messages, and patient web searches.

Page 11: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop Driver: Enabling the Data Lake SC

ALE

SCOPE

Data Lake Definition •  Centralized Architecture

Multiple applications on a shared data set with consistent levels of service

•  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.

•  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.

Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps

Goal: •  Centralized architecture •  Data-driven business

DATA LAKE

Journey to the Data Lake with Hadoop

Systems of Insight

Page 12: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Modern Data Architecture

•  Reduce cost and improve performance by off-loading EDW data and processing to the Hortonworks Distribution Platform (HDP)

•  Implement a platform that scales incrementally using low cost hardware and software

•  Support unstructured, semi-structured and structured data in a single analytics platform

•  Enable superior analytic capabilities providing insight that is not possible to achieve from their current environments

•  Provide seamless access to data for analysis and business applications

Page 13: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Solution Model - Modern Data Architecture

EDW Optimization Roadmap

Identify offload candidates, create architectural blueprint, implementation roadmap, business case and ROI

EDW Optimization Implementation Execute Data and ETL/ELT off-load, active archive, implement data ingestion and data service

Data Value Realization Provide insight, data in motion, advanced analytics, information value creation, and visualization

Enterprise Enablement Enterprise access, enriched data sources, service orchestration and data virtualization

Page 14: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

EDW Optimization – Roadmap and Analysis

•  Assess current reporting, ELT/ETL, and analytical processes

•  Review logical and physical data models

•  Assess current technical architecture

•  Prioritize opportunities •  Define future Hadoop

architecture and capacity needs

•  Develop implementation plan •  Create business case / ROI •  Create and review Executive

Summary with Clients

•  Analyze Data Usage: •  Identify under-utilized

•  Schemas •  Tables / Columns •  Data

•  Identify off-load opportunities

Analyze EDW Workload • Read vs. Writes • ETL vs. ELT • Analytical vs. Batch SQL’s • CPU consumption • CPU utilization

Current State Analysis

Data Usage Analysis

Workload Analysis

Blueprint & Roadmap

Activities Week1 Week2 Week3 Week4 Current State Analysis

Data Usage Analysis

Workload Analysis

Blueprint & Roadmap

Page 15: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

EDW Optimization – Implementation

Activities Month 1 Month 2 Month 3 Month …

Data Off-Load

Process Off-load

Data Services

Analysis & Reporting

Data Off-load

Process Off-load

Data Services

Analysis & Reporting

•  POC / Reference Implementation (if needed)

•  Install / expand HDP cluster

•  Analyze off-load data sets •  Automate data ingestion •  Implement active archiving

•  Provide scheme-on-read for direct business analysis

•  Migrate resource intensive analysis to Hadoop

•  Connect analysis and visualization tools to Hadoop

•  Migrate EDW ETL/ELT workload to Hadoop

•  De-normalize data to optimize performance

•  Load Hadoop ETL/ELT output data back into EDW

•  Provide data virtualization for data transparency across Hadoop and MPP databases

•  Build business services for reporting and enterprise applications

Page 16: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Data Warehouse Optimization - An Iterative Process

•  Identify low-hanging fruits

•  Get buy-in from stakeholders

•  Plan and implement in increments

•  Continuously assess and iterate

Page 17: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Attunity Visibility Data Usage Analysis (Sample)

•  Unused Data (e.g. Tables with no ‘SELECT’ statements)

70 Terabytes in Unused Databases

Page 18: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Attunity Visibility Data Usage Analysis (Sample)

•  History of data used in large “Fact” table

•  Queries go back only 2 years

•  Maintains 8 years of data

Page 19: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Attunity Visibility Workload Analysis (Sample)

Almost 60% of CPU to load and ingest data

•  Intensive ETL workloads

Page 20: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Attunity Visibility Workload Analysis (Sample) The Top 100 repetitive SQL of 101,000 in ETL SQL acounts for 30+ % of CPU consumption by ETL.

Page 21: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Attunity Visibility – The Data Dashboard

Completely Analyze Workloads And Data Usage

Reduce Cost | Optimize Performance | Justify Investments

User Activity Data Usage Workload Performance

Page 22: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

RCG Success Stories

•  Completed EDW optimization projects for two large retailors

•  Offloading cold data and ELT to Hadoop

•  Cost savings projected between $6M to $10M

Top Retailors

$ Top Financial Services

•  Currently working with two large Fortune 100 financial companies

•  Offloading 40TB to 60TB of RAW data from EDW platforms to Hadoop

•  Re-architecting their batch decision processing with savings between $10M to $15M.

Page 23: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Next Steps…

Download the Hortonworks Sandbox Learn Hadoop

Build Your Analytic App

Try Hadoop

Learn more about our partnerships

http://hortonworks.com/partner/rcg-global-services/

http://hortonworks.com/partner/attunity/

Page 24: Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks

Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

SAN JOSE June 9-11

BRUSSELS April 15-16

•  Deep-dive technical content •  65+ sessions and 5 tracks •  1,000 attendees •  Sponsorships Available •  Including Pre and Post event community meetups

and BOFs •  Hadoop training available

•  100+ sessions and 7 tracks •  Deep-dive technical content •  5,000 attendees •  Sponsorships Available •  Including Pre and Post event community meetups

and BOFs •  Hadoop training available

www.hadoopsummit.org

The Largest Hadoop Community Events in Europe and North America