modern data integration: a paradigm shiftfiles.meetup.com/1831741/chug_29apr2015.pdf · chug –...

15
Modern Data Integration: A Paradigm Shift Making a leap from traditional world to modern world of Data Integration CHUG – Packard Place: Charlotte Apr 29 th 2015

Upload: others

Post on 03-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Modern Data Integration: A Paradigm Shift

Making a leap from traditional world to modern world of Data Integration

CHUG – Packard Place: CharlotteApr 29th 2015

Page 2: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Traditional Data Integration – Or Legacy?

1. “Existing approaches to data integration won’t meet future needs as the use of technology continues to change. Drastic measures must be taken now to prepare enterprises for the arrival of this technology, and to position enterprises to take full advantage”1 – Gaurav Dhillion, Ex-CEO of Informatica, CEO Snaplogic

2. “New approaches to managing data, as well as the rapid growth of data, make traditional data integration technology unusable” 2 – David S. Linthicum, Linthicum Research

3. “Extract, transform and load (ETL) processes have been the way to move and prepare data for analysis within data warehouses, but will the rise of Hadoop bring the end of ETL?” 3 InformationWeek

4. What Informatica’s Buyout Means to Big Data Integration“..the news represented a death knell of sorts for “old style” ETL and the recognition that newer data integration technologies and techniques are here to stay..” Datanami.com

1: http://video.snaplogic.com/iJe/webinar-the-death-of-traditional-data-integration/

2: http://tngconsultores.com/kw/pluginfile.php/140/mod_forum/attachment/983/Death_of_Traditional_DI.pdf

3. http://www.informationweek.com/big-data/big-data-analytics/big-data-debate-end-near-for-etl/d/d-id/1107641?

4. http://www.datanami.com/2015/04/08/what-informaticas-buyout-means-to-big-data-integration/

Page 3: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Massive Data Being Created

Emerging Data_______

Limited Structure_______

Data In Motion

SaaS

Changing Times and New Challenges

Page 4: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

What comes to mind when we think of Big Data Integration

PigSqoop

Hive HBase

SparkMap

Reduce

Python Java

SparkScala

Typical Big Data Project Implementation

Exploratory Pilot Production

• Technology evaluations

• Evaluations mainly focus on conceptual aspects than long-term sustainability

• Data Integration is composed of scripting or using legacy tools

• “Voila, I can sqoop data from ___ to Hadoop”

• Take the exploratory work to next level

• Scale the use-cases to more complicated scenarios

• More scripts, more coding or more legacy tools codebase

• “Why should we look for other tools, _____ is working right?”

• Enterprise Integration with production-like system

• Several months of development, no standards, no best practices

• Over-grown complexity often leads to either failure or re-engineering

• “To make it right, it will take ___ million dollars”

Page 5: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

How Diyotta address these challenges

PigSqoop

Hive HBase

SparkMap

Reduce

Python Java

SparkScalaEnables existing skillset relevant with respect to Big Data without the need

to learn new/rapidly evolving technologies

Page 6: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Diyotta – Leading Modern Data Integration

1. Take the processing to where the data lives

2. Fully leverage all platforms based on what they are designed to do well

3. Move data point-to-point to avoid single server bottlenecks

4. Manage all of the business rules and data logic centrally

5. Make changes quickly using existing rules and logic

Page 7: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Deployment Architecture for Hadoop

Drag & Drop interface for creating designs to performIngestion, Blending, Enrichment, Transformation, Summarization, Load, Provision

Source Join Transform Aggregate Load Provision

Source any data

Join heterogeneous

objects

Use native libraries to transform

Summarize & merge data

Load processed data in Hadoop

(HDFS, Hive)

Export data to

Provisioning platform

Data flows frictionless from source to target

Design-time MetadataRun-time E-L-T Instructions

Browser-based developer studio

Real-time execution monitoringData Lineage

Security & AdministrationScheduling & Orchestration

Other client modules

Edge Node

Can be deployed either on edge or name node

Data Node

Data Node

Data Node

Data Node

HDFS

Master or Name node

Transform & Load

instructions to native

target system

Data Integration Engine

Extract instructions to External

source systems

Data Integration Engine

Page 8: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Paradigm Shift - Modern Data Integration

Design once, use many times

Know your data

Deploy agents, deliver instructions

Optimize actions

Adapt quickly

Page 9: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Diyotta – Partners

Diyotta is certified on all Hadoop distributions and MPP platforms

Page 10: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Business Problem Statement

ScotiabankTraded: TSX, NYSE and TTSE Total Assets: $800B Global Presence: 55 Countries Net Income: 8B

Objective: • Improve data availability and reduce latency for Business Users using Hadoop as the data provisioning platform• Source data present in diverse source system across various banking organizations including Mortgage, Real-Estate,

Securities, Retail banking, Collateral Management, etc. • Provide a comprehensive & holistic view to Business Users which is currently not possible due to data being present in an

unintegrated and siloed fashion across different data platforms• Reduce cost of overall Data Management due to costly storage in analytical platforms such as Netezza and DB2, plus

additional cost to maintain large teams to manage data pipelines, data enrichment and data blending

Approach:• Enable Hadoop as the Data Provisioning Platform to land all data from various source systems into Hadoop• Identify a light-weight Data Integration solution to ingest, enrich, transform, blend and export data in Hadoop• Empower Business Users and other applications to access data from Hadoop using Modern Analytics tools

Page 11: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Accomplishments

Time to Market• Lending & Mortgage data (Phase I) development and production in record 4 weeks• Wizard based development accelerators for rapid data migration across various data platforms• Data delivered to business in less than 1 month as opposed to 6 months originally planned in Client roadmap

Cost & Resource Optimization• Optimizing and containing TCO of data integration within Hadoop – licensing, hardware, maintenance & upgrades• Leveraging existing SMEs/developers, no additional skills or pay more to get substandard resources • Maximize ROI - fully leverage existing data platform with the quickest time to value

Diyotta Difference – Business Value• Data for over 2 million mortgage customers available for real-time reporting & decision making• Transformed technology driven project into immediate business value delivered• Future-proofing Data Integration on Hadoop regardless the underlying Hadoop distribution today or tomorrow

Page 12: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Implementing Data Lake in days

RDBMS

Files

Logs

JSON

DATA LAKE

Data lake as the emerging approach to speed up thedelivery of information and insights to the businesswithout the delays traditionally experienced withcumbersome data warehousing processes.

Manage the data lake- Automatic target structure creation- Multiple target options- Enable metadata discovery- Standardize data formats based on use-cases- Allow schema on read

Do not let your data lake become a data swamp

Page 13: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Modern Data Integration – By DI Experts, For DI Experts

• 60 years of data integration experience on the senior executive team.“Designed by data integration professionals for data integration professionals.”

• Co-founder is the architect of modern data integration.Sanjay Vyas, author of “An Executive’s Guide to Modern Data Integration”

• Global coverage with customers and engineers on three continents.“Scale up to handle global concerns; scale down to handle single-location project.”

• Investors and advisors committed to long term success. “The best minds in DI consulting, delivery, and implementation back Diyotta.”

Page 14: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

Parameters – Time, Cost, Resources, Functionality

Time ->

Cost

Reso

urce

Com

plex

ity->

Diyotta lets you:

• SAVE TIME

• CONTROL COST

• REDUCE COMPLEXITY

Typical big data implementations:- takes months to implement- requires specialized and rare skillsets at premium

cost - introduces complexity with additional

functionality, production

Page 15: Modern Data Integration: A Paradigm Shiftfiles.meetup.com/1831741/CHUG_29APR2015.pdf · CHUG – Packard Place: Charlotte. Apr 29. th. 2015. Traditional Data Integration – Or Legacy?

What’s Next

OPTION 1

Arrange a Deep Dive into Diyotta Offerings

OPTION 2

Provide Access to Data and Platforms

to Demonstrate Diyotta Value

OPTION 3

Install DiyottaEvaluation and Guide Internal Usage of the

Product