modern data integration: a paradigm shiftfiles.meetup.com/1831741/chug_29apr2015.pdf · chug –...
TRANSCRIPT
Modern Data Integration: A Paradigm Shift
Making a leap from traditional world to modern world of Data Integration
CHUG – Packard Place: CharlotteApr 29th 2015
Traditional Data Integration – Or Legacy?
1. “Existing approaches to data integration won’t meet future needs as the use of technology continues to change. Drastic measures must be taken now to prepare enterprises for the arrival of this technology, and to position enterprises to take full advantage”1 – Gaurav Dhillion, Ex-CEO of Informatica, CEO Snaplogic
2. “New approaches to managing data, as well as the rapid growth of data, make traditional data integration technology unusable” 2 – David S. Linthicum, Linthicum Research
3. “Extract, transform and load (ETL) processes have been the way to move and prepare data for analysis within data warehouses, but will the rise of Hadoop bring the end of ETL?” 3 InformationWeek
4. What Informatica’s Buyout Means to Big Data Integration“..the news represented a death knell of sorts for “old style” ETL and the recognition that newer data integration technologies and techniques are here to stay..” Datanami.com
1: http://video.snaplogic.com/iJe/webinar-the-death-of-traditional-data-integration/
2: http://tngconsultores.com/kw/pluginfile.php/140/mod_forum/attachment/983/Death_of_Traditional_DI.pdf
3. http://www.informationweek.com/big-data/big-data-analytics/big-data-debate-end-near-for-etl/d/d-id/1107641?
4. http://www.datanami.com/2015/04/08/what-informaticas-buyout-means-to-big-data-integration/
Massive Data Being Created
Emerging Data_______
Limited Structure_______
Data In Motion
SaaS
Changing Times and New Challenges
What comes to mind when we think of Big Data Integration
PigSqoop
Hive HBase
SparkMap
Reduce
Python Java
SparkScala
Typical Big Data Project Implementation
Exploratory Pilot Production
• Technology evaluations
• Evaluations mainly focus on conceptual aspects than long-term sustainability
• Data Integration is composed of scripting or using legacy tools
• “Voila, I can sqoop data from ___ to Hadoop”
• Take the exploratory work to next level
• Scale the use-cases to more complicated scenarios
• More scripts, more coding or more legacy tools codebase
• “Why should we look for other tools, _____ is working right?”
• Enterprise Integration with production-like system
• Several months of development, no standards, no best practices
• Over-grown complexity often leads to either failure or re-engineering
• “To make it right, it will take ___ million dollars”
How Diyotta address these challenges
PigSqoop
Hive HBase
SparkMap
Reduce
Python Java
SparkScalaEnables existing skillset relevant with respect to Big Data without the need
to learn new/rapidly evolving technologies
Diyotta – Leading Modern Data Integration
1. Take the processing to where the data lives
2. Fully leverage all platforms based on what they are designed to do well
3. Move data point-to-point to avoid single server bottlenecks
4. Manage all of the business rules and data logic centrally
5. Make changes quickly using existing rules and logic
Deployment Architecture for Hadoop
Drag & Drop interface for creating designs to performIngestion, Blending, Enrichment, Transformation, Summarization, Load, Provision
Source Join Transform Aggregate Load Provision
Source any data
Join heterogeneous
objects
Use native libraries to transform
Summarize & merge data
Load processed data in Hadoop
(HDFS, Hive)
Export data to
Provisioning platform
Data flows frictionless from source to target
Design-time MetadataRun-time E-L-T Instructions
Browser-based developer studio
Real-time execution monitoringData Lineage
Security & AdministrationScheduling & Orchestration
Other client modules
Edge Node
Can be deployed either on edge or name node
Data Node
Data Node
Data Node
Data Node
HDFS
Master or Name node
Transform & Load
instructions to native
target system
Data Integration Engine
Extract instructions to External
source systems
Data Integration Engine
Paradigm Shift - Modern Data Integration
Design once, use many times
Know your data
Deploy agents, deliver instructions
Optimize actions
Adapt quickly
Diyotta – Partners
Diyotta is certified on all Hadoop distributions and MPP platforms
Business Problem Statement
ScotiabankTraded: TSX, NYSE and TTSE Total Assets: $800B Global Presence: 55 Countries Net Income: 8B
Objective: • Improve data availability and reduce latency for Business Users using Hadoop as the data provisioning platform• Source data present in diverse source system across various banking organizations including Mortgage, Real-Estate,
Securities, Retail banking, Collateral Management, etc. • Provide a comprehensive & holistic view to Business Users which is currently not possible due to data being present in an
unintegrated and siloed fashion across different data platforms• Reduce cost of overall Data Management due to costly storage in analytical platforms such as Netezza and DB2, plus
additional cost to maintain large teams to manage data pipelines, data enrichment and data blending
Approach:• Enable Hadoop as the Data Provisioning Platform to land all data from various source systems into Hadoop• Identify a light-weight Data Integration solution to ingest, enrich, transform, blend and export data in Hadoop• Empower Business Users and other applications to access data from Hadoop using Modern Analytics tools
Accomplishments
Time to Market• Lending & Mortgage data (Phase I) development and production in record 4 weeks• Wizard based development accelerators for rapid data migration across various data platforms• Data delivered to business in less than 1 month as opposed to 6 months originally planned in Client roadmap
Cost & Resource Optimization• Optimizing and containing TCO of data integration within Hadoop – licensing, hardware, maintenance & upgrades• Leveraging existing SMEs/developers, no additional skills or pay more to get substandard resources • Maximize ROI - fully leverage existing data platform with the quickest time to value
Diyotta Difference – Business Value• Data for over 2 million mortgage customers available for real-time reporting & decision making• Transformed technology driven project into immediate business value delivered• Future-proofing Data Integration on Hadoop regardless the underlying Hadoop distribution today or tomorrow
Implementing Data Lake in days
RDBMS
Files
Logs
JSON
DATA LAKE
Data lake as the emerging approach to speed up thedelivery of information and insights to the businesswithout the delays traditionally experienced withcumbersome data warehousing processes.
Manage the data lake- Automatic target structure creation- Multiple target options- Enable metadata discovery- Standardize data formats based on use-cases- Allow schema on read
Do not let your data lake become a data swamp
Modern Data Integration – By DI Experts, For DI Experts
• 60 years of data integration experience on the senior executive team.“Designed by data integration professionals for data integration professionals.”
• Co-founder is the architect of modern data integration.Sanjay Vyas, author of “An Executive’s Guide to Modern Data Integration”
• Global coverage with customers and engineers on three continents.“Scale up to handle global concerns; scale down to handle single-location project.”
• Investors and advisors committed to long term success. “The best minds in DI consulting, delivery, and implementation back Diyotta.”
Parameters – Time, Cost, Resources, Functionality
Time ->
Cost
Reso
urce
Com
plex
ity->
Diyotta lets you:
• SAVE TIME
• CONTROL COST
• REDUCE COMPLEXITY
Typical big data implementations:- takes months to implement- requires specialized and rare skillsets at premium
cost - introduces complexity with additional
functionality, production
What’s Next
OPTION 1
Arrange a Deep Dive into Diyotta Offerings
OPTION 2
Provide Access to Data and Platforms
to Demonstrate Diyotta Value
OPTION 3
Install DiyottaEvaluation and Guide Internal Usage of the
Product