3 reasons data virtualization matters in your portfolio
TRANSCRIPT
DATA VIRTUALIZATION PACKED LUNCH WEBINAR SERIES
Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
Next session
3 Reasons Data Virtualization Matters in Your PortfolioThursday, November 16th, 2017 | 11:00am PT | 2:00pm ET
Alberto Pan
Denodo’s CTO
Pablo Alvarez
Denodo’s Director of Product Management
Paul Moxon
Denodo’s Data Architectures & Chief
Evangelist
Data Integration – “The Way We Were…”
OperationalData Stores
Staging Area Data Warehouse Data Marts Analytics andReporting
ETLETLETL
The Data Integration Challenge
Manually access different systems
IT responds with point-to-point data integration
Takes too long to get answers to business users
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL“Data bottlenecks create business bottlenecks.”– Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
The Solution – A Data Abstraction Layer
Abstracts access to disparate data sources
Acts as a single repository (virtual)
Makes data available in real-time to consumers
DATA ABSTRACTION LAYER
“Enterprise architects must revise their data architecture to meet the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
Summary
• Modern Data Architectures are much more complex than the architectures of just
10 years ago
• Replicating (copying) data into a central repository doesn’t work at this scale or
complexity
• Data Virtualization can provide access to all of your data, in real-time, and
supporting self-service with a common data model (in the context of the
business users)
• Let’s find out how…
10
Logical Data Warehouse
“The Logical Data Warehouse (LDW) is a new data management
architecture for analytics combining the strengths of traditional
repository warehouses with alternative data management and access
strategy.”
11
Gartner Hype Cycle for Enterprise Information Management, 2012
12
The State and Future of Data Integration. Gartner, 25 may 2016
Physical data movement architectures that aren’t designed to
support the dynamic nature of business change, volatile
requirements and massive data volume are increasingly being
replaced by data virtualization.
Evolving approaches (such as the use of LDW architectures) include
implementations beyond repository-centric techniques
13
DW + Cloud dimensional data
Time Dimension Fact table(sales) Product Dimension
Customer Dimension
CRM
SFDC Customer
EDW
14
Multiple DW integration
Time Dimensi
on
Sales fact
Product Dimension
Region
Finance EDW
City
Marketing EDW
Customer Fidelity factsProduct Dimension
*Real Examples: Nationwide POC, IBM tests
Store
15
DW Historical offloading
Horizontal partitioning
Time Dimension Fact table(sales) Product Dimension
Retailer Dimension
Current Sales Historical Sales
EDW
16
Summary
▪ “The LDW is an evolution and augmentation of DW practices, not a replacement”
▪ “A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a
semantic layer can contain many combination of use cases, many business definitions of
the same information”
▪ “The LDW permits an IT organization to make a large number of datasets available for
analysis via query tools and applications.”
18
Gartner, Magic Quadrant for Data Integration, 2017
The Denodo Platform ... incorporates dynamic query optimization as
a key value point. This capability includes support for cost-based
optimization specifically for high data volume and complexity;... it
has also added an in-memory data grid with Massively Parallel
Processing(MPP) architecture to its platform.
19
Query Optimization: Example (1)
Naive Strategy (BI Tools, BDI Tools, Simple federation engines):
join
union
group by
Customers (3M)
Sales previous years (3B)Sales this year
(290M)
290M rows
300M rows (sales previous
year)
3M rows593M rows throughthe network
Obtain Total Sales By Customer Country in the Last Two Years
20
Query Optimization: Example (2)
Denodo Strategy
join
union
group by
Customers (3M)
Sales previous years (3B)Sales this year
(290M)
3M rows (sales by customer this year)
3M rows (sales by customer
previous year)
3M rows9 M rows through thenetwork
Obtain Total Sales By Customer Country in the Last Two Years
group by customer
group by customer
Query Optimization: Example (and 3)
union
group by
3M rows (sales by customer
this year)
3M rows (sales by customer
previous year)
3M rows(customers)
Aggregation pushdowngroup by
customer
group by customer
join
Integrated MPP
processing
System Execution TimeOptimization
Technique
No Rewriting 20 min None
Denodo 6 51 sec Aggregation push-down
Denodo 7 13 secAggregation push-down
+ MPP integration
22
Query Optimization: Summary
▪ You can achieve excellent performance in Logical Analytics Architectures.
▪ Key techniques needed:
▪ Advanced Dynamic Optimization to minimize network traffic and leverage the
power of data sources
▪ In-memory MPP processing to speed operations atthe DV layer
▪ Advanced incremental caching for reusing commonly used data and complex
calculations
• Let business users access the
data that they need and stop
IT being a bottleneck
• That’s the vision as sold by
many BI tool vendors
• i.e. give me the tools and
access to the data and
stand back ☺
The Promise of Self-Service Initiatives
Self-Service Issues…
• Tools are designed for data analysts (or power users)
• Users who are happy finding, wrangling, cleansing data
• Creating calculations, aggregations within the data
• What about the other business users?
• People who don’t want to spend hours fighting the spreadsheet…
• Will they use common definitions for key business entities and
metrics?
• Or will they pick and choose their own?
• Ultimately, can you trust the numbers?
• Where did the data come from? How has is been manipulated?
Rob van der Meulen, Gartner
Gartner predicts that by 2018 most business users
will have access to self-service tools, but that only
one in 10 initiatives will be sufficiently well-
governed to avoid data inconsistencies that
negatively impact the business.
Self-Service with Guardrails
• Don’t build just for the ‘data cowboys’
• Create a common and consistent semantic layer
• Everyone is using the same definitions and metrics
• Create pre-integrated, pre-calculated data services
• Saves the user having to do this themselves
• Ensures consistency of calculations, etc.
• But allow the cowboys to ‘roam and wrangle’
• Even the cowboys can only access ‘approved’ data sources
Indiana University – Decisions Support Initiative
• Multi-campus public university system in state of Indiana
• 110,000 students, 8,700 academic staff, 9 campuses statewide
• DSI Goal: To provide timely, relevant, and accurate data to decision makers
within the University system
• Turning disparate data into actionable information
• DSI portal provide ‘one stop shop’ for key data
• Prepackaged data set available for users
• Role-based access
• Data provisioned through Denodo Platform
• http://dsi.iu.edu
29
The Benefits of Data Virtualization
32
Complete enterprise information, combining Web, cloud, streaming, and structured data
ROI realization within 6 months, with the flexibility to adjust to unforeseen changes
An 80% reduction in integration costs, in terms of resources and technology
Real-time integration and data access, enabling faster business decisions
“Get it Real-time and Get it Fast!”
Next steps
Download Denodo Express: www.denodoexpress.com
Access Denodo Platform on AWS:
www.denodo.com/en/denodo-platform/denodo-platform-for-aws
Thank you!
© Copyright Denodo Technologies and Daman, Inc. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies and Daman, Inc