red hat jboss data virtualization bill kemp sr. solutions architect
DESCRIPTION
Red Hat is… Today the collaboration between Red Hat and SAP continues. Engineers from both companies are working towards a common target — enhancing the interoperability of JBoss Enterprise middleware with the existing SAP landscape. Specifically, Red Hat and SAP are collaborating on development efforts for tools that are designed to simplify the integration of SAP data and business processes with other enterprise data and applications. The aim of such integration, of course, is a more intelligent enterprise — one that can maximize the value of your data assets in accelerating business decisions. “By running tests and executing numerous examples for specific teams, we were able to prove […] not only would the solution work, but it will perform better & at a fraction of the costs.” MICHAEL BLAKE, Director, Systems & ArchitectureTRANSCRIPT
RED HAT JBOSS DATA VIRTUALIZATION Bill Kemp Sr. Solutions
Architect
August 28, 2014 Red Hat is Today the collaboration between Red Hat
and SAP continues. Engineers from both companies are working
towards a common target enhancing the interoperability of JBoss
Enterprise middleware with the existing SAP landscape.Specifically,
Red Hat and SAP are collaborating on development efforts for tools
that are designed to simplify the integration of SAP data and
business processes with other enterprise data and applications. The
aim of such integration, of course, is a more intelligent
enterprise one that can maximize the value of your data assets in
accelerating business decisions. By running tests and executing
numerous examples for specific teams, we were able to prove [] not
only would the solution work, but it will perform better & at a
fraction of the costs. MICHAEL BLAKE, Director, Systems &
Architecture Innovate faster, in a smarter way
A family of a lightweight, enterprise-grade productsthat are ideal
for open hybrid cloud environments. Red Hat JBoss Middleware
User Interaction JBoss Portal Business Process Management JBoss
BRMS JBoss BPM Suite Development Toolsh Application Integration
JBoss A-MQ JBoss Fuse JBoss Fuse Service Works Management Tools
Data Integration JBoss Developer Studio JBoss Data Virtualization
JBoss Operations Network Foundation JBoss EAP JBoss Web Server
JBoss Data Grid ACCELERATEINTEGRATEAUTOMATE Agenda Business Problem
Product Overview Customer Stories Competition
Prospecting Guidance Pricing & Promotions Business Challenges
Data Driven Economy Data is becoming the new raw material of
business: an economic input almost on a par with capital and labor.
Every day I wake up and ask, how can I flow data better, manage
data better, analyze data better? CIO - Wal-Mart Data Challenges
Getting Bigger Big Data, Cloud, and Mobile
Existing Data Integration approaches are not sufficient Extracting
and moving data adds latency and cost Every project solves data
access and integration in a different way Solutions are tightly
coupled to data sources Poor flexibility and agility Constant
Change BI Reports Operational Reports Enterprise Applications SOA
Applications Mobile Applications Integration Complexity How to
align? Siloed & Complex Hadoop NoSQL Cloud Apps Data Warehouse
& Databases Mainframe XML, CSV & Excel Files Enterprise
Apps Business Objective Turn Data into Actionable Information
Only 28% Users have any meaningful data access Reduce costs for
finding and accessing highly fragmented data Improve time to market
for new products and services by simplifying data access and
integration Deliver IT solution agility necessary to capitalize on
constantly changing market conditions Transform fragmented data
into actionable information that delivers competitive advantage
Over 70% BI project efforts lies in the integration of source data
Technology Overview What does Data Virtualization software do
What does Data Virtualization software do? Turn Fragmented Data
into Actionable Information DATA CONSUMERS Data Virtualization
software virtuallyunifies data spread across variousdisparate
sources; and makes it available to applications as a
singleconsolidated data source. The data virtualization
softwareimplements 3 steps process to bridge data sources and data
consumers: Connect: Fast access to data fromdiverse data sources
Compose: Easily create unifiedvirtual data models and views
bycombining and transforming datafrom multiple sources. Consume:
Expose consistentinformationto data consumers inthe right form thru
standard dataaccess methods. Easy, Real-time Information Access BI
Reports SOA Applications Data Virtualization Software Consume
Compose Connect Virtual Consolidated Data Source Virtualize
Abstract Federate Easy data accessibility thru standard interfaces
e.g SQL, Web Services etc. Exposes non-relational sources as
relational Read and write data in place Real time access No data
replication/duplication required So lets define what are the
attributes of Data Virtualization solution.The first thing that
data virtualization product does is virtualizes the data,
regardless of where it is. It makes the data look as if it was in
one place. So applications dont need to know where the data is,
because the data virtualization software does that for you. The
second thing that data virtualization does is federating the data.
Youre running a query which spans multiple databases or data
warehouses. You want that query to run sufficiently and with
optimum performance. So in order to do that, you need a variety of
techniques, like caching, like pushdown optimization, you need to
have knowledge of the source databases to make this whole
environment run as smoothly and efficiently as possible. Thirdly,
it abstracts the data into the format of choice. It conforms the
data so that its in a consistent format, and thats regardless of
the native structure or syntax of the data. And one point I should
make here is that you want to be able to you dont want a tool which
will force you to have a particular format. What you want is a
format that suits your business, rather than one which is imposed
on you. So you need to have, the data virtualization tool itself
needs to be agile and flexible, in the sense of being able to
provide a data format that suits you. And then the fourth thing you
have a requirement for is to present the data in a consistent
fashion. And it doesnt matter whether its a business intelligence
application, its a mash-up, its a regular application; whatever it
is, you want to be able to present the data in a consistent format
to the business, to participating applications. Imagine if all the
up-to-date data you need to take informed action, is available to
you on demand as one unified source.This is the capability provided
by Data Virtualization software. Siloed & Complex Oracle DW SAP
XML, CSV & Excel files Salesforce.com DATA SOURCES Turn Siloed
Data into Actionable Information
Mobile Applications BI Reports & Analytics ESB, ETL SOA
Applications & Portals Consumers Data Easy, Real-time
Information Access Consume Design Tools Standard based Data
Provisioning JDBC, ODBC, SOAP, REST, OData Data Virtualization
JBoss Dashboard Compose Optimization Virtualize Transform Federate
Unified Virtual Database / Common Data Model Data Transformations
Caching Connect Security The data virtualization software provides
3 step process to connect data sources and data consumers: Connect:
Fast Access to data from disparate systems (databases, files,
services, applications, etc.)with disparate access method and
storage models. Compose: Easily create reusable, unified common
data model and virtual data views by combining and transforming
data from multiple sources. Consume: Seamlessly exposing unified,
virtual data model and views available in real-time through a
variety of open standards data access methods to support different
tools and applications. JBoss Data Virtualization software
implements all three steps internally while isolating/hiding
complexity of data access methods, transformation and data merge
logic details from information consumers. This enables organization
to acquire actionable, unified information when they want it and
the way they want it; i.e. at the business speed. Native Data
Connectivity Metadata Sources Data Siloed & Complex Data
Warehouse & Databases XML, CSV & Excel Files Enterprise
Apps Hadoop NoSQL Cloud Apps Mainframe Consider... How would your
organization change
Inconsistent, Incomplete Information Uninformed, Delayed Decisions
Costly Business Risk and Exposure How would your organization
change If data were readily reusable in place rather thanrequiring
significant effort to build new intermediary datatiers? If data
could be repurposed quicklyinto new applicationsand business
processes? If all applications and business processes could get all
ofthe information needed in the form needed, whereneeded and when
needed? JBoss Data Virtualization Use Cases
Self-Service Business Intelligence The virtual, reusable data model
provides business-friendly representation of data, allowing the
user to interact with their data without having to know the
complexities of their database or where the data is stored and
allowing multiple BI tools to acquire data from centralized data
layer.Gain better insights from Big Data using JBoss Data
Virtualization to integrate with existing information sources. 360
Unified View Deliver a complete view of master & transactional
data in real-time.The virtual data layer serves as a unified,
enterprise-wide view of business information that improves users
ability to understand and leverage enterprise data. Agile SOA Data
Services A data virtualization layer deliver the missing data
services layer to SOAapplications.JBoss Data Virtualization
increases agility and loose coupling with virtual data stores
without the need to touch underlying sources and creation of data
services that encapsulate the data access logic and allowing
multiple business service to acquire data from centralized data
layer. Regulatory Compliance Data Virtualization layer deliver the
data firewall functionality.JBoss Data Virtualization improves data
quality via centralized access control, robust security
infrastructure and reduction in physical copies of data thus
reducing risk. Furthermore, the metadata repository catalogs
enterprise data locations and the relationships between the data in
various data stores, enabling transparency and visibility. Enable
Self-Service Business Intelligence Shared, Reusable Logic =
Lighter, Faster Client Development Microsoft Cognos Microsoft
Cognos BI Tool Centric Non-sharable & Duplicated Presentation
Logic KPI Calculations Semantic Data Model Data Security Policy
Data Transformation Logic Data Integration Logic Data Access Logic
BI Tool Centric Non-sharable & Duplicated Presentation Logic
KPI Calculations Semantic Data Model Data Security Policy Data
Transformation Logic Data Integration Logic Data Access Logic
Presentation Logic Presentation Logic JBoss Data Virtualization
Shared & Reusable KPI Calculations Semantic Data Model Data
Security Policy Data Transformation Logic Data Integration Logic
Data Access Logic Database Data Warehouse ERP App Cloud App
Database Data Warehouse ERP App Cloud App DB DB DB DB DB DB 360
Unified View Complete View of Master and Transactional Data in
Real-time
BI Reports CRM Apps Portal JBoss Data Virtualization Shared &
Reusable Unified Customer View Unified Product View Unified
xBusiness View Master Data Management Hub Data Repository Workflow
Enterprise Apps DB DB DB Operational Data Sources Agile SOA Data
Services Shared, Reusable Logic = Lighter, Faster Service
Development
Web Service Web Service Web Services Web Services Non-sharable
& Duplicated BusinessLogic Semantic Data Model Data Security
Policy Data Transformation Logic Data Integration Logic Data Access
Logic Non-sharable & Duplicated BusinessLogic Semantic Data
Model Data Security Policy Data Transformation Logic Data
Integration Logic Data Access Logic Business Logic BusinessLogic
JBoss Data Virtualization Shared & Reusable Semantic Data Model
Data Security Policy Data Transformation Logic Data Integration
Logic Data Access Logic Database Data Warehouse ERP App Cloud App
Database Data Warehouse ERP App Cloud App DB DB DB DB DB DB JBoss
Data Virtualization Key Business Values
Increase ROA Improved utilization of data assets Derive more value
from existing investments Complements existing systems Boost
Agility Better/faster than hand coding Faster, less costly than
batch data movement Data virtualization provides loose coupling
Improve Productivity Right data at the right time to the right
people Decision support, BI with a complete view of information
Better Information Control Powerful security, Auditing, Data
Firewall Avoid data silo proliferation Central data access and
policy, Compliance JBoss Data Virtualization Key
Differentiators
Lowest TCO Cost leadership lower adoption barrier Core based
subscription provide flexibility across small to large deployment
Openness Open, community based innovation No vendor lock-in Cloud
Ready Private, public and hybrid cloud deployments Comprehensive
Integrated with JBoss Middleware portfolio for end-to-end business
solution Single vendor support simplify IT operations Performance
Fast query processing optimizations, low footprint Comprehensive
data provisioning options Quick data visualization through business
dashboard Customer Success Self-Service BI and Hybrid data
integration use case
Global Biotech Company Self-Service Data for Self-Service Business
Intelligence Situation/Needs Needed to integrate cloud application
data (salesforce.com) withon-premise, real-time data (role mgmt,
territory mgmt andauthentication systems) for operational reporting
and monitoring Need to ensure HIPAA compliance Need to support
multiple BI tools Solution Used Data Virtualization to provide
unified interface to data tomultiple BI tools Virtual views isolate
BI applications from changes in the sourcedata systems Single point
of data access ensured security policy enforcementand HIPAA
compliance Benefits Enabled business users to use the BI tools of
choice while ITensured better control of information Rapid
development cycle thru the use of common data models Sensitive data
is protect to ensure strict compliance requirements Portal Spotfire
Business Objects Crystal Reports JBoss Data Virtualization Consume
Compose Connect Web Service JNDI JDBC Role Membership LDAP Server
Cloud CRM Navigator Security Regional Bank Single View of Loans
Processing
Unified 360* view use case Regional Bank Single View of Loans
Processing Situation / Needs: Thousands of loans in process
Management seeks visibility and control, while loanoperations needs
to speed up funding steps Loan data spread across many
databases/systems Solution: Consolidate all data into virtual data
mart Transformation of data differences Provide real-time data
access to management portal andloan workflow system Benefits:
Management get timely information on funding needs,exposure and
operating metrics Loan officers received all the information to
process theloan faster Sensitive data is protected
ManagementReporting Loan Processing Workflow Mgmt. Web Services
JBoss Data Virtualization Consume Compose Connect Web Services Loan
Origination & Approvals Risk Analysis Loan Funding Large US
Bank VISA Data Security & Governance
Data firewall use case Large US Bank VISA Data Security &
Governance Situation / Needs: VISA PCI mandates protection of
cardholder info Cant maintain common security policy acrossmultiple
data stores Solution: Create data firewall across many data sources
Federate rather than replicate Common access policy across all
sources Common data definitions Audit trail Benefits: One set of
data security policies Can prove to regulators that data is
protected Web Portal JBoss Data Virtualization Consume Compose
Connect Data Sources Multinational Insurance Company SOA Data
Services Layer
Agile SOA Data Services use case Multinational Insurance Company
SOA Data Services Layer Situation/Needs: Deploying SOA reference
architecture Want common data model across all sources Dont want
tightly bound physical data sources Change data sources without
breaking apps/services Solution: All data is access via data
services Data Virtualization provides abstraction and logicaldata
model for enterprise Expose data as Web services and SQL Benefits:
All applications will get the same data through useof common model
Easier to expose data to new applications Easier to make changes to
data sources SOA Applications SOA/ESB JBoss Data Virtualization
Consume Compose Connect Data Sources Gain Better Insight from Big
Data Intelligent Inventory Management
Big Data integration use case Gain Better Insight from Big Data
Intelligent Inventory Management Objective: Right merchandise, at
right time and price Problem: Cannot utilize social data and
sentimentanalysis with their inventory and purchasemanagement
system Solution: Leverage JBoss Data Virtualization tomashup
Sentiment analysis data withinventory and purchasing system data.
Leveraged BRMS to optimize pricing andstocking decisions.
Analytical Apps JBoss BRMS Data Driven Decision Management JBoss
Data Virtualization Consume Compose Connect Hive Purchase Mgmt
Application Inventory Databases Sentiment Analysis Better Together
- Big Data and Data Virtualization Big Data is not another Silo -
Customers Combine Multiple Technologies Combine structured and
unstructured analysis Augment data warehouse with additional
external sources, suchas social media Combine high velocity and
historical analysis Analyze and react to data in motion; adjust
models with deephistorical analysis Reuse structured data for
analysis Experimentation and ad-hoc analysis with structured data
Better Together - Big Data and Data Virtualization Capture, Process
and Integrate Data Volume, Velocity, Variety BI Analytics
(historical, operational, predictive) SOA Composite Applications
Integrate & Analyze Data Integration JBoss Data Virtualization
In-memory Cache JBoss Data Grid Red Hat Enterprise Linux &
Virtualization Red Hat Storage Capture & Process Messaging and
Event Processing JBoss A-MQ and JBoss BRMS J Hadoop Structured Data
Streaming Data Semi-Structured Data Product Details JBoss Data
Virtualization: Supported Data Sources
Enterprise RDBMS: Oracle IBM DB2 Microsoft SQL Server Sybase ASE
MySQL PostgreSQL Ingres Enterprise EDW: Teradata Netezza Greenplum
Hadoop: Apache HortonWorks Cloudera More coming Office
Productivity: Microsoft Excel Microsoft Access Google Spreadsheets
Specialty Data Sources: ModeShape Repository Mondrian MetaMatrix
LDAP NoSQL: JBoss Data Grid MongoDB Enterprise & Cloud
Applications: Salesforce.com SAP Technology Connectors: Flat Files,
XML Files, XML over HTTP SOAP Web Services REST Web Services OData
Services Key New Features and Capabilities
Data connectivity enhancements Hadoop Integration (Hive Big Data),
NoSQL (MongoDB Tech Preview) and JBoss Data Grid Odata support (SAP
integration) Developer Productivity improvements New VDB Designer 8
and integration with JBoss Developer Studio v7 Enhanced column
level security, VDB import/reuse, and native queries Simplify
deployment and packaging Requires JBoss EAP only; included with
subscription Remove dependency with SOA Platform Business Dashboard
New rapid data reporting/visualization capability Business
Dashboard Quickly Visualize your Data OData Support OData (OASIS
Open Data Protocol)
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=odata
Objective: OASIS OData TC works to simplify the querying and
sharing of dataacross disparate applications and multiple
stakeholders for re-use in theenterprise, Cloud, and mobile
devices. A REST-based protocol, OData builds onHTTP, AtomPub, and
JSON using URIs to address and access data feed resources.It
enables information to be accessed from a variety of sources
including (but notlimited to) relational databases, file systems,
content management systems, andtraditional Web sites. Data Services
v6 supports Odata in two ways: Connect to and access Odata sources
Act as an Odata server to client applications Data Virtualization
Designer Model Driven Development
Eclipse-based graphical tool for modeling, analyzing, Integrating,
resolving semantic differences and testing multiple data sources to
produce Relational, XML and Web Service Views that expose your
business data without any programming. Shows structural
transformations and dependencies Defines transformations 33
(Modeshape + Infinispan)
Metadata Repository & Governance S-RAMP: SOA Repository
Artifact Model & Protocol OASIS specification that defines: a
common data model for repositories an interaction protocol to
facilitate the useof common tooling and sharing data. S-RAMP
repository capabilities: Store and retrieve content and metadata
Classification of artifacts (e.g. XSD, WSDL,VDB, ...) Clients
interact via ATOM/REST XPath2 based query language Integration with
Maven ATOM Binding (REST) Core Model Documents Derived Models (Read
Only) JCR Storage (Modeshape + Infinispan) Semantic Mediation &
Integration
Business Intelligence Applications Search Applications Web Services
Application views of informationn: Relational, XML XML Document T T
T Claims, Billing, Policies, Semantic Data Services Data
Dictionary: Based on logical data model or XML schema Support for
multiple COIs Support for multiple versions bldg_id SITENUM
Facility_ID Location_ID bldg_type Depot_Number Location_Type T T T
Authoritative Sources: Mapped to logical view Multiple
Internal/External Information Sources JBoss Data Virtualization
Logical Architecture
Data Consumers Data Sources JBoss Data Virtualization System
Flow
Tooling VirtualDB Engine Server Imported from data sources Supplied
via DDL Provided by Engine
Tooling VirtualDB Engine Server Users create data models based on
metadata: Imported from data sources Supplied via DDL Provided by
Engine Specified by user Models are packaged in a Virtual Database
(VDB) Connector Binding Properties
Tooling VirtualDB Engine Server VDB Internals Virtual Databases
(VDBs) are deployment archives similar to .WAR. VDBs contain Source
metadata and models View metadata and models System metadata
Connection information, which is bound to sources at deployment
time VDBs are deployed to the query engine Source Models View
Models Manifesto Info Connector Binding Properties Tooling
VirtualDB Engine Server Data Consumer Apps Query Engine JDBC API
Query Engine is core data virtualization functionality:Federating
relational query engine.Rule and cost based optimizer, advanced
query planner, caching, hint processing. Query Engine hosts VDBs,
binds to data sources, performs query execution and results
processing. VDB C1 C2 Connector Binding (1) Connector Binding (2)
DB Oracle DB SQL Server Admin Socket Transport
Tooling VirtualDB Engine Server JBoss EAP Applications Security
JAAS TransactionManager JDV Runtime Engine BufferMgr Threading
Local Caches etc. VDB VDBs ODBC Socket Transport Admin Socket
Transport JDBC Socket Transport Profile Service ODBC JDBC Admin /
AdminShell RHQ DS JCA Translators Embedded DS xxx-ds.xml yyy-ds.xml
zzz-ds.xml The server runtime environment is JBoss EAP. The Teiid
Query engine is hosted in JBoss EAP and uses key container-provided
services: Transaction manager JAAS security framework Container
managed data sources EAP management infrastructure EAP deployment
The Server exposes views /services to consumers and managed
connections and connection pools for data sources. Rich Security
Capabilities
Multiple forms of Authentication: Client Authentication:
LoginModules (File, LDAP); Kerberos (JDBC/ODBC);HTTP Basic, WS
UsernameToken Profile (Web Services) PassThrough Authentication
Source Authentication: Source credentials, Caller Identity (same
credentialsas client), RoleBasedCredentialMap (credentials per
role), Executionpayload/Custom Authorization: Create, Read, Update,
Delete, Execute permissions Row-based security Column masking
Additional Security: Transport encryption (SSL: Anon, 1-way, 2-way)
Password encryption Transactions Support All scopes are handled by
JBoss Transactions JTA
Three scopes Global (through XAResource) Local (autocommit = false)
Command (autocommit = true) Command scope behavior is handled
throughtxnAutoWrap={ON|OFF|DETECT} Isolation level is set on a per
connector basis. Customization & Extensibility
Many forms of customization available: Extended
connectors/translators New connectors/translators User-defined
functions Custom logging Administrative API XML-based virtual
database, DDL support Custom metadata injection Embeddable engine
Performance Optimization Load Handling
Memory Usage the BufferManager acts as a memorymanager for batches
(with passivation) to ensure thatmemory will not be exhausted.
Non-blocking source queries rather than waiting forsource query
results processor thread detach from theplan and pick up a plan
that has work. Time slicing plans produce batches for a time
slicebefore re-queuing and allowing their thread to do otherwork
(preemptive control only between batches) Caching ResultSets,
processing plans, internalmaterialized views, etc. Performance
Optimization Caching & Materialized View
Virtual Table T Source Table Materialized Table Oracle SQL Server
Files XML, Text etc. Result set Cache Cached? In-coming Query
Results Save? No Yes Materialization Support Virtual Database JBoss
Data Virtualization Server Multiple levels of caching to meet
performance requirements and manage load on source systems
Materialized Views External or Internal materialized views Ability
to override use of materialized views Result set Caching Applied to
results return from user queries and virtual procedure calls
Configurable time to live and max. number of entries Code Table
Caching Suited for integrating reference data with
transaction/operational data e.g. Country code, State Code etc.
Caching hints to set time-to-live, memory preference, and
updatability Performance Optimization Query
Access Patterns criteria requirements on pushdown queries Pushdown
decompose user query into source queries Projection minimization to
remove unused select items Decompose aggregates over joins/unions
Generating SQL matching Teiid system functions Dependent Joins (can
use hints) feed equi-join values from oneside of the join to the
other Partition aware aggregation and joins Optional Join (can use
hints) removes an unused join child Multi-source models allows for
multiple homogeneous schemas tobe used through the same model. Copy
Criteria uses criteria transitivity to minimize join tuples.
Performance Optimization Query Planning
Distinct phases: parse, resolve, validation, rewrite,optimization,
process plan creation. Rewrite canonicalizes and simplifies. The
optimization phase follows with rules/hints/costing Non-federated
optimization is similar to mature RDBMS Optimizer plan structure is
a flexible tree - distinct fromthe command form and processing
plans. Planning is typically quick and deterministic preparedplans
are recommended Thank You Q&A Additional Position Slides
Integration Technologies Integration Technologies When to use
What?
Data Virtualization Real Time Service Oriented (ESB) Extract,
Transform, Load (ETL) Responsiveness Batch Data Integration Style
Process SOA-Centric Integration
Data Virtualization Complements SOA-Centric Integration (ESB) Our
key message is that soa-centric approaches to implementing data
integration/synchronization require large amounts of
service/workflow development and result in solutions with lots of
moving parts which can benefit from a model-based data
virtualization technology that requires no data integration coding
SOA-Centric Integration Data Virtualization Multi-step process or
workflow development using graphical tooling Real-time
transactional access to data across multiple heterogeneous data
sources for operational data needs Data is treated as a special
type of step that typically contains a SQL statement to execute
against a source Specialized,graphical tooling for easy mapping
between different models of data Resulting approach is static and
cannot be queried On demand, query-able access and update of
real-time up-to-date data Relational or XML data only Any data
source Data Virtualization Complements Extract, Transform, Load
(ETL)
Our key message is that most operational data consumption problems
cannot be solved with a data warehouse but instead require specific
tooling and technology focusing on model-based data consumption,
integration, and exchange ETL Data Virtualization Bulk / batch data
operations for data consolidation, reporting and analysis Real-time
bi-directional access to data across multiple heterogeneous data
sources for operational and analytical data needs Involves
periodically moving / copying / consolidating large amounts of data
No moving or copying of data required finer grained operational
data sets No on-demand access to real-time data On demand access
and update of real-time up-to-date data Limited data sources only
(relational, structured files) Any data source Additional Position
Slides Top 10 Ways Data Virtualization enablesAgile business
intelligence development #1 Data Flattening- Simplified
Tables
The table structures implemented in a data store might be complex
to access for the data consumers. This leads to complex queries for
retrieving data and that complicates application development. Data
virtualization could present a simpler and more appropriate table
structure, simplifying application development and maintenance.
Every data consumer can benefit from those simplified table
structures. #2 Tools Agnostic Common Data Model
Jaspersoft Cognos Business Object Microsoft Data Consumers
Reusable, Common, SemanticData Model JBoss Data Virtualization
Virtual DB Data Virtualization provides a unified semantic layer.
So what that means is that it doesnt matter what BI tool youre
using. I mean, the fact is that most large organizations have
multiple BI tools. In theory, it might be a good idea if they
standardized on a single one, but in practice thats probably not
gonna happen. What the data virtualization layer allows you to do
is to have a single interface for it, which supports all of those
BI Tools. And you shouldnt have to change the way that you have a
query running in Cognos or Business Objects, or whatever tool you
happen to use. You should be able to run exactly as it runs now,
hit the data virtualization layer, and that will provide the data
for you. Data Sources #3 Centralized Data Transformation
Report 1 Report 2 Report 3 Report 4 Data Consumers JBoss Data
Virtualization Format consistency (123) 123/456/7890 123,456,7890
[123] Particular data values in a data store might have formats
that arent suitable for some data consumers. Imagine that most data
consumers want to process telephone number values as pure digits
and not in the form in which the area code is separated from the
subscriber number by a dash. A data virtualization server could
implement this transformation and all the data consumers will use
it. Data Sources #4 Centralized Business KPIs & Metrics
Calculations
BI App 1 BI App 2 BI App 3 BI App 4 Data Consumers Net Profit
Operating Margin Net Sales JBoss Data Virtualization Similarly, if
multiple data consumers have to access multiple data stores, each
and every data consumer has to include code that is responsible for
calculating business matrices and uses different calculation rules
on data from those data stores. The consequence is a lot of
variation of business matrices formulas and calculation methods. A
data virtualization server centralizes key business metrices
calculation code and all data consumers will share that code. Data
Sources #5 Centralize Data Integration
BI App 1 BI App 2 BI App 3 BI App 4 Data Consumers Virtual Customer
Master Virtual Master Data Virtual Product Master JBoss Data
Virtualization If multiple data consumers have to access multiple
data stores, each and every data consumer has to include code that
is responsible for integrating those data stores. The consequence
is a lot of replication of data integration solutions. A data
virtualization server centralizes integration code and all data
consumers will share that integration code. Data Sources #6
Ubiquitous Data Consumption
BI App 1 BI App 2 BI App 3 BI App 4 Data Consumers JDBC, ODBC,
SOAP, REST, XML, JMS, POJO, Hibernate JBoss Data Virtualization
Standard based Provisioning Different data stores might be using
different storage formats. For example, some of the data might be
stored in a SQL database, some in Excel spreadsheets, some in index
sequential files, some in databases supporting other database
languages than SQL, some in XML documents, and some of the data
might even be hidden in HTML-based webpages. A data virtualization
server can offer one unified API and database language to access
all these different storage formats, therefore simplifying data
access for the data consumers. They will only have to support one
language and API. Data Sources #7 Optimized Data Access
Federating relational query engine. Rule and cost based optimizer,
advanced query planner Multi-level caching Pushdown Queries When
data from multiple data stores is joined, a performance question is
where and how the join is processed: is all the data first shipped
to the data consumer, will the latter process the join, or should
the data from one data store be shipped to another, and will that
other data store process the join? Other processing strategies
exist. A developer should not be concerned with such an issue.
Therefore, this is a task taken over by a data virtualization
server. #8 No Data Latency Virtual Table
select e.title, e.lastname from Employees as e JOIN Departments as
d ON e.dept_id = d.dept_id where year(e.birthday) >= 1970 and
d.dept_name = 'Engineering' A data virtualization product can
integrate data live. So, when a data consumer queries data, only
then is data from the data stores retrieved and integrated. Compare
this to ETL solutions which integrate data in a more scheduled
fashion. The result of an ETL integration process has to be stored
before it can be used for reporting. Live data integration is
called on-demand data integration whereas ETL delivers scheduled
data integration. Data Source(s) #9 Minimize Need for Data
Replication and Duplication
Activities required to setup a physicalvs. virtual data mart Define
Data Structure Define ETL Logic Prepare HW Server Install and
Configure RDBMS Create Database Physical DB Design and Tuning Load
Tables and Setup Batch Updates Require DBA, Developer to maintain
and manage VS. Design Data Structure Define Mappings Define Virtual
Tables Enable Caching (if need) #10 Centralize Security Data
Sanitization Column level masking
Access and audit control Centralize compliance policies Appendix
Large Investment Bank Dashboard Derivatives Trading
BI Use Case Large Investment Bank Dashboard Derivatives Trading
Situation / Need: Monitor derivatives security trades to prevent
roguetrades and financial loss Trading data spread across many
databases/systems Solution: Consolidate all trading data into
single view Real-time access Transformation of data differences
Benefits: Prevent financial loss Saved time and cost to develop
application Easier to manage data changes Dashboard Custom App
JBoss Data Virtualization Consume Compose Connect Data Sources
Large Financial Services Institution Single View of Customer
Unified 360* view use case Unified 360 View Use Case Large
Financial Services Institution Single View of Customer Situation /
Needs: 600 different brokerage offices 600 databases Cant access
account information from other offices Cant manage customer only
individual accounts Solution: Enable a CRM application to find
customerinformation with single query across all databases
Real-time access Benefits: Better manage customer Simpler/faster
application development Brokerage CRM App JBoss Data Virtualization
Consume Compose Connect 600 geographically dispersed DBs
Competitive Landscape
Platform Competitors IBM(InfoSphere Federation Server) Oracle
(Oracle Data Service Integrator) Strengths: Comprehensive offerings
but require multiple SKUs Weakness (Exploit/Attack): Extremely
expensive Complexity requires lots of services Proprietary
Competitive Landscape Only Open Source Data Virtualization: Lowest
TCO for broad adoption: compared to competing solutions; especially
as customers are looking for ways to reduce spending. Lower
business risk: due to open, community-based technology.No vendor
lock-in. Out performs competitive solutions: faster query
performance, most comprehensive data provisioning options, and
simple data vistualization thru dashboard Comprehensive Solution:
JBoss Data Virtualization is fully integrated and certified with
the JBoss stack. It is part of a more comprehensive offering than
those from pure-play vendors, providing shorter time to value.
Pure-Play Competitors Informatica (Power Center Data Virtualization
Edition) Strengths: Integrated ETL and Data Virtualization offering
Integrated Data Quality support Data Integration leadership
Weakness (Exploit/Attack): Always push ETL first Extremely
expensive Proprietary Composite Software (Composite Information
Server) Denodo (Denodo Platform) Easy to use tools Broad
Connectivity Performance Lack of comprehensive platform Weak data
provisioning support TCO for Mass Adoption: Lower TCO and pricing
compared to competing solutions; especially as customers are
looking for ways to reduce spending. Core-based subscriptions are
easy to understand and provide flexibility across small to large
deployments. Lower business risk due to open, community-based
technology: No vendor lock-in. Note that many government
organizations have a stated preference for open source products.
Out performs competitive solutions: faster query performance, more
provisioning options simplifies data consumption, and dashboard
helps data reporting and visualization. Comprehensive Solution:
JBoss Data Virtualization is fully integrated and certified with
the JBoss stack. It is part of a more comprehensive offering than
those from pure-play vendors, providing shorter time to value.
Model Driven Development Data Virtualization Designer
Logical Models representing virtual, unified data views Shows
structuraltransformations anddependencies Definestransformations
with Selects Joins Criteria Functions Unions User Defined Physical
Models representing actual data sources Eclipse-based graphical
modeling tool for modeling, analyzing, integrating and testing
multiple data sources to produce Relational, XML and Web Service
Views that expose your business data without programming. 69
Business Dashboard Quickly Visualize your Data JBoss Data
Virtualization
Lean Virtual Data Integration Comprehensive data federation,
integration, transformation andprovisioning through the creation of
reusable virtual logical datamodels that are easily consumable thru
standard based SQL (JDBC,ODBC, Hibernate) and Web Services (REST,
SOAP, Odata) interfaces. Model Driven Development Eclipse-based
graphical tool, lets you map and transform data fromsources to
target formats, as well as resolve semantic differences,create
virtual data structures at a physical or logical level, and
usedeclarative interfaces that are compatible with and optimized
for yourapplications. Universal Connectivity with Big Data and
Cloud Support for Hadoop, NoSQL, and SaaS data integration along
with allmajor enterprise RDBMS, Data Warehouses and files (XML,
CSV, Excel)and strong extensibility support for custom
connectors.