data virtualization, data federation & iaas with jboss teiid
DESCRIPTION
Enterprise have always grappled with the problem of information silos that needed to be merged using multiple data warehouses(DWs) and business intelligence(BI) tools so that enterprises could mine this disparate data for businessdecisions and strategy. Traditionally this data integration was done with ETL by consolidating multiple DBMS into a single data storage facility. Data virtualization enables abstraction, transformation, federation, and delivery of data taken from variety of heterogeneous data sources as if it is a single virtual data source without the need to physically copy the data for integration. It allows consuming applications or users to access data from these various sources via a request to a single access point and delivers information-as-a-service (IaaS). In this presentation, we will explore what data virtualization is and how it differs from the traditional data integration architecture. We’ll also look at validating the data virtualization and federation concepts by working through an example(see videos at the GitHub repo) to federate data across 2 heterogeneous data sources; mySQL and MongoDB using the JBoss Teiid data virtualization platform.TRANSCRIPT
DATA VIRTUALIZATION&INFORMATION AS A SERVICE (IAAS)
By Anil Allewar
Senior Solutions Architect - Synerzip
1
About Me!!2
Anil Allewar
Senior Solutions Architect @ Synerzip
Technology Evangelist & speaker
Core interests: JEE, EAI, EII
• Use cases
Agenda3
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Why it makes sense?4
Use Cases
Data Warehous
e
ETL
Financial
Data
OLTP Data
ETL
3rd Party Data
Data Mart
ETL
Web Servic
e 1
Web Servic
e 2
Legacy Data
Custom
Program
Excel files
5
Traditional Data Integration6
Enterprise Information System
ETL
Source System
Source System
ETL
Business Applications
Problems with ETL 7
More than 1 copy of data for staging
Intermediate data => Errors
Lead time to add new source
Domain knowledge for mapping
Batch Process => No real time data
Problems with DBMS consolidation8
Alternate approach => Single EIS (say
RDBMS)
Extensive changes to existing apps
Might not satisfy everyone’s
requirements
• Use cases
Agenda9
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Data Virtualization & Federation10
Single API to access data
Only metadata stored at
virtualization layerReal time access without
copying/moving data Federate data
across hetero/homogenou
s sources
Data Virtualization11
• Use cases
Agenda12
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Architecture13
UserApplicati
on
Com
mon A
ccess
API
Connector 1
Connector 2
RUNTIME & QUERY ENGINE
VirtualDatabase
Translator 1
Translator 2
• Use cases
Agenda14
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Vendors15
Commercial Products Composite Software
http://www.compositesw.com/data-virtualization/
Denodo http://www.denodo.com/en/product/overview.php?n=h
IBM http://www-03.ibm.com/software/products/en/ibminfofedeserv
Informatica http://www.informatica.com/us/data-virtualization/
Red Hat http://www.redhat.com/products/jbossenterprisemiddleware/data-virtualization/
Open Source Jboss Teiid
http://teiid.jboss.org/
Selected Platform – JBoss Teiid16
Open Source
Number of relational/NoSQL/ERP/CRM data
stores
JEE standards
Add custom EIS support using
JEE components
Active & responsive community
Synerzip contribution: Defect discovery, root cause analysis, feature
verification
Teiid Components17
Virtual Database container for components used to integrate data
from multiple data sources Source Models
structure and characteristics of physical data sources View Models
structure and characteristics of abstract structures you want to expose to your applications
Teiid Designer Eclipse based UI to dynamically discover data
source objects and apply data federation Generate virtual database from 1 or more
sources
Teiid Components18
Translator Provides abstraction later between Teiid Query
Engine and source system Convert Teiid SQL commands to source specific
execution commands Convert result data from source system to Teiid
specific format Resource Adapter
Provides connectivity to the physical data source Integration provided through Java Connector
Architecture (JCA) API
Teiid – Supported EIS
Amazon SimpleDB Apache Accumulo Apache SOLR Cassandra File Google Spreadsheet JPA LDAP Excel – as file SalesForce
JDBC MS access, DB2, derby,
excel-odbc, greenplum, h2 , hive(for accessing Hadoop), oracle, teradata and most RDBMS
MongoDB Object OData OLAP Web Services SAP Netweaver
Gateway
19
Performance Characteristics20
Access same data using Oracle and Teiid drivers
Retrieval times comparable when accessing tables having no Blobs
0
5,000
10,000
15,000
20,000
25,000No. of rows Vs Time: No Blobs
Oracle-JDBCTeiid-JDBC
No. of rows
ms
Performance Characteristics21
Teiid slower when accessing Blob data Can be tuned
0 0 2 42 21,804 32,531 185,4540
5,000
10,000
15,000
20,000
25,000
30,000
No. of rows Vs Time: Blobs
Oracle-JDBCTeiid-JDBCm s
No. of rows
• Use cases
Agenda22
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Demo23
JDBC Clien
t
JDB
CA
PI
RDBMS Resource Adapter
MongoDB Resource Adapter
TEIID RUNTIME & QUERY ENGINE
Federated VDB
mySQL Translat
or
MongoDB
Translator
mySQL
Demo-Steps24
Pre-requisites mySQL server 5.5+ installed MongoDB 2.4.x+ installed
Steps Load the mySql and MongoDB database with sample data Setup environment – JBoss, Eclipse Create Teiid project in Eclipse using Teiid designer
Import source model using JDBC Create the virtual model and federate data from the
source model Create a virtual database (VDB) and deploy to JBoss
Access data using JDBC client or through browser using OData
Demo – Scenario25
Federated
Data
Demo – Connection Profile26
Demo – Source Model27
Demo - Source Model Generation28
Demo – Map Source To View29
Demo - Association30
Demo – Data Federation31
Demo – Source Code32
Source code https://github.com/anilallewar/JBoss-Teiid Contains
Configuration files Instructions “How-to” videos VDBs, source models and view models
Conclusion33
Data Virtualization and Federation is a rapidly emerging technology that solves traditional BI/ETL problems.
It provides lower time to market, distributes data across the enterprise as a service and provides real time access to enterprise data.