oracle data integration with hadoop data integration with hadoop jeff pollock vice president, oracle...
TRANSCRIPT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration with Hadoop
Jeff Pollock Vice President, Oracle Data Integration Product Management and Strategy Madhu Raviendran Nair Marketing Director, Oracle Data Integration
Introducing the Big Data Reservoir
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Too much of a good thing
Oracle Confidential – Internal/Restricted/Highly Restricted
2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
22× 2011-2016
12.5 Billion 2020
1.3 Billion Today
Smart Device Growth Data Production Increase
Datafication is leading to Data Explosion
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Use Data
12%
Executives who feel they understand the impact data
will have on their organizations
Produce Data
The Big Data Paralysis
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Get Fast Answers to New Questions
Create a Data Reservoir
Predict More, More Accurately
Accelerate Data-Driven Action
Big Data Reservoir To Drive Results
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why the word “Reservoir?”
Oracle Company Confidential 6
https://blogs.oracle.com/bigdata/entry/big_data_and_analytic_top
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data, Data Integration and Data Reservoir
Oracle Confidential – Internal/Restricted/Highly Restricted
7
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Business Value of a Reservoir Architecture
Oracle Confidential 8
Lower TCO for the Data
Warehouse
LoB Faster Access to
Analytic Data
New Types of Analytics for
All Data • Control the costs of the Data
Warehouse • Massive value multipliers for
Teradata and Netezza customers
• Put an end to the annual upgrade cycle
• Give analytics to the business earlier in the data lifecycle
• Empower IT to focus the data modeling and report design on highest value analytics
• Run BI queries faster
• Support Exploratory Analytics directly from Hadoop cluster
• Run Streaming Analytics from big data Storm, Flume etc.
• Drive new business solutions (telematics data, machine data, log data, unstructured data)
COST SPEED VALUE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Hadoop Opportunity for Big Data Reservoir
Support for exploratory analytics
without time consuming modeling
Lower cost data staging and
data preparation
Lower cost storage for questionable
Business data.
Oracle Confidential – Internal/Restricted/Highly Restricted 9
Data Flow DW Data Discovery
Data
Preparation
Deep Data Storage
Data staged/merged in Hadoop to provide single place to explore/ discover data External data staging and long running batch jobs run in Hadoop To make the most of DB Store more raw detail data for less Cost while keeping aggregates in the DB.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Differentiation in Data Integration
Oracle Confidential 10
Native Capture Deeply integrated capture from the #1 database with ~50% market share, OGG will be preferred choice
Hadoop Agnostic Generate transformation code into popular Hadoop frameworks/languages using KMs – other ETL vendors must recompile their engine
Real-time Delivery Dominant market share for OGG and battle-hardened robustness
E-LT Engine Dominant market share for ODI capabilities with large scale E-LT use cases on DW translate into battle-hardened robustness for Hadoop E-LT
Differentiated Oracle Data Integration Features
Differentiated Data Integration “Know How” and Core Capabilities
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Reservoir for EDW Continuous Data Delivery and Pushdown ELT Transformations
Oracle Confidential – Internal/Restricted/Highly Restricted 11
Staging Detail
Fast load
Fast load
Data Replication
Data Synchronization
Hadoop Data Transformation
PIG - HiveQL
Sources
Data Reservoir
Sources
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle For Big Data Reservoir Continuous Data Delivery and Pushdown ELT Transformations
Oracle Confidential – Internal/Restricted/Highly Restricted 12
Staging Detail
Fast load
Fast load
Data Replication
Data Synchronization
Hadoop Data Transformation
PIG - HiveQL
Sources
Data Reservoir
Sources
Oracle GoldenGate
Oracle Data Integrator Oracle Data Integrator
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Reference Architecture – Logical View
Oracle Confidential 13
Vir
tua
lisa
tion
&
Qu
ery
Fe
de
ratio
n
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
Information Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Science
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
Structured Data Sources
• Operational Data • COTS Data • Streaming & BAM
Immutable raw data reservoir Raw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to support specific discovery objectives
Project based data stored to facilitate rapid content / presentation delivery
Data Sources
Master & Reference Data Sources
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Can Help Right Now
Oracle Confidential 14
Any Sources
Staging
Prod
Detail
MR
MR
Oracle Data Integrator Oracle GoldenGate
Fast Load
Transformation
#1 – Tools not Spaghetti • “ETL 101” avoid complex, costly custom coding
#2 – Non-invasive Capture and Staging • Move data without inefficient batch extracts
#3 – Processing is Taken to the Data • No separate ETL engine needed • Eliminate unnecessary data movement • Reclaim latency and time from network overhead
#4 –Native Hadoop Execution • Choose the right Hadoop language for your use case
• HiveQL, Pig, Spark, Storm, Java/MR2, etc. • Template driven code gen keeps pace w/change on Hadoop platform
#5 – Native SQL Pushdown • Optimize some join types within the Data Warehouse
#6 – Oracle Optimized • OGG and ODI certified to run on the Oracle Appliances
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15
No more sampling
From 2 weeks to 2 minutes
Complex custom analysis
Dunnhumby
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 16
Customer 360 Vision
CEO driven initiative
Adaptability and time to market
Improving Banking Service Quality
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 17
Cable TV Capture true user preference
Model behavior
Refine marketing
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Myth Busters ETL Workload Offloading versus ETL Technology
Oracle Confidential 18
Dominant Perception 1. Hadoop will replace the Data
Warehouse
2. Hadoop is mainly for Unstructured Data
3. Hadoop is a Data Integration solution
Reality: 1. Hadoop is a supplement to the
Data Warehouse
2. Hadoop is for both Structured and Unstructured Data
3. Hadoop is not a Data Integration Solution ETL workloads are a critical Hadoop use case!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Bridging Big Data and Enterprise Data Oracle Big Data Platform
Data Warehouse Data Reservoir +
Oracle Big Data Connectors
Oracle Data Integrator
Oracle Advanced Analytics
Oracle Database
Oracle Spatial & Graph
Oracle NoSQL Database
Cloudera Hadoop
Oracle R Distribution
Oracle Industry Models
Oracle GoldenGate
Oracle Data Integrator
Oracle Event Processing
Oracle Event Processing
Oracle Data Integrator
Oracle GoldenGate
Oracle Advanced Analytics
Oracle Database
Oracle Spatial & Graph
Oracle Industry Models
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Actionable Events
Event Engine Data Reservoir
Data Factory Enterprise Information Store
Reporting
Discovery Lab
Actionable Information
Actionable Insights
Data Streams
Execution
Innovation
Discovery Output
Events & Data
Data Flow View – Data Factory and Discovery Lab
Structured Enterprise Data
Other Data
Oracle Confidential 20
Embedding Big Data in Corporate DNA
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle for Data Integration with Hadoop
Oracle Confidential 21
Proven Technology
Better Architecture
Best for Oracle
• Unlike custom coding, a tools based approach is proven to result in lower cost long term operations
• Oracle GoldenGate is industry standard for Data Replication
• Oracle invented E-LT Pushdown processing and is 10x more widely deployed than competitors
• Oracle GoldenGate provides the most scalable, native integration for database replication
• Oracle Data Integrator provides ultimate scalability and choice for Hadoop data transformations
• Consistent agent-based architecture avoids having multiple, incompatible engines (eg; INFA and IBM)
• Exadata – OGG and ODI are deeply integrated and are the only Replication and ETL processes certified to run on the appliance
• Big Data Appliance – deeply integrated technology part of core reference architecture
• Big Data Connectors – ODI included with core connector technologies for Hadoop
RISK SCALE COMPLETE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Simplifies Big Data Integration
Open Comprehensive Big Data Platform
Appliance w/Hadoop Cluster
Analytic Tools
DI Tools and Connectors
Heterogeneous & Best of Breed
Differentiated and powerful DI capabilities for Teradata, Netezza, Microsoft, DB2, Sybase..
Faster Time to Value
Flexible configurations
OOTB performance with DI
Unified Mgmt - EM Plug-ins for Appliance and DI Tools
Single Support Contact – Hardware/Software/Networking and ASR
Oracle Company Confidential 22
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential – Internal/Restricted/Highly Restricted 23
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 25