oracle big data governance webcast charts
DESCRIPTION
Data governance for hadoop and big dataTRANSCRIPT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Data Integration and Governance For Big Data
Jeff PollockVice President, Oracle Data Integration & Governance
Madhu Raviendran NairMarketing Director, Oracle Data Integration & Governance
Data Governance for the Big Data Reservoir
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Get Fast Answers to New Questions
Create a Data Reservoir
Predict More,More Accurately
AccelerateData-Driven Action
Big Data Reservoir Drives Big ResultsBusiness Drivers for Big Data Initiatives
Oracle Big Data Governance 2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle For Big Data ReservoirOracle Data Integration Provides the Architectural Components
Oracle Big Data Governance 3
Staging Detail
Fast load
Fast load
Data Replication
Data Synchronization
Hadoop Data Transformation
HiveQL – Pig/Oozie - Spark
Sources
Data Reservoir
Sources
Oracle Data IntegratorOracle Data IntegratorGG to Flume
GG to Kafka
GG to Hive
Oracle GoldenGate
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
But What About Data Governance?
Oracle Big Data Governance 4
https://blogs.oracle.com/bigdata/entry/big_data_and_analytic_top
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
…to manage Risk/Compliance
Records retention
Rediscovery
Litigation support
Data access management
Information security and protection
Minimize corporate liability through proper governance of data
…to drive Business Value
Metadata discovery
Metadata & glossary cataloging
Data profiling
Data cleansing lifecycle
Data remediation
Maximize opportunity by ensuring trusted data is easily available for data driven business processes
5
The Data Governance Opportunity with Big Data
Oracle Big Data Governance
Solving business and IT data challenges
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Big Data Governance MythsDo the same principles apply for Big Data and Traditional Data Governance?
Oracle Big Data Governance 6
Perception
1. Data Governance has reduced significance in Big Data
2. Data Reservoirs should always contain only raw data in full fidelity
3. Big Data and Hadoop architectures are black boxes
Reality
1. Big Data without governance and quality is just Big Bad Data
2. Data Reservoirs contains all data. Raw, formatted and enriched.
3. If you use the data (you will!), you need to govern it’s lifecycle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Governance and the Data Reservoir
Oracle Big Data Governance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Governance 8
The Big Data Governance Problem
1 – How do we clean up the data lake?
2 – How do we keep the data reservoir clean?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Governance is Not Easy, there is No Silver Bullet!
Oracle Big Data Governance 9
Data Governance
Metadata Management
Business Glossary
Data Profiling
Data Cleansing
Data Archiving
Data Privacy
PEOPLE
PROCESS TECHNOLOGY
…people and process first, …tools and capabilities next, …and, there is no magic!
“…the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data."
- Ted Friedman, Gartnerhttp://www.gartner.com/newsroom/id/2854917
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Governance for the Data Reservoir Right Now
Oracle Big Data Governance 10
Data Governance
Metadata Management
Business Glossary
Data Profiling
Data Cleansing
Data Archiving
Data Privacy
Oracle Enterprise Metadata Management
Oracle Enterprise Data Quality delivers a complete, best-of-breed and business friendly approach to data cleansing resulting in trustworthy data for applications and to improve business reliability.
• Metadata Management – horizontal and semantic data lineage for all big data sources
• Business Glossary – simple tools to catalog, link and collaborate on business terms
Oracle Enterprise Data Quality
Oracle Enterprise Data Quality delivers a complete, best-of-breed and business friendly approach to data cleansing resulting in trustworthy data for applications and to improve business reliability.
• Profiling – simple to use data health check that can work with sample sets of all data
• Cleansing – validate, match and de-duplicate data records from any business application
Oracle Big Data SQL
Extends Oracle SQL to Hadoop and NoSQL and the security of Oracle Database to all your data. It also includes a unique Smart Scan service that minimizes data movement and maximizes performance.
• Data Privacy – leverage the Oracle DB security model on data that physically resides in Hadoop
• Archiving – Seamlessly locate aged data in a queryable data tables physically located in Hadoop
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Enterprise Metadata Management (OEMM)
Oracle Big Data Governance 11
• Metadata Management – horizontal and semantic data lineage for all big data sources
• Business Glossary – simple tools to catalog, link and collaborate on business terms
Business Data Catalog
Report to Source Lineage
Impact Analysis
Audit, Versioning & Diff Reports
Social/Collaboration Features
Annotations and Tagging
Comprehensive Harvesting 3rd Party BI Metadata
3rd Party ETL Metadata
3rd Party DB Metadata
3rd Party Modeling Tools
Big Data Metadata
Metadata Standards
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Value of Enterprise Metadata Management
Oracle Big Data Governance 12
ETL
BIDashboards
App
ETL
ETL
How was sales figure calculated?
What will happen if I change this
table?
What reports use the mainframe
data? Sys Admin
Executive
BI Developer
Where did this data
come from?
Application User
Which reports use this
customer data?
CDC
Data Reservoir
Data Steward
Can I trust the sources of this
customer data?
ETL
Developer
Solves significant pain points for wide variety of business consumers and technical staff
I want to design an experiment to measure the
success of a signup page. What data do I have?
Data Scientist
GG
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Metadata Management Use Cases
Oracle Big Data Governance
My dashboard does not match
this report…why?
Where did this data
come from?Where can I find
the data I need for analytics?
Which ETL mappings or BI Reports will be
affected by my column change?
What systems does the data flow
through?
13
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Simple Screens for both Business and IT User Profiles
Oracle Big Data Governance 14
Comprehensive Data Lineage for IT
Simple to Navigate All Metadata
Business / IT Collaboration
Search Driven Business Access
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
15
Vertical Lineage Links a business friendly set of terms to the
IT metadata and operational assets Capture Business Glossary, Taxonomy,
Ontology, Conceptual Models
Horizontal Column Level Links the data fields from Business Intelligence
Dashboards or Reports back to the Source Columns Schemas, BI View Layers, ETL Transformations,
Calculations, etc.
Oracle Big Data Governance
Ve
rtic
al L
ine
age
Horizontal Lineage
“NE_SALES”
“SALES”
“NAME” “ACCT_NAME”
“NORTH”
“AGG_TOTAL”
BI Fields to Source Columns
“FNAME|LNAME”
“Customer”
Biz Terms to IT
Two Crucial Styles of Metadata Management
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
ActionableEvents
Event Engine Data Reservoir
Data Factory Enterprise Information Store
Reporting
Discovery Lab
ActionableInformation
ActionableInsights
DataStreams
Execution
Innovation
Discovery Output
Events & Data
Data Flow View – Data Factory and Metadata Management
StructuredEnterprise Data
OtherData
Oracle Big Data Governance 16
Metadata Management and Business Glossary
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Comprehensive Data Integration & Governance Capabilities
Oracle Big Data Governance 17
Dynamic Data Movement– Low impact capture, stage in Hadoop– Continuous data availability
Data Transformation– Bulk data movement– Pushdown data processing
Data Federation– Virtualized Data Services
Data Quality & Verification– Fix quality at the source– Verify data consistency
Metadata Management– Lineage and Impact Analysis– Business Glossary Semantics
Data GovernanceFoundation
Oracle Data Integrator(Transformation)
Enterprise Data Quality(Profile, Cleanse, Match and De-duplicate)
FastLoad
Oracle GoldenGate(Movement)
Enterprise Metadata Management & Business Glossary(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator(Federation)
GoldenGate Veridata(Online Data Verification)
ELT Processingon Hadoop or SQL
Continuous Availability
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Enterprise Data Quality
Oracle Big Data Governance 18
• Profiling – simple to use data health check that can work with sample sets of all data
• Cleansing – validate, match and de-duplicate data records from any business application
Profile
Standardize
Match
Govern
Un
ifie
d W
ork
ben
chMarket-leading businessusability for all types of data
Unparalleled time-to-value, rapid deployments
High performance engine operates in real-time or batch
Out-of-the-box global knowledge-base for world-wide coverage
Foundation for comprehensivedata governance program
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Governance Lifecycle Tools
Oracle Big Data Governance 19
Operational Data FlowsBusiness Sources
Quality KPIs Case Management
Governance Cockpit for Data Stewards & Stakeholders
Exception Review
Metadata Management
Business Glossary
Design Time
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enterprise-Wide Governance Board
Top US Payroll ProviderOracle Enterprise Data Quality for
Governance on 100m records per month
20
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Privacy and Deep Data Access with Oracle Big Data SQL
Oracle Big Data Governance 21
SELECT w.sess_id, c.name
FROM web_logs w, customers c
WHERE w.source_country = ‘Brazil’
AND w.cust_id = c.customer_id;
Relevant SQL runs on BDA nodes
10’s of Gigabytes of Data
Only columns and rows needed toanswer query are returned
Hadoop Cluster
B B B
Big Data SQL
Oracle Database
CUSTOMERSWEB_LOGS
• Data Privacy – leverage the Oracle DB security model on data that physically resides in Hadoop
• Archiving – Seamlessly locate aged data in a queryabledata tables physically located in Hadoop
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Integration & Governance Better
22
Dynamic Data Movement
NoETLEngine
Most Heterogeneous
vs.Batch Data Movement & Weak CDC Tools
ETL Engine H/W Alongside Hadoop
Proprietary Vendor Lock-in, Incomplete Metadata
vs.
vs.
Oracle Big Data Governance
Oracle Data Integration Governance vs. “Other Guys”
Business FriendlyGovernance Tools
Wide & Current3rd Party Support
Comprehensive Platform
vs.Mix and Match of 6+ Legacy Tools
Inflexible Metadata Models & Frameworks
Incomplete Governance Features
vs.
vs.
Data Governance
Metadata Management
Business Glossary
Data Profiling
Data Cleansing
Data Archiving
Data Privacy
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Most Heterogeneous, Deep 3rd Party Support
Oracle Big Data Governance 23
Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata
Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter
CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL
QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema
Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus
+ open APIs and standards based meta-model
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
…to manage Risk/Compliance
Records retention
Rediscovery
Litigation support
Data access management
Information security and protection
Minimize corporate liability through proper governance of data
…to drive Business Value
Metadata discovery
Metadata & glossary cataloging
Data profiling
Data cleansing lifecycle
Data remediation
Maximize opportunity by ensuring trusted data is easily available for data driven business processes
24
The Data Governance Opportunity with Big Data
Oracle Big Data Governance
Solving business and IT data challenges
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Simplifies Big Data Integration & Governance
Comprehensive Big Data Integration and Data Governance Platform
Appliance w/Hadoop Cluster
Analytic Tools
DI Tools and Connectors
Heterogeneous & Best of Breed
Differentiated and powerful DI capabilities for Teradata, Netezza, Microsoft, DB2, Sybase..
Faster Time to Value
Flexible configurations
OOTB performance with DI
Unified Mgmt - EM Plug-ins for Appliance and DI Tools
Single Support Contact –Hardware/Software/Networking and ASR
Oracle Big Data Governance 25
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Join the Community
#ODI12c #GoldenGate12c #OEDQ #OEMM
Connect with Oracle on Social Media
OR connect via the web
Oracle Data Integration blog
blogs.oracle.com/dataintegrationOracle Data Integration Home Page
oracle.com/goto/dataintegration
Oracle Big Data Governance 26
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Governance 28