oracle big data y database analytics - jordi trill
DESCRIPTION
Oracle proporciona una solución completa y abierta, sencilla de implementar, que combina hardware y software, para incorporar entornos y arquitecturas Big Data en entornos IT empresariales que requieran elevados niveles de fiabilidad, seguridad y productividad. Con Oracle Big Data SQL es posible mantener múltiples repositorios de información -Hadoop, NoSQL y relacionales- y acceder a ellos de forma unificada mediante SQL con el máximo rendimiento y el mínimo movimiento de información.TRANSCRIPT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data y Database Analytics en el ámbito empresarial
Jordi Trill Core Tech Business Development Manager
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The World has Changed!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Five Forces Are Transforming IT
The convergence of big data, social, mobile, IoT and cloud computing – five distinct, yet increasingly intertwined technology trends that exist in an overlapping matrix, where the importance of each increases because it leverages one of the others
Social Mobile
Cloud Big Data
IoT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The use of any digital technology to promote, sell and enable innovative products, services and experiences
What is Digital Business?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How Are Companies Using Digital?
Customer
Experience Operational
Improvement
New Business
Models
44% 30% 26%
Big Data Cloud IOT Social Mobile
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Digital Business Strategies
Digital Transformation
Digital Disruption
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The change that occurs when new digital technologies and/or business models affect the value proposition of existing goods and services
Digital Disruption
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The re-alignment of, or new investment in, technology and/or business models to more effectively engage consumers or employees
Digital Transformation
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Thoughts Things Activities
Big Data Is The Datafication Of Everything
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Ability to Consume Data
12%
Executives who feel they understand the impact data
will have on their organizations
Ability to Produce Data
Challenge #1: Data Production Outweighs Consumption
Source: The Economist Intelligence Unit
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Challenge #2: Data Analysis Takes Too Long
Source: Richard Hackethorn’s Component’s of Action Time
Business event
Response Time
Bu
sin
ess V
alu
e
Data captured
Analysis completed
Action taken
of executives say too much critical
information is delivered too late
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Warehouse 2.0 Data Warehousing in the Age of Big Data
1. Still to many data marts
2. Batch updates = stale data sets
3. Instinct-based decision making
4. Analysis bolted onto limited set of business
processes
Data Warehouse 1.0
1. Integrated, consolidated architecture
2. Real-time ELT = data always fresh
3. Fact-based decision making
4. Analysts focus on discovery and driving business
value
Data Warehouse 2.0
The Path to Monetizing Big Data
Source: Tom Davenport – Harvard Business Review
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
00101110001
01011100010
10111000100
00101110001
01011100010
10111000100 Hadoop Relational
DBMS
Integrated Data Management Platform
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
30 years of development
High concurrency
Rich language
Tool support
Security
Mission-critical performance
Relational Hadoop
Innovative, economical and flexible
Scale out simply
Cost effective
Rapidly evolving
Less formalization
Oracle Solution: Big Data Management System
+
BDMS
Integrated Data Management Platform
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Ken Rudin, Director of Analytics (Facebook)
Original from James Collins, Stanford University
“The genius of 'and',
the tyranny of 'or' ”
http://tdwi.org/Articles/2013/05/06/Facebooks-Relational-Platform.aspx?Page=1
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Applications
Business Analytics
Big Data Management System
Data Warehouse Data Reservoir +
Discovery Biz Intelligence +
by Industry & LoB
Key Requirements
Single view of ALL data
Optimized performance
across data sets
Continuous, enterprise-
grade functionality
Security,
Resource
Management,
Backup and DR
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
OLTP/Data Warehouse Data Reservoir +
Oracle Big Data Connectors
Oracle Data Integrator
Oracle Advanced Analytics
Oracle
Database
Oracle Spatial & Graph
Oracle NoSQL Database
Cloudera Hadoop
Oracle R Distribution
Oracle Industry Models
Oracle GoldenGate
Oracle Data Integrator
Oracle Event Processing
Oracle Event Processing
Apache Flume
Oracle GoldenGate
Oracle Advanced Analytics
Oracle Database
Oracle Spatial & Graph
Oracle In-Memory Columnar Store
Oracle Big Data SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Appliance
Hardware (X4-2 Full ; 18 nodes)
288 CPU cores with 1152 GB RAM
864 TB of raw disk storage
40 Gb/s InfiniBand
Integrated Software (Pre-installed, pre-optimized)
Includes all components of Cloudera Enterprise and Add-ons
Cloudera CDH
Cloudera Impala
Cloudera HBase (with Apache Accumulo)
Cloudera Search
Apache Spark
Cloudera Manager (incl. BDR and Navigator)
Oracle NoSQL Database
Oracle R Distribution
Oracle Confidential – Internal/Restricted/Highly Restricted 23
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26
Oracle Big Data Appliance Installation
Physical Installation (10 racks)
Electricians
Network Engineers
Storage Engineers
System Administrators
286 hours 236 hours, 616 cables
264 hours, 864 cables
320 hours, 576 cables
232 hours
16 hours 16 hours, 32 cables
6 hours, 14 cables
n/a n/a
38 vs. 1338 hours 19 vs. 677 elapsed hours 46 vs. 2344 cables
vs.
Oracle
Custom
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enterprise Security
Authentication through Kerberos
Authorization through Apache Sentry
Auditing through Oracle Audit Vault
Encryption for Data-at-Rest
Network Encryption
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle NoSQL Database
30
101100101001001001101010101011100101010100100101
Reliable Flexible Fast Simple
advanced Key-Value database designed as cost effective, high performance solution for simple operations on collections of data with built in high availability and elastic
scale-out.
less is more
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Open source language and environment
Used for statistical computing and graphics
Strength in easily producing publication-quality graphs
Highly extensible
Created by Robert Gentleman and Ross Ihaka.
33
Big Data Technology Today R Statistical Programming Language
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
OLTP/Data Warehouse Data Reservoir +
Oracle Big Data Connectors
Oracle Data Integrator
Oracle Advanced Analytics
Oracle
Database
Oracle Spatial & Graph
Oracle NoSQL Database
Cloudera Hadoop
Oracle R Distribution
Oracle Industry Models
Oracle GoldenGate
Oracle Data Integrator
Oracle Event Processing
Oracle Event Processing
Apache Flume
Oracle GoldenGate
Oracle Advanced Analytics
Oracle Database
Oracle Spatial & Graph
Oracle In-Memory Columnar Store
Oracle Big Data SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
JSON Native
In-Database Open R
In-Database Map Reduce
OLAP Engine
SQL Pattern Matching
Data Redaction
Adaptive Execution Plans
XML Native
Spatial & Graph Analysis
In-Database Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Good SQL execution without intervention
HJ
Table scan T2
Table scan T1
NL
Index Scan T2
Threshold exceeded, plan switches
Table scan T1
HJ
Table scan T2
Plan decision deferred until
runtime Final decision is based on
statistics collected during execution If statistics prove to be out of
range, sub-plans can be swapped Bad effects of skew eliminated &
queries significantly accelerated
Query Performance Acceleration
Adaptive Execution Plans
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Pattern Matching
• Scalable discovery of business event sequences,
– Clickstream Analysis
– Fraud Detection
– Stock Analysis
– Dropped Calls Analysis
– Automatic Medical Detections
• Historically requires complex SQL or external code to execute
EVENT TIME LOCATION
A 1 SFO
A 1 SFO
A 2 ATL
A 2 LAX
B 2 SFO
C 2 LAX
C 3 LAS
A 3 SFO
B 3 NYC
C 4 NYC
> 1
min
A 2 ATL
A 2 LAX
B 2 SFO
C 2 LAX
“Find one or more event A followed by one B
followed by one or more C in a 1 minute interval”
Recognize patterns in sequences of rows
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Pattern Matching in action Example: Find a double bottom pattern (W-shape) in ticker stream
• Find a W-shape pattern in a ticker stream:
• Output the beginning and ending date of the pattern
• Calculate average price each the W-shape
• Find only patterns that lasted less than a week days
Stock price
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SELECT . . . FROM ticker MATCH_RECOGNIZE ( . . . )
days
Stock price
SQL Pattern Matching in action Example: Find W-Shape
New syntax for discovering patterns using SQL:
MATCH_RECOGNIZE ( )
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Find a W-shape pattern in a ticker stream:
• Set the PARTITION BY and ORDER BY clauses
SELECT … FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time
days
Stock price
SQL Pattern Matching in action Example: Find W-Shape
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Find a W-shape pattern in a ticker stream:
• Define the pattern – the “W-shape”
SQL Pattern Matching in action Example: Find W-Shape
SELECT … FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time PATTERN (X+ Y+ W+ Z+)
days
Stock price
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Find a W-shape pattern in a ticker stream:
• Define the pattern – the “W-shape”
SQL Pattern Matching in action Example: Find W-Shape
days
Stock price
SELECT … FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price)))
X Y W Z
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Find a W-shape pattern in a ticker stream:
• Define the measures to output once a pattern is matched:
• FIRST: beginning date
• LAST: ending date
SQL Pattern Matching in action Example: Find W-Shape
days
Stock price
SELECT … FROM ticker MATCH_RECOGNIZE ( PARTITION BY name ORDER BY time
MEASURES FIRST(x.time) AS first_x, LAST(z.time) AS last_z
PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price)))
X Z
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Find a W-shape pattern in a ticker stream:
• Calculate average price in the second ascent
SQL Pattern Matching in action Example: Find W-Shape
1 9 13 19 days
Stock price
SELECT first_x, last_z, avg_price
FROM ticker MATCH_RECOGNIZE (
PARTITION BY name ORDER BY time
MEASURES FIRST(x.time) AS first_x,
LAST(z.time) AS last_z,
AVG(z.price) AS avg_price
ONE ROW PER MATCH
PATTERN (X+ Y+ W+ Z+)
DEFINE X AS (price < PREV(price)),
Y AS (price > PREV(price)),
W AS (price < PREV(price)),
Z AS (price > PREV(price) AND
z.time - FIRST(x.time) <= 7 ))))
Average stock price: $52.00
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
next = lineNext.getQuantity();
}
if (!q.isEmpty() && (prev.isEmpty() || (eq(q, prev) && gt(q, next)))) {
state = "S";
return state;
}
if (gt(q, prev) && gt(q, next)) {
state = "T";
return state;
}
if (lt(q, prev) && lt(q, next)) {
state = "B";
return state;
}
if (!q.isEmpty() && (next.isEmpty() || (gt(q, prev) && eq(q, next)))) {
state = "E";
return state;
}
if (q.isEmpty() || eq(q, prev)) {
state = "F";
return state;
}
return state;
}
private boolean eq(String a, String b) {
if (a.isEmpty() || b.isEmpty()) {
return false;
}
return a.equals(b);
}
private boolean gt(String a, String b) {
if (a.isEmpty() || b.isEmpty()) {
return false;
}
return Double.parseDouble(a) > Double.parseDouble(b);
}
private boolean lt(String a, String b) {
if (a.isEmpty() || b.isEmpty()) {
return false;
}
return Double.parseDouble(a) < Double.parseDouble(b);
}
public String getState() {
return this.state;
}
}
BagFactory bagFactory = BagFactory.getInstance();
@Override
public Tuple exec(Tuple input) throws IOException {
long c = 0;
String line = "";
String pbkey = "";
V0Line nextLine;
V0Line thisLine;
V0Line processLine;
V0Line evalLine = null;
V0Line prevLine;
boolean noMoreValues = false;
String matchList = "";
ArrayList<V0Line> lineFifo = new ArrayList<V0Line>();
boolean finished = false;
DataBag output = bagFactory.newDefaultBag();
if (input == null) {
return null;
}
if (input.size() == 0) {
return null;
}
Object o = input.get(0);
if (o == null) {
return null;
}
//Object o = input.get(0);
if (!(o instanceof DataBag)) {
int errCode = 2114;
String msg = "Expected input to be DataBag, but"
SELECT first_x, last_z
FROM ticker MATCH_RECOGNIZE (
PARTITION BY name ORDER BY time
MEASURES FIRST(x.time) AS first_x,
LAST(z.time) AS last_z
ONE ROW PER MATCH
PATTERN (X+ Y+ W+ Z+)
DEFINE X AS (price < PREV(price)),
Y AS (price > PREV(price)),
W AS (price < PREV(price)),
Z AS (price > PREV(price) AND
z.time - FIRST(x.time) <= 7 ))
250+ Lines of Java and PIG 12 Lines of SQL
20x less code, 5x faster
SQL Pattern Matching Finding Double Bottom (W)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Native Support for TOP-N Queries
• New OFFSET and FETCH FIRST clauses
• Specify number or percentage of rows to return
• ANSI 2008/2011 compliant with additional extensions
Simplified Code Development
“Who are the top 5 money makers in my enterprise?”
SELECT empno, ename, deptno
FROM emp
ORDER BY sal, comm FETCH FIRST 5 ROWS ONLY;
SELECT empno, ename, deptno
FROM (SELECT empno, ename, deptno, sal, comm,
row_number() OVER (ORDER BY sal,comm) rn
FROM emp
)
WHERE rn <=5
ORDER BY sal, comm;
versus
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle OLAP Built-in Access to Analytic Calculations
Multidimensional analytic engine that analyzes summary data
Offers improved query performance and fast, incremental updates
Embedded in Oracle Database instance and storage
Example Analytical Questions How do sales in the Western region this quarter
compare with sales a year ago? What will sales next quarter be? What factors can we alter to improve the sales
forecast?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R code and/or SQL
Models run in-database
Avoid Data Movement
Large data sets
Built-in security
Oracle Advanced Analytics R Enterprise and Data Mining
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Spatial and Graph High performance, simplified geospatial analysis all through SQL
ROADS
RNAME ID TYPE LANES GEOMETRY M40
M25
140
141
HWY
HWY
6
4
SELECT a.owner_name, a.acquisition_status FROM properties a, projects b WHERE sdo_within_distance (a.property_geom, b.project_geom, ‘distance = .1 unit = mile’) = ‘TRUE’ and b.project_id=189498;
Vector Performance Acceleration: 50X performance improvement (2D/3D queries) - Spatial joins, touch, contains, overlaps, complex masks
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Spatial and Graph: Network Graph
Explicitly stores and maintains connectivity
Attributes at link and node level
Example algorithms: Traveling salesman, spanning tree, shortest path, sub-path, within cost, nearest neighbors
Very fast network analysis
Graph model to represent physical and logical networks
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Spatial and Graph: Semantic Graph
• Supports all relevant W3C standards
• View relational data as RDF graph
• 60% data compression reduces storage and enhances performance
RDF Semantic Graph
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
In-Database MapReduce
Oracle Database
Reduce
Table
Map
Map
Reduce
Table K V
timestamp userid pageid
10:00:00 12345 A73_2
10:00:02 8901 A74_3
10:00:03 12345 A73_3
10:01:12 12345 A74_4
session userid pageid duration
0 12345 A73_2 3
0 12345 A73_3 70
0 12345 A74_4 12
1 8901 A74_3 89
MapReduce within the Oracle Database:
select session, userid, pageid, duration
from table(oracle_map_reduce.reducer(cursor(
select * from table(oracle_map_reduce.mapper(cursor(
select * from clicks))) map_result)));
=> Works on internal and external data sources
=> Leverage PL/SQL skills for big data analytics
=> High efficiency through parallel pipelined infrastructure
=> In-database execution allows for fast query performance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Soc. Sec. # 115-69-3428
DOB 11/06/71
NAME SARA JONES
Policy enforced redaction of sensitive data
Data Redaction
Data Analyst
ETL / Data Quality Processes
Dynamically Masking for Data Warehouses
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
OLTP/Data Warehouse Data Reservoir +
Oracle Big Data Connectors
Oracle Data Integrator
Oracle Advanced Analytics
Oracle
Database
Oracle Spatial & Graph
Oracle NoSQL Database
Cloudera Hadoop
Oracle R Distribution
Oracle Industry Models
Oracle GoldenGate
Oracle Data Integrator
Oracle Event Processing
Oracle Event Processing
Apache Flume
Oracle GoldenGate
Oracle Advanced Analytics
Oracle Database
Oracle Spatial & Graph
Oracle In-Memory Columnar Store
Oracle Big Data SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Connectors
Data Load Oracle Loader for Hadoop
Data Access Oracle SQL Connector for
HDFS
R Analytics Oracle R Advanced Analytics
on Hadoop
Oracle Data Integrator Knowledge Modules
XML/XQuery Oracle XQuery on Hadoop
XQuery R Client
Optimized for Hadoop Maximum parallelism
Fast performance Analyze all your data in-place
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Connectors and Data Integrator
Big Data Appliance +
Hadoop
Exadata +
Oracle Database
15TB / hour
10x Faster
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
When Data Lives in Many Places…
Oracle Confidential – Internal/Restricted/Highly Restricted 58
Profit and Loss
Relational Hadoop
Application Logs
NoSQL
Customer Profiles
SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL
Query All Data without Application Change or Data Conversion
Big Data Appliance +
Cloudera Hadoop
Oracle NoSQL DB
Exadata +
Oracle Database
Oracle Catalog
External Table
create table customer_address
( ca_customer_id number(10,0)
, ca_street_number char(10)
, ca_state char(2)
, ca_zip char(10))
organization external (
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(com.oracle.bigdata.cluster hadoop_cl_1)
LOCATION ('hive://customer_address')
)
HDFS Data Node
HDFS Name Node
Hive metadata
External Table
Hive metadata
Big Data SQL Query all data with Oracle SQL Smart scan in Hadoop to optimize data requests
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL
60
SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id;
Relevant SQL runs on BDA nodes
10’s of Gigabytes of Data
Only columns and rows needed to answer query are returned
Hadoop Cluster
B B B
Big Data SQL
Oracle Database
CUSTOMERS WEB_LOGS
Fast Smart Scan
Massive Parallelism
Storage Indexes
Filtered Locally
Minimized Data Movement
Intelligent Query Optimization
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle’s Vision of a Big Data Management System
Oracle Confidential – Internal/Restricted/Highly Restricted 61
One fast SQL query , on all your data.
Oracle SQL on Hadoop and beyond • With a Smart Scan service as in Exadata • Without federation or fragmented stores • With the security and certainty of Oracle Database
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data and Fast Data Integrated Solution Stack
Decide
Oracle Coherence
Oracle Event Processing
Oracle BAM
Fast Data: Real Time Streaming – Filtering – Pattern M. – Monitoring
Oracle Real -Time Decisions
Big Data Management System: Acquire – Organize – Analyze
In-D
atab
ase
An
alyt
ics
In-Memory Columnar Store
Oracle Advanced Analytics
Oracle Database
Ap
plic
atio
ns
Oracle NoSQL Database
Cloudera Hadoop
Oracle R Distribution
Oracle Big Data Connectors
Oracle Data Integrator
Oracle BI Enterprise Edition
Endeca Information Discovery
Data Sources
Oracle Big Data SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Unified Data Platform
Advanced Query & Analysis Full Power of SQL and Advanced Analytics
Transparent to Applications No Changes to Application Code
Single View of All Data Unified Metadata Across RDBMS & Hadoop
Fastest Performance Utilize SQL Processing Across the Platform
Leverage Existing Skills Lower Cost & Complexity of Big Data Adoption