seamless access from oracle database to your big...
TRANSCRIPT
Seamless Access from Oracle Database to Your Big Data
Brian MacdonaldBig Data and Analytics SpecialistOracle Enterprise Architect
September 24, 2015
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda• Hadoop and SQL access methods• What is Oracle Big Data SQL• Big Data SQL Architecture• Big Data SQL Configuration• Roadmap• Customer Story• Q&A
9/23/2015 2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
First Lets Define Big Data
Structured & Unstructured Data
&
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SQL on Hadoop is ObviousAlthough Implementations Vary
• Hive• Impala, HAWQ, IBM Big SQL• Oracle SQL Connector for Hadoop (OSCH)• Oracle Big Data SQL • A million more (Tez, Presto, Hadapt, Stinger, Polybase, Drill, Lots of
start ups)
Stinger
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Analytics ChallengeSeparate silos of information to analyze
5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Analytics ChallengeNo comprehensive SQL interface
6
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQLHadoop + NoSQL + Relational…
7
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL – A New Architecture
• Powerful, high-performance SQL on Hadoop– Full Oracle SQL capabilities on Hadoop– SQL query processing local to Hadoop nodes
• Simple data integration of Hadoop and Oracle Database– Single SQL point-of-entry to access all data– Scalable joins between Hadoop and RDBMS data
• Oracle Security – Govern all Data through a Single Set of Security Policies– Redaction, VPD, etc.– Tool Access
8
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Use Rich Oracle SQL Dialect Over All DataSnapshot of Oracle SQL Analytic Functions
• Ranking functions
– rank, dense_rank, cume_dist, percent_rank, ntile
• Window Aggregate functions (moving and cumulative)
– Avg, sum, min, max, count, variance, stddev, first_value, last_value
• LAG/LEAD functions
– Direct inter-row reference using offsets
• Reporting Aggregate functions
– Sum, avg, min, max, variance, stddev, count, ratio_to_report
• Statistical Aggregates
– Correlation, linear regression family, covariance
• Linear regression
– Fitting of an ordinary-least-squares regression line to a set of number pairs.
– Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions
• Descriptive Statistics
– DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values
• Correlations
– Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).
• Cross Tabs
– Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa
• Hypothesis Testing
– Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
• Distribution Fitting
– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL Architecture• Two components of Oracle Big Data SQL
– External Table extension– Big Data SQL Server Software on Big Data Appliance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL Architecture• Two components of Oracle Big Data SQL
– External Table extension– Big Data SQL Server Software on Big Data Appliance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 12
A Smarter Oracle External Table
Oracle Table
HDFS Data
You define:
You get:
• Table name
• Automatic discovery of Hive table metadata
• Oracle types
• Automatic translation from Hadoop types• Automatic conversion from any InputFormat
• Any Degree of Parallelism
• Fan-out Parallelism across the Hadoop cluster
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Unify Metadata: Publish Hive Metadata to Oracle Catalog
13
CREATE TABLE movieapp_log_json(click VARCHAR2(4000))ORGANIZATION EXTERNAL(TYPE ORACLE_HIVEDEFAULT DIRECTORY DEFAULT_DIR)
REJECT LIMIT UNLIMITED;
Big Data Appliance+
Hadoop/NoSQL
Exadata+
Oracle Database
Oracle Catalog
External Table
Hive metadata
External Table
Hive Metastore
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Accessible through Oracle Data DictionaryImmediately – So the DBA doesn’t need to go to Hadoop
• ALL_HIVE_DATABASES
• ALL_HIVE_TABLES
• ALL_HIVE_COLUMNS
• DBA_HIVE_DATABASES
• DBA_HIVE_TABLES
• DBA_HIVE_COLUMNS
• USER_HIVE_DATABASES
• USER_HIVE_TABLES
• USER_HIVE_COLUMNS
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15
Extend Oracle External Tables
CREATE TABLE movielog (
click VARCHAR2(4000))
ORGANIZATION EXTERNAL (
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(
com.oracle.bigdata.tablename logs
com.oracle.bigdata.cluster mycluster
))
REJECT LIMIT UNLIMITED;
• New types of external tables– ORACLE_HIVE (inherit metadata)– ORACLE_HDFS (specify metadata)
• Access parameters for Big Data– Hadoop cluster– Remote Hive database/table
• DBMS_HADOOP Package for automatic import
• SQLDeveloper Integration (Create Table)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SQLDeveloper Integration
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How Data is Stored in HadoopAs files. Pretty Simple
17
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}}{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
Example: 1TB File
CREATE TABLE ORDER (custid VARCHAR2(10), recommended VARCHAR2(20), activity (NUMBER 8,2))ORGANIZATION EXTERNAL (TYPE oracle_hdfs)LOCATION ("hdfs:/usr/cust/summary/*");
• Assumes Default Values• Table Options
– Fields– Column Maps– Delimiters– Fileformats
• json, textfile, sequencefile,…
– Serdes• i.e regex
More (See Docs)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Creating an External Table against Hive
DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE (cluster_id IN VARCHAR2,db_name IN VARCHAR2 := NULL,hive_table_name IN VARCHAR2,hive_partition IN BOOLEAN,table_name IN VARCHAR2 := NULL,perform_ddl IN BOOLEAN DEFAULT FALSE,text_of_ddl OUT VARCHAR2);
set serveroutput onDECLAREDDLout VARCHAR2(4000);BEGINdbms_hadoop.create_extddl_for_hive(CLUSTER_ID=>'bigdatalite',DB_NAME=>'brian',HIVE_TABLE_NAME=>'movie',HIVE_PARTITION=>FALSE,TABLE_NAME=>'movie',PERFORM_DDL=>FALSE,TEXT_OF_DDL=>DDLout);dbms_output.put_line(DDLout);END;
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19
Oracle External Tables – Flexibility for Varied File Structures
• Transparent schema-for-read– Use fast C-based readers when possible– Use native Hadoop classes otherwise
• Engineered to understand parallelism– Map external units of parallelism to Oracle
• Architected for extensibility– StorageHandler capability enables support for other
data sources– Examples: MongoDB, HBase, Oracle NoSQL DB
CREATE TABLE ORDER (cust_num VARCHAR2(10),
order_num VARCHAR2(20), order_total NUMBER(8,2))ORGANIZATION EXTERNAL (
TYPE ORACLE_HIVEDEFAULT DIRECTORY DEFAULT_DIR
)PARALLEL 20REJECT LIMIT UNLIMITED;
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
StorageHandlers: Extensibility Beyond HDFS
Hive Metastore
Oracle Big Data SQL
StorageHandlers are a metadata bridge.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL Architecture• Two components of Oracle Big Data SQL
– External Table extension– Big Data SQL Server Software on Big Data Appliance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What gives Exadata extreme performance?
22
Oracle Database 12c
SQL
Offload Query toExadata Storage Servers
Small data subsetquickly returned
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Introducing Oracle Big Data SQL
23
Massively Parallel SQL Query across Oracle, Hadoop and NoSQL
Oracle Database 12cHadoop & NoSQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Appliance X5-2
Sun Oracle X5-2L Servers with per server:• 2 * 18 Core Intel Xeon E5 Processors• 128 GB Memory• 96TB Disk space
Integrated Software (4.2):• Oracle Linux 6.6• Oracle Big Data SQL 1.1*• Cloudera Distribution of Apache Hadoop 5.4 – EDH Edition• Cloudera Manager 5.4• Oracle R Distribution• Oracle NoSQL Database CE
24
* Oracle Big Data SQL is separately licensed
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Introducing Oracle Big Data SQL
25
Massively Parallel SQL Query across Oracle, Hadoop and NoSQL
Oracle Database 12c
Offload Query toExadata Storage Servers
Small data subsetquickly returned
Hadoop & NoSQL
Offload Query toData Nodes
SQL
data subset
SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Storage Layer
26
Big Data SQL Server: A New Hadoop Processing Engine
Filesystem (HDFS) NoSQL Databases(Oracle NoSQL DB, Hbase)
Resource Management (YARN, cgroups)
Processing Layer
MapReduceand Hive Spark Impala Search Big Data
SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How do we query Hadoop?Big Data SQL Query Execution
HDFSData NodeBDSQL
HDFS Data NodeBDSQL
Query compilation determines:• Data locations • Data structure• Parallelism
1
Fast reads using Big Data SQL Server• Schema-for-read using Hadoop classes• Smart Scan selects only relevant data
2
Process filtered result• Move relevant data to database• Join with database tables• Apply database security policies
3Hive Metastore
HDFSNameNode 1
2 3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Apply Advanced Security on Hadoop & NoSQL
28
Same security policies across all data
Oracle Database 12cHadoop
Redacteddata
subset
SQLRaw JSON
data inHadoop
JSON
Customer data
in Oracle
DBMS_REDACT.ADD_POLICY(object_schema => 'sales',object_name => 'customer_detail',column_name => 'last_name',policy_name => 'customer_privacy',function_type => DBMS_REDACT.FULL,expression => '1=1');
• Redaction• Virtual Private Database• Fine-grain Access Control
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Configuration• Install Oracle Big Data SQL on the BDA using Mammoth• Run the Big Data SQL-Exadata installation script on each Oracle Exadata
database node– Sets up connectivity from Exadata to the Big Data SQL Servers on the BDA.– Installs a Hadoop client– Configure directories and files– Big Data SQL Agent– Oracle directory objects– Others
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Directories• Two Types of directories are created• Common Directory – must be on cluster wide shared files system
– Subdirectories for jar files– bigdata.properties (paths,etc.)
• Cluster Directory(s)– Configuration details for each BDA Cluster– Sub directory of Common directory
• Oracle Directories that point to these Dirs– ORACLE_BIGDATA_CONFIG – Common Directory– ORACLE_BIGDATA_CL_XXXX – One for each Cluster directory (case sensitive)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data SQL Agents • Created by Install Script• This multi-threaded agent bridges the metadata between Oracle Database
and Hadoop.• It launches a single JVM - instead of one for every process (which can be
quite slow). • create public database link BDSQL$_XXXX using
'extproc_connection_data'; (XXXX is the name of each BDA cluster from Cluster Directories
• create public database link BDSQL$_DEFAULT_CLUSTER using 'extproc_connection_data';
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
If Kerberos is used on BDA
• Must create ticket (kinit) for BDS user• BDS runs as Oracle User• Need to renew tickets
– cron– Other automation to be released soon
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Requirements - For Now• Exadata• Oracle 12.1.0.2.1+• Storage Servers 2.1.1.1 or 12.1.1.0• Exadata configured on the same InfiniBand subnet as BDA• Exadata and BDA connected by InfiniBand
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
RoadmapSubsequent content subject to change!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Today
• Hadoop DoP linked to RDBMS DoP– Lead to many idle PQ processes– Required explicit declaration
Next
• Unlink Hadoop and RDBMS DoP• Automatic max Hadoop parallelism
– Even on serial tables
• An average of 40% faster– Even at equivalent DoP
Enhanced Parallelism
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Today
• All blocks in a query must be read from disk– Large (256MB) disk I/O for each block
Next
• Automatically create Storage Indexes in Big Data SQL Agents
• Check index before reading blocks– Skip unnecessary I/Os
• An average of 65% faster– Up to 100x faster for highly selective
queries
Storage Indexing
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Customer Examples
37
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Building Customer Loyalty
Deliver personalized multi-channel content to every customer (example: Kroger’s ‘MyMagazine’)
Expand to a wide variety of interaction data to build customer profiles
Challenges
Customer loyalty marketing and programs for major retailers and consumer brands
Company Overview
2x improvements in campaign performance Large-scale concurrent processing of complex SQL
70% of analysis is done in SQL, uses R as well
Benefits
Solution Overview Oracle Exadata X3-8 Oracle Database with Advanced Analytics Oracle ZFS Backup Appliance Big Data Appliance Next: Big Data SQL
SQL Analysis R-based Analysis Machine Learning
BDAZFS X3-8X3-8
Source Systems (at Client)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Thank You&
Q&A