turbocharge sql performance with oracle database 12c philip moore senior data architect and...
TRANSCRIPT
Turbocharge SQL Performance with Oracle Database 12c
Philip Moore
Senior Data Architect and Developer
222© 84.51° 2015 | Public
We believe in making people’s lives easier by putting the customer at the center of everything we do
WHO IS 84.51°
333© 84.51° 2015 | Public
84.51° HELPS COMPANIES PUT THE CUSTOMER AT THE CENTER OF EVERY DECISION
With insights from the data, we personalize the experience, making your business more relevant to consumers who matter most
Second, we work with you to change the organization, embedding the principle of shopper first
In so doing, we grow measurable brand value for our clients
444© 84.51° 2015 | Public
Q2
02Q
3 02
Q4
02Q
1 03
Q2
03Q
3 03
Q4
03Q
1 04
Q2
04Q
3 04
Q4
04Q
1 05
Q2
05Q
3 05
Q4
05Q
1 06
Q2
06Q
3 06
Q4
06Q
1 07
Q2
07Q
3 07
Q4
07Q
1 08
Q2
08Q
3 08
Q4
08Q
1 09
Q2
09Q
3 09
Q4
09Q
1 10
Q2
10Q
3 10
Q4
10Q
1 11
Q2
11Q
3 11
Q4
11Q
1 12
Q2
12Q
3 12
Q4
12
0%
-1%-2%
-1%-1%
0%
1%
1%1%
2% 2%2%
3%4%
5%
6%6%
5% 5% 5% 5%6%
5%6%
5%
6%
4%
3%3%
1% 1%
2%3%
2%
4%
5%
5%5% 5%
4%4%
3%4%
Identical Store Sales (Exc Fuel)
% C
han
ge
in I
den
tica
l S
tore
Sal
es Y
ear
on
Yea
rPLACING THE CUSTOMER AT THE CENTER WORKS
JV Partnership began in 2003
10%
0%
555© 84.51° 2015 | Public 5
WHO AM I?
Philip Moore
• Exadata Data Warehouse Architect / Lead Oracle
Developer – 84.51°
• 15+ years Oracle data warehousing (since Oracle 8i)
• 5+ years with Oracle Exadata (v2, x2, x3)
• Oracle Certified Professional
‑ PL/SQL Developer
‑ 10g DBA
• Oracle SQL Certified Expert
?
Experience with:• SQL Server
• Teradata
• Netezza
• Greenplum
• PostgreSQL
666
Before we start on 12c…
Let’s Talk about:
84.51’s Oracle Data Warehouse
777© 84.51° 2015 | Public 7
• ETL• Analyst ad-hoc queries (no-SLA)• Loyalty Campaign Mailer
• ETL• Products –– The Shop• Pre-canned queries with SLA
Oracle Exadata runs the heart of our Data Warehouse. It has enabled us to deliver things we could not do before.
We use an Active-Active Disaster Recovery configuration –– using a dual ETL/ELT strategy to load data into our Primary site and to the D/R and Products site
Primary Site D/R and Products Site
WE USE ORACLE EXADATA
ETL
ETL
888© 84.51° 2015 | Public 8
Fact Table200+ Billion
rows
Logically — We use a traditional Kimball star schema
Example client model:
Customers300+ Million rows
Terminals500+ rows
Stores15,000+ rows
Dates4,700+ rows
Products8+ Million rows
84.51 - Data Warehouse Point of Sale Data Model
999© 84.51° 2015 | Public 9
Physically — we use the following strategies
For Dimensions
• Oracle Advanced Compression (COMPRESS FOR OLTP)
• Primary Keys are enforced — in RELY mode
• Oracle DIMENSION objects
• Exadata Hybrid Columnar Compression (HCC) — QUERY HIGH
• We will put the latest 2 years (“Hot” data) into the In-Memory Column Store
• Partition by RANGE on our date field, 1 week per partition
“Soft” Referential Integrity for Foreign Key relationships
• NUMBER for Numbers
• DATE for Dates
• VARCHAR2 for Characters
For Facts
Proper data types for our data
NOT NULL and CHECK constraints to validate data and help CBO
PHYSICAL MODEL STRATEGIES FOR EXADATA
101010
Now Let’s Talk Oracle 12c New Features that will speed up the Data Warehouse
111111© 84.51° 2015 | Confidential
12c – New Data Warehousing FeaturesThese new features will accelerate your Data Warehouse SQL queries
Oracle In-Memory Column Store
In-Memory Aggregation (Vector Group By)
Attribute Clustering
Adaptive Optimization
Approximate Aggregation
121212© 84.51° 2015 | Public 12
Query plans are not longer set in stone!
Before 12c – the Optimizer’s decision was locked – meaning that no matter how bad its cardinality estimates were for each phase of execution – it continued down the ill-advised path.
The Optimizer can now change path in mid-query execution – for:
o Join Paths
– Helps recover from stale stats causing nested loop joins with large driving tables for example.
o Parallel Query Distribution Methods (Broadcast / Hash)
– This helps your query recover from high skew – preventing an unbalanced
This feature will take queries than run in hours in 11gR2 run in minutes or less!
ADAPTIVE OPTIMIZATIONThe Cost-Based Optimizer just got a whole lot smarter!
adaptive_join.html
131313© 84.51° 2015 | Public 13
Data is organized in columnar, compressed format
Ideal for Data Warehousing (Decision Support Systems) – with analytical workloads
Oracle In-Memory excels at:
o Predicate Filtering (due to SIMD instructions and IMCU “pruning”)
o Bloom Filters for large joins
o Full Table Scans
o Additive Aggregation (hopefully non-additive as well soon)
You will typically see 10-50x performance improvement for non-Exadata, and about 5-10x with Exadata. Note: with Exadata – In-Memory doesn’t shine until you have a highly concurrent workload…
With Engineered Systems – you can choose how the data is spread across RAC Nodes:
o Distribution (default) – (best for fact tables)
o Duplication – consider this for Dimension tables
ORACLE IN-MEMORY COLUMN STORERevolutionizing Analytical Query Performance
141414© 84.51° 2015 | Public 14
Also known as “I.M.A.” – or “Vector Group By”
The methodology was taken from Oracle OLAP – and fitted into In-Memory with use by SQL for the first time
The data is aggregated as it is scanned – reducing TEMP usage to zero in most cases – allowing for blazing fast aggregation for additive measures.
The decision is Cost-Based – and by default only kicks in if your fact table has 10 million or more rows.
You can force Vector Group By in your star aggregation query by using the “/*+ vector_transform */” hint
We have seen 16x performance improvement with Vector Group By as compared to the “traditional” Hash Group By aggregation method.
IN-MEMORY AGGREGATIONA Smarter way to Aggregate
no_vector_group_by.html vector_group_by.html
151515© 84.51° 2015 | Public 15
For the first time – the way data is stored and sorted in a table (or partition) is declarative
Any direct-path INSERTs or “ALTER TABLE … MOVE” commands which load or reorganize the table will obey the table-level attribute clustering directives (stored in the data dictionary)
Storing data that logically belongs together helps improve performance, compression ratios, and improved concurrency for users of your system
You can use “Join Attribute Clustering” to organize your fact data by dimensional attributes. This can greatly speed up star queries.
You can choose between two options:
o Linear (basic ORDER BY) – allowing for better compression – benefitting In-Memory as well as Oracle Exadata Hybrid Columnar Compression
o Interleaved – a.k.a “Z-Order Curve Fitting” – ideal for hierarchical aggregation via star queries
Can be combined with Zone Maps (on Exadata) – to facilitate zone pruning and dramatically reduce I/O as a result
ATTRIBUTE CLUSTERINGStoring similar “fact” rows together
161616© 84.51° 2015 | Public 16
ATTRIBUTE CLUSTERINGExample – from Oracle 12c Data Warehousing Guide documentation
• For Linear-Ordered tables – the data is simply sorted in the order of the column(s) you specify
• For Interleaved-Ordered tables – the data is arranged so that the data is contiguous in a multi-dimensional manner – meaning that contiguous regions contain data for similar “Country” and “Category” values
171717© 84.51° 2015 | Public 17
This technique helps solve the difficult: “Count Distinct Problem” – see: https://en.wikipedia.org/wiki/Count-distinct_problem
HyperLogLog provides about 97%+ accuracy while using far fewer resources than traditional “COUNT (DISTINCT expr)” aggregation
HyperLogLog is distributable – meaning that it can be merged between parallel slaves – effectively making Distinct Counts “additive” in nature.
APPROX_COUNT_DISTINCT provides nearly 10x query performance improvement when used for star queries which perform Distinct Counts with hierarchical aggregation
The more Distinct Counts you do in your query – the more it shines
This feature is near and dear to my heart – after learning about HyperLogLog – I requested (via an SR enhancement request via “My Oracle Support”) that Oracle adopt the “APPROX_COUNT_DISTINCT” function based upon Flajolet’s algorithm.
Oracle put the customer first and graciously added this revolutionary new feature
Many more approximate features could (will) be coming to an Oracle release near you!
APPROXIMATE AGGREGATIONUsing the brilliant: HyperLogLog algorithm
exa_approx_count_distinct.html exa_exact_count_distinct.html
181818
THANK YOU!