turbocharge sql performance with oracle database 12c philip moore senior data architect and...

Turbocharge SQL Performance with Oracle Database 12c

Philip Moore

Senior Data Architect and Developer

222© 84.51° 2015 | Public

We believe in making people’s lives easier by putting the customer at the center of everything we do

WHO IS 84.51°

333© 84.51° 2015 | Public

84.51° HELPS COMPANIES PUT THE CUSTOMER AT THE CENTER OF EVERY DECISION

With insights from the data, we personalize the experience, making your business more relevant to consumers who matter most

Second, we work with you to change the organization, embedding the principle of shopper first

In so doing, we grow measurable brand value for our clients

444© 84.51° 2015 | Public

Q2

02Q

3 02

Q4

02Q

1 03

Q2

03Q

3 03

Q4

03Q

1 04

Q2

04Q

3 04

Q4

04Q

1 05

Q2

05Q

3 05

Q4

05Q

1 06

Q2

06Q

3 06

Q4

06Q

1 07

Q2

07Q

3 07

Q4

07Q

1 08

Q2

08Q

3 08

Q4

08Q

1 09

Q2

09Q

3 09

Q4

09Q

1 10

Q2

10Q

3 10

Q4

10Q

1 11

Q2

11Q

3 11

Q4

11Q

1 12

Q2

12Q

3 12

Q4

12

0%

-1%-2%

-1%-1%

0%

1%

1%1%

2% 2%2%

3%4%

5%

6%6%

5% 5% 5% 5%6%

5%6%

5%

6%

4%

3%3%

1% 1%

2%3%

2%

4%

5%

5%5% 5%

4%4%

3%4%

Identical Store Sales (Exc Fuel)

% C

han

ge

in I

den

tica

l S

tore

Sal

es Y

ear

on

Yea

rPLACING THE CUSTOMER AT THE CENTER WORKS

JV Partnership began in 2003

10%

0%

555© 84.51° 2015 | Public 5

WHO AM I?

Philip Moore

• Exadata Data Warehouse Architect / Lead Oracle

Developer – 84.51°

• 15+ years Oracle data warehousing (since Oracle 8i)

• 5+ years with Oracle Exadata (v2, x2, x3)

• Oracle Certified Professional

‑ PL/SQL Developer

‑ 10g DBA

• Oracle SQL Certified Expert

?

Experience with:• SQL Server

• Teradata

• Netezza

• Greenplum

• PostgreSQL

666

Before we start on 12c…

Let’s Talk about:

84.51’s Oracle Data Warehouse

777© 84.51° 2015 | Public 7

• ETL• Analyst ad-hoc queries (no-SLA)• Loyalty Campaign Mailer

• ETL• Products –– The Shop• Pre-canned queries with SLA

Oracle Exadata runs the heart of our Data Warehouse. It has enabled us to deliver things we could not do before.

We use an Active-Active Disaster Recovery configuration –– using a dual ETL/ELT strategy to load data into our Primary site and to the D/R and Products site

Primary Site D/R and Products Site

WE USE ORACLE EXADATA

ETL

ETL

888© 84.51° 2015 | Public 8

Fact Table200+ Billion

rows

Logically — We use a traditional Kimball star schema

Example client model:

Customers300+ Million rows

Terminals500+ rows

Stores15,000+ rows

Dates4,700+ rows

Products8+ Million rows

84.51 - Data Warehouse Point of Sale Data Model

999© 84.51° 2015 | Public 9

Physically — we use the following strategies

For Dimensions

• Oracle Advanced Compression (COMPRESS FOR OLTP)

• Primary Keys are enforced — in RELY mode

• Oracle DIMENSION objects

• Exadata Hybrid Columnar Compression (HCC) — QUERY HIGH

• We will put the latest 2 years (“Hot” data) into the In-Memory Column Store

• Partition by RANGE on our date field, 1 week per partition

“Soft” Referential Integrity for Foreign Key relationships

• NUMBER for Numbers

• DATE for Dates

• VARCHAR2 for Characters

For Facts

Proper data types for our data

NOT NULL and CHECK constraints to validate data and help CBO

PHYSICAL MODEL STRATEGIES FOR EXADATA

101010

Now Let’s Talk Oracle 12c New Features that will speed up the Data Warehouse

111111© 84.51° 2015 | Confidential

12c – New Data Warehousing FeaturesThese new features will accelerate your Data Warehouse SQL queries

Oracle In-Memory Column Store

In-Memory Aggregation (Vector Group By)

Attribute Clustering

Adaptive Optimization

Approximate Aggregation

121212© 84.51° 2015 | Public 12

Query plans are not longer set in stone!

Before 12c – the Optimizer’s decision was locked – meaning that no matter how bad its cardinality estimates were for each phase of execution – it continued down the ill-advised path.

The Optimizer can now change path in mid-query execution – for:

o Join Paths

– Helps recover from stale stats causing nested loop joins with large driving tables for example.

o Parallel Query Distribution Methods (Broadcast / Hash)

– This helps your query recover from high skew – preventing an unbalanced

This feature will take queries than run in hours in 11gR2 run in minutes or less!

ADAPTIVE OPTIMIZATIONThe Cost-Based Optimizer just got a whole lot smarter!

adaptive_join.html

131313© 84.51° 2015 | Public 13

Data is organized in columnar, compressed format

Ideal for Data Warehousing (Decision Support Systems) – with analytical workloads

Oracle In-Memory excels at:

o Predicate Filtering (due to SIMD instructions and IMCU “pruning”)

o Bloom Filters for large joins

o Full Table Scans

o Additive Aggregation (hopefully non-additive as well soon)

You will typically see 10-50x performance improvement for non-Exadata, and about 5-10x with Exadata. Note: with Exadata – In-Memory doesn’t shine until you have a highly concurrent workload…

With Engineered Systems – you can choose how the data is spread across RAC Nodes:

o Distribution (default) – (best for fact tables)

o Duplication – consider this for Dimension tables

ORACLE IN-MEMORY COLUMN STORERevolutionizing Analytical Query Performance

141414© 84.51° 2015 | Public 14

Also known as “I.M.A.” – or “Vector Group By”

The methodology was taken from Oracle OLAP – and fitted into In-Memory with use by SQL for the first time

The data is aggregated as it is scanned – reducing TEMP usage to zero in most cases – allowing for blazing fast aggregation for additive measures.

The decision is Cost-Based – and by default only kicks in if your fact table has 10 million or more rows.

You can force Vector Group By in your star aggregation query by using the “/*+ vector_transform */” hint

We have seen 16x performance improvement with Vector Group By as compared to the “traditional” Hash Group By aggregation method.

IN-MEMORY AGGREGATIONA Smarter way to Aggregate

no_vector_group_by.html vector_group_by.html

151515© 84.51° 2015 | Public 15

For the first time – the way data is stored and sorted in a table (or partition) is declarative

Any direct-path INSERTs or “ALTER TABLE … MOVE” commands which load or reorganize the table will obey the table-level attribute clustering directives (stored in the data dictionary)

Storing data that logically belongs together helps improve performance, compression ratios, and improved concurrency for users of your system

You can use “Join Attribute Clustering” to organize your fact data by dimensional attributes. This can greatly speed up star queries.

You can choose between two options:

o Linear (basic ORDER BY) – allowing for better compression – benefitting In-Memory as well as Oracle Exadata Hybrid Columnar Compression

o Interleaved – a.k.a “Z-Order Curve Fitting” – ideal for hierarchical aggregation via star queries

Can be combined with Zone Maps (on Exadata) – to facilitate zone pruning and dramatically reduce I/O as a result

ATTRIBUTE CLUSTERINGStoring similar “fact” rows together

161616© 84.51° 2015 | Public 16

ATTRIBUTE CLUSTERINGExample – from Oracle 12c Data Warehousing Guide documentation

• For Linear-Ordered tables – the data is simply sorted in the order of the column(s) you specify

• For Interleaved-Ordered tables – the data is arranged so that the data is contiguous in a multi-dimensional manner – meaning that contiguous regions contain data for similar “Country” and “Category” values

171717© 84.51° 2015 | Public 17

This technique helps solve the difficult: “Count Distinct Problem” – see: https://en.wikipedia.org/wiki/Count-distinct_problem

HyperLogLog provides about 97%+ accuracy while using far fewer resources than traditional “COUNT (DISTINCT expr)” aggregation

HyperLogLog is distributable – meaning that it can be merged between parallel slaves – effectively making Distinct Counts “additive” in nature.

APPROX_COUNT_DISTINCT provides nearly 10x query performance improvement when used for star queries which perform Distinct Counts with hierarchical aggregation

The more Distinct Counts you do in your query – the more it shines

This feature is near and dear to my heart – after learning about HyperLogLog – I requested (via an SR enhancement request via “My Oracle Support”) that Oracle adopt the “APPROX_COUNT_DISTINCT” function based upon Flajolet’s algorithm.

Oracle put the customer first and graciously added this revolutionary new feature

Many more approximate features could (will) be coming to an Oracle release near you!

APPROXIMATE AGGREGATIONUsing the brilliant: HyperLogLog algorithm

exa_approx_count_distinct.html exa_exact_count_distinct.html

https://en.wikipedia.org/wiki/Count-distinct_problem

https://en.wikipedia.org/wiki/Count-distinct_problem

181818

THANK YOU!

turbocharge sql performance with oracle database 12c philip moore senior data architect and...

Documents