do’s and don’ts of oracle database in-memoryda… · do’s and don’ts of oracle database...

Do’s and Don’ts of Oracle Database In-Memory

Jorge Barba Infrastructure Principal at Accenture Enkitec Group https://jorgebarbablog.wordpress.com Mar 2016

Agenda

2

A. Overview

B. Configuration

C. Optimizer

D. Queries

E. Usage with Oracle Technologies

F. Do's and Don'ts

G. Conclusion

Overview

4

Database In-Memory

In-Memory Column Store is an optional, static SGA pool that stores segments in columnar format.

It is a supplement to the Buffer Cache.

The database keeps the columnar data transactionally consistent with the Buffer Cache.

5

Row Format vs. Column Format

Transactions run faster on row format

Example: Query or Insert a sales order

Fast processing a few rows, many columns.

Analytics run faster on column format

Example: Report on sales totals by region

Fast accessing few columns, many rows

6

Dual Format Database

Same table on both formats

Analytics use the new in-memory Column Format

OLTP uses the Row Format

7

Scanning Memory

Buffer Cache

Have to walk along that row until we find col4.

IM Column Store

Go directly to the col4 structure and scan all the entries.

8

Storage Index

Storage Indexes

Automatically created and maintained for each column in the Column Store.

Allow data pruning based on filter predicates in the SQL statement.

Keeps track of minimum and maximum values for each column in an IMCU.

If the column value is outside the minimum and maximum range for an IMCU, the scan of that IMCU is avoided.

9

In-Memory Join and Bloom Filter (BF)

•  A Bloom filter transforms a join into a filter that can be applied as part of the scan of the larger table.

•  Very efficiently applied to column format data via SIMD vector processing.

•  Appears in two places, at creation time and again when it is applied.

SELECT count(*) FROM lineorder lo, part p WHERE lo.lo_partkey=p.p_partkey AND lo.lo_shipmode='TRUCK' AND lo.lo_ordtotalprice between 55000000 and 56000000 AND p.p_name='papaya burlywood’;

10

Vector Group By

•  New optimizer transformation introduced with Oracle 12.1.0.2.0.

•  Find the total sales of footwear products in outlet stores.

•  The combination of these two phases dramatically improves the efficiency of a multiple table join with complex aggregations.

| 9 | KEY VECTOR (USE) | :KV0000 | 10 | KEY VECTOR (USE) | :KV0001

11

SIMD Single Instruction Multiple Data

SIMD A set of column values are evaluated together in a single CPU instruction. Designed to maximize the number of column entries loaded and evaluated in a single CPU instruction. 8 entries are loaded into the register for evaluation.

Configuration

13

Configuring the In-Memory Column Store

Configuration

•  INMEMORY_SIZE=1520M

•  Minimum of 100M

•  Part of the SGA

•  Fixed size

SQL> select * from v$sga; NAME VALUE -------------------- ---------- Fixed Size 2932632 Variable Size 587202664 Database Buffers 2097152000 Redo Buffers 13844480 In-Memory Area 1593835520

alter system set inmemory_size=1520M scope=spfile; shutdown immediate; startup

14

Populating the In-Memory Column Store

ALTER TABLE lineorder INMEMORY;

ALTER TABLE lineorder NO INMEMORY;

CREATE TABLE customer …

PARTITION BY LIST

(PARTITION p1 …… INMEMORY,

PARTITION p2 …… NO INMEMORY);

Eligible segments are:

•  Tables

•  Partitions

•  Subpartitions

•  Materialized Views

Not supported:

•  IOTs, Hash clusters, Out of line LOBs.

15

Composition of In-Memory Area and DML

SQL> select pool, alloc_bytes 2 from v$inmemory_area; POOL ALLOC_BYTES -------------- ----------- 1MB POOL 1274019840 64KB POOL 301989888

16

Composition of In-Memory Area and DML

•  ��Bulk Data Loads

•  Typically conducted as a direct path load.

•  The size of the missing data will be visible in the BYTES_NOT_POPULATED column (V$IM_SEGMENTS).

•  Partition Exchange Loads

•  Partition big tables or fact tables.

•  Transaction Processing

•  Single row data change operations (DML) execute via the Buffer Cache.

17

Repopulation

��

•  ��Oracle Database will repopulate an IMCU when the number of entries reaches a threshold.

•  Repopulation is more frequent for IMCUs that are accessed frequently or have higher percentage of stale rows.

•  Also the IMCO background process may also repopulate.

18

Overhead

��

•  ��Keeping the IM Column Store transactionally consistent.

•  Rate of change, compression level, location of the changed rows, type of operations being performed.

Optimizer

20

Optimizer

1.  In 12c the optimizer is fully aware of the In-Memory Column Store. This means that the optimizer will cost the read from the In-memory Column Store.

2.  Using the same statistics and also new In-Memory statistics

21

Optimizer 10053 trace

SELECT /* opt_trace_test */ sum(lo_revenue) FROM lineorder lo, customer c WHERE lo.lo_custkey=c.c_custkey AND c_region='AFRICA';

column sql_text format a30 select sql_id, child_number, sql_text from v$sql where sql_text like '%opt_trace_test%'; SQL_ID ------------- 5b8n5m6gtx71r

22

Optimizer 10053 trace (cont)

alter session set max_dump_file_size = unlimited; execute DBMS_SQLDIAG.DUMP_TRACE(- p_sql_id=>'5b8n5m6gtx71r', - p_child_number=>0, - p_component=>'Optimizer', - p_file_id=>'TRACE_10053');

How to Obtain Tracing of Optimizer Computations (EVENT 10053) (Doc ID 225598.1)

col value format a90 SELECT value FROM v$diag_info WHERE name='Default Trace File'; VALUE --------------------------------------------------------------------------------------- /u01/app/oracle/diag/rdbms/db_inst2/DBNAME/trace/DB_INST2_ora_130114_TRACE_10053.trc

23

Optimizer 10053 trace (cont)

*************************************** BASE STATISTICAL INFORMATION *********************** Table Stats:: Table: CUSTOMER Alias: C #Rows: 120000 SSZ: 0 LGR: 0 #Blks: 1882 AvgRowLen: 107.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 1 IMCRowCnt: 120000 IMCJournalRowCnt: 3000 #IMCBlocks: 0 IMCQuotient: 1.000000 Column (#1): C_CUSTKEY(NUMBER) AvgLen: 5 NDV: 120000 Nulls: 0 Density: 0.000008 Min: 1.000000 Max: 120000.000000 *********************** Table Stats:: Table: LINEORDER Alias: LO #Rows: 23996670 SSZ: 0 LGR: 0 #Blks: 335060 AvgRowLen: 96.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 23 IMCRowCnt: 23996670 IMCJournalRowCnt: 599917 #IMCBlocks: 0 IMCQuotient: 1.000000 Column (#3): LO_CUSTKEY(NUMBER) AvgLen: 5 NDV: 80504 Nulls: 0 Density: 0.000012 Min: 1.000000 Max: 119999.000000

24

Disabling and Enabling In-Memory

To disable In-Memory scans set the parameter INMEMORY_QUERY = DISABLE Plans will get the cost based on disk statistics and scans will be from the IM Column Store. Use hint INMEMORY to force In-Memory Scan even if INMEMORY_QUERY is disabled. NO_INMEMORY disables In-Memory scan even if the table is in the In-Memory Column Store.

SELECT /*+ INMEMORY */ sum(lo_revenue) FROM lineorder lo, customer c WHERE lo.lo_custkey=c.c_custkey AND c_region='AFRICA’;

SELECT /*+ NO_INMEMORY */ sum(lo_revenue) FROM lineorder lo, customer c WHERE lo.lo_custkey=c.c_custkey AND c_region='AFRICA’;

25

Optimizer Summary

•  In 12.1.0.2 the Optimizer is In-Memory aware.

•  In-Memory statistics automatically generated at parse.

•  Cost model adjusted for costing In-Memory Scans

•  New INMEMORY and NO_INMEMORY hints

Queries

27

Queries

•  We have chosen some of the queries that are candidates to benefit from Database In-Memory.

28

Function MAX(column)

select max(lo_ordtotalprice)

from lineorder;

MAX(LO_ORDTOTALPRICE)

---------------------

55903140

Elapsed time: 0.004

select /*+ NO_INMEMORY */ max(lo_ordtotalprice)

from lineorder;


---------------------

55903140

Elapsed time: 4.014 ��

29


B-tree index on lo_ordtotalprice create index ordtotalprice_ix on lineorder(lo_ordtotalprice);


from lineorder;


---------------------

55903140

Elapsed time: 0.001

30


How about Result Cache? ��SQL> show parameter result_cache_max_size

��big integer 22000K


from lineorder;


---------------------

55903140

Elapsed time: 0.001

31

How do we know it used the In-Memory Column Store?

select n.name, s.value from v$mystat s, v$statname n where s.statistic#=n.statistic# and n.name like 'IM scan CUs columns accessed';

-----------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 2224 (12)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 3 | | |

| 2 | TABLE ACCESS INMEMORY FULL| LINEORDER | 24M| 68M| 2224 (12)| 00:00:01 |

-----------------------------------------------------------------------------------------

NAME VALUE---------------------------------------------------------------- ----------IM scan CUs columns accessed 45

32

One equality predicate

SELECT lo_orderkey, lo_custkey, lo_revenue

FROM lineorder

WHERE lo_orderkey = 4000000; LO_ORDERKEY LO_CUSTKEY LO_REVENUE

----------- ----------- -----------

4000000 51832 6983797

4000000 51832 917952

4000000 51832 2501733

4000000 51832 7895007

Elapsed time: 0.002

SELECT /*+ NO_INMEMORY */lo_orderkey, lo_custkey, lo_revenue

FROM lineorder


----------- ----------- -----------

4000000 51832 6983797

4000000 51832 917952

4000000 51832 2501733

4000000 51832 7895007

Elapsed time: 4.868

IM scan CUs columns accessed 3

IM scan segments minmax eligible 44

IM scan CUs pruned 43

33

One equality predicate

B-tree index on lo_orderkey create index lo_orderkey_ix on lineorder (lo_orderkey);

SELECT /*+ NO_INMEMORY */lo_orderkey, lo_custkey, lo_revenue

FROM lineorder


----------- ----------- -----------

4000000 51832 6983797

4000000 51832 917952

4000000 51832 2501733

4000000 51832 7895007

Elapsed time: 0.001

34

Three equality predicate

SELECT lo_orderkey, lo_custkey, lo_revenue

FROM lineorder

WHERE lo_custkey = 13286

AND lo_shipmode = 'TRUCK’

AND lo_orderpriority = '3-MEDIUM’;

LO_ORDERKEY LO_CUSTKEY LO_REVENUE

----------- ----------- -----------

8268262 13286 6777268

8268262 13286 6207689

9048868 13286 6394887

17521920 13286 2905822

19397281 13286 3573400

Elapsed time: 0.002

SELECT /*+ NO_INMEMORY */ lo_orderkey, lo_custkey, lo_revenue

FROM lineorder




LO_ORDERKEY LO_CUSTKEY LO_REVENUE

----------- ----------- -----------

8268262 13286 6777268

8268262 13286 6207689

9048868 13286 6394887

17521920 13286 2905822

19397281 13286 3573400

Elapsed time: 4.868




35

How about composite index?

create index cust_ship_pri_ix on lineorder

(lo_custkey, lo_shipmode, lo_orderpriority);

SELECT /*+ NO_INMEMORY */ lo_orderkey, lo_custkey, lo_revenue

FROM lineorder




Customer#000118324 JORDAN

Customer#000119371 ETHIOPIA

Customer#000119386 SAUDI ARABIA

Customer#000118412 VIETNAMLO_ORDERKEY LO_CUSTKEY LO_REVENUE

----------- ----------- -----------

8268262 13286 6777268

8268262 13286 6207689

9048868 13286 6394887

17521920 13286 2905822

19397281 13286 3573400

Elapsed time: 0.001

36

Greater Than (>) Instead of Equality (=)

SELECT MAX(lo_ordtotalprice)

FROM lineorder

WHERE lo_quantity > 74;

Elapsed time: 0.002

SELECT MAX(lo_ordtotalprice)

FROM lineorder


Elapsed time: 4.82




37

How about an index?

create index lo_quantity_ix on lineorder(lo_quantity);

SELECT /*+ INDEX(a lo_quantity_ix) */ MAX(lo_ordtotalprice)

FROM lineorder a


Elapsed time: 0.001

38

Bloom Filter Example

SELECT c.c_name, c.c_nation

FROM customer c, part p,

lineorder lo

WHERE lo.lo_custkey = c.c_custkey

AND lo.lo_partkey = p.p_partkey

AND p.p_name = 'white salmon';

C_NAME C_NATION

------------------------- ---------------

Customer#000118324 JORDAN

Customer#000119371 ETHIOPIA

Customer#000119386 SAUDI ARABIA

Customer#000118412 VIETNAM

...

...

39

Vector Group By Example

SELECT /*+ VECTOR_TRANSFORM */ c.c_name, c.c_nation, sum(lo_ordtotalprice)

FROM customer c, part p,

lineorder lo

WHERE lo.lo_custkey = c.c_custkey

AND lo.lo_partkey = p.p_partkey

AND p.p_name = 'white salmon'

group by c.c_name, c.c_nation;

... ...

C_NAME C_NATION SUM(LO_ORDTOTALPRICE)

------------------------- --------------- ---------------------

Customer#000118540 ROMANIA 12634439

Customer#000119051 ARGENTINA 28032775

Customer#000118129 JAPAN 22258279

Customer#000118198 UNITED STATES 17849279

Usage with Oracle Technologies

41

RAC

Each node in a RAC environment has its own IM column store. Objects populated into memory will be distributed across all of the IM column stores in the cluster. ALTER TABLE lineorder INMEMORY DISTRIBUTE BY PARTITION; ALTER TABLE lineorder INMEMORY DISTRIBUTE AUTO DUPLICATE ALL;

42

M6-32

M6-32 SMP removes the overhead of distributing queries across a cluster and coordinating transactions. Algorithms NUMA optimized. Memory interconnect far faster than any network.

43

Exadata

Exadata

Complete fault-tolerant In-Memory solution.

Exceed DRAM limits and transparently scale across Memory, Flash and Disk.

Initial population of data into the In-Memory column store from storage is very fast.

The In-Memory Aggregation optimization, can be offloaded to Exadata storage cells

44

Oracle Technologies

•  Data Guard

•  Golden Gate

•  Oracle Multitenant

•  Partitioning

•  Parallelism

•  Resource Manager

•  RMAN

•  ALTER TABLE EXCHANGE

45

Partition Exchange

1.  Create external table for flat files 2.  Use CTAS to create non part table and

gather table stats 3.  Set INMENORY attribute ON 4.  Populate non part table in column store 5.  Alter table <table_name> exchange partition

<part name> with table <non part table>;

Do’s and Don’ts

47

The benefits of IM Column Store

Speed up Scans, Joins, Aggregates. Scan of large tables and using predicates like =, <, >, IN Querying a subset of columns in a table, for example, selecting 5 of 100 columns. Accelerating joins by converting predicates on small dimension tables into filters on a large fact table.

48

Do or Don’t?

Business applications Ad-hoc analytic queries Data warehouse workloads

49

Do or Don’t?

OLTP databases short transactions using index lookups

50

Do

Queries that scan a large number of rows and apply filters that use operators such as the following: =, <, >, and IN. Queries that select a small number of columns from a table or materialized view with a large number of columns, such as a query that selects five columns from a table with 100 columns. Queries that join a small table to a large table. Queries that aggregate data.

51

Don’t

Queries with complex predicates. Queries that select a large number of columns. Queries that return a large number of rows. Queries with multiple large table joins.

52

Don’t

In-memory speeds up analytic data access, not: Network round trips, logon/logoff Parsing, PL/SQL, complex functions Data processing (as opposed to access) Complex joins or aggregations where not much data is filtered before processing Load and select once - Staging tables, ETL, temp tables

Is it for me? Conclusion

54

KEEP CALM

AND ASK ME

Tú Pregúntame …

Please visit my blog at: https://jorgebarbablog.wordpress.com

do’s and don’ts of oracle database in-memoryda… · do’s and don’ts of oracle database...

Documents