do’s and don’ts of oracle database in-memoryda… · do’s and don’ts of oracle database...
TRANSCRIPT
Do’s and Don’ts of Oracle Database In-Memory
Jorge Barba Infrastructure Principal at Accenture Enkitec Group https://jorgebarbablog.wordpress.com Mar 2016
Agenda
2
A. Overview
B. Configuration
C. Optimizer
D. Queries
E. Usage with Oracle Technologies
F. Do's and Don'ts
G. Conclusion
Overview
4
Database In-Memory
In-Memory Column Store is an optional, static SGA pool that stores segments in columnar format.
It is a supplement to the Buffer Cache.
The database keeps the columnar data transactionally consistent with the Buffer Cache.
5
Row Format vs. Column Format
Transactions run faster on row format
Example: Query or Insert a sales order
Fast processing a few rows, many columns.
Analytics run faster on column format
Example: Report on sales totals by region
Fast accessing few columns, many rows
6
Dual Format Database
Same table on both formats
Analytics use the new in-memory Column Format
OLTP uses the Row Format
7
Scanning Memory
Buffer Cache
Have to walk along that row until we find col4.
IM Column Store
Go directly to the col4 structure and scan all the entries.
8
Storage Index
Storage Indexes
Automatically created and maintained for each column in the Column Store.
Allow data pruning based on filter predicates in the SQL statement.
Keeps track of minimum and maximum values for each column in an IMCU.
If the column value is outside the minimum and maximum range for an IMCU, the scan of that IMCU is avoided.
9
In-Memory Join and Bloom Filter (BF)
• A Bloom filter transforms a join into a filter that can be applied as part of the scan of the larger table.
• Very efficiently applied to column format data via SIMD vector processing.
• Appears in two places, at creation time and again when it is applied.
SELECT count(*) FROM lineorder lo, part p WHERE lo.lo_partkey=p.p_partkey AND lo.lo_shipmode='TRUCK' AND lo.lo_ordtotalprice between 55000000 and 56000000 AND p.p_name='papaya burlywood’;
10
Vector Group By
• New optimizer transformation introduced with Oracle 12.1.0.2.0.
• Find the total sales of footwear products in outlet stores.
• The combination of these two phases dramatically improves the efficiency of a multiple table join with complex aggregations.
| 9 | KEY VECTOR (USE) | :KV0000 | 10 | KEY VECTOR (USE) | :KV0001
11
SIMD Single Instruction Multiple Data
SIMD A set of column values are evaluated together in a single CPU instruction. Designed to maximize the number of column entries loaded and evaluated in a single CPU instruction. 8 entries are loaded into the register for evaluation.
Configuration
13
Configuring the In-Memory Column Store
Configuration
• INMEMORY_SIZE=1520M
• Minimum of 100M
• Part of the SGA
• Fixed size
SQL> select * from v$sga; NAME VALUE -------------------- ---------- Fixed Size 2932632 Variable Size 587202664 Database Buffers 2097152000 Redo Buffers 13844480 In-Memory Area 1593835520
alter system set inmemory_size=1520M scope=spfile; shutdown immediate; startup
14
Populating the In-Memory Column Store
ALTER TABLE lineorder INMEMORY;
ALTER TABLE lineorder NO INMEMORY;
CREATE TABLE customer …
PARTITION BY LIST
(PARTITION p1 …… INMEMORY,
PARTITION p2 …… NO INMEMORY);
Eligible segments are:
• Tables
• Partitions
• Subpartitions
• Materialized Views
Not supported:
• IOTs, Hash clusters, Out of line LOBs.
15
Composition of In-Memory Area and DML
SQL> select pool, alloc_bytes 2 from v$inmemory_area; POOL ALLOC_BYTES -------------- ----------- 1MB POOL 1274019840 64KB POOL 301989888
16
Composition of In-Memory Area and DML
• ���Bulk Data Loads
• Typically conducted as a direct path load.
• The size of the missing data will be visible in the BYTES_NOT_POPULATED column (V$IM_SEGMENTS).
• Partition Exchange Loads
• Partition big tables or fact tables.
• Transaction Processing
• Single row data change operations (DML) execute via the Buffer Cache.
17
Repopulation
���
• ���Oracle Database will repopulate an IMCU when the number of entries reaches a threshold.
• Repopulation is more frequent for IMCUs that are accessed frequently or have higher percentage of stale rows.
• Also the IMCO background process may also repopulate.
18
Overhead
���
• ���Keeping the IM Column Store transactionally consistent.
• Rate of change, compression level, location of the changed rows, type of operations being performed.
Optimizer
20
Optimizer
1. In 12c the optimizer is fully aware of the In-Memory Column Store. This means that the optimizer will cost the read from the In-memory Column Store.
2. Using the same statistics and also new In-Memory statistics
21
Optimizer 10053 trace
SELECT /* opt_trace_test */ sum(lo_revenue) FROM lineorder lo, customer c WHERE lo.lo_custkey=c.c_custkey AND c_region='AFRICA';
column sql_text format a30 select sql_id, child_number, sql_text from v$sql where sql_text like '%opt_trace_test%'; SQL_ID ------------- 5b8n5m6gtx71r
22
Optimizer 10053 trace (cont)
alter session set max_dump_file_size = unlimited; execute DBMS_SQLDIAG.DUMP_TRACE(- p_sql_id=>'5b8n5m6gtx71r', - p_child_number=>0, - p_component=>'Optimizer', - p_file_id=>'TRACE_10053');
How to Obtain Tracing of Optimizer Computations (EVENT 10053) (Doc ID 225598.1)
col value format a90 SELECT value FROM v$diag_info WHERE name='Default Trace File'; VALUE --------------------------------------------------------------------------------------- /u01/app/oracle/diag/rdbms/db_inst2/DBNAME/trace/DB_INST2_ora_130114_TRACE_10053.trc
23
Optimizer 10053 trace (cont)
*************************************** BASE STATISTICAL INFORMATION *********************** Table Stats:: Table: CUSTOMER Alias: C #Rows: 120000 SSZ: 0 LGR: 0 #Blks: 1882 AvgRowLen: 107.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 1 IMCRowCnt: 120000 IMCJournalRowCnt: 3000 #IMCBlocks: 0 IMCQuotient: 1.000000 Column (#1): C_CUSTKEY(NUMBER) AvgLen: 5 NDV: 120000 Nulls: 0 Density: 0.000008 Min: 1.000000 Max: 120000.000000 *********************** Table Stats:: Table: LINEORDER Alias: LO #Rows: 23996670 SSZ: 0 LGR: 0 #Blks: 335060 AvgRowLen: 96.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 23 IMCRowCnt: 23996670 IMCJournalRowCnt: 599917 #IMCBlocks: 0 IMCQuotient: 1.000000 Column (#3): LO_CUSTKEY(NUMBER) AvgLen: 5 NDV: 80504 Nulls: 0 Density: 0.000012 Min: 1.000000 Max: 119999.000000
24
Disabling and Enabling In-Memory
To disable In-Memory scans set the parameter INMEMORY_QUERY = DISABLE Plans will get the cost based on disk statistics and scans will be from the IM Column Store. Use hint INMEMORY to force In-Memory Scan even if INMEMORY_QUERY is disabled. NO_INMEMORY disables In-Memory scan even if the table is in the In-Memory Column Store.
SELECT /*+ INMEMORY */ sum(lo_revenue) FROM lineorder lo, customer c WHERE lo.lo_custkey=c.c_custkey AND c_region='AFRICA’;
SELECT /*+ NO_INMEMORY */ sum(lo_revenue) FROM lineorder lo, customer c WHERE lo.lo_custkey=c.c_custkey AND c_region='AFRICA’;
25
Optimizer Summary
• In 12.1.0.2 the Optimizer is In-Memory aware.
• In-Memory statistics automatically generated at parse.
• Cost model adjusted for costing In-Memory Scans
• New INMEMORY and NO_INMEMORY hints
Queries
27
Queries
• We have chosen some of the queries that are candidates to benefit from Database In-Memory.
28
Function MAX(column)
select max(lo_ordtotalprice)
from lineorder;
MAX(LO_ORDTOTALPRICE)
---------------------
55903140
Elapsed time: 0.004
select /*+ NO_INMEMORY */ max(lo_ordtotalprice)
from lineorder;
MAX(LO_ORDTOTALPRICE)
---------------------
55903140
Elapsed time: 4.014 ���
29
Function MAX(column)
B-tree index on lo_ordtotalprice create index ordtotalprice_ix on lineorder(lo_ordtotalprice);
select /*+ NO_INMEMORY */ max(lo_ordtotalprice)
from lineorder;
MAX(LO_ORDTOTALPRICE)
---------------------
55903140
Elapsed time: 0.001
30
Function MAX(column)
How about Result Cache? ���SQL> show parameter result_cache_max_size
���big integer 22000K
select /*+ NO_INMEMORY */ max(lo_ordtotalprice)
from lineorder;
MAX(LO_ORDTOTALPRICE)
---------------------
55903140
Elapsed time: 0.001
31
How do we know it used the In-Memory Column Store?
select n.name, s.value from v$mystat s, v$statname n where s.statistic#=n.statistic# and n.name like 'IM scan CUs columns accessed';
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 2224 (12)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | TABLE ACCESS INMEMORY FULL| LINEORDER | 24M| 68M| 2224 (12)| 00:00:01 |
-----------------------------------------------------------------------------------------
NAME VALUE---------------------------------------------------------------- ----------IM scan CUs columns accessed 45
32
One equality predicate
SELECT lo_orderkey, lo_custkey, lo_revenue
FROM lineorder
WHERE lo_orderkey = 4000000; LO_ORDERKEY LO_CUSTKEY LO_REVENUE
----------- ----------- -----------
4000000 51832 6983797
4000000 51832 917952
4000000 51832 2501733
4000000 51832 7895007
Elapsed time: 0.002
SELECT /*+ NO_INMEMORY */lo_orderkey, lo_custkey, lo_revenue
FROM lineorder
WHERE lo_orderkey = 4000000; LO_ORDERKEY LO_CUSTKEY LO_REVENUE
----------- ----------- -----------
4000000 51832 6983797
4000000 51832 917952
4000000 51832 2501733
4000000 51832 7895007
Elapsed time: 4.868
IM scan CUs columns accessed 3
IM scan segments minmax eligible 44
IM scan CUs pruned 43
33
One equality predicate
B-tree index on lo_orderkey create index lo_orderkey_ix on lineorder (lo_orderkey);
SELECT /*+ NO_INMEMORY */lo_orderkey, lo_custkey, lo_revenue
FROM lineorder
WHERE lo_orderkey = 4000000; LO_ORDERKEY LO_CUSTKEY LO_REVENUE
----------- ----------- -----------
4000000 51832 6983797
4000000 51832 917952
4000000 51832 2501733
4000000 51832 7895007
Elapsed time: 0.001
34
Three equality predicate
SELECT lo_orderkey, lo_custkey, lo_revenue
FROM lineorder
WHERE lo_custkey = 13286
AND lo_shipmode = 'TRUCK’
AND lo_orderpriority = '3-MEDIUM’;
LO_ORDERKEY LO_CUSTKEY LO_REVENUE
----------- ----------- -----------
8268262 13286 6777268
8268262 13286 6207689
9048868 13286 6394887
17521920 13286 2905822
19397281 13286 3573400
Elapsed time: 0.002
SELECT /*+ NO_INMEMORY */ lo_orderkey, lo_custkey, lo_revenue
FROM lineorder
WHERE lo_custkey = 13286
AND lo_shipmode = 'TRUCK’
AND lo_orderpriority = '3-MEDIUM’;
LO_ORDERKEY LO_CUSTKEY LO_REVENUE
----------- ----------- -----------
8268262 13286 6777268
8268262 13286 6207689
9048868 13286 6394887
17521920 13286 2905822
19397281 13286 3573400
Elapsed time: 4.868
IM scan CUs columns accessed 3
IM scan segments minmax eligible 44
IM scan CUs pruned 43
35
How about composite index?
create index cust_ship_pri_ix on lineorder
(lo_custkey, lo_shipmode, lo_orderpriority);
SELECT /*+ NO_INMEMORY */ lo_orderkey, lo_custkey, lo_revenue
FROM lineorder
WHERE lo_custkey = 13286
AND lo_shipmode = 'TRUCK’
AND lo_orderpriority = '3-MEDIUM’;
Customer#000118324 JORDAN
Customer#000119371 ETHIOPIA
Customer#000119386 SAUDI ARABIA
Customer#000118412 VIETNAMLO_ORDERKEY LO_CUSTKEY LO_REVENUE
----------- ----------- -----------
8268262 13286 6777268
8268262 13286 6207689
9048868 13286 6394887
17521920 13286 2905822
19397281 13286 3573400
Elapsed time: 0.001
36
Greater Than (>) Instead of Equality (=)
SELECT MAX(lo_ordtotalprice)
FROM lineorder
WHERE lo_quantity > 74;
Elapsed time: 0.002
SELECT MAX(lo_ordtotalprice)
FROM lineorder
WHERE lo_quantity > 74;
Elapsed time: 4.82
IM scan CUs columns accessed 3
IM scan segments minmax eligible 44
IM scan CUs pruned 43
37
How about an index?
create index lo_quantity_ix on lineorder(lo_quantity);
SELECT /*+ INDEX(a lo_quantity_ix) */ MAX(lo_ordtotalprice)
FROM lineorder a
WHERE lo_quantity > 74;
Elapsed time: 0.001
38
Bloom Filter Example
SELECT c.c_name, c.c_nation
FROM customer c, part p,
lineorder lo
WHERE lo.lo_custkey = c.c_custkey
AND lo.lo_partkey = p.p_partkey
AND p.p_name = 'white salmon';
C_NAME C_NATION
------------------------- ---------------
Customer#000118324 JORDAN
Customer#000119371 ETHIOPIA
Customer#000119386 SAUDI ARABIA
Customer#000118412 VIETNAM
...
...
39
Vector Group By Example
SELECT /*+ VECTOR_TRANSFORM */ c.c_name, c.c_nation, sum(lo_ordtotalprice)
FROM customer c, part p,
lineorder lo
WHERE lo.lo_custkey = c.c_custkey
AND lo.lo_partkey = p.p_partkey
AND p.p_name = 'white salmon'
group by c.c_name, c.c_nation;
... ...
C_NAME C_NATION SUM(LO_ORDTOTALPRICE)
------------------------- --------------- ---------------------
Customer#000118540 ROMANIA 12634439
Customer#000119051 ARGENTINA 28032775
Customer#000118129 JAPAN 22258279
Customer#000118198 UNITED STATES 17849279
Usage with Oracle Technologies
41
RAC
Each node in a RAC environment has its own IM column store. Objects populated into memory will be distributed across all of the IM column stores in the cluster. ALTER TABLE lineorder INMEMORY DISTRIBUTE BY PARTITION; ALTER TABLE lineorder INMEMORY DISTRIBUTE AUTO DUPLICATE ALL;
42
M6-32
M6-32 SMP removes the overhead of distributing queries across a cluster and coordinating transactions. Algorithms NUMA optimized. Memory interconnect far faster than any network.
43
Exadata
Exadata
Complete fault-tolerant In-Memory solution.
Exceed DRAM limits and transparently scale across Memory, Flash and Disk.
Initial population of data into the In-Memory column store from storage is very fast.
The In-Memory Aggregation optimization, can be offloaded to Exadata storage cells
44
Oracle Technologies
• Data Guard
• Golden Gate
• Oracle Multitenant
• Partitioning
• Parallelism
• Resource Manager
• RMAN
• ALTER TABLE EXCHANGE
45
Partition Exchange
1. Create external table for flat files 2. Use CTAS to create non part table and
gather table stats 3. Set INMENORY attribute ON 4. Populate non part table in column store 5. Alter table <table_name> exchange partition
<part name> with table <non part table>;
Do’s and Don’ts
47
The benefits of IM Column Store
Speed up Scans, Joins, Aggregates. Scan of large tables and using predicates like =, <, >, IN Querying a subset of columns in a table, for example, selecting 5 of 100 columns. Accelerating joins by converting predicates on small dimension tables into filters on a large fact table.
48
Do or Don’t?
Business applications Ad-hoc analytic queries Data warehouse workloads
49
Do or Don’t?
OLTP databases short transactions using index lookups
50
Do
Queries that scan a large number of rows and apply filters that use operators such as the following: =, <, >, and IN. Queries that select a small number of columns from a table or materialized view with a large number of columns, such as a query that selects five columns from a table with 100 columns. Queries that join a small table to a large table. Queries that aggregate data.
51
Don’t
Queries with complex predicates. Queries that select a large number of columns. Queries that return a large number of rows. Queries with multiple large table joins.
52
Don’t
In-memory speeds up analytic data access, not: Network round trips, logon/logoff Parsing, PL/SQL, complex functions Data processing (as opposed to access) Complex joins or aggregations where not much data is filtered before processing Load and select once - Staging tables, ETL, temp tables
Is it for me? Conclusion
54
KEEP CALM
AND ASK ME
Tú Pregúntame …
Please visit my blog at: https://jorgebarbablog.wordpress.com