8th tuc meeting - eugene i. chong (oracle usa). balancing act to improve rdf query performance in...
TRANSCRIPT
![Page 1: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/1.jpg)
1 Confidential – Oracle Restricted
![Page 2: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/2.jpg)
<Insert Picture Here>
Balancing Act to improve RDF Query Performance in Oracle Database Eugene I. Chong
![Page 3: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/3.jpg)
3
Agenda
• RDF Query processing Issues • RDF Order-By and Filter Processing • RDF In-Memory Processing • RDF In-Memory Virtual Columns • Conclusion
Confidential – Oracle Restricted
![Page 4: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/4.jpg)
4
Oracle RDF
• RDF_LINK$ table (triples) – normalized – subject, predicate, object IDs
• RDF_VALUE$ table (ID to value mapping) – value, type, etc.
• Issues – frequent joins with RDF_VALUE$ table to present results,
process filters and order-by queries – complete de-normalization incurs large storage requirements – self-joins: large intermediate join results
![Page 5: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/5.jpg)
5
Oracle RDF Filters and Order-By Processing
• SPARQL order-by semantics – order: no values, blank nodes, IRIs, literals – case statement: value type, numeric value, date value, string
value – ORDER BY CASE WHEN (V4.VALUE_TYPE IS NULL)
THEN 0 WHEN (V4.VALUE_TYPE IN ('BLN','BN')) THEN 1 WHEN (V4.VALUE_TYPE IN ('URI','UR')) THEN 2 WHEN (V4.VALUE_TYPE IN ('PL', 'PLL', 'CPLL', 'PL@',
'PLL@', 'CPLL@', 'TL', 'TLL', 'CTLL', 'LIT')) THEN (CASE WHEN (V4.LANGUAGE_TYPE IS NOT NULL)
THEN 5 ……..
![Page 6: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/6.jpg)
6
Oracle RDF Filters and Order-By Processing – literal type - numeric: TO_NUMBER( ) – literal type - date/time: TO_TIMESTAMP_TZ ( ), DECODE( ) – use function calls to generate SQL for order-by – case statements executed for every row at runtime – same problem for filters
• Solution – materialize value type and values in RDF_VALUE$ table – stored as ORDER_TYPE, ORDER_NUM, ORDER_DATE – filled in at load time – generate SQL: ORDER BY order_type, order_num,
order_date, value_name – filter clause: WHERE order_num < to_number(89)
![Page 7: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/7.jpg)
7
Oracle RDF Order-By and Filter Performance using BSBM Benchmark Queries (in secs)
0
10
20
30
40
50
60
70
80
90
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 BI1 BI2 BI3 BI4 BI5 BI6 BI7 BI8
Without Order Columns With Order Columns
![Page 8: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/8.jpg)
8
Oracle RDF In-Memory Processing
• Utilize Oracle IMC – load frequently accessed columns in memory
• RDF_LINK$ table: subject, predicate, object IDs • RDF_VALUE$: id, value
– fast full scan of the table: good for hash join
• Experiment – 32GB memory, 2TB disk space – LUBM benchmark queries (8,763,829 rows including
entailment) – varying the size of the memory: 6G(100%), 4G(56%),
2G(27%), 1G(12%)
![Page 9: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/9.jpg)
9
Oracle RDF In-Memory Query Times (in sec) for LUBM Benchmark Queries • 100% : 4x – 6x gain
• 56%
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (100%)
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (56%)
![Page 10: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/10.jpg)
10
Oracle RDF In-Memory Query Times (in sec) for LUBM Benchmark Queries • 27%
• 12%
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (27%)
0
10
20
30
40
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
No IM
IM (12%)
![Page 11: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/11.jpg)
11
Oracle RDF In-Memory Full Scan Performance (in sec)
• Fetching 3 IDs from RDF_LINK$ table • 100% - 190x gain
0
0.5
1
1.5
2
2.5
IM (100%) IM (56%) IM (27%) IM (12%)
No IM
IM
![Page 12: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/12.jpg)
12
Oracle RDF In-Memory Virtual Columns • In-memory complete de-normalization without
incurring disk storage requirements – define virtual columns in RDF_LINK$ table for values, types,
etc. : VALUE_NAME_S, VALUE_NAME_P, VALUE_NAME_O, etc.
– useful for fully populated data in memory: virtual model Virtual column in-memory performance (in min) –fetching 3 IDs & 3 VCs
0
5
10
15
20
25
IM (100%) IM (56%) IM (27%) IM (12%)
No IM
IM-No VC
IM-VC
![Page 13: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/13.jpg)
13
Oracle RDF In-Memory Virtual Columns
– remove joins with RDF_VALUE$ table – queries are processed on RDF_LINK$ table only – compression, smart scans (in-memory storage index),
dictionary code for values, SIMD vector processing
Confidential – Oracle Restricted
![Page 14: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/14.jpg)
14
Oracle RDF In-memory Virtual Column Performance using LUBM Benchmark Queries (in secs) • Up to 8x gain
• As the number of joins increases, a bigger gain is achievable
0
10
20
30
40
50
60
70
80
90
100
Q9 (3 joins) Q2 (2 joins) Q6 (2 joins) Q13 (1 join) Q14 (1 join)
No IMVC IMVC
![Page 15: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/15.jpg)
15
Oracle RDF In-Memory Virtual Columns
• Can apply to data mart/data warehousing star/ snowflake schema – remove joins with dimension tables
• Can apply to any applications where joined tables have one-to-one mapping on their join keys
Confidential – Oracle Restricted
![Page 16: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/16.jpg)
16
Conclusion • Significant performance improvement
– use order columns in place of complex logic in the query for RDF filter and order-by processing
– improve hash joins by in-memory processing of frequently accessed columns
– remove costly joins using in-memory virtual columns by complete de-normalization for fully populated data
![Page 17: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/17.jpg)
17 Confidential – Oracle Restricted
<Insert Picture Here>
Your Questions
![Page 18: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database](https://reader031.vdocuments.us/reader031/viewer/2022022203/5875834d1a28ab78498b7d31/html5/thumbnails/18.jpg)
18