Download - Oracle Index study for Event TAG DB M. Boschini [email protected] S. Della Torre [email protected]
![Page 2: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/2.jpg)
Outline TAG DB issue:
space usage issue INDEX accounting for ~ 50% of used space...
query performance Tests with random data Tests with AMS01 data Tests with AMS02 Cosmic data
M. Boschini Oracle Index Study for STATUS DB – TIM October 2008 - CERN
![Page 3: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/3.jpg)
EVENT TAG DB Goal: test feasibility of storing in an Oracle DB TAG
info for AMS02 events TAG is a 64bit number AMS02 MC has, as for now, no meaningful TAG info
(0 or error bit) We start from previous studies:
http://ams.cern.ch/AMS/Reports/AMSnote-2000_09_05b.ps.gz
http://ams.cern.ch/AMS/Reports/Computing/Apr2004/M.Boschini_LV3_DB.pdf
![Page 4: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/4.jpg)
EVENT TAG DB
Test INDEX type space usage Test INDEX usage Time insert & selections What can we use ?
![Page 5: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/5.jpg)
test environment SLC 4.6 (64bit) 2 dual core Intel(R) Xeon(R) CPU @ 2.00GHz Oracle 11g 64 bit
we opted for Oracle11g because of product lifetime All data is fed to DB using C-OCI programs Oracle setup:
user AMSDES with dedicated Bigfile tablespace overkill wrt Oracle's suggestion (>= 1 TB) ...?
default TEMP TBLSpace (3 GB)
![Page 6: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/6.jpg)
test environment Dummy Table:
RUN (NUMBER) EVENT (NUMBER) TAG (BINARY_DOUBLE)
TAG field will be used for indexing ...assuming queries will be mainly on TAG
10^8 records, equivalent to ~ 10 days of DAQ Test B-Tree vs BitMap Index
as hinted by our previous studies...
![Page 7: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/7.jpg)
Indexes (theory) Btree
default index type in Oracle very space consuming. Not advised for dataware housing
dataware house: write few, read many, HUGE data sample BitMap
advised for dataware housing Bitmap indexes are stored as an array of zero-and-one
values, with one entry for each row. Should be used only on low cardinality fields
![Page 8: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/8.jpg)
Indexes (theory) BitMap
From Oracle Advanced Programming White Paper, Sep. 2008
“...Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems...”
![Page 9: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/9.jpg)
Tests Items to be tested are:
SPACE OCCUPANCY = f(index type) QUERY PERFORMANCE = f(index type)
Index types: B-Tree and Bitmap We started with a random generated sample We then used an AMS01 sample
40 ntuples no request on charge, pmass, pmom, lat (nothing!)
AMS02 Cosmic data
![Page 10: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/10.jpg)
Disk Space TABLE SIZE = 5 GB
this means that for AMS02 we'll need 500 GB B-Tree INDEX SIZE = 4.8 GB
this means that for AMS02 we'd need 480 GB BitMap INDEX SIZE = 70 MB
this means that for AMS02 we'd need 7 GB thus, at least for space reasons, we should use
BitMap INDEX, a factor 30 smaller.
![Page 11: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/11.jpg)
when to create an INDEX After TABLE has been populated ! B-Tree INDEX creation time:
4 µsec/record mainly SORT and disk space allocation
BitMap INDEX creation time: 0.5 µsec/record
![Page 12: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/12.jpg)
Theory: is an INDEX useful ? Not always... ORACLE has a built-in decision taking algorithm
(EXECUTION PLAN) which decides how to actually implement a query.
Execution plan tries to optimize disk accesses and CPU usage.
So, e.g., if reading an index which returns “many” records takes more than X, than the index is not used.
These are based, since Oracle 10.0.2i on Cost Based Optimizer and Dynamic Sampling (by default ON)
![Page 13: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/13.jpg)
EXECUTION PLANPLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3591811347
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |
| 1 | TABLE ACCESS FULL | N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:05 |
![Page 14: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/14.jpg)
EXECUTION PLANPLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3591811347
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |
| 1 | TABLE ACCESS BY INDEX ROWID| N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:04 |
|* 2 | INDEX RANGE SCAN | TAG_RAND_IDX | 36793 | | 106 (0)| 00:00:02 |
![Page 15: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/15.jpg)
IDX usage Test To test if Index is actually used, we wrote a simple
PLSQL+Perl program that Dynamically populates statistics table
HOW-MANY <--> TAG_STA For each TAG_STA value, generates and analyzes
EXECUTION PLAN for SELECT run, event from TABLE wher tag=TAG_STA;
We thus run CARDINALITY queries to DB and analyze the Optimizer decisions...
![Page 16: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/16.jpg)
Sample1 CARDINALITY:5relative population > 5.7%
![Page 17: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/17.jpg)
Sample2 CARDINALITY:2517relative population: 6x10E-4
![Page 18: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/18.jpg)
Performance Sample1 and Sample2 both do NOT use INDEX in the
select query. This is because Oracle's built-in optimization algorithm
discovers that too many records will be returned, and thus decides to ignore the INDEX
Thus: LOW/MEDIUM cardinality with HIGH relative
population makes INDEX uneffective
![Page 19: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/19.jpg)
Sample3 CARDINALITY:1024relative population 6x10E-5
![Page 20: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/20.jpg)
Performance Sample3 uses INDEX ! This is because Oracle's builtin optimization algorithm
discovers that TABLE ACCESS BY INDEX ROWID is efficient.
Thus LOW/MEDIUM cardinality with “correct for
Oracle” relative population makes INDEX effective !
![Page 21: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/21.jpg)
Real Data: AMS01 We then used an AMS01 sample
40 ntuples (no request on charge, pmass, pmom, ecc) “scrambled” this sample to get 10^8 records
Average relative cardinality = 2x10E-6 BitMap Index is USED ! Selection time
no IDX: 22 sec IDX: 0.05 sec
INDEX really effective
![Page 22: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/22.jpg)
Real Data: AMS02 !!! We also used AMS02 Cosmic Rays data
we used only 2 2x10E8 events no request on charge, pmass, pmom, ...(nothing!)
Table size: 5.5 GB 2 BitMap IDX size: 57 MB 2
just for the sake of curiosity: a UNIQUE requirement on RUN+EVENTNO creates an INDEX 5 GB big !!!
Btree IDX size: 4.2 GB 2
![Page 23: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/23.jpg)
Real Data: AMS02 !!! Mean Relative cardinality: 1.8x10E-4 Max Relative cardinality: 3x10E-3 BitMap Index can be USED !
tested EXECUTION PLAN for all TAG values.
![Page 24: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/24.jpg)
Real Data: AMS02 !!! Selection time
no IDX: Average Selection time: 24 sec 0.01 sec/record once retrieved
IDX: Average Selection time: 0.05 sec 0.005 sec/record once retrieved
Again, B-Map index is very effective.
![Page 25: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/25.jpg)
naïve usage... The most naïve way to design the DB is 1 table for
each month of DAQ. This leads to ~36 tables. Most naïve way to query them all is using a UNION
statement which still uses index with nearly no overhead for the SORT
UNIQUE/UNION-ALL part...
![Page 26: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/26.jpg)
PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------------------Plan hash value: 2624995586
----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 44 | 1408 | 290K (50)| 00:58:03 || 1 | SORT UNIQUE | | 44 | 1408 | 290K (50)| 00:58:03 || 2 | UNION-ALL | | | | | || 3 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV | 42 | 1344 | 145K (1)| 00:29:10 || 4 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 5 | BITMAP INDEX SINGLE VALUE | COMIX_TDV | | | | || 6 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV1 | 2 | 64 | 144K (1)| 00:28:53 || 7 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 8 | BITMAP INDEX SINGLE VALUE | COMIX_TDV1 | | | | |----------------------------------------------------------------------------------------------
44 rows returned out of 377,516,896 rows in total
Rough estimate for SELECT on 3 years: 25 minutes...
![Page 27: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/27.jpg)
Conclusions In general, Bit-Map index is very efficient in space usage
30 times smaller than normal Btree index Index is used in queries if relative cardinality is low If so, selection time is very low. According to AMS01 and AMS02 Cosmic Rays
analyzed, BitMap index can be used We could thus use it for AMS02 expected space usage: 500 GB (table)+ 6 GB (idx)
EventStatusTable02 files will use ~ 150 GB...
![Page 28: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/28.jpg)
![Page 29: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it](https://reader031.vdocuments.us/reader031/viewer/2022032606/56649e945503460f94b991a7/html5/thumbnails/29.jpg)
AMS01 sample CARDINALITY:89860
average relative population 5.7%