not in vs not exists
DESCRIPTION
Comparison between Not In and Not Exists in OracleTRANSCRIPT
I want to share a small comparison between NOT EXISTS and NOT IN operator. We see these 2 operators
mostly in the sub-query and in reporting query. Few points that we should understand regarding these
operations:
1. Not In and Not Exists will produces exactly the same result if there is no NULL value in the sub-query. If
we have NULL value in the sub-query, we can add one more condition to eliminate that value (for
example: … AND (column_name IS NOT NULL …). The other approach is to define the column as
NOT NULL.
2. Not Exists is more efficient because when Oracle found 1 record in the sub-query that satisfies the
condition, Oracle will move to the next record of main query. If there is efficient index access to that
sub-query, Oracle will be able to use that index.
3. In case of Not In, Oracle has to scan the whole population (full table scan) of sub-query before Oracle
can move to the next record of main query,it requires more IO. If there is an efficient index access to
the sub-query, Oracle will not be able to use that index because of implicit LNNVL function in the
predicate.
4. Not In will be translated as ((column_name != VALUE_1) AND (column_name != VALUE_2)
AND (column_name != VALUE_3) … AND (column_name != VALUE_N)).
Let’s the party started! This is the test data.
We have 2 exactly the same tables (structure and data). BULK_ACT will be the main table, while BULK_ACT will
be the sub-query table. There is an index on each table. Later we update 2 records in the main table to
simulate NULL value and 1 different value between main and sub-query table.
This is the session statistics comparison (XLS) of those 2 operations and also the output of trace file of event
10053.
exist_vs_in.xlsx
In Predicate Information section of Not In version, we can see 1 bind variable, :B1, which can be translated as
Oracle executes the sub-query over and over again. We have 100,000 rows in BULK_ACT, so we will have
100,000 times Table Full Scan against BULK_GRP. Let’s check the session statistics for the confirmation.
There should be only 2 table scans in the Not Exists version, but we got 7 in above table, so the other 5 table
scans are belong to another tables/ processes. Looking forward to the Not In version, we got 100,006 table
scans, if 5 of its are belong to another tables, then we have only 100,001 table scans for the Not In version. 1
table scan for BULK_ACT and 100,000 table scans for BULK_GRP. The next 2 statistics (table scan blocks gotten
and table scan rows gotten) also confirmed that more operation were done in the Not In version.
Due to this huge table scans execution, the execution time and number of consistent gets for Not In version
also increase significantly.
Not In version with
dynamic sampling is on
Not Exists versionwith
dynamic sampling is on
So, what will be happened if we remove the “IS NOT NULL” part? The null value will be reported in the query as
below. There 2 rows in the output: 100,001 and NULL.
As mentioned earlier, we can also define the column as NOT NULL to have the same result.
What’s the Improvement in 11g?
There is new method which introduced in 11g to handle the inefficient Not In version (without re-writing the
query). I rerun the Not In version in 11g and here is the result.
As seen below, the new “HASH JOIN ANTI NA” has been introduced. The NA is stand for Null-Aware (please
read: http://structureddata.org/2008/05/22/null-aware-anti-join/). So we don’t need to rewrite the query
since Oracle can take the benefit of Hash Join, instead of using Filter (as we have seen in previous 10g version).
In above link, you will see also another new Hash Join method, “HASH JOIN ANTI SNA”. So please go through
that site, read and get your hand dirt with trial and error
-heri-
The new HASH JOIN ANTI NA