not in vs not exists

I want to share a small comparison between NOT EXISTS and NOT IN operator. We see these 2 operators

mostly in the sub-query and in reporting query. Few points that we should understand regarding these

operations:

1. Not In and Not Exists will produces exactly the same result if there is no NULL value in the sub-query. If

we have NULL value in the sub-query, we can add one more condition to eliminate that value (for

example: … AND (column_name IS NOT NULL …). The other approach is to define the column as

NOT NULL.

2. Not Exists is more efficient because when Oracle found 1 record in the sub-query that satisfies the

condition, Oracle will move to the next record of main query. If there is efficient index access to that

sub-query, Oracle will be able to use that index.

3. In case of Not In, Oracle has to scan the whole population (full table scan) of sub-query before Oracle

can move to the next record of main query,it requires more IO. If there is an efficient index access to

the sub-query, Oracle will not be able to use that index because of implicit LNNVL function in the

predicate.

4. Not In will be translated as ((column_name != VALUE_1) AND (column_name != VALUE_2)

AND (column_name != VALUE_3) … AND (column_name != VALUE_N)).

Let’s the party started! This is the test data.

We have 2 exactly the same tables (structure and data). BULK_ACT will be the main table, while BULK_ACT will

be the sub-query table. There is an index on each table. Later we update 2 records in the main table to

simulate NULL value and 1 different value between main and sub-query table.

This is the session statistics comparison (XLS) of those 2 operations and also the output of trace file of event

10053.

exist_vs_in.xlsx

In Predicate Information section of Not In version, we can see 1 bind variable, :B1, which can be translated as

Oracle executes the sub-query over and over again. We have 100,000 rows in BULK_ACT, so we will have

100,000 times Table Full Scan against BULK_GRP. Let’s check the session statistics for the confirmation.

There should be only 2 table scans in the Not Exists version, but we got 7 in above table, so the other 5 table

scans are belong to another tables/ processes. Looking forward to the Not In version, we got 100,006 table

scans, if 5 of its are belong to another tables, then we have only 100,001 table scans for the Not In version. 1

table scan for BULK_ACT and 100,000 table scans for BULK_GRP. The next 2 statistics (table scan blocks gotten

and table scan rows gotten) also confirmed that more operation were done in the Not In version.

Due to this huge table scans execution, the execution time and number of consistent gets for Not In version

also increase significantly.

Not In version with

dynamic sampling is on

Not Exists versionwith

dynamic sampling is on

So, what will be happened if we remove the “IS NOT NULL” part? The null value will be reported in the query as

below. There 2 rows in the output: 100,001 and NULL.

As mentioned earlier, we can also define the column as NOT NULL to have the same result.

What’s the Improvement in 11g?

There is new method which introduced in 11g to handle the inefficient Not In version (without re-writing the

query). I rerun the Not In version in 11g and here is the result.

As seen below, the new “HASH JOIN ANTI NA” has been introduced. The NA is stand for Null-Aware (please

read: http://structureddata.org/2008/05/22/null-aware-anti-join/). So we don’t need to rewrite the query

since Oracle can take the benefit of Hash Join, instead of using Filter (as we have seen in previous 10g version).

http://structureddata.org/2008/05/22/null-aware-anti-join/

In above link, you will see also another new Hash Join method, “HASH JOIN ANTI SNA”. So please go through

that site, read and get your hand dirt with trial and error

-heri-

The new HASH JOIN ANTI NA

not in vs not exists

Technology