ITPub博客

首页 > 数据库 > Oracle > 如何让Not in语句跑出与not exists相当的效率

如何让Not in语句跑出与not exists相当的效率

原创 Oracle 作者:oliseh 时间:2014-11-10 11:48:18 0 删除 编辑

 关于Not inNot exists的讨论由来已久,not exists在性能上要优于not in,一时间几乎所有包含not inselect语句都被要求替换成not exists,那么not in是否真的那么实力不济,让我们静下心来细细的品味一下。

 首先还是简要回顾一下与这个话题有关的基础知识,以select c1 from test1 where c1 not in (select c2 from test2)为例,语句的本义是将test1表中c1字段值取出,同时要求这个c1字段值不能和test2表中c2字段的任何值相等。这里强调一点,子查询select c2 from test2的结果中不能有null值(即c2字段不能有null值),否则整个查询不会返回任何值,如果不能保证这点,那么慎用not in,这属于基本概念不清。反观select c1 from test1 where not exists (select 1 from test2 where test2.c2=test1.c1),子查询中只比较non-null值,所以不存在这个问题。因此我们下面基于not in的讨论只适用于其子查询不会返回null值的情况。

#####先建立两张测试表

grant resource,connect,unlimited tablespace,select any dictionary to scott identified by abcd_1234;

 

sqlplus scott/abcd_1234

 

create table tabinner as select * from dba_objects where object_id is not null;

 

create table tabouter as select * from dba_objects where object_id is not null;

 

create index ind_tabinner on tabinner(object_id);

 

create index ind_tabouter on tabouter(object_id);

 

SQL> desc tabinner

 Name                                      Null?    Type

 ----------------------------------------- -------- ----------------------------

 OWNER                                              VARCHAR2(30)

 OBJECT_NAME                                        VARCHAR2(128)

 SUBOBJECT_NAME                                     VARCHAR2(30)

 OBJECT_ID                                          NUMBER

 DATA_OBJECT_ID                                     NUMBER

 OBJECT_TYPE                                        VARCHAR2(19)

 CREATED                                            DATE

 LAST_DDL_TIME                                      DATE

 TIMESTAMP                                          VARCHAR2(19)

 STATUS                                             VARCHAR2(7)

 TEMPORARY                                          VARCHAR2(1)

 GENERATED                                          VARCHAR2(1)

 SECONDARY                                          VARCHAR2(1)

 NAMESPACE                                          NUMBER

 EDITION_NAME                                       VARCHAR2(30)

 

exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'TABINNER',cascade=>TRUE);

 

exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'TABOUTER',cascade=>TRUE);

 

#####查看not in执行计划:

set autotrace traceonly explain

select object_id from tabouter where object_id not in (select object_id from tabinner);

Execution Plan

----------------------------------------------------------

Plan hash value: 3199961677

 

-------------------------------------------------------------------------------

| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |          |   172K|  1008K|    71M  (1)|239:44:31 |

|*  1 |  FILTER            |          |       |       |            |          |

|   2 |   TABLE ACCESS FULL| TABOUTER |   172K|  1008K|   442   (1)| 00:00:06 |

|*  3 |   TABLE ACCESS FULL| TABINNER |     1 |     6 |   442   (1)| 00:00:06 |

-------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter( NOT EXISTS (SELECT 0 FROM "TABINNER" "TABINNER" WHERE

              LNNVL("OBJECT_ID"<>:B1)))

   3 - filter(LNNVL("OBJECT_ID"<>:B1))

  

注:谓词信息里的LNNVL("OBJECT_ID"<>:B1)等价于OBJECT_ID =:B1 or OBJECT_ID is null

 

#####查看not exists执行计划:

select object_id from tabouter where not exists (select 1 from tabinner where object_id=tabouter.object_id);

 

Execution Plan

----------------------------------------------------------

Plan hash value: 3225897791

 

----------------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |  1721 | 20652 |       |   811   (2)| 00:00:10 |

|*  1 |  HASH JOIN ANTI       |              |  1721 | 20652 |  3032K|   811   (2)| 00:00:10 |

|   2 |   TABLE ACCESS FULL   | TABOUTER     |   172K|  1008K|       |   441   (1)| 00:00:06 |

|   3 |   INDEX FAST FULL SCAN| IND_TABINNER |   172K|  1008K|       |    71   (2)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - access("OBJECT_ID"="TABOUTER"."OBJECT_ID")

 

not innot exists执行计划对比:

not in使用的是基于两表FTS结果的Filter,这种join略好于nest loop joinNot exists使用的是基于Tabinner.object_id字段索引扫描和tabouterFTSHash反连接。从Cost估算上not exists都要远胜于not in

 

对于上面的执行结果,我们不免发出如下疑问

为何not in无法使用索引?

因为要使用到索引全扫描前提是要访问到的值都在索引中,而null值是不保存在索引中的,虽然tabinner.object_id里不含有null值但是object_id字段并没有定义成not null,所以optimizer还是非常严谨的以FTS的方式去访问tabinner表,要解决这个问题的方法有两种,

1、子查询中where条件加上is not null引导optimizer去走索引:

select object_id from tabouter where object_id not in (select object_id from tabinner where object_id is not null);  

 

Execution Plan

----------------------------------------------------------

Plan hash value: 3633759339

 

--------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |   172K|  1008K|    11M  (4)| 39:09:39 |

|*  1 |  FILTER               |              |       |       |            |          |

|   2 |   TABLE ACCESS FULL   | TABOUTER     |   172K|  1008K|   442   (1)| 00:00:06 |

|*  3 |   INDEX FAST FULL SCAN| IND_TABINNER |     1 |     6 |    72   (3)| 00:00:01 |

--------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter( NOT EXISTS (SELECT 0 FROM "TABINNER" "TABINNER" WHERE

              "OBJECT_ID" IS NOT NULL AND LNNVL("OBJECT_ID"<>:B1)))

   3 - filter("OBJECT_ID" IS NOT NULL AND LNNVL("OBJECT_ID"<>:B1))

 

2、将tabinner.object_id字段加上not null限制:

select object_id from tabouter where object_id not in (select object_id from tabinner);

 

Execution Plan

----------------------------------------------------------

Plan hash value: 3633759339

 

--------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |   172K|  1008K|    11M  (3)| 39:02:48 |

|*  1 |  FILTER               |              |       |       |            |          |

|   2 |   TABLE ACCESS FULL   | TABOUTER     |   172K|  1008K|   442   (1)| 00:00:06 |

|*  3 |   INDEX FAST FULL SCAN| IND_TABINNER |     1 |     6 |    72   (3)| 00:00:01 |

--------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter( NOT EXISTS (SELECT 0 FROM "TABINNER" "TABINNER" WHERE

              LNNVL("OBJECT_ID"<>:B1)))

   3 - filter(LNNVL("OBJECT_ID"<>:B1))

 

第二个疑问来了:虽然我们加了not null或者is not nullnot in语句的cost11M还是比not exists811高出一大截,还有没有进一步提升的空间?

我们看到使用了索引之后,接下来的焦点是两表关联所使用的Filter方法,刚才提到过Filter的效率之差仅次于nest loop join,优化的目标是改成not exists所使用的hash join anti,使用hash join anti的前提是外层查询里和内层子查询关联的字段不能为null,我们刚才将tabinner.object_id字段改成了not null,现在要将tabouter.object_id也改成not null

 

Alter table tabouter modify(object_id not null);

 

#####修改tabouter.object_idnot null之后的not innot exists执行效率相同

select object_id from tabouter where object_id not in (select object_id from tabinner);

Execution Plan

----------------------------------------------------------

Plan hash value: 2451368449

 

----------------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |  1721 | 20652 |       |   441   (2)| 00:00:06 |

|*  1 |  HASH JOIN ANTI       |              |  1721 | 20652 |  3032K|   441   (2)| 00:00:06 |

|   2 |   INDEX FAST FULL SCAN| IND_TABOUTER |   172K|  1008K|       |    71   (2)| 00:00:01 |

|   3 |   INDEX FAST FULL SCAN| IND_TABINNER |   172K|  1008K|       |    71   (2)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - access("OBJECT_ID"="OBJECT_ID")

 

select object_id from tabouter where not exists (select 1 from tabinner where object_id=tabouter.object_id);

 

Execution Plan

----------------------------------------------------------

 

Plan hash value: 2451368449

 

----------------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |  1721 | 20652 |       |   441   (2)| 00:00:06 |

|*  1 |  HASH JOIN ANTI       |              |  1721 | 20652 |  3032K|   441   (2)| 00:00:06 |

|   2 |   INDEX FAST FULL SCAN| IND_TABOUTER |   172K|  1008K|       |    71   (2)| 00:00:01 |

|   3 |   INDEX FAST FULL SCAN| IND_TABINNER |   172K|  1008K|       |    71   (2)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - access("OBJECT_ID"="TABOUTER"."OBJECT_ID")

 

 

如果发现optimizer还是没有使用hash join anti,可以在子查询中加上/*+ hash_aj */

select object_id from tabouter where object_id not in (select /*+ HASH_AJ */ object_id from tabinner);

 

至此我们经过为表字段增加not null约束、在子查询where条件中加入is not null等手段终于将not innot exists的执行性能拉平了,那么有没有一种方法不对表或者SQL作任何修改就能达到同样的目的?

如果你的版本是 11g及以上,那么恭喜你,你可以这样做。11g有一个隐含参数” _optimizer_null_aware_antijoin”,字面意思理解就是引导优化器进行null值敏感的反连接,说具体点在_optimizer_null_aware_antijoin=TRUE的情况下,对于select object_id from tabouter where object_id not in (select object_id from tabinner)语句即使tabouter.object_idtabinner.object_id字段都没有增加not null约束,也可以使用效率较高的hash join anti,我们来测试一下

 

#####我们先把taboutertabinner两张表的object_id恢复成nullable

alter table tabouter modify (object_id null);

alter table tabinner modify (object_id null);

 

#####修改_optimizer_null_aware_antijoin=TRUE,运行not in语句

set autotrace traceonly explain

alter session set "_optimizer_null_aware_antijoin"=TRUE;

select object_id from tabouter where object_id not in (select object_id from tabinner);

 

Execution Plan

----------------------------------------------------------

Plan hash value: 911337931

 

---------------------------------------------------------------------------------------

| Id  | Operation          | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |          |  1721 | 20652 |       |  1181   (1)| 00:00:15 |

|*  1 |  HASH JOIN ANTI NA |          |  1721 | 20652 |  3032K|  1181   (1)| 00:00:15 |

|   2 |   TABLE ACCESS FULL| TABOUTER |   172K|  1008K|       |   441   (1)| 00:00:06 |

|   3 |   TABLE ACCESS FULL| TABINNER |   172K|  1008K|       |   441   (1)| 00:00:06 |

---------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - access("OBJECT_ID"="OBJECT_ID")

 

HASH JOIN ANTI NA后面的NA表示Null Aware,可以看出cost值为1181,比之前not exists跑出的cost 811值略大,这是因为没有用到tabinner表的索引扫描,如果我们在not in后的subquery中加上is not null,就完全一样了,执行计划中的SNA表示single null aware即参与join的两个列中有一个列是nullable

 

set autotrace traceonly explain

 

select object_id from tabouter where object_id not in (select object_id from tabinner where object_id is not null);

 

Execution Plan

----------------------------------------------------------

Plan hash value: 659037736

 

----------------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |  1721 | 20652 |       |   811   (2)| 00:00:10 |

|*  1 |  HASH JOIN ANTI SNA   |              |  1721 | 20652 |  3032K|   811   (2)| 00:00:10 |

|   2 |   TABLE ACCESS FULL   | TABOUTER     |   172K|  1008K|       |   441   (1)| 00:00:06 |

|*  3 |   INDEX FAST FULL SCAN| IND_TABINNER |   172K|  1008K|       |    72   (3)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - access("OBJECT_ID"="OBJECT_ID")

   3 - filter("OBJECT_ID" IS NOT NULL)

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/53956/viewspace-1327031/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
不仅仅专注Oracle database技术, member of SHOUG

注册时间:2014-04-06

  • 博文量
    128
  • 访问量
    1616676