not in & not exists (3-1)
1) NULL值对结果集的影响
如果子查询返回的结果集存在null值会导致父查询没有结果集返回。
比如查询select * from T1 where x not in ( select y from T2 where z = Z),如果子查询select y from T2 where z = Z有null值返回,父查询将得不到任何结果.
test@GZSERVER> update inner_large_t set id = null where rownum <= 1
2 /
已更新 1 行。
test@GZSERVER> select count(*) from outer_small_t where id not in (select id fro
m inner_large_t)
2 /
COUNT(*)
----------
0
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1203 Card=1 Bytes=
13)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'OUTER_SMALL_T' (TABLE) (Cost=2
Card=50 Bytes=650)
4 2 TABLE ACCESS (FULL) OF 'INNER_LARGE_T' (TABLE) (Cost=4
8 Card=3500 Bytes=45500)
对于父查询中表OUTER_SMALL_T中的某个id值x与子查询表INNER_LARGE_T中的每一个id值y1,y2,y3…进行比较,如果满足x!=y1 and x!=y2 and x!=y3…则x返回到整个查询的结果集中。但因为任何值与null比较均为null,如果子查询存在null的值会导致父查询中没有一个值可以满足条件,故没有结果集返回。
test@GZSERVER> select count(*) from outer_small_t t1
2 where not exists (select id from inner_large_t t2 where t1.id=t2.id)
3 /
COUNT(*)
----------
501
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=26
)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (ANTI) (Cost=45 Card=1 Bytes=26)
3 2 TABLE ACCESS (FULL) OF 'OUTER_SMALL_T' (TABLE) (Cost=2
Card=1001 Bytes=13013)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_INNER_LARGE_T_1' (INDEX
) (Cost=42 Card=70001 Bytes=910013)
对于not exists,由于访问INNER_LARGE_T使用了索引扫描,而索引中是不会包含null值的,可以认为not exists不会理会子查询中null值而直接与非null值作hash反连接。
2) NULL值对执行计划的影响
test@GZSERVER> alter session set OPTIMIZER_FEATURES_ENABLE='10.1.0'
2 /
会话已更改。
已用时间: 00: 00: 00.00
test@GZSERVER> select count(*) from outer_small_t where id not in (select id fro
m inner_large_t)
2 /
COUNT(*)
----------
500
已用时间: 00: 00: 20.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1203 Card=1 Bytes=
13)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'OUTER_SMALL_T' (TABLE) (Cost=2
Card=50 Bytes=650)
4 2 TABLE ACCESS (FULL) OF 'INNER_LARGE_T' (TABLE) (Cost=4
8 Card=3500 Bytes=45500)
Statistics
----------------------------------------------------------
328 recursive calls
0 db block gets
102703 consistent gets
0 physical reads
…
test@GZSERVER> alter table inner_large_t modify id not null
2 /
表已更改。
已用时间: 00: 00: 00.00
test@GZSERVER> alter table outer_small_t modify id not null
2 /
表已更改。
已用时间: 00: 00: 00.00
test@GZSERVER> select count(*) from outer_small_t where id not in (select id fro
m inner_large_t)
2 /
COUNT(*)
----------
500
已用时间: 00: 00: 00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=26
)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (ANTI) (Cost=45 Card=1 Bytes=26)
3 2 INDEX (FAST FULL SCAN) OF 'IDX_OUTER_SMALL_T_ID' (INDE
X) (Cost=2 Card=1001 Bytes=13013)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_INNER_LARGE_T_1' (INDEX
) (Cost=42 Card=70001 Bytes=910013)
Statistics
----------------------------------------------------------
348 recursive calls
0 db block gets
264 consistent gets
0 physical reads
…
我们可以看到,如果outer_small_t和inner_large_t的列id没有not null约束的话,执行效率相当低,而加入not null约束后,由于使用了hash反连接,执行效率得到极大的提高:从原来的20.03秒提升为00.01秒,102703的逻辑读减少为264逻辑读。
其实在父、子查询均加入id is not null也是可以达到调整的效果:
test@GZSERVER> select count(*) from outer_small_t
2 where id is not null and id not in (select id from inner_large_t where id i
s not null)
3 /
COUNT(*)
----------
500
已用时间: 00: 00: 00.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=26
)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (ANTI) (Cost=45 Card=1 Bytes=26)
3 2 INDEX (FAST FULL SCAN) OF 'IDX_OUTER_SMALL_T_ID' (INDE
X) (Cost=2 Card=1001 Bytes=13013)
4 2 INDEX (FAST FULL SCAN) OF 'IDX_INNER_LARGE_T_1' (INDEX
) (Cost=42 Card=70001 Bytes=910013)
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-21643/,如需转载,请注明出处,否则将追究法律责任。