ITPub博客

首页 > 数据库 > PostgreSQL > PostgreSQL DBA(106) - pgAdmin(Don't do this:NOT IN)

PostgreSQL DBA(106) - pgAdmin(Don't do this:NOT IN)

原创 PostgreSQL 作者:husthxd 时间:2019-10-14 18:29:09 0 删除 编辑

no zuo no die系列,来自于pg的wiki。

这是第四部分,不要使用NOT IN。
理由是:

1.NOT IN behaves in unexpected ways if there is a null present
2.Because of point 1 above, NOT IN (SELECT …) does not optimize very well. In particular, the planner can’t transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(N²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.

第一个理由是如存在null的话,会有未如预期的结果。

[local]:5432 pg12@testdb=# select * from tbl;
 id | value 
----+-------
  1 |     2
(1 row)
Time: 1.359 ms
[local]:5432 pg12@testdb=# select * from tbl where id not in (1,null);
 id | value 
----+-------
(0 rows)
Time: 1.443 ms

原因是任意值跟null的运算结果都是null,不是false也不是true,因此没有结果返回。

[local]:5432 pg12@testdb=# create table t_null(id int);
CREATE TABLE
Time: 168.932 ms
[local]:5432 pg12@testdb=# insert into t_null values(1);
INSERT 0 1
Time: 6.136 ms
[local]:5432 pg12@testdb=# insert into t_null values(null);
INSERT 0 1
Time: 2.611 ms
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_null b);
 id | value 
----+-------
(0 rows)
Time: 6.682 ms
[local]:5432 pg12@testdb=#

第二个理由是在结果集很大的情况下会有性能问题(实际上,o(  ̄︶ ̄)o)

[local]:5432 pg12@testdb=# explain verbose select * from tbl a where a.id not in (select b.id from t_big_null b);
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Seq Scan on public.tbl a  (cost=0.00..291892417.45 rows=1130 width=8)
   Output: a.id, a.value
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..233311.72 rows=10000048 width=4)
           Output: b.id
           ->  Seq Scan on public.t_big_null b  (cost=0.00..144248.48 rows=10000048 width=4)
                 Output: b.id
(8 rows)
Time: 2.552 ms
[local]:5432 pg12@testdb=# explain verbose select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=308312.08..347459.58 rows=1 width=8)
   Output: a.id, a.value
   Hash Cond: (a.id = b.id)
   ->  Seq Scan on public.tbl a  (cost=0.00..32.60 rows=2260 width=8)
         Output: a.id, a.value
   ->  Hash  (cost=144248.48..144248.48 rows=10000048 width=4)
         Output: b.id
         ->  Seq Scan on public.t_big_null b  (cost=0.00..144248.48 rows=10000048 width=4)
               Output: b.id
(9 rows)
Time: 2.960 ms

实际执行SQL:

[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 2.176 ms
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
 id | value 
----+-------
(0 rows)
Time: 3540.645 ms (00:03.541)
[local]:5432 pg12@testdb=#

执行计划,not exists的成本比not in要低,但实际上,not in的执行要比not exists要快很多!

参考资料
Don’t Do This

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-2659954/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
ITPUB数据库版块资深版主,对Oracle、PostgreSQL有深入研究。

注册时间:2007-12-28

  • 博文量
    1416
  • 访问量
    3855849