ITPub博客

首页 > 数据库 > PostgreSQL > PostgreSQL DBA(107) - pgAdmin(Don't do this:NOT IN vs NOT EXISTS)

PostgreSQL DBA(107) - pgAdmin(Don't do this:NOT IN vs NOT EXISTS)

原创 PostgreSQL 作者:husthxd 时间:2019-10-15 11:42:16 1 删除 编辑

延续昨天的话题,NOT IN vs NOT EXISTS,实际执行SQL:

[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 2.176 ms
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
 id | value 
----+-------
(0 rows)
Time: 3540.645 ms (00:03.541)
[local]:5432 pg12@testdb=#

执行计划,not exists的成本比not in要低,但实际上,not in的执行要比not exists要快很多!Why?

考虑到os page cache的因素,重启数据库,重新执行SQL:

[pg12@localhost ~]$ pg_ctl start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2019-10-15 11:25:30.683 CST [2376] LOG:  starting PostgreSQL 12.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
2019-10-15 11:25:30.691 CST [2376] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-10-15 11:25:30.691 CST [2376] LOG:  listening on IPv6 address "::", port 5432
2019-10-15 11:25:30.706 CST [2376] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-10-15 11:25:31.153 CST [2376] LOG:  redirecting log output to logging collector process
2019-10-15 11:25:31.153 CST [2376] HINT:  Future log output will appear in directory "pg_log".
. done
server started
[pg12@localhost ~]$ psql -d testdb
Timing is on.
Expanded display is used automatically.
psql (12.0)
Type "help" for help.
[local]:5432 pg12@testdb=#  select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 143.010 ms
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
 id | value 
----+-------
(0 rows)
Time: 7581.171 ms (00:07.581)
[local]:5432 pg12@testdb=#

NOT IN比起NOT exists还是要快很多,猜测是数据值1在第一个block中,实际上读取了第一个block就返回了,因此调整数据顺序重新测试。

[local]:5432 pg12@testdb=# truncate table t_big_null;
TRUNCATE TABLE
Time: 99.209 ms
[local]:5432 pg12@testdb=# insert into t_big_null select generate_series(2,10000000);
INSERT 0 9999999
Time: 17223.163 ms (00:17.223)
[local]:5432 pg12@testdb=# insert into t_big_null values(1);
INSERT 0 1
Time: 2.771 ms
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 5633.978 ms (00:05.634)
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
 id | value 
----+-------
(0 rows)
Time: 3938.083 ms (00:03.938)
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 7.334 ms
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
 id | value 
----+-------
(0 rows)
Time: 3218.846 ms (00:03.219)
[local]:5432 pg12@testdb=#

NOT IN第一次较慢,但第二次执行就很快了,考虑到page cache的因素,因此清除缓存再测试

[root@localhost ~]# sync; echo 3 > /proc/sys/vm/drop_caches
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
 id | value 
----+-------
(0 rows)
Time: 6.415 ms

时间并没有太大出入,仍然很快。

推测NOT IN执行很快是因为NOT IN执行的时候只需要碰到相同的马上就返回,而且在本例中由于tbl表只有一条记录,运气好的话,很快就可以返回得到结果。
如果我们在tbl表中插入另外一条在t_big_null中不存在的记录,那么时间上很可能会比not exists要慢。
而not exists执行时需要创建hash表,因此每次的时间都会大体稳定,不会出现太大的变动。

[local]:5432 pg12@testdb=# truncate table tbl;
TRUNCATE TABLE
Time: 5.333 ms
[local]:5432 pg12@testdb=# insert into tbl(id,value) values(1,1);
INSERT 0 1
Time: 4.678 ms
[local]:5432 pg12@testdb=# insert into tbl(id,value) values(10000002,10000002);
INSERT 0 1
Time: 3.429 ms
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
    id    |  value   
----------+----------
 10000002 | 10000003
(1 row)
Time: 3972.879 ms (00:03.973)
[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);
    id    |  value   
----------+----------
 10000002 | 10000003
(1 row)
Time: 4022.345 ms (00:04.022)
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
    id    |  value   
----------+----------
 10000002 | 10000003
(1 row)
Time: 3246.868 ms (00:03.247)
[local]:5432 pg12@testdb=# select * from tbl a where not exists (select b.id from t_big_null b where a.id = b.id);
    id    |  value   
----------+----------
 10000002 | 10000003
(1 row)
Time: 3224.394 ms (00:03.224)
[local]:5432 pg12@testdb=#

结果验证了推测。

参考资料
Don’t Do This

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-2660005/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
ITPUB数据库版块资深版主,对Oracle、PostgreSQL有深入研究。

注册时间:2007-12-28

  • 博文量
    1416
  • 访问量
    3855856