ITPub博客

首页 > 数据库 > Oracle > 示例说明动态采样的作用,并演示动态采样对有内在关系的多列查询的影响

示例说明动态采样的作用,并演示动态采样对有内在关系的多列查询的影响

原创 Oracle 作者:buptdream 时间:2014-08-06 15:53:49 0 删除 编辑
 动态采样技术是为了弥补CBO 优化器下,在段对象(表,索引,分区)没有分析的情况下,为了使优化器得到得到足够的信息以保证做出正确的执行计划而发明的一种技术,它可以把它看作是分析手段的一种补充。当段对象没有统计信息时(即没有做分析),动态采样技术可以通过直接从需要分析的对象上收集一些数据块(采样)来获得CBO 需要的统计信息。
   下面我们来演示一下动态采用的效果:
   Last login: Fri Aug 31 21:34:50 2012 from 10.8.0.248
[root@localhost ~]# su - oracle
[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 8月 31 22:09:48 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn change/change
Connected.
SQL> drop table t purge;

Table dropped.

SQL> create table t  as  select owner, object_type   from dba_objects;  -----------创建一个表

Table created.

SQL> select count(*) from t; ------------ 表的记录数

  COUNT(*)
----------
     72340

SQL> set autotrace traceonly        
SQL> set linesize 1000
通过在hint 中加入dynamic_sampling 的方式来使用动态采样,如果我们把它的级别设为0,那么此时CBO
将不会做动态采用操作,此时CBO 唯一可以使用的信息就是表存储在数据字典中的一些信
息,比如有多少个extents,有多少个blocks 等等。
SQL> select /*+ dynamic_sampling(t 0) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16990 |   464K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16990 |   464K|    58   (0)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         44  recursive calls
          0  db block gets
       5012  consistent gets
          0  physical reads
          0  redo size
    2306268  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed
T 表里面实际上是有72340 条数据,而CBO 告诉我们只有16990 条和实际数量相差非常多。这种偏离的猜测,往往会导致CBO 做出错
误的判断,如果一个表没有分析,又没有动态采样,那么CBO 做出错误的执行计划的可能行非常高
SQL> select * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 84402 |  2307K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 84402 |  2307K|    58   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2) -------------采用了level=2的动态采用


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       5083  consistent gets
          0  physical reads
          0  redo size
    2306268  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed
CBO估计出的记录为84402 和实际的记录数有点相近。因为CBO 只是收集了有限的一些数据块作分析,来对
整个表的数据做出估算,但它比没有动态采样已经好了很多,能够帮助CBO 做出更为正确
的执行计划。
下面我们将表中的记录全部删除,看看没有动态采用和采用动态采用的差别。
SQL> delete from t;

72340 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3335594643

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | DELETE STATEMENT   |      |     1 |    58   (0)| 00:00:01 |
|   1 |  DELETE            | T    |       |            |          |
|   2 |   TABLE ACCESS FULL| T    |     1 |    58   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         87  recursive calls
      75432  db block gets
        324  consistent gets
          0  physical reads
   18876988  redo size
        848  bytes sent via SQL*Net to client
        770  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> commit;

Commit complete.

SQL> select /*+ dynamic_sampling(t 0) */ * from t;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 16990 |   464K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 16990 |   464K|    58   (0)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        200  consistent gets
          0  physical reads
          0  redo size
        414  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
CBO 估算出结果集和没有删除数据时是一样的,当一个表的数据被delete 掉以后,这个表所分配的extents
和blocks 是不会被回收的(高水位值不变),所以在CBO 看起来,它依然是有那么多的数
据在那里16990 条。采用level=2的动态采用,CBO认为只有一条记录,和实际非常相近。
SQL> select * from t;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    28 |    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |     1 |    28 |    58   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        200  consistent gets
          0  physical reads
          0  redo size
        414  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


SQL> drop table t purge;

Table dropped.

动态采用除了可以在段对象没有分析时,给CBO 提供分析数据之外,还有一个独特的
能力,它可以对不同列之间的相关性做统计;
相对的,表分析的信息通常是独立的,比如:
1 表的行数,平均行长的。
2 表的每个列的最大值,最小值,重复率,也可能包含直方图。
3 索引的聚合因子,索引叶的块数目,索引的高度等。
这些信息相互之间都是独立的,当查询涉及到列之间的相关性时,这些信息就显得不够了。


下面创建一张表T,它包含2 个列:flag1 和flag2,这两列总组存放了两个值,‘Y’和
‘N’,但是对于每一行记录,flag1 和flag2 的值永远保持一个是‘Y’而另一个是‘N’或
者相反,绝不会相同。
两个值相同的情况没有,下面我们看一下这个例子:

[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 8月 31 22:33:50 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn change/change
Connected.
SQL> drop table t purge;

Table dropped.

SQL> create table t as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1, decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.* from dba_objects a;

Table created.

SQL> select count(*) from t; ----------表中有记录72340条。

  COUNT(*)
----------
     72340

SQL> create index t_idx on t(flag1,flag2);----------创建联合索引

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T',method_opt=>'for all indexed columns size 254' );---------收集统计信息

PL/SQL procedure successfully completed.
这是表里的数据总量,总量的1/2,和总量的1/4,我们获得这个值是要和CBO 估算出的
结果集的数量做对比,以判断CBO 估算的正确性:

SQL> select num_rows, num_rows/2,num_rows/2/2 from user_tables where table_name = 'T';

  NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
---------- ---------- ------------
     72340      36170        18085

SQL> set autotrace traceonly
SQL> select * from t where flag1='N';

36170 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 36229 |  3573K|   300   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 36229 |  3573K|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG1"='N')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3455  consistent gets
          0  physical reads
          0  redo size
    1887389  bytes sent via SQL*Net to client
      27045  bytes received via SQL*Net from client
       2413  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      36170  rows processed
 此时CBO获取t表中,flag1为N的情况为36229条记录,和总表的1/2非常相近。
SQL> select * from t where flag2='N';

36170 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 36111 |  3561K|   300   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 36111 |  3561K|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG2"='N')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3452  consistent gets
          0  physical reads
          0  redo size
    1885652  bytes sent via SQL*Net to client
      27045  bytes received via SQL*Net from client
       2413  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      36170  rows processed
同样,对于flag=2 的记录数是总数的一半,此时CBO 表现完美,得出的执行计划也是最优
的。
看下面这个sql:
实际上,flag1 和flag2 是不可能一样的,也就是表中不可能存在一条记
录,它的值同时为‘N’或者同时为’Y’,这意味着,这条SQL 的返回值为0 条。

SQL> select * from t where flag1 = 'N' and flag2 = 'N';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 18085 |  1783K|   300   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 18085 |  1783K|   300   (1)| 00:00:04 |-----------CBO估算为18085条记录,其实算法就是NUM_ROWS/2/2,这和实际偏差比较大。
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG2"='N' AND "FLAG1"='N')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1077  consistent gets
          0  physical reads
          0  redo size
       1485  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';  -------下面采用level=3的动态采样

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

--------------------------------------------------------------------------------
-----

| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time
    |

--------------------------------------------------------------------------------
-----

|   0 | SELECT STATEMENT            |       |     6 |   606 |     2   (0)| 00:00
:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     6 |   606 |     2   (0)| 00:00
:01 |

|*  2 |   INDEX RANGE SCAN          | T_IDX |     6 |       |     1   (0)| 00:00
:01 |

--------------------------------------------------------------------------------
-----


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FLAG1"='N' AND "FLAG2"='N')

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
        140  consistent gets
          0  physical reads
          0  redo size
       1485  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
上面的执行计划中,CBO估算记录为6条,和实际非常接近。由于结果集很小,选择了索引。

Oracle 为动态采样化分了11 个级别,采样的数据块越多,得到的分析数据就越接近于真实,但同时伴随着资源消耗的也越大:
SQL> exec dbms_stats.delete_table_stats(user,'t');

PL/SQL procedure successfully completed.

SQL> select /*+dynamic_sampling(t 1) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 81606 |    16M|   300   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    | 81606 |    16M|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         48  recursive calls
          0  db block gets
       5871  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 5) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  4951 |  1020K|   299   (0)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    |  4951 |  1020K|   299   (0)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       5867  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 10) */ * from t; --------可见level 10的信息是最准确的。

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 72340 |    14M|   300   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    | 72340 |    14M|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
       6905  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 8) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 72340 |    14M|   300   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    | 72340 |    14M|   300   (1)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       6905  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed

SQL> select /*+dynamic_sampling(t 6) */ * from t;

72340 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2473 |   509K|   299   (0)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T    |  2473 |   509K|   299   (0)| 00:00:04 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
       5867  consistent gets
          0  physical reads
          0  redo size
    8592752  bytes sent via SQL*Net to client
      53566  bytes received via SQL*Net from client
       4824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      72340  rows processed
  动态采样也需要额外的消耗数据库资源,所以,在一个这样的环境中,SQL 被反复的
执行,变量被绑定,硬分析很少----是不宜使用动态采样的,就像OLTP 系统。动态采样发
生在硬分析时候,如果很少有硬分析发上,动态采样的意义就不大。
而OLAP 或者数据仓库环境下,SQL 的执行消耗的资源要远远大于SQL 解析,那么让
解析再消耗多一点资源做一些动态采样分析,从而做出一个最优的执行计划是非常值得的,
实际上在这样的环境下,硬分析消耗的资源几乎是可以忽略的。一般来说 在OLAP 或者数据仓库环境中,将动态采样的level 设置为3 或者4 是
比较好;相反,在一个OLTP 系统下,不应该使用动态采样。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/10701850/viewspace-1246585/,如需转载,请注明出处,否则将追究法律责任。

下一篇: 直方图
请登录后发表评论 登录
全部评论

注册时间:2008-12-04

  • 博文量
    25
  • 访问量
    335703