ITPub博客

首页 > 数据库 > Oracle > 直方图

直方图

原创 Oracle 作者:oracle_mao 时间:2015-12-24 12:55:00 1 删除 编辑

直方图概念:
 直方图,当某个列数据分布不均衡时,为了让cbo生成的执行计划更准确,可以收集直方图。我们可能需要对表收集直方图。直方图最大的桶数(Bucket)是254,默认桶数是75,桶数可以取值范围是1-254,。收集时直方图的时候很耗费cpu,oracle在收集直方图的时候会对表进行全表扫描,对于所有的列值放入不同的桶中。一般情况下,要理性的收集,如果收集不当可能会造成绑定变量窥探。
直方图用在什么情况下?
  列的值分布非常不均衡的时候,并且where条件中经常用到这个列。

绑定变量窥探概念:
 使用绑定变量,可以使sql共享,从来不产生那么多的硬解析,可以避免4031错误。但有时候绑定变量也会产生一些不好的事情,由于绑定变量而导致oracle没有正常的走应该走的执行计划。绑定变量窥探就是oracle在第一次硬解析一个带有绑定变量的sql时,会窥探(查看)变量的赋值,将真实值带入并生成一个执行计划,而以后同类型的sql都使用改执行计划,如果此时的真实值恰好是选择性小的值,生成了走索引的执行计划,那万一下次的真实值的选择性很大,其实走全表扫描更好,但由于之前的绑定变量窥探导致走了索引,那性能会查很多。
直方图实验:
实验:
SQL> create table test as select * from dba_objects;

Table created.
SQL> BEGIN
      DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SYS',
                                    tabname          => 'TEST',
                                    estimate_percent => 100,---采样比率
                                    method_opt       => 'for all columns size skewonly',
                                    degree           => 1,---指定并行度
                                    cascade          => TRUE);--指定也收集相关表的索引的统计信息
    END;
   /
PL/SQL procedure successfully completed.

 --这里method_opt(与列的统计相关)中size后面的值(size后面就是和直方图有关)有几种:
auto:Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.它基于数据的分布以及应用程序访问列的方式来创建直方图。
repeat:只会为现有的直方图重新分析索引,不再生成新的直方图。
1-254桶数:size后面如果跟着10,那就代表用10个桶
skewonly:Oracle determines the columns to collect histograms based on the data distribution of the columns。只会根据column的数据分布情况决定是否收集histogram。
这里auto和skewonly类似,但区别是skewonly一定会收集这个列的直方图,但auto是会根据列的数据和workload。

SQL> select a.column_name,
           b.num_rows,
           a.num_distinct Cardinality,
            round(a.num_distinct / b.num_rows * 100, 2) selectivity,
          a.histogram,
          a.num_buckets
      from dba_tab_col_statistics a, dba_tables b
     where a.owner = b.owner
       and a.table_name = b.table_name
      and a.table_name = 'TEST';
COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER                               13202           8         .06 FREQUENCY                 8
OBJECT_NAME                         13202       10219        77.4 HEIGHT BALANCED         254
SUBOBJECT_NAME                      13202          83         .63 FREQUENCY                83
OBJECT_ID                           13202       13202         100 NONE                      1
DATA_OBJECT_ID                      13202        2689       20.37 HEIGHT BALANCED         254
OBJECT_TYPE                         13202          37         .28 FREQUENCY                37
CREATED                             13202         389        2.95 HEIGHT BALANCED         254
LAST_DDL_TIME                       13202         407        3.08 HEIGHT BALANCED         254
TIMESTAMP                           13202         402        3.04 HEIGHT BALANCED         254
STATUS                              13202           1         .01 FREQUENCY                 1
TEMPORARY                           13202           2         .02 FREQUENCY                 2
GENERATED                           13202           2         .02 FREQUENCY                 2
SECONDARY                           13202           1         .01 FREQUENCY                 1
NAMESPACE                           13202          15         .11 FREQUENCY                15
EDITION_NAME                        13202           0           0 NONE                      0

15 rows selected.
解释一下:
CARDINALITY是基数,
SELECTIVITY是选择性,
HISTOGRAM是直方图,
频率直方图(FREQUENCY HISTOGRAM):num_buckets<254时,oracle会收集频率直方图。也可以说频率直方图的话,num_buckets就等于distinct的数量,(如果size 后面跟着的值不是桶数的话,比如是auto的话,那么如果一个列的基数小于254,那么它的桶数就等于基数。

高度平衡直方图(HEIGHT BALANCED):num_buckets=254时,oracle会收集高度平衡直方图。

对于object_name,不应该收集直方图,可能是收集统计信息的时候,不小心收集的直方图(method_opt       => 'for all columns size skewonly',),导致对表收集统计信息的时候对所有的列都收集了直方图。也可以说如果当一个列的值得选择性很高的话,或者说这个列的distinct值很大的话大到已经要接近与主键的话,那就没必要收集直方图了。


在owner有直方图的情况下,测试执行计划:

SQL> create index ind_mao on test(owner);

Index created.
SQL> set autotrace traceonly
SQL> select * from test where owner='SYS';

9280 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9280 |   806K|    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |  9280 |   806K|    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='SYS')


Statistics
----------------------------------------------------------
        112  recursive calls
          0  db block gets
       1016  consistent gets
          0  physical reads
          0  redo size
     404254  bytes sent via SQL*Net to client
       7214  bytes received via SQL*Net from client
        620  SQL*Net roundtrips to/from client
         76  sorts (memory)
          0  sorts (disk)
       9280  rows processed
SQL> select * from test where owner='SH';
Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     2 |   178 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST    |     2 |   178 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MAO |     2 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"='SH')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1519  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed


删除直方图信息:
SQL> BEGIN
      DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SYS',
                                    tabname          => 'TEST',
                                    estimate_percent => 100,
                                    method_opt       => 'for all columns size 1',
                                    degree           => 1,
                                    cascade          => TRUE);
    END;
   /
SQL> select a.column_name,
  2         b.num_rows,
  3         a.num_distinct Cardinality,
  4          round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5         a.histogram,
  6         a.num_buckets
  7    from dba_tab_col_statistics a, dba_tables b
  8   where a.owner = b.owner
  9     and a.table_name = b.table_name
 10     and a.table_name = 'TEST';
COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER                               13202           8         .06 NONE                      1
OBJECT_NAME                         13202       10219        77.4 NONE                      1
SUBOBJECT_NAME                      13202          83         .63 NONE                      1
OBJECT_ID                           13202       13202         100 NONE                      1
DATA_OBJECT_ID                      13202        2689       20.37 NONE                      1
OBJECT_TYPE                         13202          37         .28 NONE                      1
CREATED                             13202         389        2.95 NONE                      1
LAST_DDL_TIME                       13202         407        3.08 NONE                      1
TIMESTAMP                           13202         402        3.04 NONE                      1
STATUS                              13202           1         .01 NONE                      1
TEMPORARY                           13202           2         .02 NONE                      1
GENERATED                           13202           2         .02 NONE                      1
SECONDARY                           13202           1         .01 NONE                      1
NAMESPACE                           13202          15         .11 NONE                      1
EDITION_NAME                        13202           0           0 NONE                      0

15 rows selected.

SQL>  select * from test where owner='SH';


Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1650 |   143K|    46   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST    |  1650 |   143K|    46   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MAO |  1650 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"='SH')


Statistics
----------------------------------------------------------
         37  recursive calls
          0  db block gets
         64  consistent gets
          0  physical reads
          0  redo size
       1519  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          2  rows processed
这里虽然只返回2行结果,但oracle认为返回了1650行结果。这里的1650其实是13202/8。
我们没有对owner列收集直方图,那么oracle就会认为owner这个列是均衡的,就会认为行数(13202)/基数(8个值) 就是oracle认为的行数。得到了1650行,oracle觉得1650行对于1w多行中是应该走索引的。所以就走了索引。
select * from test where owner='SYS';
SQL> select * from test where owner='SYS';

9280 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1650 |   143K|    46   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST    |  1650 |   143K|    46   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MAO |  1650 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"='SYS')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1421  consistent gets
          0  physical reads
          0  redo size
     938251  bytes sent via SQL*Net to client
       7214  bytes received via SQL*Net from client
        620  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9280  rows processed
如果是sys的话,同样计算,oracle也是因为行数为1650,也还是会走索引。
select /*+ full(test) */ * from test where owner='SYS';
SQL> select /*+ full(test) */ * from test where owner='SYS';

9280 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1650 |   143K|    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |  1650 |   143K|    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='SYS')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        789  consistent gets
          0  physical reads
          0  redo size
     404254  bytes sent via SQL*Net to client
       7214  bytes received via SQL*Net from client
        620  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9280  rows processed
逻辑读比走索引要低。

为啥全表扫描要比索引的逻辑读要高呢?

SQL> select blocks from dba_segments where segment_name='TEST'
  2  ;

    BLOCKS
----------
       256

SQL> show parameter db_f

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count        integer     120

多块读的个数是120。那消耗io的次数就是789/16次。所以49次io就可以全表扫描完毕了。

索引扫描的io次数:
索引扫描是单词io单块读(除了INDEX FAST FULL SCAN)。我们这里的索引是INDEX RANGE SCAN ,是单块读,假如索引的高度是2,那如果扫描索引块的话,至少要扫描2个块,那就需要2次io,刚才索引返回9280行数据,如果一个索引块能存100条记录,那需要90个索引块才可以得到这些数据。所以需要进行90次的io,然后得到rowid后需要回表再得到数据。得到9k多个rowid后,回到表里查这些查rowid所对应的表中数据,至少还需要至少1百个左右的io。

如果有一个1000w行的表,那么如果查询返回结果是500w,那索引的高度为3的话,那一个索引块存100行记录,那就需要扫描5w次,再回表的话,也需要几万个io。那差不多就是10w次io。但如是全表扫描的话,那就只有1w个io。索引是单块读,所以涉及的等待事件也是db file sequential read 。如果看到这个等待事件,可以判断出可能是需要全表的但走了索引。

一次io单块读和一次io多快读的时间差别并不太大,对于现在的存储,多快读其实大多是在多个存储里读的,因为存储有条带化,多块读的话是在多个磁盘读,单块读的话是在一个存储里读,所以差别并不大。所以速度基本一样的。其实也都是毫秒里计算的,只有io扫描了上千万次,那可能给我们的感觉才是慢,如果只是io几万次,那我们可能是感觉不出来慢的。

如果我们没计算直方图,那oracle就简单的相除,如果我们计算了直方图,那oracle就知道了,不能单纯的相除了,要根据桶里的值来计算。



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24500180/viewspace-1952637/,如需转载,请注明出处,否则将追究法律责任。

下一篇: 收集统计信息
请登录后发表评论 登录
全部评论

注册时间:2011-03-28

  • 博文量
    94
  • 访问量
    752910