ITPub博客

首页 > 数据库 > Oracle > global stats字段

global stats字段

Oracle 作者:wmsok 时间:2014-08-26 13:29:49 0 删除 编辑
Subject:         Global statistics - An Explanation
          Doc ID:         Note:236935.1         Type:         FAQ
          Last Revision Date:         17-JUL-2008         Status:         PUBLISHED

PURPOSE

This article explains what Global statistics are, how to gather and check them in the dictionary views.
It also explains some differences between DBMS_STATS and the legacy ANALYZE command.

SCOPE & APPLICATION

DBAs and Support Engineers

What are Global statistics ?

Global statistics are gathered statistics that provide information regarding an object as a whole. Global statistics are directly gathered against each level of the object in question at the time of statistic gathering by selecting from the object itself at the desired level to represent the underlying objects.

For example, if a table is partitioned, then statistics at the table level can be gathered directly (and are called Global Statistics) or can be derived from the statistics on the underlying partitions (called Composite, Aggregate or Derived Statistics).
Even though a SubPartition has no underlying objects, the statistics gathered at SubPartition level are called Global Statistics as they are global for that level of the object.

Unless the query predicate narrows the query to a single partition, the optimizer will use the global statistics. Because most queries are not likely to be this restrictive, it is most important to have accurate global statistics. Intuitively, it can seem that generating some global statistics from partition-level statistics is straightforward; however, for some statistics this is not true and can be prone to inaccuracies. For example, it is very difficult to determine the number of distinct values for a column from the number of distinct values found in each partition, since there is potential for overlap in values. (See Example for an illustration of this)

How to gather Global Statistics ?

The gathering of true table level Global Statistics is specific to the DBMS_STATS package. Global statistics can exist with the ANALYZE command but they will not be directly collected from the table as a whole, rather they will be derived from aggregation of the individual partitions. See DBMS_STATS versus ANALYZE

The Granularity parameter of DBMS_STATS package specifies the level to which statistics should be gathered.

Granularity         Global Statistics Level
Global         Table
Partition         Partition
SubPartition         SubPartition
Default         Table + Partition
All         Table + Partition + Subpartition


Setting granularity to 'DEFAULT', or 'ALL' gathers statistics at more than 1 level.

For example, to collect global statistics from all levels of a SubPartitioned table :
Set the Granularity parameter to 'ALL' and DBMS_STATS will generate 3 sets of SQL to gather the statistics:

   1. select .... from table
   2. select .... from table (partition P1) -- Repeated for each partition
   3. select .... from table (subpartition SP1) -- Repeated for each subpartition in each partition 


How to check in dictionary views if Global Statistics are gathered ?
Check the GLOBAL_STATS column :

    * Value:
          o YES means Global Statistics are gathered.
          o NO means statistics are derived by aggregation from a lower level. 

    * At the Table Level:
          o [DBA|ALL|USER]_TABLES
          o [DBA|ALL|USER]_TAB_COL_STATISTICS

    * At the Partition Level:
          o [DBA|ALL|USER]_TAB_PARTITIONS
          o [DBA|ALL|USER]_PART_COL_STATISTICS

    * At the SubPartition level:
          o [DBA|ALL|USER]_TAB_SUBPARTITIONS
          o [DBA|ALL|USER]_SUBPART_COL_STATISTICS



DBMS_STATS versus ANALYZE

As discussed above, the DBMS_STATS package can gather global statistics at multiple levels as specified by the granularity parameter.
In contrast, the legacy ANALYZE command collects statistics only at the lowest level and derives higher level statistics by aggregation.
These aggregated statistics are sometimes less accurate than the global statistics since it is not possible to precisely determine the overlap of values across partitions. Statistics for the number of distinct values and density are particularly susceptible to inaccuracy.(See Example for an illustration of this)

The optimizer uses global statistics as the basis of access path generation unless query predicates narrow the query to a single partition. Since most queries are unlikely to be this restrictive, it is important to have accurate global statistics. Gathering global statistics with the DBMS_STATS package is highly recommended.

Key points:

    * When ANALYZE gathers statistics GLOBAL_STATS is always set to NO


    * When DBMS_STATS gathers statistics, the GLOBAL_STATS column is populated thus:
          o GLOBAL_STATS = YES - means global statistics have been gathered
          o GLOBAL_STATS = NO - means statistics will be derived by aggregation from lower level objects 

    * Note that in Oracle 8i and 9.0 DBMS_STATS may contain recursive ANALYZE statements (for example to gather index statistics). If ANALYZE has been called by DBMS_STATS, then GLOBAL_STATS will have been set to "NO" even though DBMS_STATS was used (since ANALYZE has been called to actually gather the statistics).


    * Gathering global statistics is likely to take more time than deriving them by aggregation, but global statistics are more accurate.


    * If ALL low level statistics are gathered (partition or subpartition) specifically without the higher level (so NO global_stats = yes at upper level), then upper "aggregate" level stats are derived from the lower levels and GLOBAL_STATS is set to NO against the derived statistics.
      For example, if statistics are gathered with a granularity of 'partition' then partition level stats will be gathered and GLOBAL_STATS set to YES, Table level statistics will be derived from all Partition statistics and GLOBAL_STATS set to NO. No subpartition statistics will be gathered. (See Example for an illustration of this)


    * Global statistics (global_stats = yes) are NEVER derived (overwritten/refreshed) from a new aggregation of the statistics collected against underlying objects. The CBO will still use them if available. If they are not available (global_stats = no), new aggregated statistics will be derived from these new statistics collected against underlying objects.


    * It is important that the usage of DBMS_STATS or ANALYZE is kept consistent for each object as otherwise the resultant statistics can become very confusing.
      For example:
         1. Gather statistics was done with DBMS_STATS and granularity level set to 'ALL'.
            The result of this is that all statistics are global statistics and GLOBAL_STATS column was set to YES at all levels.
         2. To refresh statistics, ANALYZE command is then used.
            ANALYZE is only able to update statistics at the lowest level (where they are gathered) and set the Global_stats column value to NO at this level.
            Other levels don't have their statistics refreshed and keep the global_stats column value to YES.



Global Statistics Examples

    * Table T3 has 3 columns I, P and SP. (See Table Creation Information)

      The table is PARTITIONed BY RANGE(p) and SUBPARTITIONed BY HASH(sp)
      Column I has 100000 distinct values.
      Dependent on the gathering commands (DBMS_STATS as opposed to ANALYZE), the CBO will use either 100000 or 59845 distinct values for I. This is explained below.


      Consider the following statistics gathering commands:

      1) The following commands will record the number of distinct values for column I at Table Level as 100000:

      exec DBMS_STATS.GATHER_TABLE_STATS('ST','T3',granularity => 'ALL');
      exec DBMS_STATS.GATHER_TABLE_STATS('ST','T3',granularity => 'DEFAULT');

      2) These commands will record the number of distinct values for column I at Table Level as 59845:

      ANALYZE TABLE ST.T3 COMPUTE STATISTICS;
      exec DBMS_STATS.GATHER_TABLE_STATS('ST','T3',granularity => 'SUBPARTITION');

      Clearly these is a significant difference in the values recorded. If analyze is used (ie without gathered global statistics) then this inaccuracy could result in the selection of a suboptimal plan.




    * Table T2 has 2 columns I, P . (See Table Creation Information)

      The table is PARTITIONed BY RANGE(p)
      Column I has 2 distinct values 'P1' and 'P2'.
          o 10000 times 'P1' in partition P1
          o 1 times 'P2' in partition P2


      Dependent on the gathering commands (DBMS_STATS as opposed to ANALYZE), the CBO will use either 0.00005 or 0.5000 as the Density value for column I.
      For a select such as:

      SELECT P FROM T2 WHERE I = 'P2'
      This can mean that the CBO will use a Cardinality (CDN) of 1 or 5001.

      Additionally, because aggregated statistics don't have any histograms, this can lead to high variations in the Density value.


      Consider the following statistics gathering commands:

      1) The following commands will record histograms for I at Table level Density as DENS = 0.0005 and Cardinality as CDN = 1:

      exec dbms_stats.gather_table_stats(NULL,'T2', method_opt => 'for columns I size 10');

      2) These commands will record histograms for column I at Partition Level Density as DENS = 0.5000 and Cardinality as CDN=5001:

      analyze table T2 compute statistics for columns I size 10, P size 1;
      exec DBMS_STATS.GATHER_TABLE_STATS(NULL,'T2',granularity => 'PARTITION', method_opt => 'for columns I size 10, P size 1');

      Clearly these is a significant difference in the values recorded. If gathered global statistics are not used then this inaccuracy can result in the selection of a suboptimal plan.



More Examples of exactly what is gathered by various commands:

A) Gather global statistics at all levels:

exec DBMS_STATS.GATHER_TABLE_STATS('ST','T3',granularity => 'ALL');

    * Collects the following statistics and sets GLOBAL_STATS column value to YES
          o SubPartition Level
                + USER_TAB_SUBPARTITIONS
                + USER_SUBPART_COL_STATISTICS 
          o Partition Level
                + USER_TAB_PARTITIONS
                + USER_PART_COL_STATISTICS 
          o Table Level
                + USER_TABLES
                + USER_TAB_COL_STATISTICS 


B) Gather global statistics at Table and partition levels, no SubPartition statistics:

exec DBMS_STATS.GATHER_TABLE_STATS('ST','T3',granularity => 'DEFAULT');

    * Collects the following statistics and sets GLOBAL_STATS column value to YES
          o Partition Level
                + USER_TAB_PARTITIONS
                + USER_PART_COL_STATISTICS 
          o Table Level
                + USER_TABLES
                + USER_TAB_COL_STATISTICS 

    * No Statistics:
          o SubPartition Level
                + USER_TAB_SUBPARTITIONS
                + USER_SUBPART_COL_STATISTICS 


C) Gather no global statistics at any level:

ANALYZE TABLE ST.T3 COMPUTE STATISTICS;

    * collects only the lowest level statistics
          o SubPartition Level
                + USER_TAB_SUBPARTITIONS
                + USER_SUBPART_COL_STATISTICS 

    * Derives upper level statistics by aggregation :
          o Partition Level
                + USER_TAB_PARTITIONS
                + USER_PART_COL_STATISTICS 
          o Table Level
                + USER_TABLES
                + USER_TAB_COL_STATISTICS 

    * GLOBAL_STATS Column value always set to NO with ANALYZE 


D) Gather global statistics at SubPartition level and derive Table and Partition statistics:

exec DBMS_STATS.GATHER_TABLE_STATS('ST','T3',granularity => 'SUBPARTITION');

    * Collects only the lowest level statistics and sets GLOBAL_STATS column value to YES
          o SubPartition Level
                + USER_TAB_SUBPARTITIONS
                + USER_SUBPART_COL_STATISTICS 

    * Derives upper level statistics by aggregation and GLOBAL_STATS column value = NO. This will not overwrite GLOBAL_STATS=YES at higher level if stats have been previously gathered at that level
          o Partition Level
                + USER_TAB_PARTITIONS
                + USER_PART_COL_STATISTICS 
          o Table Level
                + USER_TABLES
                + USER_TAB_COL_STATISTICS 

Table Creation Information

The following is information for creating the table in the above example:

drop table ST.T3;

CREATE TABLE ST.T3(i number,  p number,sp number)
PARTITION BY RANGE(p)
SUBPARTITION BY HASH(sp) SUBPARTITIONS 2 
(PARTITION q1 VALUES LESS THAN(3) TABLESPACE USERS,
PARTITION q2 VALUES LESS THAN(MAXVALUE) TABLESPACE USERS
);

declare 
  i number; 
begin 
  for i in 1..100000 loop 
    insert into T3 values(i,mod(i,7), mod(i,8)); 
    if( mod(i, 1000) = 0) then commit; end if; 
  end loop; 
for i in 1..50000 loop 
    insert into T3 values(i,mod(i,7), mod(i,8)+5); 
    if( mod(i, 1000) = 0) then commit; end if; 
  end loop; 
end; 
/

Table Creation Information

The following is information for creating the table in the above example:

drop table ST.T2;

CREATE TABLE "T2" (
  "I" varchar2(2)
, "P"  number

PARTITION  BY RANGE (P) (
  PARTITION "P1" VALUES LESS THAN  (2) 
, PARTITION "P2" VALUES LESS  THAN (MAXVALUE)  ) ;

BEGIN
for i in 1..10000 loop
insert into T2 (I,P) values ('P1',1);
END LOOP;
insert into T2 (I,P) values ('P2',2);
commit;
END;
/

RELATED DOCUMENTS

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/25960404/viewspace-1257918/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2011-07-07

  • 博文量
    24
  • 访问量
    119100