ITPub博客

首页 > Linux操作系统 > Linux操作系统 > How CBO estimate the row source for group operation

How CBO estimate the row source for group operation

原创 Linux操作系统 作者:realkid4 时间:2012-12-13 21:39:05 0 删除 编辑

 

For Oracle CBO, there are two epochs developed from 9i to 10g, IO cost model and CPU cost model. No matter which kinds of cost calculation model used, Row Source value (cardinality) which reflects IO throughout plays an important part in cost weight.

 

So, how many rows will be returned for a specific Oracle execution operation is our focus. In most cases, the row returned estimated by CBO will determinate the final SEP(SQL Execution Plan) result.

 

In this article, we will discuss about how CBO estimate the group operation result. And let’s try to get a hard look behind the curtain.

 

1. Background and Environment

 

We chose Oracle 10gR2 as the experiment environment. And try to create the object table.

 

 

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE        10.2.0.1.0         Production

 

TNS for Linux: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 - Production

 

SQL> create table t as select * from dba_objects;

Table created

 

 

The columns owner and object_type are important experimental objects. And we do not collect the statistic data at first.

 

 

SQL> select count(distinct(owner)), count(distinct(object_type)) from t;

 

COUNT(DISTINCT(OWNER)) COUNT(DISTINCT(OBJECT_TYPE))

---------------------- ----------------------------

                    18                           39

 

SQL> select count(*) from t;

 

  COUNT(*)

----------

     49747

 

 

 

2. Group estimated rows result without statistic

 

From Oracle 10g, the feature dynamic sampling is in use in case of object statistic missing. If Oracle detects there is no statistic not collected for specific objects, it will read a small number of data blocks and do a quick statistic collection.

 

 

 

SQL> show parameter dyn

 

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

optimizer_dynamic_sampling           integer     2

 

SQL> select table_name,num_rows, blocks, sample_size,last_analyzed from dba_tables where table_name='T' and wner='SYS';

 

TABLE_NAME                       NUM_ROWS     BLOCKS SAMPLE_SIZE LAST_ANALYZED

------------------------------ ---------- ---------- ----------- -------------

T                                                               

 

 

At present time, there is no statistical data for table T.

 

--Single Column Group by operation

SQL> explain plan for select owner, count(*) from t group by owner;

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      | 47853 |   794K|   158   (5)| 00:00:02 |

|   1 |  HASH GROUP BY     |      | 47853 |   794K|   158   (5)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 47853 |   794K|   153   (2)| 00:00:02 |

---------------------------------------------------------------------------

Note

-----

   - dynamic sampling used for this statement

 

13 rows selected

 

SQL> explain plan for select object_type, count(*) from t group by object_type;

Explained

 

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      | 47853 |   514K|   159   (6)| 00:00:02 |

|   1 |  HASH GROUP BY     |      | 47853 |   514K|   159   (6)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 47853 |   514K|   153   (2)| 00:00:02 |

---------------------------------------------------------------------------

Note

-----

   - dynamic sampling used for this statement

 

13 rows selected

 

 

From the two SEP output, we can find the dynamic sampling applies.

 

The group operation will aggregate result set into their distinct number of rows. But in the above results, we find that Oracle estimate the all rows number (47853) is the group by operation result. To put it another way, Oracle do not know the column distinct value.

 

As for the value 47853, it is obvious that Oracle guess it from the dynamic sampling operation.

 

To think it further, how many result estimated if we group by more than one column.

 

 

 

--Two Columns Group by Operation

SQL> explain plan for select owner, object_type, count(*) from t group by owner, object_type;

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      | 47853 |  1308K|   159   (6)| 00:00:02 |

|   1 |  HASH GROUP BY     |      | 47853 |  1308K|   159   (6)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 47853 |  1308K|   153   (2)| 00:00:02 |

---------------------------------------------------------------------------

Note

-----

   - dynamic sampling used for this statement

 

13 rows selected

 

 

The result is same.

 

Conclusion: In case with statistical data missing, Oracle CBO will not consider the group by operation feature, and it will return the row source value directly from lower level operation.

 

3. Group estimated rows with statistical data (Single Group By)

 

Let’s check the behavior. when table t has its formal statistical data.

 

 

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade => true);

 

PL/SQL procedure successfully completed

 

SQL>  select table_name,num_rows, blocks, sample_size,last_analyzed from dba_tables where table_name='T' and wner='SYS';

 

TABLE_NAME   NUM_ROWS     BLOCKS SAMPLE_SIZE LAST_ANALYZED

---------- ---------- ---------- ----------- -------------

T               49747        683       49747 11-十二月-12

 

 

SQL> select column_name, num_distinct, DENSITY from dba_tab_columns where table_name='T' and wner='SYS';

 

COLUMN_NAME                    NUM_DISTINCT    DENSITY

------------------------------ ------------ ----------

OWNER                                    16     0.0625

OBJECT_NAME                           29772 3.35886067

SUBOBJECT_NAME                           45 0.02222222

OBJECT_ID                             49747 2.01017146

DATA_OBJECT_ID                         3620 0.00027624

OBJECT_TYPE                              31 0.03225806

CREATED                                 851 0.00117508

LAST_DDL_TIME                           850 0.00117647

TIMESTAMP                               862 0.00116009

STATUS                                    1          1

TEMPORARY                                 2        0.5

GENERATED                                 2        0.5

SECONDARY                                 1          1

 

13 rows selected

 

 

The statistical item ‘NUM_DISTINCT’ is the most important value for group operation. The group will aggregate the result set into distinct records. So, column distinct value would affect CBO estimation process.

 

 

 

SQL> explain plan for select owner, count(*) from t group by owner;

Explained

 

SQL>  select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |    16 |    96 |   159   (6)| 00:00:02 |

|   1 |  HASH GROUP BY     |      |    16 |    96 |   159   (6)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 49747 |   291K|   153   (2)| 00:00:02 |

---------------------------------------------------------------------------

 

9 rows selected

 

SQL> explain plan for select object_type, count(*) from t group by object_type;

Explained

 

SQL>  select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |    31 |   279 |   159   (6)| 00:00:02 |

|   1 |  HASH GROUP BY     |      |    31 |   279 |   159   (6)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 49747 |   437K|   153   (2)| 00:00:02 |

---------------------------------------------------------------------------

 

9 rows selected

 

 

CBO identifies the column distinct value, and determinate that the column distinct value is the single group operation row returned.

 

Conclusion: When the column statistical data is collected, single column group operation estimate the row returned as column distinct value.

 

4. Group estimated rows with statistical data (Multi-Group By)

 

For more than one column group by operation, things become more complicated and mysterious.

 

First, let’s ponder how CBO would do for this case. Column distinct value is no doubt to be the important factor. But the problem is column correlation. We cannot ensure the result row number will be Descartes (m*n).

 

 

SQL> explain plan for select owner, object_type, count(*) from t group by owner, object_type;

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |   351 |  5265 |   159   (6)| 00:00:02 |

|   1 |  HASH GROUP BY     |      |   351 |  5265 |   159   (6)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 49747 |   728K|   153   (2)| 00:00:02 |

---------------------------------------------------------------------------

 

9 rows selected

 

 

Rows result 351 is not the value of Descartes (16×31=496). What about the true result value. I will use hint “gather_plan_statistics” to do the analysis.

 

 

SQL> select sql_id from v$sqlarea where sql_text like 'select /*+ GATHER_PLAN_STATISTICS */%';

 

SQL_ID

-------------

1yvt4jzv7jhrx

 

SQL> select * from table(dbms_xplan.display_cursor('1yvt4jzv7jhrx',0,'allstats last'));

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID  1yvt4jzv7jhrx, child number 0

-------------------------------------

select /*+ GATHER_PLAN_STATISTICS */owner, object_type, count(*) from t group by

owner, object_type

Plan hash value: 47235625

--------------------------------------------------------------------------------

| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buff

--------------------------------------------------------------------------------

|   1 |  HASH GROUP BY     |      |      1 |    351 |    198 |00:00:00.14 |

|   2 |   TABLE ACCESS FULL| T    |      1 |  49747 |  49747 |00:00:00.11 |

--------------------------------------------------------------------------------

 

14 rows selected

 

 

We guess that there are some internal calculation processes for CBO group operation. And CBO has already considered the column correlation. Let’s try to guess the methods.

 

 

Row Source: 16×31=496

          496/351=1.4131

 

 

I guess the factor is 1.4131. Only I can prove the factor by another test.

 

 

SQL> select column_name, num_distinct, DENSITY, NUM_NULLS from dba_tab_columns where table_name='M' and wner='SYS';

 

COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS

------------------------------ ------------ ---------- ----------

OWNER                                    15 0.06666666          0

TABLE_NAME                             1520 0.00065789          0

TABLESPACE_NAME                           3 0.33333333        224

CLUSTER_NAME                             10        0.1       1488

IOT_NAME                                 25       0.04       1499

STATUS                                    1          1          0

PCT_FREE                                  6 0.16666666         50

PCT_USED                                  3 0.33333333        839

INI_TRANS                                 6 0.16666666         50

MAX_TRANS                                 2        0.5         50

 

 

SQL> explain plan for select owner,tablespace_name, count(*) from m group by owner, tablespace_name;

 

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 230065229

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |    32 |   416 |    13   (8)| 00:00:01 |

|   1 |  HASH GROUP BY     |      |    32 |   416 |    13   (8)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| M    |  1524 | 19812 |    12   (0)| 00:00:01 |

---------------------------------------------------------------------------

 

9 rows selected

 

     

      15×3=45

      45/32=1.40625

 

 

The factor is very close. So I make the assumption that the factor is 1.41 when we do the two column group by operation.

 

 

To prove my guess, another test:

 

 

SQL> explain plan for select owner,iot_name, count(*) from m group by owner,iot_name;

 

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 230065229

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |   266 |  2128 |    13   (8)| 00:00:01 |

|   1 |  HASH GROUP BY     |      |   266 |  2128 |    13   (8)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| M    |  1524 | 12192 |    12   (0)| 00:00:01 |

---------------------------------------------------------------------------

 

9 rows selected

      

       15×25=375

       375/1.41=265.9

 

 

The result is very close to CBO estimation. What if three column group by operation?

 

 

 

SQL> explain plan for select owner,iot_name, tablespace_name, count(*) from m group by owner,tablespace_name,iot_name;

 

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 230065229

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |   563 |  8445 |    13   (8)| 00:00:01 |

|   1 |  HASH GROUP BY     |      |   563 |  8445 |    13   (8)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| M    |  1524 | 22860 |    12   (0)| 00:00:01 |

---------------------------------------------------------------------------

 

9 rows selected

 

 

Three columns are grouped in one SQL. The factor will work twice.

 

 

     15×25×3=1125

     1125/1.41/1.41=565.86

 

 

The result is close.

 

 

SQL> explain plan for select owner, object_type, temporary, count(*) from t group by owner, object_type, temporary;

 

Explained

 

SQL> select * from table(dbms_xplan.display);

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 47235625

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |   496 |  8432 |   160   (6)| 00:00:02 |

|   1 |  HASH GROUP BY     |      |   496 |  8432 |   160   (6)| 00:00:02 |

|   2 |   TABLE ACCESS FULL| T    | 49747 |   825K|   154   (2)| 00:00:02 |

---------------------------------------------------------------------------

 

9 rows selected

 

 

           16*31*2=922

           922/1.41/1/41=498

 

 

Although the difference is enlarged, I guess it is allowable for us at present time.

 

 

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/17203031/viewspace-751065/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
求道~

注册时间:2010-11-30

  • 博文量
    545
  • 访问量
    7822241