ITPub博客

首页 > Linux操作系统 > Linux操作系统 > How to compare the CBO estimate Row-Source and Actual Row Returned

How to compare the CBO estimate Row-Source and Actual Row Returned

原创 Linux操作系统 作者:realkid4 时间:2012-10-22 15:45:46 0 删除 编辑

 

Compared with the outdated RBO, one significant feature of CBO is that the SEP (SQL Execution Plan) is generated based on Data Object statistic and CBO internal Cost formula. Different SEPs have different CBO Cost calculation result, and CBO would choice the optimal SEP as the actual SEP.

 

Oracle 9i and 10g are the milestones in Oracle CBO history. In 9i, most fundamental cost algorithms are established. And 10g introduces automatic statistic collection job and mature dynamic sampling. The statistic’s missing and outdated problems are solved in some extent.

 

But no matter how high frequency the statistic job runs and how powerful the dynamic sampling are, the outdate statistic problem always happens in many cases. The direct effect of outdate statistic is that Oracle could not estimate nearly correct row source result. Row source result indicates how many data row would be returned, and it would affect the cost result of each SEP steps.

 

In many cases we found that Oracle generate the wrong SEP just because Oracle estimate the wrong row source result. For example, Oracle evaluates a step “Full Table Scan” to a table, and it will return row source value 100 based on the statistic information. But in actual environment, the operation would return only 1 row. Outdated statistic leads to wrong cost calculation result and wrong SEP choice.

 

For our performance tuning work, it is very necessary for us to see the optimizer row source result and actual row result in one SEP result.

 

Before 10g, we can use event 10046 to compare the result. But now, we have some more convenient tools to finish the task.

 

1Data and Environment Preparation

 

We choice Oracle 11R2 as the environment, and use the usual test table as before.

 

 

SQL> select * from v$version;

 

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

PL/SQL Release 11.2.0.1.0 - Production

CORE        11.2.0.1.0         Production

 

 

 

Notes the parameter statistic statistics_level’s value, it would affect the Oracle behavior. during the CBO working.

 

 

SQL> show parameter statistics_level;

 

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

statistics_level                     string      TYPICAL

 

 

We create a test table T and index on column owner.

 

 

SQL> create table t as select * from dba_objects;

Table created

 

SQL> create index idx_t_owner on t(owner);

Index created

 

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade => true);

PL/SQL procedure successfully completed

 

 

The statistic information could be found in Dictionary view.

 

 

SQL> col last_analyzed for a20;

SQL> select num_rows, last_analyzed from dba_tables where wner='SYS' and table_name='T';

 

  NUM_ROWS LAST_ANALYZED

---------- --------------------

     72269 2012/10/11 20:34:39

 

 

SQL> col column_name for a20;

SQL> select column_name, num_distinct, num_nulls, HISTOGRAM from dba_tab_col_statistics where wner='SYS' and table_name='T';

 

COLUMN_NAME          NUM_DISTINCT  NUM_NULLS HISTOGRAM

-------------------- ------------ ---------- ---------------

OWNER                          24          0 NONE

OBJECT_NAME                 43676          0 NONE

SUBOBJECT_NAME                110      71941 NONE

OBJECT_ID                   72269          0 NONE

DATA_OBJECT_ID               7485      64744 NONE

……

 

15 rows selected

 

 

Notes: Although column owner is skew, but there is not histogram for this column.

 

 

2. Usage of hint “GATHER_PLAN_STATISTICS”

 

In default CBO configuration cases, we could use hint “GATHER_PLAY_STATISTICS” plus dbms_xplan’s format parameter to compare the estimated row source and actual row source for a CBO SEP.

 

 

SQL> alter system flush shared_pool;

 

System altered

 

SQL> set pagesize 1000;

SQL> select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SCOTT';

 

  COUNT(*)

----------

         6

 

 

The package dbms_xplan is widely used in the SQL execution plan research. It can help us to extract SEP information from shared pool or PLAN_TABLE, and then organize the raw data into a more readable format.

 

The format parameter in display_cursor procedure is used to format the output result. In 10g, Oracle introduces format value “allstats last” to show Estimated and Actual row result in the same SEP report.

 

Notes: In normal CBO working model, hint “GATHER_PALN_STATISTICS” plus “allstats last” value can help to generate the desirable report.

 

 

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------

SQL_ID  dgzx9p378gpc3, child number 1

-------------------------------------

select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SCOTT'

Plan hash value: 1232703844

 

-------------------------------------------------------------------------------------------

| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

-------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |             |      1 |        |      1 |00:00:00.01 |       2 |

|   1 |  SORT AGGREGATE   |             |      1 |      1 |      1 |00:00:00.01 |       2 |

|*  2 |   INDEX RANGE SCAN| IDX_T_OWNER |      1 |   3011 |      6 |00:00:00.01 |       2 |

-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("OWNER"='SCOTT')

 

已选择19行。

 

 

Two columns are important in the output report: E-Rows and A-Rows. E-Rows refers to Estimated Rows returned, and it means how many rows could return calculated by CBO based on statistic.

 

The column A-Rows reflects the true row Oracle return during the SQL fetch process. It shows the real effect and performance of your SEP.

 

Take the wner=’SCOTT’ select statement for example. One signification point is that the gap in operation “Index Range Scan”. The estimated row source value is 3011, but the actual result is only 6. How does CBO calculate the value 3011.

 

In dictionary information, we find that there is not histogram on column owner. The row number for table t is 72269, and distinct number value is 24. In that case, Oracle will assume the value distribution is average. So when CBO conforms equal mark condition, it will return (72269/24)=3011.

 

But the actual case is different, wner=’SCOTT’ returns only 6 rows. So the gap can be found in output report.

 

The same case is shown below:

 

 

SQL> select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SYS';

 

  COUNT(*)

----------

     30918

 

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------

SQL_ID  2gy6rmm3jazff, child number 0

-------------------------------------

select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SYS'

 

Plan hash value: 1232703844

 

-------------------------------------------------------------------------------------------

| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

-------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |             |      1 |        |      1 |00:00:00.01 |      67 |

|   1 |  SORT AGGREGATE   |             |      1 |      1 |      1 |00:00:00.01 |      67 |

|*  2 |   INDEX RANGE SCAN| IDX_T_OWNER |      1 |   3011 |  30918 |00:00:00.10 |      67 |

-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("OWNER"='SYS')

 

已选择19行。

 

 

Obviously, the wner=’SYS’ returns more record in actual case, but the estimated value is still 3011.

 

The best method to deal with skew column is using histogram.

 

 

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade => true);

 

PL/SQL procedure successfully completed

 

SQL> alter system flush shared_pool;

System altered

 

SQL> select column_name, num_distinct, num_nulls, HISTOGRAM from dba_tab_col_statistics where wner='SYS' and table_name='T';

 

COLUMN_NAME          NUM_DISTINCT  NUM_NULLS HISTOGRAM

-------------------- ------------ ---------- ---------------

OWNER                          24          0 FREQUENCY

OBJECT_NAME                 43676          0 NONE

SUBOBJECT_NAME                110      71941 NONE

 

 

Histogram is generated on column owner. Let’s see its effect.

 

 

SQL> select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SCOTT';

 

  COUNT(*)

----------

         6

 

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------

SQL_ID  dgzx9p378gpc3, child number 0

-------------------------------------

select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SCOTT'

 

Plan hash value: 1232703844

-------------------------------------------------------------------------------------------

| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

-------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |             |      1 |        |      1 |00:00:00.01 |       2 |

|   1 |  SORT AGGREGATE   |             |      1 |      1 |      1 |00:00:00.01 |       2 |

|*  2 |   INDEX RANGE SCAN| IDX_T_OWNER |      1 |     13 |      6 |00:00:00.01 |       2 |

-------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("OWNER"='SCOTT')

已选择19行。

 

SQL> select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SYS';

  COUNT(*)

----------

     30918

 

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------

SQL_ID  2gy6rmm3jazff, child number 0

-------------------------------------

select /*+ GATHER_PLAN_STATISTICS */ count(*) from t where wner='SYS'

 

Plan hash value: 3398067621

 

-----------------------------------------------------------------------------------------------

| Id  | Operation             | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

-----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |             |      1 |        |      1 |00:00:00.04 |     178 |

|   1 |  SORT AGGREGATE       |             |      1 |      1 |      1 |00:00:00.04 |     178 |

|*  2 |   INDEX FAST FULL SCAN| IDX_T_OWNER |      1 |  31302 |  30918 |00:00:00.28 |     178 |

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("OWNER"='SYS')

已选择19行。

 

 

The gap between E-Rows and A-Rows is narrow. It would lead CBO to calculate the more precise cost result and finally generate better SEP.

 

3. Alternative Operation

 

 

Hint “GATHER_PLAN_STATISTICS” make CBO collect more statistics information during the SQL execution. It just likes a switch to on or off the function.

 

By default, Oracle would not turn on the function. We can have one method to open the function.

 

Parameter statistics_level’s default value is typical. If we configure the value to all, Oracle could collect all statistic during its running.

 

 

SQL> alter system flush shared_pool;

 

系统已更改。

 

SQL> alter session set statistics_level='ALL';

 

会话已更改。

 

 

After the configuration, we do not need to use the hint to affect default behavior.

 

 

SQL> select count(*) from t where wner='SYS';

 

  COUNT(*)

----------

     30918

 

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

 

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------

SQL_ID  1m79xbmjjavag, child number 0

-------------------------------------

select count(*) from t where wner='SYS'

 

Plan hash value: 3398067621

 

-----------------------------------------------------------------------------------------------

| Id  | Operation             | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

-----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |             |      1 |        |      1 |00:00:00.29 |     178 |

|   1 |  SORT AGGREGATE       |             |      1 |      1 |      1 |00:00:00.29 |     178 |

|*  2 |   INDEX FAST FULL SCAN| IDX_T_OWNER |      1 |  31302 |  30918 |00:00:00.15 |     178 |

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("OWNER"='SYS')

已选择19行。

 

 

 

4. Conclusion

 

The hint gather_plan_statistics plan format parameter in package dbms_xplan can be a help tools when we do the SQL tuning and optimizer internal research.

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/17203031/viewspace-746946/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
求道~

注册时间:2010-11-30

  • 博文量
    545
  • 访问量
    7814737