ITPub博客

首页 > Linux操作系统 > Linux操作系统 > Pipelined table function statistics and dynamic sampling

Pipelined table function statistics and dynamic sampling

原创 Linux操作系统 作者:chncaesar 时间:2013-09-16 10:43:47 0 删除 编辑
This is an extraction of Adrian Billington's article:
http://www.oracle-developer.net/display.php?id=429

At some point, you might need to join a pipelined function to another rowsource (such as a table, a view, or the intermediate output of other joins within a SQL execution plan). Rowsource statistics (such as cardinality, data distribution, nulls, etc) are critical to achieving efficient execution plans, but in the case of pipelined functions (or indeed any table function), the cost-based optimizer doesn't have much information to work with.

cardinality heuristics for pipelined table functions

Up to and including Oracle Database 11g Release 1, the CBO applies a heuristic cardinality to pipelined and table functions in SQL statements and this can sometimes lead to inefficient execution plans. The default cardinality appears to be dependent on the value of the DB_BLOCK_SIZE initialization parameter, but on a database with a standard 8Kb block size Oracle uses a heuristic of 8,168 rows. I can demonstrate this quite easily with a pipelined function that pipes a subset of columns from the employees table. Using Autotrace in SQL*Plus to generate an execution plan, I see the following.

/* Files on web: cbo_setup.sql and cbo_test.sql */
SQL> SELECT *
  2  FROM   TABLE(pipe_employees) e;

Execution Plan
----------------------------------------------------------
Plan hash value: 1802204150

--------------------------------------------------------------------
| Id  | Operation                         | Name           | Rows  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                |  8168 |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES |       |
--------------------------------------------------------------------
This pipelined function actually returns 50,000 rows, so if I join this pipelined function to the departments table, I run the risk of getting a suboptimal plan.

/* File on web: cbo_test.sql */
SQL> SELECT *
  2  FROM   departments           d
  3  ,      TABLE(pipe_employees) e
  4  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 4098497386

----------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |  8168 |
|   1 |  MERGE JOIN                         |                |  8168 |
|   2 |   TABLE ACCESS BY INDEX ROWID       | DEPARTMENTS    |    27 |
|   3 |    INDEX FULL SCAN                  | DEPT_ID_PK     |    27 |
|*  4 |   SORT JOIN                         |                |  8168 |
|   5 |    COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES |       |
----------------------------------------------------------------------
As predicted, this appears to be a suboptimal plan; it is unlikely that a sort-merge join will be more efficient than a hash join in this scenario. So how do I influence the CBO? For this example I could use simple access hints such as LEADING and USE_HASH to effectively override the CBO’s cost-based decision and secure a hash join between the table and pipelined function. However, for more complex SQL statements, it is quite difficult to provide all the hints necessary to “lock down” an execution plan. It is often far better to provide the CBO with better statistics with which to make its decisions. 

Optimizer dynamic sampling
This feature was enhanced in Oracle Database 11g (11.1.0.7) to include sampling for table and pipelined functions;

Dynamic sampling is an extremely useful feature that enables the optimizer to take a small statistics sample of one or more objects in a query during the parse phase. You might use dynamic sampling when you haven’t gathered statistics on all of your tables in a query or when you are using transient objects such as global temporary tables. Starting with version 11.1.0.7, the Oracle database is able to use dynamic sampling for table or pipelined functions.

To see what difference this feature can make, I’ll repeat my previous query but include a DYNAMIC_SAMPLING hint for the pipe_employees function.

/* File on web: cbo_test.sql */
SQL> SELECT /*+ DYNAMIC_SAMPLING(5) */
  2         *
  3  FROM   departments           d
  4  ,      TABLE(pipe_employees) e
  5  WHERE  d.department_id = e.department_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 815920909

---------------------------------------------------------------------
| Id  | Operation                          | Name           | Rows  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                | 50000 |
|*  1 |  HASH JOIN                         |                | 50000 |
|   2 |   TABLE ACCESS FULL                | DEPARTMENTS    |    27 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES |       |
---------------------------------------------------------------------
This time, the CBO has correctly computed the 50,000 rows that my function returns and has generated a more suitable plan. Note that I used the word “computed” and not “estimated” because in version 11.1.0.7 and later, the optimizer takes a 100% sample of the table or pipelined function, regardless of the dynamic sampling level being used (this is also the case in Oracle Database 11g Release 2). I used level 5, but I could have used anything between level 2 and level 10 to get exactly the same result. This means, of course, that dynamic sampling can be potentially costly or time-consuming if it is being used for queries involving high-volume or long-running pipelined functions.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/638844/viewspace-772753/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2013-07-30

  • 博文量
    102
  • 访问量
    899690