很多客户的数据库从9i升到10g 后都出现了性能问题，有些甚至比9i时下降超过50%。后来经过分析绝大多数都是和大量SQL使用了错误的执行计划有关。为什么10g 会出现这么严重的执行计划偏差呢？在9i的时代里，我们都自己编写分析脚本来对表和索引进行分析，在使用采样比例上，大家一般都比较保守，基本上都在20%以上。而10G提供了一个令人心动的技术就是自动分析，10g 数据库由于取消了RBO优化模式，缺省的优化器模式变为了ALL_ROWS，因此对分析数据的依赖程度更高，因此Oracle自带了一个标准的分析脚本，并且定期进行分析。不过随着这种新技术的使用，问题就出现了，oracle 的自动分析脚本往往只做很少量的采样，导致大量的表的评估信息出现严重偏差。最终很多客户选择了停止自动采集作业，改为手工分析。
Recommendations for Gathering Optimizer Statistics on 10g
Doc ID: 605439.1 Type: HOWTO
Modified Date : 28-JUL-2009 Status: PUBLISHED
Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4
Oracle Server - Standard Edition - Version: 10.1.0.2 to 10.2.0.4
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4
This Document outline the recommended method to gather a standard set of optimizer statistics for use by the Cost Based Optimizer under Oracle 10g.
IMPORTANT: PLEASE NOTE:
These recommendations apply to the majority of databases.------针对所有主要数据库都适用这些建议
The recommendations aim to generate statistics with as much statistical accuracy as possible. To this end 100% sample sizes are suggested since any reduction in sample size is always a concession to accuracy. It is acknowledged that such 100% samples are potentially time consuming and consideration needs to be made to fit the statistics gathering activities within the existing maintenance window.
Gathering new optimizer statistics should maintain or improve existing execution plans, but it is possible that some queries performance may degrade. Note that from 10gR1 previous copies of statistics are maintained by default for the last 30 days and can be restored in the case of problems
Gathering new optimizer statistics may invalidate cursors in the shared pool so it is prudent to restrict all gathering operations execution to periods of low activity in the database, such as the scheduled maintenance windows.
For very large systems, the gathering of statistics can be a very time consuming and resource intensive activity.In this environment sample sizes need to be carefully controlled to ensure that gathering completes within acceptable timescale and resource constraints and within the maintenance window.In these environments, it is also recommended to utilise change based statistics gathering to avoid re-gathering information unnecessarily
On 10g it is recommended to:
Gather statistics using scheduled statistics gathering scripts. In most cases the default scripts provide an adequate level of sampling taking into account the following recommendations:
Use sample size that is large enough. On 10g support suggests an estimate sample size of 100% (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If 100% is not feasible, try using at least an estimate of 30%. Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. This setting is because the default AUTO_SAMPLE_SIZE uses a very small estimate percentage which can result in poor estimates.
Ensure all objects (tables and indexes) have stats gathered. An easy way to achieve this is to use the CASCADE parameter.
Ensuring that any columns with skewed data distribution have histograms collected, and at sufficient resolution using the METHOD_OPT parameter. Support recommends a conservative and more plan-stable approach of "adding a histogram only if it is known to be needed" rather than collecting column statistics on all columns. This can be achieved manually or by using the SKEWONLY option to automatically add column statistics to columns that contain data with a non-uniform. distribution. Using the default column statistics setting of AUTO which means that DBMS_STATS will decide which columns to add histogram to where it believes that they may help to produce a better plan. If statistics are not completely up to date then the presence of Histograms can cause trouble when parsing values are out of range, or between values for "frequency" histograms. In these circumstances the optimizer has to make guesses which may be inaccurate and, on occasion, cause poor plans.
Note that in earlier versions the default setting for the METHOD_OPT parameter was "FOR ALL COLUMNS SIZE 1" which would collect only a high and a low value and effectively meant that there were no detailed column statistics. It is known that in some cases, the effect of a histogram is adverse to the generation of a better plan so users moving between versions may initially wish to set this parameter to its pre-upgrade release value, and later adjust to the post-upgrade release default value.
f partitions are in use, gather global statistics if possible due to time constraints. Global stats are very important but gathering is often avoided due to the sizes involved and length of time to required. If 100% samples are not possible then support would recommend going for a minimum of 1%. Gathering with small sample sizes (e.g. 0.001, 0.0001, 0.00001 etc. ) can be very effective but equally, a large proportion of the data will not be examined which could prove decisive to the optimizer's plan choices. Note that the available range for the ESTIMATE_PERCENT parameter is a very flexible [0.000001 -> 100] which can use very small sample sizes suitable for huge partitioned tables. Testing will reveal the most suitable settings for each system.
Note 236935.1 Global statistics - An Explanation
Gather system statistics to reflect the CPU loading of the system and to improve the accuracy of the CBO's estimates by providing the CBO with CPU cost estimates in addition to the normal I/O cost estimates. See:
Note 470316.1 Using Actual System Statistics (Collected CPU and IO information Note 149560.1 Collect and Display System Statistics (CPU and IO) for CBO usage Note 153761.1 Scaling the System to Improve CBO optimizer
Note that the defaults for statistics gathering on different versions of Oracle are not necessarily the same, for example:
9i : 100%
10g : DBMS_STATS.AUTO_SAMPLE_SIZE (using very small estimate percentage)
11g : DBMS_STATS.AUTO_SAMPLE_SIZE (using larger estimate percentage - 100%)
9i : "FOR ALL COLUMNS SIZE 1" effectively no detailed column statistics.
10g and 11g : "FOR ALL COLUMNS SIZE AUTO" - This setting means that DBMS_STATS decides which columns to add histogram to where it believes that they may help to produce a better plan.
In 11g, using auto size for ESTIMATE_PERCENT defaults to 100% and therefore is as accurate as possible. Additionally, even though a 100% sample is collected, the gathering process is really fast since a new hashing algorithm is used to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting). In 10g the default ESTIMATE_PERCENT sample size was extremely small which often resulted in poor statistics and is therefore not recommended.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/665930/viewspace-694434/，如需转载，请注明出处，否则将追究法律责任。