ITPub博客

首页 > Linux操作系统 > Linux操作系统 > ADDM: Automatic Database Diagnostic Monitor

ADDM: Automatic Database Diagnostic Monitor

Linux操作系统 作者:tthero00boo 时间:2013-11-24 17:54:16 0 删除 编辑
from:
http://blog.csdn.net/tianlesoftware/article/details/5630942

 


 ADDM(Automatic Database Diagnostic Monitor) 是植入Oracle数据库的一个自诊断引擎.
 ADDM 通过检查和分析AWR获取的数据来判断Oracle数据库中可能的问题.

 Oracle10g中推出了新的优化诊断工具:数据库自动诊断监视工具(Automatic Database Diagnostic Monitor :ADDM)
 和SQL优化建议工具(SQL Tuning Advisor: STA)。
 这两个工具的结合使用,能使DBA节省大量优化时间,也大大减少了系统宕机的危险。
 简单点说,ADDM就是收集相关的统计数据到自动工作量知识库(Automatic Workload Repository :AWR)中,
 而STA则根据这些数据,给出优化建议。

有了STA以后,它就可以根据ADDM采集到的数据直接给出优化建议,甚至给出优化后的语句。


与其他顾问一样,ADDM也接受来自AWR的统计信息和其他信息。
只要生成快照,MMON进程就会自动运行ADDM(Automatic Database Diagnostic)。

自动生成的ADDM报告总是会包括当前快照与前一个快照之间的时间段,因此在默认情况下可以访问每小时的ADDM报告。
如果希望ADDM报告跨越更长的时间段,那么也可以手动的调用ADDM生成包括任意两个快照之间时间段的报告。
自动快照以及手动收集快照都会触发ADDM。ADDM报告默认在30天后清除。

 

ADDM能发现定位的问题包括:

 操作系统内存页入页出问题
 由于Oracle负载和非Oracle负载导致的CPU瓶颈问题
 导致不同资源负载的Top SQL语句和对象——CPU消耗、IO带宽占用、潜在IO问题、RAC内部通讯繁忙
 按照PLSQL和JAVA执行时间排的Top SQL语句.
 过多地连接 (login/logoff).
 过多硬解析问题——由于shared pool过小、书写问题、绑定大小不适应、解析失败原因引起的。
 过多软解析问题
 索引查询过多导致资源争用.
 由于用户锁导致的过多的等待时间 (通过包dbms_lock加的锁)
 由于DML锁导致的过多等待时间(例如锁住表了)
 由于管道输出导致的过多等待时间(如通过包dbms_pipe.put进行管道输出)
 由于并发更新同一个记录导致的过多等待时间(行级锁等待)
 由于ITL不够导致的过多等待时间(大量的事务操作同一个数据块)
 系统中过多的commit和rollback(logfile sync事件).
 由于磁盘带宽太小和其他潜在问题(如由于logfile太小导致过多的checkpoint,
  MTTR设置问题,过多的undo操作等等)导致的IO性能问题I
 对于DBWR进程写数据块,磁盘IO吞吐量不足
 由于归档进程无法跟上redo日至产生的速度,导致系统变慢
 redo数据文件太小导致的问题
 由于扩展磁盘分配导致的争用
 由于移动一个对象的高水位导致的争用问题
 内存太小问题——SGA Target, PGA, Buffer Cache, Shared Pool
 在一个实例或者一个机群环境中存在频繁读写争用的热块
 在一个实例或者一个机群环境中存在频繁读写争用的热对象
 RAC环境中内部通讯问题
 LMS进程无法跟上导致锁请求阻塞
 在RAC环境中由于阻塞和争用导致的实例倾斜
 RMAN导致的IO和CPU问题
 Streams和AQ问题
 资源管理等待事件

ADDM提供了一个整体的优化方案.基于一段时间内的AWR snapshots(默认一小时一次)可以执行ADDM 分析,
它可以帮我们诊断在这段期间内数据库可能存在的瓶颈.

ADDM分析的目标是减小吞吐量的度量值, 在这里我们将它称为"DB TIME".
DB TIME是一个累积值(数据库服务器处理用户请求所花费的时间).
它包括了等待时间和CPU处理的时间(针对所有活跃的用户进程而言),可以通过查询下面两个视图来获得它的值: 
V$SESS_TIME_MODEL, V$SYS_TIME_MODEL.

AWR收集的数据时放到内存中(share pool),通过一个新的后台进程MMON定期写到磁盘中。
所以10g的share pool要求比以前版本更大,一般推荐比以前大15-20%。

通过减小"DB TIME", 使用同样多的系统资源,数据库服务器可以处理更多的用户请求,也就是提高了吞吐量.
通过ADDM报告的问题是按照DB time排序的.

/* ADDM 分析的结果 */

ADDM 分析的结果以一些"Finding"的样式来表达. 每个"Finding"都属于以下三种类型之一:

1. 问题: 描述了导致数据库性能问题的根源;
2. 征兆: 包含了可能导致其他问题的信息
3. 信息: 报告其他没有问题的模块


/* 设置ADDM */

缺省情况下,ADDM已经被自动启用,通过初始化参数文件中的STATISTICS_LEVEL来控制.
这个参数应该被设置成TYPICAL或者ALL(缺省值是TYPICAL).如果你将这个参数设置成basic,很多Oracle的特性将被屏蔽.

ADDM 对于I/O性能的评估分析在部分程度上依赖于这个DBIO_EXPECTED.
这个参数的含义是读取一个数据块所花费的平均时间(以微秒为单位). Oracle使用的是缺省值(10毫秒=10000微秒),
对于现在流行的硬盘来说, 这是一个比较合适的值.如果你的硬盘比较陈旧,或者你有一个非常好的RAM DISK,请修改这个值.

DBIO_EXPECTED

ADDM analysis of I/O performance partially depends on a single argument, DBIO_
EXPECTED, that describes the expected performance of the I/O subsystem. The
value of DBIO_EXPECTED is the average time it takes to read a single database
block in microseconds. Oracle uses the default value of 10 milliseconds, which is an
appropriate value for most modern hard drives. If your hardware is signi?cantly
different, such as very old hardware or very fast RAM disks, consider using a
different value.


为了决定DBIO_EXPECTED这个参数该怎样去正确地配置,需要完成下面的步骤

1. 基于你的机器的硬件,估量一下读取单个数据库块所花费的平均时间.
注意:这个度量应该针对随机的I/O(包括寻道的时间).传统的值应该属于5000-20000微秒这个区间.

2. 为接下来的ADDM执行设置一个时间参数. 例如:如果估计的值是8000微秒,你应该以SYS的身份执行
下面的过程:

EXECUTE DBMS_ADVISOR.SET_DEFAULT_PARAMETER ('ADDM','DBIO_EXPECTED',8000);

/* 诊断与ADDM相关的问题: */
为了诊断数据库性能问题, ADDM分析可以跨越任意两个snapshots,只要它们满足下面两个条件:
1. 两个快照在创建过程中没有错误并且没有被删除;
2. 两个快照期间数据库不能发生关闭和启动的事件
(同statspack).

/* 运行 addm */
最简单的运行ADDM分析的方法就是运行Enterprise Manager.
另外,也可以手工地执行 $ORACLE_HOME/rdbms/admin/addmrpt.sql以及dbms_advisor包.
这些脚本和包可以被任何用户执行,只要它们被赋予了ADVISOR的角色.

使用addmrpt.sql来运行
和statspack包中的spreport.sql非常相似

 /* 1. 手工创建一个AWR快照 */
SQL> exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

 /* 2. 进行一些高负荷操作 */
ed
:set nonumber

declare
    v_var number;
begin
    for i in 1..10
    loop
        select count(*) into v_var from xujh.small_table ,xujh.big_table;
        dbms_output.put_line('time:'||i||':'||v_var);
    end loop;
end;
/

SQL> /
time:1:10000000
time:2:10000000
time:3:10000000
time:4:10000000
time:5:10000000
time:6:10000000
time:7:10000000
time:8:10000000
time:9:10000000
time:10:10000000

PL/SQL procedure successfully completed.

/* 3. 再创建一个AWR快照 */
SQL> exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

/* 4. 创建一个优化诊断任务并执行 */

先获取到两次快照的ID:

SQL> select snap_id from
  2  (SELECT * FROM dba_hist_snapshot
  3   ORDER BY snap_id desc)
  4   where rownum <=2;

   SNAP_ID
----------
       393
       392

然后创建优化任务,并执行。

DECLARE
    task_name VARCHAR2(30) := 'DEMO_ADDM01';
    task_desc VARCHAR2(30) := 'ADDM Feature Test';
    task_id NUMBER;
BEGIN
    dbms_advisor.create_task('ADDM', task_id, task_name, task_desc, null);
    dbms_advisor.set_task_parameter(task_name, 'START_SNAPSHOT', 392);
    dbms_advisor.set_task_parameter(task_name, 'END_SNAPSHOT', 393);
    dbms_advisor.set_task_parameter(task_name, 'INSTANCE', 1);
    dbms_advisor.set_task_parameter(task_name, 'DB_ID', 2432893466);
    dbms_advisor.execute_task(task_name);
END;
/

SQL> /

PL/SQL procedure successfully completed.

/* 5. 查看优化建议结果 */

通知函数dbms_advisor.get_task_report可以得到优化建议结果

SQL> SET LONG 1000000 PAGESIZE 0 LONGCHUNKSIZE 1000
SQL> COLUMN get_clob FORMAT a80
SQL> SELECT dbms_advisor.get_task_report('DEMO_ADDM01', 'TEXT', 'ALL') FROM DUAL;

/* 4,5 可以直接执行$ORACLE_HOME/rdbms/admin/addmrpt.sql来得到,
这个脚本的执行过程和statspack脚本执行过程类似:
*/

SQL> @?/rdbms/admin/addmrpt.sql

....

Listing the last 3 days of Completed Snapshots

                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
------------ ------------ --------- ------------------ -----
myorcl11     MYORCL11           370 22 Nov 2013 01:16      1

                                371 22 Nov 2013 09:05      1
                                372 22 Nov 2013 10:00      1
                                373 22 Nov 2013 11:00      1
                                374 22 Nov 2013 12:00      1
                                375 22 Nov 2013 12:11      1
                                376 22 Nov 2013 13:00      1
                                377 22 Nov 2013 14:00      1

                                378 23 Nov 2013 13:35      1
                                379 23 Nov 2013 15:00      1

                                380 24 Nov 2013 07:51      1

                                381 24 Nov 2013 09:00      1
                                382 24 Nov 2013 10:00      1
                                383 24 Nov 2013 11:00      1
                                384 24 Nov 2013 12:00      1

                                385 24 Nov 2013 13:00      1

                                386 24 Nov 2013 14:23      1
                                387 24 Nov 2013 15:00      1
                                388 24 Nov 2013 15:53      1
                                389 24 Nov 2013 15:55      1
                                390 24 Nov 2013 16:30      1
                                391 24 Nov 2013 17:00      1
                                392 24 Nov 2013 17:24      1
                                393 24 Nov 2013 17:34      1

 

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 386
Begin Snapshot Id specified: 386

Enter value for end_snap: 393
End   Snapshot Id specified: 393

 

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is addmrpt_1_386_393.txt.  To use this name,
press to continue, otherwise enter an alternative.

Enter value for report_name:

Using the report name addmrpt_1_386_393.txt

$ less addmrpt_1_386_393.txt

/* 如果是RAC环境下,可以执行$ORACLE_HOME/rdbms/admin/addmrpti.sql,这脚本的执行,
会多出要求输入DB ID和instance ID的要求。
*/

 

/* 诊断结果分析 */

我们从上面的建议结果看到了,ADDM Report的结果与Statspack Report的结果大不相同。
Statspack Report的结果给出的都是统计数据、各种事件,然后由DBA根据这些数据给出优化建议,
而ADDM Report的结果包含就已经是给出的优化建议了
 
Summary of Findings
-------------------
   Description                     Active Sessions      Recommendations
                                   Percent of Activity
   ------------------------------  -------------------  ---------------
1  "User I/O" wait Class           .01 | 24.19          0
2  Hard Parse                      .01 | 17.97          0
3  Session Connect and Disconnect  0 | 5.17             1
4  "Scheduler" Wait Class          0 | 4.11             0
5  Commits and Rollbacks           0 | 3.72             1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          Findings and Recommendations
          ----------------------------

Finding 1: "User I/O" wait Class
Impact is .01 active sessions, 24.19% of total activity.
--------------------------------------------------------
Wait class "User I/O" was consuming significant database time.
The throughput of the I/O subsystem was not significantly lower than expected.
The Oracle instance memory (SGA and PGA) was adequately sized.

   No recommendations are available.


Finding 2: Hard Parse
Impact is .01 active sessions, 17.97% of total activity.
--------------------------------------------------------
Hard parsing of SQL statements was consuming significant database time.
Hard parses due to cursor environment mismatch were not consuming significant
database time.
Hard parsing SQL statements that encountered parse errors was not consuming
significant database time.
Hard parses due to literal usage and cursor invalidation were not consuming
significant database time.
The Oracle instance memory (SGA and PGA) was adequately sized.

   No recommendations are available.


Finding 3: Session Connect and Disconnect
Impact is 0 active sessions, 5.17% of total activity.
-----------------------------------------------------
Session connect and disconnect calls were consuming significant database time.

   Recommendation 1: Application Analysis
   Estimated benefit is 0 active sessions, 5.17% of total activity.
   ----------------------------------------------------------------
   Action
      Investigate application logic for possible reduction of connect and
      disconnect calls. For example, you might use a connection pool scheme in
      the middle tier.


Finding 4: "Scheduler" Wait Class
Impact is 0 active sessions, 4.11% of total activity.
-----------------------------------------------------
Wait class "Scheduler" was consuming significant database time.

   No recommendations are available.


Finding 5: Commits and Rollbacks
Impact is 0 active sessions, 3.72% of total activity.
-----------------------------------------------------
Waits on event "log file sync" while performing COMMIT and ROLLBACK operations
were consuming significant database time.

   Recommendation 1: Host Configuration
   Estimated benefit is 0 active sessions, 3.72% of total activity.
   ----------------------------------------------------------------
   Action
      Investigate the possibility of improving the performance of I/O to the
      online redo log files.

   Rationale
      The average size of writes to the online redo log files was 5 K and the
      average time per write was 3 milliseconds.
   Rationale
      The total I/O throughput on redo log files was 6.9 K per second for
      reads and 7.2 K per second for writes.
   Rationale
      The redo log I/O throughput was divided as follows: 0% by RMAN and
      recovery, 50% by Log Writer, 0% by Archiver, 0% by Streams AQ and 49% by
      all other activity.

   Symptoms That Led to the Finding:
   ---------------------------------
      Wait class "Commit" was consuming significant database time.
      Impact is 0 active sessions, 3.72% of total activity.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

          Additional Information
          ----------------------

Miscellaneous Information
-------------------------
Wait class "Application" was not consuming significant database time.
Wait class "Concurrency" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
CPU was not a bottleneck for the instance.
Wait class "Network" was not consuming significant database time.

The database's maintenance windows were active during 100% of the analysis
period.


找到了有问题的SQL后我们就可以用 Oracle SQL Tuning Advisor 工具来优化该SQL...

 


/* 关于DBIO_EXPECTED 读取一个数据块所花费的平均时间(以微秒为单位)
   vmware 乱搞了一通  */

/* test 1 太空闲 iostat */
[root@localhost ~]# iostat -x -k
Linux 2.6.32-358.el6.i686 (localhost.myrh6)     11/24/2013      _i686_  (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.81    0.00    1.48    3.12    0.00   91.59

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               1.00    11.52    3.79    3.61   104.22    60.50    44.51     0.14   18.77   6.21   4.60
sdb               0.00     1.07    0.03    0.06     0.35     4.52   110.25     0.00   48.96   8.73   0.08

/* 只开了一个会话,运行一个大查询...,参考性结果 */

rkB/s = 104.22   blksize = 8k
select 1000*1000/(104.22/8) from dual;

76760.6985223566

/* test 2 非随机 dd */
[root@localhost ~]# /usr/bin/time dd if=/opt/oracle/oradata/myorcl11/system01.dbf of=/dev/null bs=8k       
98561+0 records in
98561+0 records out
807411712 bytes (807 MB) copied, 30.2891 s, 26.7 MB/s
0.00user 2.23system 0:30.33elapsed 7%CPU (0avgtext+0avgdata 3072maxresident)k
1570632inputs+0outputs (0major+228minor)pagefaults 0swaps

[root@localhost ~]# echo $((807411712/1024/8))
98561 -- blocks
[root@localhost ~]# echo $((302891 * 100 / 98561)) --echo 不能算小数,30.2891*1000*1000懒得切计算器..
307

/* test 3 hdparm */
[root@localhost ~]# hdparm -T /dev/sda  --内存缓存速度

/dev/sda:
 Timing cached reads:   7904 MB in  2.00 seconds = 3953.71 MB/sec

[root@localhost ~]# hdparm -t /dev/sda  --真正磁盘速度

/dev/sda:
 Timing buffered disk reads:  110 MB in  3.07 seconds =  35.86 MB/sec

select 1000*1000 /(35.86*1024/8) from dual;

217.86

/* 多次取平均,等到空了再看下...*/


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/28859270/viewspace-777443/,如需转载,请注明出处,否则将追究法律责任。

下一篇: Observing Transaction
请登录后发表评论 登录
全部评论

注册时间:2013-06-30

  • 博文量
    31
  • 访问量
    142548