ITPub博客

首页 > 数据库 > Oracle > enq: KO - fast object checkpoint

enq: KO - fast object checkpoint

Oracle 作者:deitylee 时间:2013-12-12 15:53:40 0 删除 编辑


测试库在大批量更新某个表的数据,总共有2亿条,每次更新20万并sleep 1秒;
此时开发发邮件说对此表执行查询一直没有返回结果;登录该server进行查看;
首先尝试运行查询语句
select * from justin t  where rownum < 10,果然一直没有返回结果,貌似hang住了;
查看此时的数据库等待事件,该sql正在等待enq: KO - fast object checkpoint;
google一把,tanel poder对此有很详细的解释
https://forums.oracle.com/forums/thread.jspa?threadID=855718
You see this event because object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is how much of your sessions response time is spent waiting for that event - and whether the winning in query speed due direct read/parallel execution outweighs the time spent waiting for this checkpoint.
Waiting for KO enqueue - fast object checkpoint means that your session has sent CKPT process a message with instruction to do object level checkpoint and is waiting for a reply.
CKPT in turn asks DBWR process to perform the checkpoint and may wait for response.
So you should check what the DBWR and CKPT processes are doing - whether they're stuck waiting for something (such controlfile enqueue) or doing IO or completely idle.
This you can do using v$session_wait or v$session_event...
You can run this:
select sid,program from v$session where program like '%CKPT%' or program like '%DBW%';
and then run
select sid,event,state,p1,p2,p3,seconds_in_wait from v$session_wait where sid in (select sid from v$session where program like '%CKPT%' or program like '%DBW%');
Please post the output here - if the CKPT or DBWR processes are hung waiting for something, these queries would show it.
If they're not completely hung it may be that they're just very slow (due bad disk write performance for example) or a rare chance that the CKPT has somehow missed the message your session sent (or vice versa). But that would be rare...
根据提供的sql查看现在的数据库,
SQL> select sid,program from v$session where program like '%CKPT%' or program like '%DBW%';

       SID PROGRAM
---------- ------------------------------------------------
       253 oracle@localhost.localdomain (CKPT)
      2521 oracle@localhost.localdomain (DBW0)
      2773 oracle@localhost.localdomain (DBW1)

SQL> select sid,event,state,p1,p2,p3,seconds_in_wait from v$session_wait where sid in (select sid from v$session where program like '%CKPT%' or program like '%DBW%');

       SID EVENT                          STATE              P1         P2         P3 SECONDS_IN_WAIT
---------- ------------------------------ ---------- ---------- ---------- ---------- ---------------
       253 rdbms ipc message              WAITING           300          0          0               3
      2521 db file async I/O submit       WAITING           967          0          0               1
      2773 db file async I/O submit       WAITING          1357          0          0               2
ckpt在等待rdbms ipc message,而两个dbwr都在等待db file async I/O submit;
查看metalink,1274737.1对db file async I/O submit有详细描述,
According to the Oracle documentation, the asynch is disabled for normal file systems if FILESYSTEMIO_OPTIONS=NONE. But, when setting DISK_ASYNCH_IO to TRUE, the wait event list shows 'db file async I/O submit'.
According to unpublished the Bug 9649885 DB FILE ASYNC I/O SUBMIT EVENT NOT TRACKED WHEN DISK_ASYCH_IO = TRUE, when DISK_ASYNCH_IO=TRUE,
the wait event 'db file async I/O submit' is posted even if the IO calls cannot be performed asynchronously and this is the current behavior.
'db file async I/O submit' should be treated as 'db file parallel write' in previous releases. Changing the DISK_ASYCH_IO to FALSE removes the wait event 'db file async I/O submit'.
查看现有参数设置
SQL> show parameter filesystemio_options

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      NONE
SQL> show parameter disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string
asm_diskstring                       string
disk_asynch_io                       boolean     TRUE
符合文档描述的情形(稍后将disk_asynch_io设置为false并重启,dbwr进程的等待事件改为了db file parallel write),不过这都是现象而不是原因;
查看os性能指标
开启vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  4 519316 101216  40428 31199052    0    0  9920   688 2292 3954  1  0 86 13  0
0  3 519316  93904  40428 31205652    0    0  3880 13856 2958 4425  0  0 92  8  0
0  4 519316  93832  40424 31205632    0    0  5268  8372 2485 3691  1  1 88 10  0
0  4 519316  93888  40436 31205620    0    0  9268   304 2168 3587  1  0 90  9  0
0  4 519316  93944  40456 31205584    0    0  9408   452 2217 3763  1  0 88 11  0
0  4 519316  94880  40448 31204720    0    0  4160  5656 1818 2442  1  0 90  9  0
发现bi/bo比较高,另外b和wa两列均有等待,说明目前系统io存在一定问题(--b 等待队列中的进程数(等待I/O),通常情况下是接近0的,--wa 等待i/o的时间)
查看I/O情况
[root@localhost oracle]# iostat -d -x -k 1 5
Linux 2.6.18-128.el5 (localhost.localdomain)    08/31/2011

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   331.68 581.19 43.56  4752.48  1508.91    20.04     3.48    5.71   1.59  99.11
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00   331.68 581.19 43.56  4752.48  1508.91    20.04     3.48    5.71   1.59  99.11

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    83.00 376.00 33.00  3144.00   432.00    17.49     2.97    6.95   2.44 100.00
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00    83.00 376.00 33.00  3144.00   432.00    17.49     2.97    6.95   2.44 100.00
--%util: 一秒中有百分之多少的时间用于 I/O 操作,如果 %util 接近 100%,说明产生的I/O请求太多,I/O系统已经满负荷,该磁盘可能存在瓶颈。

另外,查看数据库的redo log情况,该库没有开启归档模式,状态除了current的全部为active;
SQL> select * from V$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS     ARC     STATUS           FIRST_CHANGE# FIRST_TIME      NEXT_CHANGE#  NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------------- ------------ ---------------
         1          1                 3501         524288000        512          1             NO    ACTIVE               145309090       31-AUG-11          145345927      31-AUG-11
         2          1                 3502         524288000        512          1             NO    CURRENT             145345927       31-AUG-11         2.8147E+14
         3          1                 3496         524288000        512          1             NO    ACTIVE               145154118       31-AUG-11          145187494      31-AUG-11
         4          1                 3497         524288000        512          1             NO    ACTIVE               145187494       31-AUG-11          145210653      31-AUG-11
         5          1                 3498         524288000        512          1             NO    ACTIVE               145210653       31-AUG-11          145237393      31-AUG-11
         6          1                 3499         524288000        512          1             NO    ACTIVE               145237393       31-AUG-11          145271423      31-AUG-11
         7          1                 3500         524288000        512          1             NO    ACTIVE               145271423       31-AUG-11          145309090      31-AUG-11

至此有了个大致的结论:
对表justin进行DML操作导致磁盘I/O压力增加,此时再对表进行select * from justin之类的全表扫描查询(select count(*) from justin则不会,因为走的是索引扫描),导致所谓的object level checkpoint,
需将buffer cache中的脏数据写入磁盘,此过程需要很长的等待时间,于是查询就被阻塞了,通过hanganalyze的跟踪文件也可以验证这一点;

解决方案:
磁盘优化比较麻烦,而且这是测试库;
增大db_writer_process参数,由2调大为8,以增强数据写入磁盘的能力;
增大log_buffer,由20多M调大为40M,同时增加3组logfile,避免因redo造成I/O进一步恶化;
调整后重启数据库,再次进行测试,发现查询时候依旧进行enq: KO - fast object checkpoint等待,但是等待时间已经大大的缩短;而磁盘iostat的结果显示%util平均值为30-40左右;
并且设置fAST_START_MTTR_TARGET 一开始没有设置,同调整其他参数一起,后来将其设置为50,以期加快增量检查点的进度




此文整理myownstars的针对enq: KO - fast object checkpoint的优化
http://www.itpub.net/thread-1478846-1-1.html

针对enq: KO - fast object checkpoint的优化

针对enq: KO - fast object checkpoint的优化

针对enq: KO - fast object checkpoint的优


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16381228/viewspace-1063083/,如需转载,请注明出处,否则将追究法律责任。

上一篇: buffer busy waits
请登录后发表评论 登录
全部评论

注册时间:2011-10-27

  • 博文量
    74
  • 访问量
    364913