ITPub博客

首页 > Linux操作系统 > Linux操作系统 > buffer busy wait 等待事件说明

buffer busy wait 等待事件说明

原创 Linux操作系统 作者:听海★蓝心梦 时间:2009-03-16 17:45:34 0 删除 编辑

有关Buffer Busy Wait的处理方案

http://www.matrix.org.cn/thread. ... 6689&forumId=36

buffer busy waits常常是由于很频繁的insert ,需要重建,或者没有充足的回滚段引起的

发生条件:
block正被读入缓冲区或者缓冲区正被其他session使用, 当缓冲区以一种非共享方式或者如正在被读入到缓冲时,
就会出现该等待.该值不应该大于1%

解决办法:
出现此情况通常可能通过几种方式调整:增大data  buffer,增加freelist,减小pctused,增加回滚段数目,
增大initrans,考虑使用LMT, 确认是不是由于热点块造成(如果是可以用反转索引,或者用更小块大小)


P1 = file# (Absolute File# in Oracle8 onwards)
P2 = block#
P3 = id (Reason Code)


原因代码:
A block is being read
=====  
100       We want to NEW the block but the block is currently being read by another session (most likely for undo).  
200       We want to NEW the block but someone else has is using the current copy so we have to wait for them to finish.  
230       Trying to get a buffer in CR/CRX mode , but a modification has started on the buffer that has not yet been completed.  
          -  A modification is happening on a SCUR or XCUR buffer, but has not yet completed  
          (dup.) 231  CR/CRX scan found the CURRENT block, but a modification has started on the buffer that has not yet been completed.  
130       Block is being read by another session and no other suitable block image was found, so we wait until the read is completed. This may also occur after a buffer cache assumed deadlock. The kernel can't get a buffer in a certain amount of time and assumes a deadlock. Therefor it will read the CR version of the block.  
110       We want the CURRENT block either shared or exclusive but the Block is being read into cache by another session, so we have to wait until their read() is completed.  
          (duplicate)  120  We want to get the block in current mode but someone else is currently reading it into the cache. Wait for them to complete the read. This occurs during buffer lookup.  
210       The session wants the block in SCUR or XCUR mode. If this is a buffer exchange or the session is in discrete TX mode, the session waits for the first time and the second time escalates the block as a deadlock and so does not show up as waiting very long. In this case the statistic: "exchange deadlocks" is incremented and we yield the CPU for the "buffer deadlock" wait event.  
          (duplicate)  220  During buffer lookup for a CURRENT copy of a buffer we have found the buffer but someone holds it in an incompatible mode so we have to wait.  


1.
SELECT kcbwhdes, why0+why1+why2 "Gets", "OTHER_WAIT"
    FROM x$kcbsw s, x$kcbwh w
   WHERE s.indx=w.indx
     and s."OTHER_WAIT">0
   ORDER BY 3
  ;


2. SELECT count, file#, name
    FROM x$kcbfwait, v$datafile
   WHERE indx + 1 = file#
   ORDER BY count
   
3.SELECT distinct owner, segment_name, segment_type
    FROM dba_extents
   WHERE file_id= &FILE_ID
  ;

4.SELECT p1 "File", p2 "Block", p3 "Reason"
    FROM v$session_wait
   WHERE event='buffer busy waits'
  ;
  
  
相关解决办法详解
================
This document discusses a rare and difficult to diagnose database performance  
problem characterized by extremely high buffer busy waits that occur at  
seemingly random times.  The problem persists even after traditional buffer  
busy wait tuning practices are followed (typically, increasing the number of  
freelists for an object).   
  
SCOPE & APPLICATION
-------------------

This document is intended for support analysts and customers.  It applies to  
both Unix and Windows-based systems, although the examples here will be  
particular to a Unix-based (Solaris) system.

In addition to addressing a specific buffer busy wait performance problem,  
in section II, this document presents various techniques to diagnose and  
resolve this problem by using detailed data from a real-world example.  The  
techniques illustrated here may be used to diagnose other I/O and performance  
problems.



RESOLVING INTENSE AND "RANDOM" BUFFER BUSY WAIT PERFORMANCE PROBLEMS
--------------------------------------------------------------------

This document is composed of two sections; a summary section that broadly  
discusses the problem and its resolution, and a detailed diagnostics section  
that shows how to collect and analyze various database and operating system  
diagnostics related to this problem.  The detailed diagnostics section is  
provided to help educate the reader with techniques that may be useful in
other situations.


I.  Summary
~~~~~~~~~~~

1.  Problem Description
~~~~~~~~~~~~~~~~~~~~~~~

At seemingly random times without regard to overall load on the database,  
the following symptoms may be witnessed:

-        Slow response times on an instance-wide level
-        long wait times for "buffer busy waits" in Bstat/Estat or Statpack reports
-        large numbers of sessions waiting on buffer busy waits for a group of  
        objects (identified in v$session_wait)
         
Some tuning effort may have been spent in identifying the segments  
involved in the buffer busy waits and rebuilding those segments with a higher  
number of freelists or freelist groups (from 8.1.6 on one can dynamically add  
process freelists; segments only need to be rebuilt if changing freelist  
groups).  Even after adding freelists, the problem continues and is not  
diminished in any way (although regular, concurrency-based buffer busy waits  
may be reduced).
2.  Problem Diagnosis
~~~~~~~~~~~~~~~~~~~~~
         
     The problem may be diagnosed by observing the following:
         
        - The alert.log file shows many occurrences of ORA-600, ORA-7445 and  
          core dumps during or just before the time of the problem.
         
         
         
        - The core_dump_dest directory contains large core dumps during the  
          time of the problem. There may either be many core dumps or a few  
          very large core dumps (100s of MB per core file), depending on the  
          size of the SGA.
         
          查看cdump下的文件及大小
         
        - sar -d shows devices that are completely saturated and have high  
          request queues and service times.  These devices and/or their  
          controllers are part of logical volumes used for database files.
          磁盘使用情况
         
         
        - Buffer busy waits, write complete waits, db file parallel writes and  
          enqueue waits are high (in top 5 waits, usually in that order).   
          Note that in environments using Oracle Advanced Replication, the  
          buffer busy waits may at times appear on segments related to  
          replication (DEF$_AQCALL table, TRANORDER index, etc...).
         
         
3.  Problem Resolution
~~~~~~~~~~~~~~~~~~~~~~
         
The cause for the buffer busy waits and other related waits might be a  
saturated disk controller or subsystem impacting the database's ability to read
or write blocks.  The disk/controller may be saturated because of the many  
core dumps occurring simultaneously requiring hundreds of megabytes each.  If  
the alert.log or core_dump_dest directory has no evidence of core dumps, then  
the source of the I/O saturation must be found.  It may be due to non-database  
processes, another database sharing the same filesystems, or a poorly tuned  
I/O subsystem.
         
        The solution is as follows:

                1) Find the root cause for the I/O saturation (core dumps,  
                   another process or database, or poorly performing I/O  
                   subsystem) and resolve it.
        OR,  
                2) If evidence of core dumps are found:
                        -  Find the causes for the core dumps and resolve  
                           them (patch, etc)
                        -  Move the core_dump_dest location to a filesystem  
                           not shared with database files.
                        -  Use the following init.ora parameters to reduce  
                           or avoid the core dumps:
                                shadow_core_dump = partial
                                background_core_dump = partial
                        These core dump parameters can also be set to "none"  
                        but this is not recommended unless the causes for the  
                        core dumps have been identified.

  
B. SAR Diagnostics
~~~~~~~~~~~~~~~~~~

SAR, IOSTAT, or similar tools are critical to diagnosing this problem because  
they show the health of the I/O system during the time of the problem.  The  
SAR data for the example we are looking at is shown below (shown  
using "sar -d -f /var/adm/sa/sa16"):

SunOS prod1 5.6 Generic_105181-23 sun4u    05/16/01

01:00:00 device        %busy   avque   r+w/s  blks/s  avwait  avserv

         sd22            100    72.4    2100    2971     0.0    87.0
         sd22,c            0     0.0       0       0     0.0     0.0
         sd22,d            0     0.0       0       0     0.0     0.0
         sd22,e          100    72.4    2100    2971     0.0    87.0
                                 /\
                                 ||
                extremely high queue values (usually less than 2 during peak)

By mapping the sd22 device back to the device number (c3t8d0) and then back to  
the logical volume through to the filesystem (using "df" and Veritas'  
utility /usr/sbin/vxprint), it was determined the filesystem shared the same  
controller (c3) as several database files (among them were the datafiles for  
the SYSTEM tablespace).   

By looking within the filesystems using the aforementioned controller (c3),  
several very large (1.8 GB) core dumps were found in the core_dump_dest  
directory, produced around the time of the problem.


The following lists some key statistics to look at:

Statistic                          Total   per Second    per Trans
----------------------- ---------------- ------------ ------------
consistent changes                43,523         12.1          2.4     Much
free buffer inspected              6,087          1.7          0.3 <== higher
free buffer requested            416,010        115.6         23.1     than
logons cumulative                 15,718          4.4          0.9     normal
physical writes                   24,757          6.9          1.4
write requests                       634          0.2          0.0


iii.  Tablespace I/O Summary   

The average wait times for tablespaces will be dramatically higher.

Tablespace IO Summary for DB: PROD  Instance: PROD  Snaps:    3578 -   3579  
                                                                              
                        Avg Read                  Total Avg Wait
Tablespace        Reads   (ms)        Writes      Waits   (ms)   
----------- ----------- -------- ----------- ---------- --------
BOM            482,368      7.0      18,865      3,161    205.9    very
CONF           157,288      0.6         420        779   9897.3 <= high
CTXINDEX        36,628      0.5           7          4     12.5    very
RBS                613    605.7      23,398      8,253   7694.6 <= high
SYSTEM          18,360      3.6         286         78    745.5
DB_LOW_DATA     16,560      2.6       1,335         14     24.3

比如是由于热块造成的,可以使用修改pctfree到一个大的值,利用空间来提高性能。


================
统计某个区域的等待事件信息
================
CREATE TABLE sinoview.previous_events AS
SELECT SYSDATE timestamp, v$system_event.*
FROM   v$system_event;
EXECUTE dbms_lock.sleep (30);
SELECT   A.event,
         A.total_waits
         - NVL (B.total_waits, 0) total_waits,
         A.time_waited
         - NVL (B.time_waited, 0) time_waited
FROM     v$system_event A, previous_events B
WHERE    A.event NOT IN ('client message',
                         'dispatcher timer',
                         'gcs for action',
                         'gcs remote message',
                         'ges remote message',
                         'i/o slave wait',
                         'jobq slave wait',
                         'lock manager wait for remote message',
                         'null event',
                         'parallel query dequeue',
                         'pipe get',
                         'PL/SQL lock timer',
                         'pmon timer',
                         'PX Deq Credit: need buffer',
                         'PX Deq Credit: send blkd',
                         'PX Deq: Execute Reply',
                         'PX Deq: Execution Msg',
                         'PX Deq: Signal ACK',
                         'PX Deq: Table Q Normal',
                         'PX Deque wait',
                         'PX Idle Wait',
                         'queue messages',
                         'rdbms ipc message',
                         'slave wait',
                         'smon timer',
                         'SQL*Net message from client',
                         'SQL*Net message to client',
                         'SQL*Net more data from client',
                         'virtual circuit status',
                         'wakeup time manager' )
AND      B.event (+) = A.event
ORDER BY time_waited;

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/751371/viewspace-567557/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2009-02-18

  • 博文量
    256
  • 访问量
    1195325