ITPub博客

首页 > 数据库 > Oracle > Metlink:Truncates Taking Too Long...

Metlink:Truncates Taking Too Long...

Oracle 作者:yyp2009 时间:2014-03-07 16:59:35 0 删除 编辑

Applies to:

Oracle Database - Enterprise Edition - Version 9.2.0.4 and later
Information  in this document applies to any platform.
***Checked for relevance on 03-Nov-2011***


Symptoms

Concurrent truncate on small tables are taking a very long time to be completed. When the truncate is executed individually, they are completed fast. The problem is manifested as long waits for "local write time" but the number of blocks written is really small:

10046 TRACE FILE
    ---------------------------
    TRUNCATE TABLE SYSADM.PS_JCF_BAL_TAO4^M
   
   
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        0      0.00       0.00          0          0          0           0
    Execute      1      2.15     968.81         39        515         57           0
    Fetch        0      0.00       0.00          0          0          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        1      2.15     968.81         39        515         57           0
   
   
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      local write wait                              367        4.52        371.24
      db file sequential read                        11        0.02          0.17
      enqueue                                        55        3.07        154.79
      rdbms ipc reply                                98        2.09        185.63
      SQL*Net message to client                       1        0.00          0.00
      SQL*Net message from client                     1        0.00          0.00

   The trace files shows that basically the Avg. Wait Time for "enqueue -> CI" and "local write time" is sky
    rocketing when the truncates are taking place.
   
    The Avg. Wait time for local write time went from 3ms to 1002ms (that is huge),
    and also there most of the waits timed out (meaning that it had to wait very long).
   
    The same happened for the enqueue, it went from 15ms to 2800ms with many timeouts, even when there
    wre not too many requests it waited a lot. Also notice that the number of physical reads and
    writes did not increment dramatically to justify just an I/O issue caused by additional workload.

Cause

Processes that involve temporary tables being truncated and repopulated in multiple, concurrent batch streams may present this situation.
The underlying problem is we have to write the object's dirty buffers to disk prior to actually truncating or dropping the object. This ensures instance recoverability and avoids a stuck recovery. It seems at first glance perfectly reasonable to simply truncate a temporary table, then repopulate for another usage. And then to do the temporary poplulate/truncate
operations in concurrent batches to increase throughput. However, in reality the concurrent truncates get bogged down as dbwr gets busy flushing those dirty block buffers from the buffer cache. You will see huge CI enqueue
waits. The multiple truncate operations in concurrent streams absolutely kill throughput.This is specially critical with large buffers.

There was also a disscussion in Bug: 4147840 (non-publish) where a peoplesoft process was causing this behaviour
because of the above explanation and they seemed to fix it by changing some peoplesoft code
to implement delete rather than truncate on samll temporary tables.

Solution

This is a recommendation directly from the PS/Oracle 9i tuning guide.
   
    Create the temporary tables on a tablespace that has different Oracle block
    size than the rest of the tablespaces. By doing this, the temporary tables will
    be placed on a different buffer pool, which will improve truncate time. Having
    a separate buffer pool for temporary tables will also reduce RO enqueue
    contention when multiple AE jobs are running in parallel and truncating temp
    tables.

As explained before the largest amount of time is due to
the scanning of the whole buffer cache:

"In 9i, the foreground process first acquires the RO enqueue in exclusive mode s
o that an object can be flushed out of buffer cache. Then CI enqueue is held so
that cross instance calls (CIC) can be issued to background processes. The CKPT
process executes the CIC by scanning the whole buffer cache for the candidate bl
ocks and moves the dirty blocks to a special list so that the DBWR processes can
write them out. This CIC completes after all the blocks have been either writte
n out or invalidated. The RO enqueue is then released by the foreground so that
another session can proceed with its drop or truncate operation.

There are two drawbacks with this process in 9i. First, the CICs are serialized
on the RO enqueue. So concurrent drops or truncates are executed one after
another. Second, the whole cache is scanned to find the candidate blocks in the
cache. This is very expensive with large cache size."

So, if the blocks to be scanned are in a separate buffer (which we can assume it will be fairly
small because it will only hold small temporary table) it is likely that it will take a short
period of time.

In 9.2.0.5 and higher, it may also help to make sure a "temp" table
that is frequently truncated have storage defined so that it occupies
one extent. But this workaround is only available as long as the extent
is no more than 50% the size of the buffer cache. In non-RAC environments
the table still has to be smaller than 50% of the buffer cache, but it
allows the table to have up to 5 extents before falling back to the old
algorithm.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/13750068/viewspace-1103233/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2008-10-17

  • 博文量
    330
  • 访问量
    1030063