ITPub博客

首页 > 数据库 > Oracle > Metlink:Troubleshooting:WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!

Metlink:Troubleshooting:WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!

原创 Oracle 作者:yyp2009 时间:2014-01-24 10:52:19 0 删除 编辑

Applies to:

Oracle Database - Personal Edition - Version 8.0.6.0 and later
Oracle Database - Standard Edition - Version 8.0.6.0 and later
Oracle Database - Enterprise Edition - Version 8.0.6.0 and later
Information in this document applies to any platform.

Purpose

The purpose of the note is to help troubleshoot reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "

Troubleshooting Steps

What is a Row Cache Enqueue Lock?

The Row Cache or Data Dictionary Cache is a memory area in the shared pool that holds data dictionary information. Row cache holds data as rows instead of buffers.A Row cache enqueue lock is a lock on the data dictionary rows. The enqueue will be on a specific data dictionary object. This is called the enqueue type and can be found in the v$rowcache view

For a list of row cache types per version see:

Document 468334.1  How to Match a Row Cache Object Child Latch to its Row Cache

What is the meaning of the warning WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK?

This wait event is used when we are trying to acquire a lock on a rowcache entry.
When Row cache contention occurs, if the enqueue cannot be obtained within a certain predetermined time period, a trace file will be generated in the user_dump_dest or background_dump_dest depending on whether a user or background process created the trace file. The alert log is usually updated accordingly with the warning and the location of the trace file.
The database detects that a key resource is being held for too long and flags this up to the administrator so that this situation can be resolved.This may also be accompanied by database hang or slowdown.
The message in the alert.log and the trace file generated will tend to contain  the message:

> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<

If the rowcache entry lock cannot be granted immediately then we enter a loop where we free the row cache objects latch, wait (using the above event), re-acquire the latch and then again  try to acquire the rowcache lock. In single-instance mode, this is repeated for up to 1000 times before the process reports WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK message. Under RAC  it repeats until we fail to acquire the instance lock or we are interrupted.

The systemstate dumped can provide some useful information for diagnosing the cause of the contention.

Note: The "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message is raised when a threshold is reached and so if that threshold is NOT reached it will not be raised. This means that less severe issues, with identical causes, may not output the message.

 

Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"

SGA Shrink/Resize Operations

When the SGA is dynamically resized, varous latches need to be held to prevent changes from being made while the operation completes. If the resize takes a while or is happening frequently you can see "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" occurring. The key identifiers for this is high waits for 'SGA: allocation forcing component growth' or similar waits at the top of waits in AWR and the blocking session for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" waiting for waiting for 'SGA: allocation forcing component growth' (or similar). There are a couple of fixes available, see:

Document 7189722.8 Bug 7189722 - Frequent grow/shrink SGA resize operations
Document 9267837.8 Bug 9267837 - Auto-SGA policy may see larger resizes than needed

Issues by Row Cache Enqueue Type

For each enqueue type, there are a limited number of operations that require each enqueue. The enqueue type therefore may give an indication as the type of operation that may be causing the issue. As such some common reasons are outlined below:

DC_TABLESPACES

Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.

DC_SEQUENCES

Check for appropriate caching of sequences for the application requirements.

Document 853652.1 RAC and Sequences
Document 395314.1 RAC Hangs due to small cache size on SYS.AUDSES$ - fixed in 10.2.0.3
Document 6027068.8 Bug 6027068 - Contention on ORA_TQ_BASE sequence -fixed in 10.2.0.5 and 11.2.0.1
DC_USERS

Deadlock and resulting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.

Document 4604972.8 Bug 4604972 - Deadlock on dc_users by Concurrent Grant/Revoke - fixed in 11.1.0.6
Document 6143420.8 Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users AND "CURSOR: PIN S WAIT ON X"- fixed in 10.2.0.5 and 11.1.0.6  DC_OBJECTS
Document 12772404.8 Bug 12772404 - Significant "row cache objects" latch contention when using VPD
DC_OBJECT_IDS
Document 11693365.8 Bug 11693365 - Concurrent Drop table and Select on Reference constraint table hangs(deadlock) - fixed in 12.1
DC_SEGMENTS

This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.

DC_ROLLBACK_SEGMENTS

This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.

DC_TABLE_SCNS
Document 5756769.8 Bug 5756769 - Deadlock between Create MVIEW and DML - fixed in 10.2.0.5 ,11.1.07 and 11.2.0.1
DC_AWR_CONTROL

This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these.

 RAC Specific Bugs

Document 6004916.8 Bug 6004916 - Hang involving row cache enqueues in RAC (ORA-4021) - fixed in 102.0.5 and 11.1.0.6
Document 8666117.8 Bug 8666117 - High row cache latch contention in RAC - fixed in 11.2.0.2 and 12.1
Document 9866045.8 Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits - fixed in 12.1


What information can be gathered to help identify the cause?

Systemstate dump

A systemstate dump is automatically generated by Oracle when the issue occurs and listed in the alert.log

Wed Sep 21 13:39:19 2011
> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
System State dumped to trace file /oracle/diag/rdbms/..../.trc

 

AWR, ADDM and ASH Reports

Run AWR and ASH reports for the time when the problem is reported as well as a report leading up to the problem as these can sometimes help build a picture of when a problem actually started. The AWR ,ADDM and ASH reports can compliment each other in getting a complete picture.

Depending on the interval used for generating AWR snapshot, get a report for the smallest time frame available. The default for snapshots is an hourly interval.

SQL>@$ORACLE_HOME/rdbms/admin/awrrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/addmrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql


As systemstate is complicated to analyze, it is recommended to create a Service Request and upload the alert.log, systemstate dump and AWR reports preceding and during the problem to support.


How to interpret the information obtained

Systemstate dump

Often the wait for a Row cache enqueue is the culmination of a chain of events and the lock being held is a symptom of another issue where a process holding the requested row cache enqueue is being blocked by other processes. ie it is often a symptom, not the cause.

Systemstate dumps can help to find which row cache is being requested and may help find the blocking process.

Example 1:

 

Oracle process number: 77
Unix process pid: 10846, image: oracle@cpdb4532

*** 2011-05-13 08:08:58.775
*** SERVICE NAME:(ALFCMR_SERVICE) 2011-05-13 08:08:58.775
*** SESSION ID:(1076.796) 2011-05-13 08:08:58.775
> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
row cache enqueue: session: 0x1df57ade8, mode: N, request: S


The header of the trace shows:

  • the Oracle process number (pid) of the process waiting for the row cache enqueue lock (in this case Process 77)
  • the mode row cache enqueue is being requested in (request: S)

So, in the example above Process 77 is waiting for a row cache needed in shared mode (request: S) .

Systemstate contains process state information for every process in the database so looking for this process in the systemstate:

PROCESS 77
----------------------------------------
.
.
----------------------------------------
SO: 0x1cdf11958, type: 50, owner: 0x17d198288, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1df57ade8 object=0x1dc9a5d30, request=S
savepoint=0x87b70d
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
.
.


 Above we see that PROCESS 77 is requesting row cache dc_users in shared mode.
If process 77 is waiting that implies it is being blocked by another process so we now need to search the systemstate to determine what is holding the resource and blocking the process.

It is best to search for the object reference which in this  case is object=0x1dc9a5d30.
If this is done we find that Process 218 is requesting this object in eXclusive:

PROCESS 218:
----------------------------------------
.
.
SO: 0x1cdf118f8, type: 50, owner: 0x1ceb0f178, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1da54cf68 object=0x1dc9a5d30, request=X
savepoint=0x11e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)


A request in Exclusive mode will result in any requests in shared mode to wait behind it until the process is granted the request in exclusive mode and has released it. Thus this will block. Note this is a request for an exclusive not an exclusive hold, so something must be blocking this process. Looking at the other processes, we see that Process 164 has the object held in mode=S

PROCESS 164:
----------------------------------------
.
.
O/S info: user: u1m, term: , ospid: 1234, machine: cpc44711
program:
last wait for 'SQL*Net message from client' blocking sess=0x(nil) seq=36289 wait_time=6943 seconds since wait started=2539
driver id=54435000, #bytes=1, =0
.
.
SO: 0x1cdf11418, type: 50, owner: 0x1ccc26120, flag: INIT/-/-/0x00
row cache enqueue: count=2 session=0x1df578318 object=0x1dc9a5d30, mode=S
savepoint=0xb1bd8e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
hash=fc968070 typ=11 transaction=(nil) flags=00000002
own=0x1dc9a5e00[0x1cdf11448,0x1cdf11448] wat=0x1dc9a5e10[0x1cdf11928,0x17d5192e0] mode=S


So, Process 164 is holding the row cache enqueue in Shared mode (mode=S) and thus preventing process 218 from obtaining the row cache enqueue in eXclusive. Furthermore we see that the process 164 is on CPU (the systemstate shows last wait for 'SQL*Net message from client' and not waiting for 'SQL*Net message from client'). To further diagnose support would need to check the stack calls included in the dump to determine why this session was on CPU and holding this enqueue for so long (based on 'seconds since wait started=2539').

Example 2:


In this example PROCESS 18 (MMON) was waiting for row cache type dc_awr_control in SHARED mode.

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /opt/oracle10/product/10.2.0
System name: SunOS
Node name: saecopt51
Release: 5.10
Version: Generic_144488-04
Machine: sun4v
Instance name: PORT_V16
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 6196, image: oracle@sae (MMON)
.
.

PROCESS 18:
----------------------------------------
.
.
last wait for 'ksdxexeotherwait' wait_time=0.000013 sec, seconds since wait started=6
.
.
SO: 39bf1f0e8, type: 50, owner: 3980783a0, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be37ea80 object=39a79f090, request=S
savepoint=0x41f0ae
row cache parent object: address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.


The row cache lock for this object (object=39a79f090)  is being held by PROCESS 269 in exclusive mode (mode=X). The process is waiting for 'SGA: allocation forcing component growth' .

PROCESS 269:
----------------------------------------
.
.

waiting for 'SGA: allocation forcing component growth' wait_time=0, seconds since wait started=3

.
.
SO: 39bf1f148, type: 50, owner: 3bc39f560, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be1b7c98 object=39a79f090, mode=X
savepoint=0x41efe8
row cache parent object: address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.


Thus the root cause in this case is the resizing of SGA and the wait on row cache is secondary result of this.
If we use the AWR report for the period to correlate this information:

AWR Report

Top 5 Timed Events                                         Avg %Total               
~~~~~~~~~~~~~~~~~~                                        wait   Call               
Event                                 Waits    Time (s)   (ms)   Time Wait Class    
------------------------------ ------------ ----------- ------ ------ ----------    
SGA: allocation forcing compon   42,067,317      38,469      1    7.6      Other    
CPU time                                          2,796           0.6               
db file sequential read             132,906         929      7    0.2   User I/O    
latch free                        4,282,858         704      0    0.1      Other    
log file switch (checkpoint in          904         560    620    0.1 Configurat    
          -------------------------------------------------------------       


We can see clearly in the Top 5 Timed Events that there is significant waiting for this event across the system and 'SGA: allocation forcing component growth' is a major issue at this time.
The root cause of the "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message is the resizing activity. The top 5 events do not even show waits for the 'row cache' symptom.

NOTE:
If the re-sizing activity is less severe, you may not get a "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message - primarily because is a threshold before this is raised. You may however see waits for other events. One common example is outlined in:

Document 742599.1 High 'cursor: pin S wait on X' and/or 'library cache lock' Waits Generated by Frequent Shared Pool/Buffer Cache Resize Activity

For Frequent resizing, there are a couple of potential fixes available, see:

Document 7189722.8 Bug 7189722 - Frequent grow/shrink SGA resize operations
Document 9267837.8 Bug 9267837 - Auto-SGA policy may see larger resizes than needed

Possible Issue in Pre-10g versions

Prior to 10g there was a limitation to detect deadlock at the row cache level. Possible workarounds to minimize the possibility of a deadlock occurring are:

  • set TIMED_STATISTICS=FALSE
  • set _row_cache_cursors=20 or more (10 default)
  • don't do any kind of tracing
Document 30802.1 Init.ora Parameter "ROW_CACHE_CURSORS" Reference Note

Troubleshooting Other Issues

For guidance troubleshooting other performance issues see:

Document 1377446.1 Troubleshooting Performance Issues

 

 

Bug 12772404 - Significant "row cache objects" latch contention when using VPD (Doc ID 12772404.8)

References

BUG:11693365 - GETTING ERROR 'WAITED TOOL LONG FOR ROW CACHE ENQUEUE LOCK'

BUG:5756769 - ROW CACHE DEADLOCK "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"



BUG:8666117 - LCK0 PROCESS STUCK AT WAITING FOR "LATCH: ROW CACHE OBJECTS"

NOTE:11693365.8 - Bug 11693365 - Concurrent Drop table and Select on Reference constraint table hangs (deadlock)
NOTE:1377446.1 - * Troubleshooting Performance Issues
NOTE:30802.1 - Init.ora Parameter "ROW_CACHE_CURSORS" Reference Note
NOTE:395314.1 - RAC Hangs due to small cache size on SYS.AUDSES$
NOTE:4604972.8 - Bug 4604972 - Deadlock on dc_users by Concurrent Grant/Revoke
NOTE:468334.1 - How to Match a Row Cache Object Child Latch to its Row Cache
NOTE:5756769.8 - Bug 5756769 - Deadlock between Create MVIEW and DML
NOTE:6004916.8 - Bug 6004916 - Hang involving row cache enqueues in RAC (ORA-4021)
NOTE:6027068.8 - Bug 6027068 - Contention on ORA_TQ_BASE$ sequence
NOTE:6143420.8 - Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users AND "CURSOR: PIN S WAIT ON X"
NOTE:742599.1 - High 'cursor: pin S wait on X' and/or 'library cache lock' Waits. Cause: Shared Pool/Buffer Cache Resize Activity
NOTE:853652.1 - RAC and Sequences
NOTE:8666117.8 - Bug 8666117 - High row cache latch contention in RAC
NOTE:9866045.8 - Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/13750068/viewspace-1074859/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2008-10-17

  • 博文量
    330
  • 访问量
    1028939