In this Document
Oracle Database - Personal Edition - Version 10.1.0.2 and later
Oracle Database - Enterprise Edition - Version 10.1.0.2 and later
Oracle Database - Standard Edition - Version 10.1.0.2 and later
Information in this document applies to any platform.
***Checked for relevance on 14-Nov-2013***
The diagnosis of the cause of database hangs is a complex task and can involve sifting through large volumes of trace which then needs correlating to determine a cause. Due to this, it may not be feasible to produce a resolution to these issues through a document such as this. However, this document can advise on the collection of first stage diagnostic information and other information.
This document supercedes :
Troubleshooting Oracle Database Hanging Issues for versions from 7 to 9--Exhaustive
Priority: Immediate Resolution
A database 'hang' situation is a rare occurrence and can have many diverse causes, but a true hang can be devastating to the operation of a system. The priority is usually to get the system working as quickly as possible.
Often this means killing of processes or restating of the system which has the unfortunate side effect of removing all the evidence necessary to diagnose the cause of the issue.
Note that a hang and a spin situation can often be indistinguishable in as much as both can display the same symptoms (locking up the system making it inoperable). In terms of resolution, hangs and spins tend to have different directions, but in terms of initial diagnostics, these are similar. To determine between the two types see:
Diagnostic Data Collection
If possible, collect the following during the hang situation prior to any emergency resolution action:
The basic information to diagnose a hang is :
Starting from 11g release 1, the dia0 background processes starts collecting hanganalyze information and stores this in memory in the "hang analysis cache". It does this every 3 seconds for local hanganalyze information and every 10 seconds for global (RAC) hanganalyze information. This information can provide a quick view of hang chains occurring at the time of a hang being experienced.
Hanganalyze and Systemstate dumps:
Collected automatically, manually or possibly via HANGFG -
Snapshots of General database performance:
Typically with AWR reports -
Some background system information:
Typically collected via RDA -
Guidance for General Scenarios
In general, diagnosing the cause of a hang is establishing which process or processes are holding a resource that is blocking the others. This can be established in many ways but as an example, might be a process that is completely stuck waiting for some other activity. When this activity is complete or is terminated, this frees up the waiters. In a simple example, a blocker waits for a holder and freeing the holder resolves the hang, but it is possible for a chain of holders and waiters to exist making identification of the real holder a more complex process.
In terms of interpreting the diagnostics:
Hanganalyze is probably the easiest hang diagnostic to interpret since it is designed to be a summary as opposed to exhaustive. Hanganalyze trace provides a quick pointer to the waiting and potential holder processes with 'cycles' (i.e. identification of processes that are blocking each other).
- Interpreting HANGANALYZE trace files to diagnose hanging and performance problems
Systemstate trace gives an exhaustive trace of all processes that can be explored to determine some more detail about the holder itself. Its interpretation is complex and is beyond the scope of this article, although some of the specific articles below do touch on this. Generally these should be sent to support for interpretation.
General Systemwide Reports
AWR reports provides system wide information that can be useful in determining the general area that processes are waiting on and can also give useful comparison information from the build up, and the aftermath of an issue.
RDA provides background structure information that can be useful for elimination of potential solutions once found.
Guidance for Specific Scenarios
The following articles cover diagnostics that can be used to approach specific situations that may lead to hangs:
Performance problems that are intermittent happen without warning, last for a short duration, and are extremely difficult to diagnose. In order to assist with these issues, Support have developed the following tools to assist collect information in such circumstances:
Troubleshooting Other Issues
For guidance troubleshooting other performance issues see:
- Troubleshooting 'latch: cache buffers chains' Wait Contention
- How to Use AWR Reports to Diagnose Database Performance Issues
- Automatic Workload Repository (AWR) Reports - Start Point
- * Troubleshooting Performance Issues
- Troubleshooting Database Contention With V$Wait_Chains
- Interpreting HANGANALYZE trace files to diagnose hanging and performance problems for 9i and 10g.
- OSWatcher Black Box (Includes: [Video])
- Remote Diagnostic Agent (RDA) - Getting Started
- LTOM - The On-Board Monitor User Guide
- HANGFG User Guide
- CASE STUDY: Using Real-Time Diagnostic Tools to Diagnose Intermittent Database Hangs
- How to Avoid Contention Based Hangs
- How to Collect Diagnostics for Database Hanging Issues
- Troubleshooting Oracle Database Hanging Issues for versions from 7 to 9--Exhaustive.
- No Response from the Server, Does it Hang or Spin?
- 'PMON failed to acquire latch, see PMON dump' in Alert Log - How To Diagnose
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/17252115/viewspace-1133584/，如需转载，请注明出处，否则将追究法律责任。