In this Document
|What is needed to diagnose "Database Hang" issues?|
|A. Dumps and Traces|
|Hanganalyze and Systemstate Dumps|
|Collecting Hanganalyze and Systemstate Dumps|
|Logging in to the system|
|Collection commands for Hanganalyze and Systemstate: Non-RAC:|
|Collection commands for Hanganalyze and Systemstate: RAC|
|Collection commands for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088|
|Collection commands for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088|
|Explanation of Hanganalyze and Systemstate Levels|
|B. Provide AWR/Statspack snapshots of General database performance|
|C. Gather an up to date RDA|
|Proactive Methods to gather information on a Hanging System|
|Oracle Enterprise Manager 12c Real-Time ADDM|
|Retroactive Information Collection|
When a database appears to be hung, it is useful to collect information from the database in order to determine the root cause of the hang. The root cause of the hang can often be isolated and solved using the diagnostic information gathered. Alternatively, if this is not possible, we can use the information obtained in order to help eliminate future occurences
Database hangs are characterised by a number of processes waiting for some other activities to complete. Typically there is one or more blockers that are stuck or perhaps working hard and not freeing resources quickly enough. In order to diagnose this the following diagnostics are needed:
A. Hanganalyze and Systemstate Dumps
B. AWR/Statspack snapshots of General database performance
C. Up to date RDA
Please refer to the relevant sections below for more details on how to collect these.
Hanganalyze and Systemstate dumps provide information on the processes in the database at a specific point in time. Hanganalyze provides information on all processes involved in the hang chain, whereas systemstate provides information on all processes in the database. When looking at a potential hang situation, you need to determine whether a process is stuck or moving slowly. By collecting these dumps at 2 consecutive intervals this can be established. If a process is stuck, these traces also provide the information to start further diagnosis and possibly help to provide the solution.
Using SQL*Plus connect as SYSDBA using the following command:
If there are problems making this connection then in 10gR2 and above, the sqlplus "preliminary connection" can be used :
For more about connecting with a preliminary connection, see:
Collection commands for Hanganalyze and Systemstate: Non-RAC:
Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended, where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".
There are 2 bugs affecting RAC that without the relevant patches being applied on your system, make using level 266 or 267 very costly. Therefore without these fixes in place it highly unadvisable to use these level
For information on these patches see:
In RAC environment, a dump will be created for all RAC instances in the DIAG trace file for each instance.Explanation of Hanganalyze and Systemstate Levels
If connection to the system is not possible in any form, then please refer to the following article which describes how to collect systemstates in that situation:
On RAC Systems, hanganalyze, systemstates and some other RAC information can be collected using the 'racdiag.sql' script, see:
Starting from 11g release 1, the dia0 background processes starts collecting hanganalyze information and stores this in memory in the "hang analysis cache". It does this every 3 seconds for local hanganalyze information and every 10 seconds for global (RAC) hanganalyze information. This information can provide a quick view of hang chains occurring at the time of a hang being experienced.
For more information see:
Hangs are a visible effect of a number of potential causes, this can range from a single process issue to something brought on by a global problem.
Collecting information about the general performance of the database in the build up to, during and after the problem is of primary importance since these snapshots can help to determine the nature of the load on the database at these times and can provide vital diagnostic information. This may prove invaluable in identifying the area of the problem and ultimately resolving the issue.
To do this, please take and upload snapshot reports of database performance (AWR (or statspack) reports) immediately before, during and after the hang..
Please refer to the following article for details of what to collect:
An up to date current RDA provides a lot of additional information about the configuration of the database and performance metrics and can be examined to spot background issues that may impact performance.
See the following note on My Oracle Support:
On some systems a hang can occur when the DBA is not available to run diagnostics or at times it may be too late to collect the relevant diagnostics. In these cases, the following methods may be used to gather diagnostics:
Real-Time ADDM is a feature of Oracle Enterprise Manager Cloud Control 12c that allows you to analyze database performance automatically when you cannot logon to the database because it is hung or performing very slowly due to a performance issue. It analyzes current performance when database is hanging or running slow and reports sources of severe contention.
For more information see the following video:
Sometimes we may only notice a hang after it has occurred. In this case the following information may help with Root Cause Analysis:
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/17252115/viewspace-1134066/，如需转载，请注明出处，否则将追究法律责任。