ITPub博客

首页 > Linux操作系统 > Linux操作系统 > Diagnosing Database Hanging Issues [ID 61552.1]

Diagnosing Database Hanging Issues [ID 61552.1]

原创 Linux操作系统 作者:lhl1212 时间:2011-02-12 15:52:13 0 删除 编辑
Diagnosing Database Hanging Issues [ID 61552.1]

  修改时间 12-OCT-2010     类型 BULLETIN     状态 PUBLISHED  

In this Document
  Purpose
  Scope and Application
  Diagnosing Database Hanging Issues
     1. Describe the problem
     2. Look for Errors
     3. Spool out relevant V$ VIEWS
     4. Gather OS-Level Info
     5. Get SYSTEMSTATE and HANGANALYZE Dumps
     7. Generate a CORE DUMP And Run a DEBUGGER.
     8. If certain processes get PROCESSSTATE dump.
     9. Check List for diagnostics.
  References


Applies to:

Oracle Server - Enterprise Edition
Oracle Server - Enterprise Edition - Version: 7.0.16.0 and later    [Release: 7.0 and later]
Information in this document applies to any platform.

Purpose

This bulletin describes the methods and tools used in diagnosing database hanging issues and performance issue).
These issues may be due to tuning problems, design problems or Oracle bugs such as latching issues.
We determine the cause by narrowing down the problem and getting as much information as possible WHILE THE DATABASE IS HANGING.

The following diagnostic steps are discussed.

1. Describe the Problems. Under all situations go through this sections and describe the problem.
2 Look for errors.
3. Do the requested queries.
4. Gather OS level data.
5. Get systemstate and hanganalyze dumps.
6. Get BSTAT-ESTAT OR STATPACK output.
7. Generate a core dump and run a debugger (contact support first).
8. If certain process get PROCESSSTATE dumps
9. Check List for diagnostics.

Note: It may not be necessary to shutdown the database to stop the hanging. But if you MUST shutdown before speaking to support, please attempt to get these diagnostics so that we can debug the problem.
Without these diagnostics it may be impossible for us to determine the cause of the problem.

Scope and Application

Intended Audience: All Database Administrators

Diagnosing Database Hanging Issues

1. Describe the problem


What releases of all involved products are you running?
- complete release like database release 8.0.5.2, etc...
Is the "hanging" a true freeze or is activity just very slow?
(Are logs switching in the alert log? Be sure to check cpu, I/O and memory utilization - see step 4).

When does the hanging start?

Does it stop? How long does it last?

Does the hanging happen suddenly or is it a gradual performance degradation, perhaps precipitated by increased activity?

How many users are connected?

Has load on system increased recently?

Check for OTRACE creating files. (see Note 45482.1)

Unix                                      NT
cd $ORACLE_HOME/otrace/admin              cd orant\otrace73\admin
ls *.dat                                  dir *.dat
Note: If .dat files are being created and OTRACE utility is not being used, disable this option by setting the init.ora parameter oracle_trace_enable = FALSE.

Are any events set in the init.ora?

Which ones are affected?

What kind of work are they doing?

How much data are we talking about?

Are you running Oracle Parallel Server (OPS)?
- Does the problem persist if you shutdown all but one instance?
- Please note: Some of the solutions discussed here are useful for OPS; many are not. Different tuning rules apply;
   for example, a "smaller" buffer cache often means better performance in OPS. Also, the most common sources 
   of hanging issues in OPS are not discussed here. These include: PCM locking problems, pinging, space   
   management problems, lack of work partitioning, internode parallel query tuning requirements, shared disk or
   virtual shared disk problems, network problems, and DLM problems.

Are you running Multi-Threaded Server (MTS)?
- Does the problem persist without it?
- How many users?
Are you using any Oracle applications or tools (financials, forms, etc.)?

Did you recently upgrade the rdbms, application or tool, operating system, memory or hardware?

Does the problem reproduce?

How often?

Can you reproduce it at will?

When does it NOT reproduce?

What EXACTLY is hanging? What is NOT? Get all the information requested for each area that applies:

(a) The entire database?

- All instances?
- All connections?
- All operations?
- All nodes?

- Can you start a session and select * from dual?

- How often are log switches occurring?

- If there are archive-related errors in the alert log (see step 2), stop reading this bulletin: Archiving problems always freeze
the database. Review the error(s) and resolve the problem.
Example: the device may be full. Another common cause is that the ARCH process is stopped while the database is in
ARCHIVELOG mode.
Check this by connecting internal and typing ARCHIVE LOG LIST. If ARCHIVELOG mode and automatic archival is
DISABLED, that's the problem. Either restart the ARCH process by typing ALTER SYSTEM ARCHIVE LOG START
(or log_archive_start = true in init.ora) or shutdown, startup mount exclusive and ALTER DATABASE
NOARCHIVELOG.

- If archiving problems have been ruled out and the database is truly frozen, complete steps 1, 2, 3, 4, 5 and 6.

- If database is not really frozen but is suffering performance degradation due to increased activity, complete steps 1, 2, 3, 4 and 6. Do not forget 1(d).

(b) A specific SQL operation?
Be sure you also do step 8.
Get a tkprof output with timed_statistics and explain plan.
See Note 10585.1 QUERY AND APPLICATION TUNING USING EXPLAIN AND TKPROF UTILITY
Note 10585.1 QUERY AND APPLICATION TUNING USING EXPLAIN AND TKPROF UTILITY
Note 41634.1 TKPROF and Problem Solving
Note 1019631.6 TFTS SCRIPT. TO SIMPLIFY THE USE OF EXPLAIN PLAN

- Select statement?
- It may need tuning. (This is useful for more than sql syntax.)
- If a complex sql statement is hanging, can you break successfully?
- See 1 (c)

- Parallel query?

- See Note 240762.1 script. to show Explain Plan for Parallel Queries
- Could be space transaction contention. If there are ora-1575 errors in the alert log, try setting pct_increase = 0
    in the temporary tablespace to disable SMON from coalescing contiguous extents, thus reducing contention
    with query slaves. Also try spreading datafiles over several disks, increasing sort_area_size (but don't introduce
    paging) and possibly "reducing" degree of parallelism. Refer to additional tuning documentation.

- Update, insert or delete?
- possibly locking. Be sure to get v$lock output in step 3. There are the following scripts available to help debug locking problems.
Note 1020012.6 TFTS SCRIPT. TO RETURN MEDIUM DETAIL LOCKING INFO
Note 1020008.6 TFTS FULLY DECODED LOCKING SCRIPT
Note 1019527.6 TFTS CHECK FOR FOREIGN KEY LOCKING
Note 1020007.6 TFTS: DISPLAY LOCKS & GIVE SID AND SERIAL# TO KILL
- Do you have constraints and/or triggers on the object(s) you're updating? Could be a cascading locking problem.
Try putting index(es) on relevant foreign key columns. This modifies the locking behavior. on the parent. (See
Application Developers Guide for more info).
See Note 33453.1 REFERENTIAL INTEGRITY AND LOCKING
- also see steps 1(c) and 1(d).

- DDL?
- possibly a data dictionary related problem. See step 1(c).
- if "create index" it might be a space transaction contention issue.
- Does the operation run faster the second time around?
- Try tuning i/o. The following could be helpful: clustering, increasing db_block_size, distributing i/o (i.e. striping, separating
indexes and data, parallelizing). Also try setting pre_page_sga = true in the init.ora.
- Is the operation always slower immediately after a large update? This is delayed block cleanout.

(c) Specific object?

- Can you do ANY operations on the object?
- Try select count(*)
- If updates only are the problem, could be locking.Scripts are available to help resolve these issues see:
  Be sure to do get lock information 1(b) and 1(d).
  There are the following scripts available to help debug locking problems.
  Note 1020012.6 TFTS SCRIPT. TO RETURN MEDIUM DETAIL LOCKING INFO
  Note 1020008.6 TFTS FULLY DECODED LOCKING SCRIPT
  Note 1019527.6 TFTS CHECK FOR FOREIGN KEY LOCKING
  Note 1020007.6 TFTS: DISPLAY LOCKS & GIVE SID AND SERIAL# TO KILL

- Did you preallocate extents to the object? This will bump up the high water mark and cause full table scans to appear
to "hang". Full table scans always scan up to the high water mark, even when the table contains only a few rows. The
solution is to avoid preallocating extents except immediately before a large parallel procedural insert or conventional load.
Don't preallocate extents for a direct load.

- Try ANALYZE TABLE VALIDATE STRUCTURE CASCADE;
- any errors? Errors signify table or index corruption.Call support.

- If no errors after ANALYZE, run these queries and send the output and trace files to support:

SPOOL analyze.log;
SELECT *
FROM sys.dba dba_tables
WHERE table_name = '';
- block level space utilization (i.e. initrans, maxtrans, pctfree), a high chain count, etc., could be part of the problem.

SELECT i.*
FROM sys.index_stats i, sys.dba_indexes d
WHERE i.name = d.index_name
AND d.table_name = '';SELECT i.*
FROM sys.index_stats i, sys.dba_indexes d
WHERE i.name = d.index_name
AND d.table_name = '';

- an unwieldy index could be part of the problem. Do you do a lot of updates/deletes?

SPOOL OFF;

- Get tkprof output as in 1(b).

- If a view, do the above for the underlying tables.
- To identify the underlying tables:
SELECT text
FROM sys.dba_views
WHERE view_name = '';
- you may need to SET ARRAYSIZE 1 / SET LONG 2000 to see all the text.
- definitely get tkprof output as in 1(b).
- Data dictionary objects might also be affected by the above.

(d) Large scale update operation (i.e. perhaps using sqlloader, import or batch job)?

- Do you have index(es) on the table?

- Does the update run during periods of peak activity?

- Do you see "checkpoint not complete" messages in the alert.log?

- This signifies that the redo logs are too small; they are having trouble keeping up.

- Is the tablespace wherein the table resides in hot backup mode? (see v$backup)

- A tablespace in hot backup mode generates redo "records" rather than "vectors". During a large update this can cause
severe contention and performance degradation.

- Are any of the following init.ora parameters set to non-default values:

db_block_checkpoint_batch
db_block_checksum
db_block_lru_statistics
db_block_lru_extended_statistics
row_locking
serializable
timed_statistics

- If using sqlloader, are you loading with conventional path? Using REPLACE option? (Try using TRUNCATE instead).
Any sql functions in the control file? Have you tried tuning with READBUFFERS, BINDSIZE, ROWS, PARALLEL?

- If using import, are you using commit=y? indexes=y? constraints = y? Have you tried increasing BUFFER? Any info in
LOG?

- Try increasing the number of system buffers and tuning i/o (see 1(b)).

- If many users are active during the update, resource contention could be contributing to system slowdown.
Rollback segments, redo latches, i/o and db_block_buffers are some possible areas of contention. V$session_wait
and Bstat Estat output are good places to start researching contention issues. See section on estat bstat.

-Do steps 1(b), 1(c), 3, 4 and 5.


(e) Specific package, procedure or PRO*C application?

- What does it do?

- What SQL statement is it stuck on? see step. Can you remove the statement and get it to run successfully in sqlplus?

- Get tkprof output on the application, procedure or isolated sql statement with timed_statistics and explain plan.

- Optional: if a procedure, utilize DBMS_ALERT to pinpoint where the hanging starts.

- If a PRO*C program, and tkprof identifies "parsing" as the bottleneck of the embedded sql statement, tune this with
precompiler parameters hold_cursor and release_cursor. See bulletin on precompiler features.

- If a package, can you execute any of the individual procedures successfully?

- Is the package/procedure swapping in and out of the shared pool? Pinning might be advisable (a package would have to
be created for a standalone procedure).

SPOOL objcache.log

SELECT *
FROM v$db_object_cache
WHERE name = '';

SPOOL OFF;

- Resource contention could be contributing to system slowdown. For example, library cache contention (affected by
shared_pool_size) is possible in addition to the short list of contention areas mentioned in 1(d).
Refer to additional tuning documentation.

- Be sure to do step 3.

- What is the cpu / io utilization? (get this from v$sesstat above or see step 4)

- Is the process swapping in and out of memory? (see step 4)

- Did you change the process priority or "nice" it?

- Steps 3 and 4 recommended.

(f) remote access only?

- Can you do select * from dual@db_link?

- Can you connect to the remote machine and perform. the operation locally? Can you select * from dual locally?

- If you can do these tests and the hanging reproduces, you should answer all these questions for the REMOTE machine.

- Are you doing a distributed update?

- What is distributed_lock_timeout set to in the init.ora.

- Are you refreshing snapshots? using symmetric replication?

- Is it a remote select statement that's hanging?

- Get a tkprof output with timed_statistics and explain plan. On the explain plan, you'll see that certain statements just say
"REMOTE"; the actual statements sent across the network are recorded in the OTHERS column of the PLAN_TABLE -
spool a select of that column.
- If joining two tables on a remote node, try putting a 'joining view' on that node and then selecting against the view.
See Note 41634.1 TKPROF and Problem Solving
Note 10585.1 QUERY AND APPLICATION TUNING USING EXPLAIN AND TKPROF UTILITY

- The following may reduce any network overhead that's contributing to the problem: setting ARRAYSIZE in sql
operations, using anonymous pl/sql blocks rather than independent sql statements, using explicit rather than implicit
cursors. Refer to additional tuning documentation.

(g) Operations using 3rd party application (Power Builder, etc).

- Can you get the problem to reproduce in sqlplus? (If you can do this test but can't reproduce the problem, you need to
call the 3rd party vendor).

(h) Database shutdown/startup?

- How was it shutdown? normal? immediate? abort? did it crash?

- If startup hangs and the database went down abnormally (crash or shutdown abort)and there are no errors in the alert
log, this could be normal instance recovery, an internal matter (if errors),system problems (see system logs), inadequate
cleanup at shutdown (try shutdown abort / startup), unsupported software/ os combination or (less likely) data corruption.
Wait 30 minutes then call support if it doesn't open.

During normal instance recovery an example of the messages that will be written to the alert log are listed below:

Starting ORACLE instance (normal)
Starting up ORACLE RDBMS Version: 8.1.6.3.0.
System parameters with non-default values:
processes = 50
...
Crash recovery completed successfully
Tue May 22 16:15:26 2001
Thread 1 advanced to log sequence 7139
Thread 1 opened at log sequence 7139
Current log# 2 seq# 7139 mem# 0: /u02/oradata/V817/redo02.log
Successful open of redo thread 1.
Tue May 22 16:15:26 2001
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue May 22 16:15:51 2001
Completed: alter database open

- If shutdown normal hangs, this means Oracle is waiting for active sessions to log off.

- If shutdown immediate hangs, this could be for the same reasons as a hang at startup.

- Be sure to check for errors in step 2. Tracing the process may also be useful: see step 4. V$session_wait may be
useful: see step 3.

- Note: if using a system debugger to trace a hanging startup or shutdown operation, be sure to get the relevant svrmgr or
sqlplus "shadow" (child) pid. On unix boxes:

ps -ef | grep svrmgrl
usupport 26615 19866 ...etc... svrmgrl
ps -ef | grep 26615
usupport 52472 26615 ...etc... oracleV732 (DESCR...

Pid 52472 is what you want to trace.

2. Look for Errors


(a) Check the tail of the alert_.log (or portion of the alert corresponding to the hang). Check for ANY messages other than log switches ("advanced to log sequence...")

Note: to find the alert.log directory, check the background_dump_dest parameter in the init.ora or config.ora. Or do the following command.

$ sqlplus "/as sysdba"
show parameter dest
(b) Check the same directory for trace files generated around the time of the hang. Look in them for error messages. Don't search whole files; relevant messages appear towards the beginning.
On unix ls -ltr will show you files that have been updated lately last.

(c) If having remote access problems, check the sql*net trace directory for traces as well.

(d) Check system error logs for messages. On many unix systems:

cd /var/adm
ls -ltr
Check for messages files recently updated.

(e) Be prepared to send a portion of the alert.log at least 4 hours prior to the problem, init.ora and any traces to support.

For fastest service:

Move trace files to oracle-ftp
cd server/incoming
mkdir
cd
place traces files, alert log and any other files here

Or
- each file should go into a separate email message
- put the tar number and file description in the subject line
- don't uuencode or attach the files if you don't have to (i.e. unless they're too big) - just read them into the message as ascii.

If you can't comply with all of the above, just do your best.
Faxing is also possible but may not be practical.
At least, please have the information READY to communicate to support before you call.

3. Spool out relevant V$ VIEWS


WHILE the database is hanging, run the following queries:
SPOOL v_views.log;

SELECT *
FROM v$parameter;

SELECT class, value, name
FROM v$sysstat;

SELECT sid, id1, id2, type, lmode, request
FROM v$lock;

SELECT l.latch#, n.name, h.pid, l.gets, l.misses,
l.immediate_gets, l.immediate_misses, l.sleeps
FROM v$latchname n, v$latchholder h, v$latch l
WHERE l.latch# = n.latch#
AND l.addr = h.laddr(+);

SELECT *
FROM v$session_wait
ORDER BY sid;

/* repeat last query 3 times - we want to see who's repeatedly waiting*/

SPOOL OFF;

If a specific session hangs do these queries:

SPOOL session.log;

- Get the sid from the pid or vice versa:

SELECT s.sid, p.spid
FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND ... < p.spid = or perhaps
s.sid = >

- What SQL statement is being executed?

SELECT s.sid, s.status, q.sql_text
FROM v$session s, v$sqltext q
WHERE s.sql_hash_value = q.hash_value
AND s.sql_address = q.address
AND s.sid =
order by q.piece;

-Get output from V$SESSION_WAIT

column sid format 990
column seq# format 99990
column wait_time heading 'WTime' format 99990
column event format a30
column p1 format 9999999990
column p2 format 9999999990
column p3 format 9990
select sid,event,seq#,p1,p2,p3,wait_time from V$session_wait
where sid=
order by sid;
*** It is needed to run this statement a few times to see if the wait events actually change

- What is LOCKWAIT column set to in V$SESSION?

SELECT lockwait
FROM v$session
WHERE sid = ;

- If not null, who's holding what lock on which object?

col Username format A15
col Sid format 9990 heading SID
col Type format A4
col Lmode format 990 heading 'HELD'
col Request format 990 heading 'REQ'
col Id1 format 9999990
col Id2 format 9999990

select SN.Username, M.Sid, M.Type,
DECODE(M.Lmode, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(Lmode,'990'))) Lmode,
DECODE(M.Request, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(M.Request, '990'))) Request,
M.Id1, M.Id2 from V$SESSION SN, V$LOCK M
WHERE (SN.Sid = M.Sid and M.Request ! = 0)
or (SN.Sid = M.Sid and M.Request = 0 and Lmode != 4 and (id1, id2)
in (select S.Id1, S.Id2 from V$LOCK S where Request != 0 and S.Id1
= M.Id1 and S.Id2 = M.Id2) ) order by Id1, Id2, M.Request;

- What is LATCHWAIT column set to in V$PROCESS?

SELECT latchwait
FROM v$process
WHERE spid = ;SELECT latchwait
FROM v$process
WHERE spid = ;

- If not null, who's holding what latch?

column name format a32 heading 'LATCH NAME'
column pid heading 'HOLDER PID'
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+) and a.latch# = c.latch#
and c.name like '&latch_name%' order by a.latch#;
- Did the lock holder or latch holder shut off his/her terminal without logging off? This can leave a zombie shadow
process holding the resource.
Kill it with "alter system kill session ''".
If on sqlnet 2.1 or above, try setting sqlnet.expire_time in the sqlnet.ora on the server side. (For more info, see page A-16 of "Understanding SQL*NET" v2.1).

- If session is not frozen but slow, get session-specific statistics:

SELECT s.sid, s.value, t.name
FROM v$sesstat s, v$statname t
WHERE s.statistic# = t.statistic#
AND s.sid = ;

- If session is extremely slow or truly frozen, get session-specific wait info:

SELECT *
FROM v$session_wait
where sid = ;


- repeat above query 3 times.

SPOOL OFF;

- If this is distributed transaction run the following statements from EVERY NODE involved in the transaction:

SPOOL 2pc.log;
SELECT * FROM dba_2pc_pending;
SELECT * FROM pending_sessions$;
SELECT * FROM pending_sub_sessions$;
SELECT * FROM dba_2pc_neighbors;
SPOOL OFF;

The following spool will help determine the cause in MTS.

SPOOL mts.log
- How busy the dispatchers are:

select name,network,status,
(busy /(busy + idle)) * 100 "% of time busy" from v$dispatchers
- query the shared servers view V$SHARED_SERVERS:

select name,status,requests, (busy /(busy + idle)) * 100 "% of time busy"
from v$shared_servers
Spool off

4. Gather OS-Level Info


***RDA output gives the most comprehensive data on OS and database configuration

For information about how to download, install and run the RDA scripts refer to Note 314422.1 "Remote Diagnostic Agent (RDA) 4.0 - Overview"

a. Briefly describe your architecture. Include number of CPUs / disks. Are you using raw devices, NFS mounted
files, shared disks, RAM disks...Are you mirroring? etc.

b. Measure unusual OS level activity: excessive CPU or i/o, paging, swapping, etc. There are other monitor tools such
as TOP. Sun has utilities on their webpage that can be downloaded for monitoring the system.
If you're not comfortable with the utilities listed below, be sure to do step 3 and, if you have isolated a troubled process, the v$sesstat output in step.

- On unix: use SAR, VMSTAT and NETSTAT (man sar)
- You can use truss, trace (man truss) or oradbx (call support) for tracing specific processes.
- You can also unix debuggers such as dbx, xdb, adb or truss to dump process activity. Example:

dbx -a /*attach to hanging process*/
where /*dump Oracle stack */
detach /*detach from process*/

truss -o /tmp/mytruss -fae -p
- On vms: use MONITOR (help monitor)
- you can use analyze/process (help analyze) or orambx (call support) for tracing specific processes.

- On novl: use LOAD MONITOR (gui - no online documentation)

- On Windows NT: use Performance Monitor, Event Monitor, Dr. Watson (see Resource kit for documentation).
- use pview for monitoring specific processes (resource kit).

c. Check the system log for additional information. On many platforms it is at /var/adm the files are the message files.

5. Get SYSTEMSTATE and HANGANALYZE Dumps

This creates a large trace file in the user_dump_dest (30M or more is not unusual).
Note: the init.ora parameter MAX_DUMP_FILE_SIZE controls the maximum trace file size. Using Oradebug and setting unlimit will allow a complete dump which we will need.

Do this step for sure if the entire database is frozen or nearly frozen and if this condition came on suddenly and there are no archive errors in the alert log. Please note: As systemstate dumps are instance specific, they tend to be inconclusive with hanging problems involving Oracle Parallel Server (OPS) unless you get them from each node. You will need 3 system state dumps from each node for OPS.

NOTE: Do systemstate dump 3 times in a row, so we can determine whether
the processes are hung or active.

For Oracle 8.0.5.x to 8.1.7.x

$ svrmgrl
svrmgr> connect internal

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';

EXIT ... then reconnect

svrmgr>ALTER SESSION SET MAX_DUMP_FILE_SIZE=UNLIMITED;
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';

For Oracle 9.2.0.1 or higher

$ sqlplus /nolog
connect / as sysdba
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
wait 90 seconds
oradebug hanganalyze 3
exit


$ sqlplus /nolog
connect / as sysdba
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 10
wait 90 seconds
oradebug dump systemstate 10
wait 90 seconds
oradebug dump systemstate 10
exit

Use systemstate level 266 in 9.2.0.6+.  Use level 10 prior to this version.

For HangFG for Unix Platforms.
NOTE.362094.1 HANGFG User Guide

For RAC environment and datbase hang, please review Note 135714.1.

6. Get BSTAT-ESTAT OR (for 8.1.6 or higher) STATSPACK output

If the entire database is suffering performance degradation .
See Note 62161.1 System wide Tuning using UTLESTAT Reports in Oracle7/8
See Note 149113.1 Installing and Configuring StatsPack Package
See Note 149124.1 Creating a StatsPack performance report

Send this report to support along with the other files requested.

Note 149124.1 Creating a StatsPack performance report

Send this report to support along with the other files requested.

7. Generate a CORE DUMP And Run a DEBUGGER.

This step is a last resort. It is included to advise you of an option when others are exhausted (e.g. when
systemstate dump hangs). You should be in contact with support before proceeding. Do not send a core to Oracle unless requested.
Core: Debugger: Documentation:
  • Unix: kill -11 dbx, adb man
  • VMS: N/A analyze/system help
  • Windows NT pview Dr. Watson resource kit
  • Novl vdb3 or vdb4 vdb3 or vdb4 Appx of Ora7
  • Guide for Netware

8. If certain processes get PROCESSSTATE dump.

- Trace the process from the os level (see step 4).

Get processstate dumps. Do 3 times. This generates a trace file in your user_dump_dest (from sqlplus: show parameter user_dump_dest).

$ sqlplus "/as sysdba"
oradebug setospid
oradebug unlimit
oradebug dump processstate 10
Get errorstacks from the process. Do 3 times. This generates a trace file in your user_dump_dest (from sqlplus: show parameter user_dump_dest).

$ sqlplus "/as sysdba"
oradebug setospid
oradebug unlimit
oradebug dump errorstack 3

9. Check List for diagnostics.


  • Go over the information you have gathered and then place it on Oracle External FTP site.
  • Please include all configuration information releases and options used.
  • Please include any changes you made to the system recently hardware or software.
  • Please include results from locking scripts if this may be a locking issue.
  • If you did an analyze please include results.
  • Did you run an analyze.log if so include it?
  • Did you include init parameters and config files?
  • Did you include the statements that are having the problem?
  • If this is a tool such as import, export, sqlloader did you include the exact statements used to run and the parameter files?
  • Did you answer the questions and include the answers?
  • Did you get the explain plan and tkprof output?
  • If you have a small reproducible test case did you include it?
    •    Statements that are being run.
    •    All table, index, view definitions.
    •    All constraints. In short everything it would take for us to reproduce it.
  • Did you trace the processes with a debugger? If so did you include the trace from the debugger?
  • Have you included all trace files?
  • Have you included the alert.log?
  • If core dumps where generated did you translate them and send the stack?(Do not send core files.)
  • Have you sent the sql*net trace files?
  • Did you include v_views.log?
  • Have you included session.log;
  • Did you spool session-specific wait info:
  • If this is distributed transaction have you included 2PC.log?
  • If MTS have you included mts.log?
  • Did you gather os-level information on CPU, Memory and I/O?
  • Did you GET SYSTEMSTATE DUMPS and place these on ftp site?
  • Did you run BSTAT-ESTAT and include the report?
  • Did you include PROCESSSTATE and ERRORSTACK dumps.

References

NOTE:135714.1 - Script. to Collect RAC Diagnostic Information (racdiag.sql)
NOTE:362094.1 - HANGFG User Guide
NOTE:41634.1 - TKProf Basic Overview
NOTE:69739.1 - How to Turn Archiving ON and OFF

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/611609/viewspace-687086/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2008-12-08

  • 博文量
    252
  • 访问量
    496839