ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 启动实例hang的一例的诊断

启动实例hang的一例的诊断

原创 Linux操作系统 作者:cc59 时间:2007-06-09 00:00:00 0 删除 编辑
d

启动实例hang的一例的诊断

数据库被人无意断开电源。
1、启动数据库时HANG住:ALERT日志停在以下的行。

RECO started with pid=7
CJQ0 started with pid=8
QMN0 started with pid=9
Sun May 27 11:22:28 2007
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Sun May 27 11:22:28 2007
ALTER DATABASE MOUNT
Sun May 27 11:22:35 2007
Successful mount of redo thread 1, with mount id 2423313207.
Sun May 27 11:22:35 2007
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Sun May 27 11:22:35 2007
ALTER DATABASE OPEN
Sun May 27 11:22:35 2007
Beginning crash recovery of 1 threads
Sun May 27 11:22:35 2007
Started first pass scan
Sun May 27 11:22:36 2007
Completed first pass scan
0 redo blocks read, 0 data blocks need recovery
Sun May 27 11:22:36 2007
Started recovery at
Thread 1: logseq 78153, block 3, scn 2110.630842439
Recovery of Online Redo Log: Thread 1 Group 3 Seq 78153 Reading mem 0
Mem# 0 errs 0: /dev/ora/redolog3
Sun May 27 11:22:36 2007
Ended recovery at
Thread 1: logseq 78153, block 3, scn 2110.630862440
0 data blocks read, 0 data blocks written, 0 redo blocks read
Crash recovery completed successfully
Sun May 27 11:22:36 2007
Thread 1 advanced to log sequence 78154
Thread 1 opened at log sequence 78154
Current log# 2 seq# 78154 mem# 0: /dev/ora/redolog2
Successful open of redo thread 1.
Sun May 27 11:22:36 2007
LOG_CHECKPOINT_INTERVAL was set when MTTR advisory was switched on.
Sun May 27 11:22:36 2007
SMON: enabling cache recovery

启动到mount模式下,
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /disk1/ora9i/920/admin/ora9i/udump/ora9i_ora_1202.trc
SQL>

trace文件内容如下:

*** SESSION ID:(12.4) 2007-05-27 10:34:21.904
*** 2007-05-27 10:34:21.903
==============
HANG ANALYSIS:
==============
Open chains found:
Other chains found:
Chain 1 : :
<0/3/1/0xb7a8fc98/957/LGWR wait for redo copy>
Chain 2 : :
<0/8/1/0xb7a915c0/967/wakeup time manager>
Chain 3 : :
<0/9/3/0xb7a924d8/989/No Wait>
Chain 4 : :
<0/12/4/0xb7a92ee8/1202/No Wait>
Extra information that will be dumped at higher levels:
[level 5] : 4 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 10] : 7 node dumps -- [IGN]


可以看到这里有一个等待:LGWR wait for redo copy,诊断应与undo有关。


于是设置events 10015,对于该events的解释:Undo Segment Recovery


event="10015 trace name context forever,level 10"
5. 在生成的TRACE 文件内容如下:
KCRA: buffers claimed = 0/0, eliminated = 0
Acquiring rollback segment SYSTEM
Recovering rollback segment _SYSSMU5$
Recovering rollback segment _SYSSMU6$
Recovering rollback segment _SYSSMU7$
Recovering rollback segment _SYSSMU8$
Recovering rollback segment _SYSSMU9$
Recovering rollback segment _SYSSMU10$

这里可以发现从_SYSSMU5$在做recover时hang住了。
那么可以将此undo segment offline,于是我们使用隐含参数来打开数据库,当然这个过程log需要reset
_ALLOW_RESETLOGS_CORRUPTION = TRUE
_OFFLINE_ROLLBACK_SEGMENTS=(_SYSSMU5$)

再打开数据库,OK。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/104152/viewspace-140015/,如需转载,请注明出处,否则将追究法律责任。

下一篇: Checking RAC
请登录后发表评论 登录
全部评论

注册时间:2007-12-21

  • 博文量
    132
  • 访问量
    286061