ITPub博客

首页 > 数据库 > 数据库开发技术 > 数据库突然hang了

数据库突然hang了

原创 数据库开发技术 作者:yeahokay 时间:2012-02-01 17:33:45 0 删除 编辑

rac2----crsd.log日志

2012-02-01 16:10:05.708: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/crs_1/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2012-02-01 16:10:05.763: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac2.vip! (timeout=60)
2012-02-01 16:10:05.763: [ CRSAPP][1500211520]0CheckResource error for ora.perac2.vip error code = -2
2012-02-01 16:11:48.986: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/crs_1/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2012-02-01 16:11:49.389: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac2.vip! (timeout=60)
2012-02-01 16:11:49.390: [ CRSAPP][1500211520]0CheckResource error for ora.perac2.vip error code = -2
2012-02-01 16:18:27.234: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Could not join /oracle/product/10.2.0/crs_1/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2012-02-01 16:18:27.373: [ CRSEVT][1500211520]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac2.vip! (timeout=60)
2012-02-01 16:18:27.374: [ CRSAPP][1500211520]0CheckResource error for ora.perac2.vip error code = -2


rac1----crsd.log日志
2012-02-01 16:18:18.476: [ CRSEVT][1497348416]0CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs_1/bin/racgwrap(check) timed out for ora.perac1.vip! (timeout=60)
2012-02-01 16:18:19.574: [ CRSAPP][1497348416]0CheckResource error for ora.perac1.vip error code = -2

两个节点的ocssd.log与系统messages无任何错误或warnning信息

alert日志报ora-3136错误


相关文档
10g/11gR1: Many Orphaned Or Hanging "racgmain" Processes Running [ID 732086.1]

Cause
crsd.bin invokes the racgmain to check the status of the resources that are managed by CRS. The racgmain is invoked through the wrapper script racgwrap.

If the resource action timed out, crsd kills the action script, which is racgwrap, while racgmain process will not be killed. Over time, this might create lot of orphan racgmain processes in the system. This would eventually slow down the due to the resource contention at the OS level.

Internal bug:6196746 addresses this issue.


Solution


?This is fixed in 11.1.0.7 patchset.. If you are running into this issue in 10gR2, please go ahead and apply 10.2.0.4 patchset and the latest CRS bundle patch. This fix is included in CRS bundle patch from bundle #2 onwards.

?Following option could be used as a temporary workaround until the patch is applied.


1. Make a copy of racgwrap located under $ORACLE_HOME/bin and $CRS_HOME/bin on ALL Nodes

2. Edit the file racgwrap and modify the last 3 lines from:

~~~
$ORACLE_HOME/bin/racgmain "$@"
status=$?
exit $status

to:

# Line added to fix for Bug 6196746
exec $ORACLE_HOME/bin/racgmain "$@"
~~~

3. Kill all the orphan racgmain processes running.

$ ps -ef|grep "racgmain check"
oracle 18701 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check
oracle 14653 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check
oracle 24517 1 0 Aug 1 ? 0:00 /oracle/product/10.2.0/database/bin/racgmain check

$ kill -9

References
BUG:7009245 - "RACGMAIN CHECK" PROCESS NOT TERMINATING

[@more@]

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/786540/viewspace-1057250/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
  • 博文量
    140
  • 访问量
    1091753