ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 两个节点重启

两个节点重启

原创 Linux操作系统 作者:ljz-gy 时间:2011-03-29 11:17:50 0 删除 编辑
Bug 8552596: NODE EVICTED WITH "TERMINATING INSTANCE DUE TO ERROR 481"  

显示 Bug 属性 Bug 属性


类型 B - Defect 已在产品版本中修复 -
严重性 2 - Severe Loss of Service 产品版本 10.2.0.4
状态 33 - Suspended, Req'd Info not Avail 平台 212 - IBM AIX on POWER Systems (64-bit)
创建时间 27-May-2009 平台版本 5.3
更新时间 03-Jul-2009 基本 Bug -
数据库版本 10.2.0.4
影响平台 Generic
产品源 Oracle

显示相关产品 相关产品


产品线 Oracle Database Products 系列 Oracle Database
区域 Oracle Database 产品 5 - Oracle Server - Enterprise Edition

Hdr: 8552596 10.2.0.4 RDBMS 10.2.0.4 RAC PRODID-5 PORTID-212
Abstract: NODE EVICTED WITH "TERMINATING INSTANCE DUE TO ERROR 481"

*** 05/27/09 05:28 am ***
TAR:
----

PROBLEM:
--------
Instance terminated with the following error in alert log file.
===============================================================
Error: KGXGN aborts the instance (6)
Tue May 26 15:29:29 2009
USER: terminating instance due to error 481
Tue May 26 15:29:29 2009
System state dump is made for local instance
System State dumped to trace file
/oracle/app/admin/ODSPROD/bdump/odsprod1_diag_430314.trc
Tue May 26 15:29:39 2009
Termination issued to instance processes. Waiting for the processes to exit
Instance terminated by USER, pid = 1667548

DIAGNOSTIC ANALYSIS:
--------------------
The rdbms and asm instance reported issues with the css at 2009-05-26
15:29:28
>>>
*** 15:29:28.953
2009-05-26 15:29:28.953: [ CSSCLNT]clsssRecvMsg: comm error received, comrc
11, con (110ef8c90), msg (fffffffffffd2b0), msgl 144
2009-05-26 15:29:28.990: [ CSSCLNT]clssgsGGetStatus:  communications failed
(0/3/-10944)
2009-05-26 15:29:28.990: [ CSSCLNT]clssgsGGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSS
CM problem, please abort
*** 15:29:28.991
Node monitor becomes unavailable for service
2009-05-26 15:29:29.191: [ CSSCLNT]clsssRecvMsg: comm error received, comrc
11, con (110ef8c90), msg (fffffffffffd2b0), msgl 144
2009-05-26 15:29:29.191: [ CSSCLNT]clssgsGGetStatus:  communications failed
(0/3/-10944)
2009-05-26 15:29:29.191: [ CSSCLNT]clssgsGGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSS
>>>>

in css log
====
  CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmHandleExitUpdate: (src
2) grock ocr_ODS_CRS, member 1
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmRPCDone: rpc
110741ed8 (RPC#1829) state 6, flags 0x100
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmDelMemCompl: rpc
110741ed8, ret 0, client 111f0cd70
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clscsendx: (111f0d970)
Connection not active

[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmSendClient: Send
failed rc 6, con (111f0d970), client (111f0cd70), proc (0)
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmFreeRPCIndex:
freeing rpc 1829
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmRemoveMember:
grock(ocr_ODS_CRS) member(1/111f0cbb0) nodeNum(1) flags(0x0) type(2)
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmDispatchCMXMSG():
msg type(6) src(2) dest(65535) size(352) tag(00000000) incarnation(4)
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmHandleExitUpdate:
(src 2) grock crs_version, member 0
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmRPCDone: rpc
110742040 (RPC#1830) state 6, flags 0x100
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmDelMemCompl: rpc
110742040, ret 0, client 111f0dd50
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clscsendx: (111f0e430)
Connection not active


++ We started seeing "Connection not active" messages at the same time.
The oswatcher last snapshot is at 15:29:01

The css seems to respawn very quickly here:
>>>>
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmFreeRPCIndex:
freeing rpc 1832
[    CSSD]2009-05-26 15:29:28.329 [3857] >TRACE:   clssgmRemoveMember:
grock(_ORA_CRS_MEMBER_p690ap0) member(0/111f12250) nodeNum(1) flags(0x12)
type(3)
[    CSSD]2009-05-26 15:29:43.860 >USER:    Copyright 2009, Oracle version
10.2.0.4.0
[    CSSD]2009-05-26 15:29:43.860 >USER:    CSS daemon log for node p690ap0,
number 1, in cluster ODS_CRS
[    CSSD]2009-05-26 15:29:43.873 [1] >TRACE:   clssscmain: local-only set to
false
>>>>>

++ This is strange, the css respawned at 15:29:43. There should have been a
reboot here ++.
The reboot happened at 15:41 as reported by errpt.log. We see cssd also
starting at 15:41 again.


The issues are:
=============
1) Why did the css report communication issue. The node statistics in
oswatcher look good.
2) What caused the cssd to respawn at 15:29. Ideally the node should reboot
with abnormal termination of cssd

WORKAROUND:
-----------

RELATED BUGS:
-------------

REPRODUCIBILITY:
----------------

TEST CASE:
----------

STACK TRACE:
------------

SUPPORTING INFORMATION:
-----------------------

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------

DIAL-IN INFORMATION:
--------------------

IMPACT DATE:
------------

*** 05/27/09 05:41 am ***
*** 05/27/09 05:52 am *** (CHG: Sta->16)
*** 05/27/09 09:57 am ***
*** 05/27/09 05:36 pm ***
*** 05/27/09 05:37 pm *** (CHG: Sta->10)
*** 05/28/09 08:01 am *** (CHG: Sta->16)
*** 05/28/09 08:01 am ***
*** 05/28/09 09:16 am ***
*** 05/28/09 09:16 am *** (CHG: Sta->10)
*** 05/28/09 01:00 pm *** (CHG: Sta->16)
*** 05/28/09 01:00 pm ***
*** 05/29/09 10:20 am *** (CHG: Sta->10)
*** 05/29/09 10:20 am ***
*** 05/31/09 02:15 am ***
*** 05/31/09 02:17 am ***
*** 05/31/09 02:17 am *** (CHG: Sta->16)
*** 05/31/09 02:20 am ***
*** 06/01/09 05:35 pm ***
*** 06/01/09 05:35 pm *** (CHG: Sta->10)
*** 06/01/09 05:36 pm ***
*** 06/01/09 09:22 pm ***
*** 06/01/09 09:23 pm ***
*** 06/01/09 11:37 pm *** (CHG: Sta->16)
*** 06/01/09 11:37 pm ***
*** 06/01/09 11:39 pm ***
*** 06/04/09 12:37 am ***
*** 06/04/09 12:37 am ***
*** 06/04/09 01:16 am ***
*** 06/04/09 01:34 am *** (CHG: Sta->10)
*** 06/04/09 02:55 am ***
*** 06/04/09 02:58 am *** (CHG: Sta->16)
*** 06/04/09 02:58 am ***
*** 06/04/09 03:01 am ***
*** 06/04/09 03:23 am *** (CHG: Sta->10)
*** 06/04/09 03:23 am ***
*** 06/04/09 03:19 pm ***
*** 06/04/09 03:21 pm ***
*** 06/05/09 06:47 pm ***
*** 07/03/09 05:52 pm *** (CHG: Sta->33)
*** 07/03/09 05:52 pm ***

try{var s = window.name;parent.MM[s].initIframe();}catch(e){}

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22123669/viewspace-691193/,如需转载,请注明出处,否则将追究法律责任。

上一篇: 修改
请登录后发表评论 登录
全部评论

注册时间:2010-06-09

  • 博文量
    74
  • 访问量
    38906