ITPub博客

首页 > IT基础架构 > 网络安全 > 当votedisk只有一份的时候,可能出现BUG:CSSD进程会因为找不到votedisk而crash,从而导致DB无法与ASM通讯,最终instance关闭

当votedisk只有一份的时候,可能出现BUG:CSSD进程会因为找不到votedisk而crash,从而导致DB无法与ASM通讯,最终instance关闭

原创 网络安全 作者:yeahokay 时间:2012-08-15 14:28:38 0 删除 编辑

CSSD aborting from thread clssnmvDiskPingMonitorThread

当votedisk只有一份的时候,可能出现BUG:CSSD进程会因为找不到votedisk而crash,从而导致DB无法与ASM通讯,最终instance关闭。

此为Oracle的bug。

引发bug的原因未知。

由于voting disk在asm中,安装时,只有1个(使用了外部冗余策略),在只有一个voting disk时,当csstd进程与voting disk进行通讯时,由于bug,会发生无法通讯,导致进程关闭,从而又导致了asm与db无法通讯,所以db出现宕机。

相关文档:
ASM Crashed Due to CSS Crash With Voting File Checks [ID 1468826.1]
11.2.0.3 Node Reboot With "CSSD aborting from thread clssnmvDiskPingMonitorThread" if Only one Voting Disk/File is Configured [ID 1466639.1]

[@more@]

错误日志:

ocssd_node1.log
----------------------
2012-08-13 03:17:42.765: [ CSSD][1109031232]clssnmSendingThread: sent 5 status msgs to all nodes
2012-08-13 03:17:47.766: [ CSSD][1109031232]clssnmSendingThread: sending status msg to all nodes
2012-08-13 03:17:47.766: [ CSSD][1109031232]clssnmSendingThread: sent 5 status msgs to all nodes
2012-08-13 03:17:49.983: [ CSSD][1091426624](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2012-08-13 03:17:49.984: [ CSSD][1091426624]###################################
2012-08-13 03:17:49.984: [ CSSD][1091426624]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-08-13 03:17:49.984: [ CSSD][1091426624]###################################
2012-08-13 03:17:49.984: [ CSSD][1091426624](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-08-13 03:17:49.984: [ CSSD][1091426624]
2012-08-13 03:17:49.984: [ CSSD][1091426624]calling call entry argument values in hex
2012-08-13 03:17:49.984: [ CSSD][1091426624]location type point (? means dubious value)
2012-08-13 03:17:49.984: [ CSSD][1091426624]-------------------- -------- -------------------- ----------------------------
2012-08-13 03:17:49.989: [ CSSD][1091426624]clssscExit()+740 call kgdsdst() 000000000 ? 000000000 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 0410D8568 ? 000000001 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]clssnmvDiskCheck()+ call clssscExit() 7FC21424A8A0 ? 000000002 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]3356 0410D8568 ? 000000001 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]clssnmvDiskPingMoni call clssnmvDiskCheck() 7FC21424A8A0 ? 7FC2140A3C40 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]torThread()+423 0410DD0B8 ? 000000000 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]clssscthrdmain()+25 call clssnmvDiskPingMoni 7FC21424A8A0 ? 7FC2140A3C40 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]3 torThread() 0410DD0B8 ? 000000000 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]start_thread()+221 call clssscthrdmain() 7FC21424A8A0 ? 7FC2140A3C40 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 7FC2140A3C40 ? 000000000 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]clone()+109 call start_thread() 0410DD940 ? 7FC2140A3C40 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 7FC2140A3C40 ? 000000000 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624]0000000000000000 call clone() 0410DD940 ? 7FC2140A3C40 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 7FC2140A3C40 ? 000000000 ?
2012-08-13 03:17:49.990: [ CSSD][1091426624] 000000001 ? 000000003 ?

...

ocssd_node2.log
----------------------
2012-08-13 03:17:47.716: [ CSSD][1110337856]clssnmSendingThread: sending status msg to all nodes
2012-08-13 03:17:47.717: [ CSSD][1110337856]clssnmSendingThread: sent 5 status msgs to all nodes
2012-08-13 03:17:50.011: [ CSSD][1113491776]clssnmHandleMeltdownStatus: node node1, number 1, has experienced a failure in thread number 9 and is shutting down
2012-08-13 03:17:52.336: [GIPCHAUP][1102911808] gipchaUpperProcessDisconnect: processing DISCONNECT for hendp 0x7f86184a7b40 [0000000000000845] { gipchaEndpoint : port 'gm2_crs/f483-4e28-b94b-8942', peer 'node1:9a12-5b0a-0d30-d102', srcCid 00000000-00000845, dstCid 00000000-00007cb4, numSend 0, maxSend 100, groupListType 1, hagroup 0x7f86100468a0, usrFlags 0x4000, flags 0x204 }
2012-08-13 03:17:52.336: [ CSSD][1113491776]clssnmHandleManualShut: Manual shutdown of node nodename node1 nodenum 1
2012-08-13 03:17:52.337: [ CSSD][1113491776]clssnmMarkNodeForRemoval: node 1, node1 marked for removal

解决方法:

方法1、增加voting disk,保持voting disk有3-5个
或者
方法2、打补丁,该bug于2012年6月18号Oracle官网公布,至2012年7月19号又公布于最新的11.2.0.3.2或11.2.0.3.3也有此问题,计划12.1版本修复。
最后在8月3号发布了350M左右的补丁<13869978>,该补丁已整合在最新的11.2.0.3.2或11.2.0.3.3中,意味着PSU补丁也一并打上。

由于现在的系统环境中,使用了ASM外部冗余策略:

提示:
ASM有三种模式
1、外部冗余(external redundancy):即数据只有一份,数据的安全性完全靠外部的raid冗余来保证
2、普通冗余(normal redundancy):数据有两份,那么影响是对可用空间将减少一半。
3、高级冗余(high redundancy):数据有三份,影响是对可用空间只有三分之一。

而且该冗余模式下无法增加voting disk,原因是取决了上述的ASM冗余策略,为此,只能是打补丁去避免这个bug了。。。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/786540/viewspace-1059183/,如需转载,请注明出处,否则将追究法律责任。

上一篇: 数据库突然hang了
下一篇: 没有了~
请登录后发表评论 登录
全部评论
  • 博文量
    140
  • 访问量
    1091755