ITPub博客

首页 > Linux操作系统 > Linux操作系统 > Rac PMON crash 故障解决一例

Rac PMON crash 故障解决一例

原创 Linux操作系统 作者:flying_warrior 时间:2011-04-22 23:55:26 0 删除 编辑
RAC 2节点CRASH 在ALERT LOG中发现如下日志 

ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not mapped to object] [0x00008239D] [] []


相关错误 有可能是PL/SQL developer 引起的数据字典bug  但是由于在 V5之后 就不会存在这个问题了 我们的 PL/SQL DEV V8的。

(k2g table)

error 602 detected in background process

ORA-00602: internal programming exception

ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not mapped to object] [0x00008239D] [] []

 

而且这个问题确实是可以引发宕机   BUG 11G 中才修复好。

 

SIGSEGV
Typically, the signals seen are SIGBUS (signal 10, bus error) and SIGSEGV (signal 11, segmentation violation).  There are other UNIX signals and exceptions that may happen, however, they are likely caused by OS problems rather than an Oracle problem.  Examples of other signals are: SIGINT, SIGKILL, SIGSYS.  A complete list is available in Note:1038055.6.

错误解释

SIGSEGV

        Segmentation violation.  This signal can also result from an illegal

        pointer reference or an array bound error.

 

看起来 还是软件的错误 虽然他说是OS 的错误。但是论坛上有提到解决问题的办法是 flush shared_pool.

 

下面是一个BUG REPORT  我选择其中的关键内容

 

When attempting to cleanup after a SQL*Net connection is terminated, the following error occurs:

 

ORA-07445: exception encountered: core dump [kssdct()+94] [SIGSEGV] [Address not mapped to object] [0x00000240E] [] []

 

and then the instance is terminated, due to PMON reporting the below errors:

 

ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kssdch()+2188] [SIGSEGV] [Address not mapped to object] [0x00000241E] [] []

 


  Oracle 10.2.0.5 on Linux x86-64.
  8 node RAC database
  Intermittent instance failures on one node. So far, two failures.

 

这个过程来看跟我们的宕机有些相像。

:
  ORA-602: internal programming exception
  ORA-7445: exception encountered: core dump [kssdch()+2188] [SIGSEGV]
  [Address not mapped to object] [0x0000708FB] [] []
  Thu Nov 18 01:02:04 GMT 2010
  PMON: terminating instance due to error 602

 

这个问题关系到一个 unpublished bug 9184754

我无法查到其中内容。

 

 

 

==================

 

无论如何这个问题已经FIX 掉了  以下是ORACLE的建议。

Download and apply the one-off patch number Patch:9184754 on top of your version/platform. combination if available.

 比较call stack 完全一致   ,call stack 请务必确保一致 否则不要轻易尝试总结。

Call  stack :  kssdct() <- kwqbcsecl() <- ksuxds() <- ksudel() <- opidcl() ... 

当打过PATCH 之后 问题解决。

 

具体可以参考 Doc ID 1281101.1

 

 


 

 

 下面是自己查的一些其他资料。算是学习笔记了

==============================

 

Disable RAC

3.  Change the working directory to $ORACLE_HOME/lib:

cd $ORACLE_HOME/lib

4.  Run the following make command to relink the Oracle binaries without the RAC option:

make -f ins_rdbms.mk rac_off

 

make -f ins_rdbms.mk ioracle

 

==========================

RAC 3reason fail 第一个 是节点自然离开 第二个 节点心跳死亡 心跳是记录在controlfile中的 第三个 节点通信终端

 

RAC 默认通信使用 UPD 因为TCP IP 7  UPD没那么多 也不许要3次握手  内连很少丢包。

 

通信终端的原因

If

a message is not received for a timeout period, then a “communication failure” is assumed. This

is more relevant for UDP, as Reliable Shared Memory (RSM), Reliable DataGram protocol (RDG),

and Hyper Messaging Protocol (HMP) do not need it, since the acknowledgment mechanisms are

built into the cluster communication and protocol itself

 

大部分UPD 协议都是不可靠的 如果发生丢包 那么可以通屏蔽这个协议 比如 将_reliable_block_sends=TRUE 这样可能是走TCP了……目前不知道。

user-mode IPC protocols

such as RDG (on HP Tru64 UNIX TruCluster) or HP HMP are used,

 

 

_lgwr_async_broadcasts = true 这个参数可以设置是否允许异步广播

9I的时候每一次COMMIT都需要所有的节点写REDO

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21818314/viewspace-693195/,如需转载,请注明出处,否则将追究法律责任。

下一篇: linux 的cache 机制
请登录后发表评论 登录
全部评论

注册时间:2009-06-21

  • 博文量
    49
  • 访问量
    79216