ITPub博客

首页 > Linux操作系统 > Linux操作系统 > ORA-29740

ORA-29740

原创 Linux操作系统 作者:gamble_god 时间:2012-06-12 08:30:50 0 删除 编辑
昨天用户给打电话说是数据库一个节点自动重启,让我们过去分析下错误原因。
用户数据库环境为ORACLE 11.1.0.7 RAC + SOLARIS 10
检查alert_SID.log:重启前错误信息如下:
Sun Jun 10 22:27:59 2012
IPC Send timeout detected.Sender: ospid 2838
Receiver: inst 1 binc -312867816 ospid 2876
IPC Send timeout to 1.0 inc 168 for msg type 8 from opid 11
Sun Jun 10 22:28:01 2012
Communications reconfiguration: instance_number 1
Sun Jun 10 22:28:01 2012
Trace dumping is performing id=[cdmp_20120610222801]
Waiting for clusterware split-brain resolution
Sun Jun 10 22:38:18 2012
Errors in file /u01/oracle/diag/rdbms/oracle/oracle2/trace/oracle2_lmon_2836.trc  (inciden
t=273710):
ORA-29740: evicted by member 1, group incarnation 170
Sun Jun 10 22:42:24 2012
Starting ORACLE instance (normal)

首先发现Waiting for clusterware split-brain resolution,怀疑心跳网络故障导致一个节点重启,检查网络配置,两个节点心跳网卡均接在交换机,交换机上没有端口的启停信息。

在MOS上查找ORA-29740错误,发现一篇文章和我们的情况很类似:
ORA-29740 Instance (ASM/DB) eviction on Solaris SPARC [ID 761717.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7 - Release: 11.1 to 11.1
Oracle Solaris on SPARC (64-bit)
***Checked for relevance on 24-Sep-2010***
Sun Solaris SPARC (64-bit)
DB or ASM instance using the option RAC.

Symptoms

o An instance is evicted of the cluster reporting the error:

Errors in file /u01/app/oracle/diag/asm/+asm/+ASM3/trace/+ASM3_lmon_27256.trc
(incident=8057):
ORA-29740: evicted by member 0, group incarnation 70

o The eviction was due to a IPC Send timeout reported in the alert log files for the instances:

IPC Send timeout detected.Sender: ospid 27268
Receiver: inst 1 binc 528748320 ospid 8728

IPC Send timeout detected. Receiver ospid 8728

o The Sender process reveals:

1. The process waited a few seconds (instead of 5 min):

GSIPC:KSXPCB: msg 0x3b0ad9218 status 32, type 43, dest 0, rcvr 1
GSIPC:KSXPCB: msg 0x3b0ad9218 send timed out inc 68 waited 2181325 usec
GSIPC:KSXPCB: dest_inc 68  sys_inc 68

- in this case the Sender waited 2 secs and declared the Send timeout.

2. The network segment was retransmitted only one time (CNT=1), but normally a process should resend the message several times before declaring a IPC Send timeout:

SKGXPCTX: 0x109a6b380 ctx
..
   sconno     accono   ertt  state   seq#   RcvPid   TotCredits  sent rtrans acks
..
seq=30893 len=200 accno=0x2aeb82ec start TS=0xa3c51d64 rt TS=0xa3c51d84 X CNT=1

- note that in some cases CNT can be 3 or 4.

Changes

Most of the situations reported have occurred on ASM environments and Solaris SPARC platform. running 11.1.0.6.0. and 11.1.0.7.0 But the code where the error is produced is the same when we have ASM or DB instances running on RAC. Thus is not specific to ASM only.

Cause

Development has found that during the code optimization the original sentence, that verifies the network communication  has not expired,  was changed producing a premature IPC Send timeout.

Solution

Apply the patch for the bug:

BUG 7653579 - ASM INSTANCE EVICTED WITH ORA-29740

又重新检查了alert_SID.log,发现有13次ORA-29740错误导致2节点重启。至此可以判断是由于Oracle BUG导致出现节点重启错误。
下载对应的补丁程序,并通过opatch应用补丁。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/12366929/viewspace-732515/,如需转载,请注明出处,否则将追究法律责任。

上一篇: duplicate database
下一篇: ORA-00257
请登录后发表评论 登录
全部评论

注册时间:2012-05-29

  • 博文量
    10
  • 访问量
    17531