ITPub博客

IP地址被清空导致实例重启

原创 网络安全 作者:yangtingkun 时间:2013-07-19 23:03:22 0 删除 编辑

客户10.2.0.4 RAC for Solaris 10环境突然出现了实例重启的现象。

[@more@]

数据库正常运行到下午3点左右,随后两个节点分别重启,其中一个节点上的实例无法自动启动。检查两个实例的告警日志发现,在节点重启前,两个节点都出现了明显的ORA-27504错误:

Wed Apr 10 15:00:05 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_10997.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:00:06 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_11007.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:00:06 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_11009.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:00:06 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_11011.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found. Check output from ifconfig command
.
.
.
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25688
Receiver: inst 2 binc 427282 ospid 11838
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25724
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25680
Receiver: inst 2 binc 431591 ospid 11822
Receiver: inst 2 binc 431795 ospid 11874
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25684
Receiver: inst 2 binc 428985 ospid 11826
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25708
Receiver: inst 2 binc 430048 ospid 11858
Wed Apr 10 15:07:09 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.1 inc 4 for msg type 44 from opid 7
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.12 inc 4 for msg type 44 from opid 21
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.2 inc 4 for msg type 44 from opid 8
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.3 inc 4 for msg type 44 from opid 10
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.8 inc 4 for msg type 44 from opid 15
Wed Apr 10 15:08:13 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:08:16 2013
IPC Send timeout detected.Sender: ospid 25748
Receiver: inst 2 binc 430164 ospid 11890
.
.
.
Wed Apr 10 15:08:53 2013
IPC Send timeout to 1.13 inc 4 for msg type 36 from opid 176
Wed Apr 10 15:08:53 2013
IPC Send timeout to 1.15 inc 4 for msg type 36 from opid 167
Wed Apr 10 15:08:57 2013
IPC Send timeout to 1.4 inc 4 for msg type 32 from opid 180
.
.
.
Wed Apr 10 15:15:51 2013
Evicting instance 2 from cluster
Wed Apr 10 15:16:09 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:16:40 2013
Waiting for instances to leave:
2
Wed Apr 10 15:17:00 2013
Waiting for instances to leave:
2
Wed Apr 10 15:17:09 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:17:20 2013
Waiting for instances to leave:
2

节点2上的错误信息与之类似:

.
.
.
Wed Apr 10 15:19:07 2013
Errors in file /oracle/admin/orcl/udump/orcl2_ora_14065.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.4 not found. Check output from ifconfig command
Wed Apr 10 15:19:08 2013
Errors in file /oracle/admin/orcl/udump/orcl2_ora_14057.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.4 not found. Check output from ifconfig command
Wed Apr 10 15:19:46 2013
ospid 11820: network interface with IP address 192.168.168.4 no longer operational
requested interface 192.168.168.4 not found. Check output from ifconfig command
Wed Apr 10 15:20:46 2013
ospid 11820: network interface with IP address 192.168.168.4 no longer operational
requested interface 192.168.168.4 not found. Check output from ifconfig command
Wed Apr 10 15:20:55 2013
Errors in file /oracle/admin/orcl/bdump/orcl2_lmon_11818.trc:
ORA-29740: evicted by member 0, group incarnation 6
Wed Apr 10 15:20:55 2013
LMON: terminating instance due to error 29740
Wed Apr 10 15:20:55 2013
Errors in file /oracle/admin/orcl/bdump/orcl2_smon_11924.trc:
ORA-29740: evicted by member , group incarnation
Wed Apr 10 15:20:55 2013
Errors in file /oracle/admin/orcl/bdump/orcl2_lmse_11886.trc:
ORA-29740: evicted by member , group incarnation
Wed Wed Apr 10 16:11:37 2013
Starting ORACLE instance (normal)
Wed Apr 10 16:11:45 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:45 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:11:45 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:11:45 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:11:50 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:50 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:11:50 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:11:50 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:11:54 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:54 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:11:54 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:11:54 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:29 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:29 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:29 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:12:29 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:47 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:47 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:47 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:12:47 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:52 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:52 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:52 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:12:52 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:56 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:56 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:56 2013
Oracle Instance Startup operation failed. Another process may be attempting to startup or shutdown this Instance.
Wed Apr 10 16:12:56 2013
Failed to acquire instance startup/shutdown serialization primitive

导致问题的原因根据错误信息很容易分析出来,节点2上的IP地址被修改,导致心跳通信出现了异常,而节点1试图将节点2踢出集群,但是由于无法和节点2之间进行通信,因此只有等待节点2重启。

检查节点2的操作系统日志:

Apr 10 15:00:04 bj-sst-xhm-3f2-m5k-02 ip: [ID 482227 kern.notice] ip_arp_done: init failed
Apr 10 15:07:37 bj-sst-xhm-3f2-m5k-02 Had[4135]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 CPU usage on bj-sst-xhm-3f2-m5k-02 is 92%
Apr 10 15:18:41 bj-sst-xhm-3f2-m5k-02 sshd[13485]: [ID 800047 auth.error] error: Failed to allocate internet-domain X11 display socket.

1504秒时出现的ip_arp_done: init failed信息,说明设置网卡接口时使用了主机名信息,且主机的IP地址被在线修改。

最后根据HISTORY确认,发现有人通过root登录系统,执行ifconfig –a6来检查IPV6的地址,但是命令敲错,执行了ifconfig –a 6,在a6之间多了一个空格,导致主机所有的IP地址被设置成0.0.0.0,于是导致了上面的错误。

这再次说明,对于root这种权限用户而言,任何的不小心都可能会导致非常严重的后果。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-1060787/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
暂无介绍

注册时间:2007-12-29

  • 博文量
    1955
  • 访问量
    10318345