ITPub博客

首页 > Linux操作系统 > Linux操作系统 > AIX RAC9I 心跳线断掉测试(续)

AIX RAC9I 心跳线断掉测试(续)

原创 Linux操作系统 作者:westzq1984 时间:2009-05-13 12:15:09 0 删除 编辑

昨天测试了心跳线断掉的时候,集群把节点2踢出,使数据库关闭,但是HACMP并没有关闭,也忘记看网卡接管的情况
今天测试网卡接管的情况以及客户端访问的情况

[oracle@P61A:/u01/app/oracle]$ifconfig -a
en0: flags=4e080863,80
        inet 12.0.0.61 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
        inet 10.10.1.61 netmask 0xffffff00 broadcast 10.10.1.255
        inet 10.10.3.201 netmask 0xffffff00 broadcast 10.10.3.255
        inet 10.10.3.101 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
[oracle@P61A:/u01/app/oracle]$rsh P61B ifconfig -a
en0: flags=4e080863,80
        inet 12.0.0.62 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
        inet 10.10.1.62 netmask 0xffffff00 broadcast 10.10.1.255
        inet 10.10.3.202 netmask 0xffffff00 broadcast 10.10.3.255
        inet 10.10.3.102 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
        
        
21:59:45 拔掉P61B的心跳网线
[oracle@P61A:/u01/app/oracle]$date
Tue May 12 21:59:48 CDT 2009
 
可以远程连接的实例1,2,可以查询某些视图,但不是全部,系统挂起,还有无法运行存在对数据,数据字典的修改

P61B
Tue May 12 22:04:55 2009
IPC Send timeout detected. Sender ospid 340088
Tue May 12 22:05:20 2009
IPC Send timeout detected. Sender ospid 385220
Tue May 12 22:05:27 2009
Communications reconfiguration: instance 0
Tue May 12 22:05:32 2009
IPC Send timeout detected. Sender ospid 364746
Tue May 12 22:05:32 2009
IPC Send timeout detected. Sender ospid 254204
Tue May 12 22:05:57 2009
Trace dumping is performing id=[cdmp_20090512220527]
Tue May 12 22:06:00 2009
IPC Send timeout detected. Sender ospid 389318
Tue May 12 22:06:23 2009
Waiting for clusterware split-brain resolution
Tue May 12 22:06:52 2009
IPC Send timeout detected. Sender ospid 356598
Tue May 12 22:07:54 2009
Trace dumping is performing id=[cdmp_20090512220724]
Tue May 12 22:10:34 2009
IPC Send timeout detected. Sender ospid 327754
Tue May 12 22:16:23 2009
Errors in file /u01/app/oracle/admin/rac/bdump/rac2_lmon_356598.trc:
ORA-29740: evicted by member 1, group incarnation 3
Tue May 12 22:16:23 2009
LMON: terminating instance due to error 29740
Instance terminated by LMON, pid = 356598

P61A
Tue May 12 22:05:11 2009
IPC Send timeout detected. Sender ospid 450792
Tue May 12 22:05:23 2009
IPC Send timeout detected. Sender ospid 233720
Tue May 12 22:05:23 2009
IPC Send timeout detected. Sender ospid 266430
Tue May 12 22:05:42 2009
IPC Send timeout detected. Sender ospid 225300
Communications reconfiguration: instance 1
Waiting for clusterware split-brain resolution
Tue May 12 22:06:44 2009
Trace dumping is performing id=[cdmp_20090512220614]
Tue May 12 22:16:13 2009
Evicting instance 2 from cluster
Tue May 12 22:16:19 2009
Reconfiguration started (old inc 2, new inc 4)
List of nodes:
 0
 Nested/batched reconfiguration detected.
 Global Resource Directory frozen
one node partition
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 Resources and enqueues cleaned out
 Resources remastered 699
 745 GCS shadows traversed, 0 cancelled, 0 closed
 304 GCS resources traversed, 0 cancelled
 set master node info
 Submitted all remote-enqueue requests
 Update rdomain variables
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 745 GCS shadows traversed, 0 replayed, 0 unopened
 Submitted all GCS remote-cache requests
 24 write requests issued in 745 GCS resources
 6 PIs marked suspect, 0 flush PI msgs
Tue May 12 22:16:19 2009
Reconfiguration complete
 Post SMON to start 1st pass IR
Tue May 12 22:16:19 2009
Instance recovery: looking for dead threads
Tue May 12 22:16:19 2009
Beginning instance recovery of 1 threads
Tue May 12 22:16:19 2009
Started redo scan
Tue May 12 22:16:19 2009
Completed redo scan
 569 redo blocks read, 28 data blocks need recovery
Tue May 12 22:16:22 2009
Started recovery at
 Thread 2: logseq 7, block 3, scn 0.0
Tue May 12 22:16:22 2009
Recovery of Online Redo Log: Thread 2 Group 3 Seq 7 Reading mem 0
  Mem# 0 errs 0: /dev/rtrac_redo2_11
Tue May 12 22:16:22 2009
Completed redo application
Tue May 12 22:16:22 2009
Ended recovery at
 Thread 2: logseq 7, block 572, scn 0.271410
 2 data blocks read, 28 data blocks written, 569 redo blocks read
Ending instance recovery of 1 threads
SMON: about to recover undo segment 11
SMON: mark undo segment 11 as available
SMON: about to recover undo segment 12
SMON: mark undo segment 12 as available
SMON: about to recover undo segment 13
SMON: mark undo segment 13 as available
SMON: about to recover undo segment 14
SMON: mark undo segment 14 as available
SMON: about to recover undo segment 15
SMON: mark undo segment 15 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 19
SMON: mark undo segment 19 as available
SMON: about to recover undo segment 20
SMON: mark undo segment 20 as available

大概5分钟才能察觉到脑裂,大概17分钟才能解决

[oracle@P61A:/u01/app/oracle]$ifconfig -a
en0: flags=4e080863,80
        inet 12.0.0.61 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
        inet 10.10.1.61 netmask 0xffffff00 broadcast 10.10.1.255
        inet 10.10.3.201 netmask 0xffffff00 broadcast 10.10.3.255
        inet 10.10.3.101 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
[oracle@P61A:/u01/app/oracle]$rsh P61B ifconfig -a
en0: flags=4e080863,80
        inet 12.0.0.62 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
        inet 10.10.1.62 netmask 0xffffff00 broadcast 10.10.1.255
        inet 10.10.3.202 netmask 0xffffff00 broadcast 10.10.3.255
        inet 10.10.3.102 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

IP也没切换过去,也不可能切换的过去


看下LINUX下的测试了下,大概也需要15分钟才能解决完脑裂,一个节点被强行关闭,但是主机不会重启
ORACM进程崩溃,GSD进程还存在

服务IP的配置的9I感觉没什么意义,9I处理脑裂的方式并不是重启主机,HACMP控制的服务IP应该需要编写专门的脚本才能实现切换,意义不大
参见linux下ORACLE集群件的处理方式,也没有服务IP这个概念
9i下处理脑裂状况太慢了,基本上在15分钟左右,期间可以运行部分查询(应该是数据已经在SGA中,而且不需要重新SQL解析的这部分查询,其他新的查询运行时直接HANG在那里知道超时)
9i下的HACMP,像10g一样,把卷组管理起来应该就足够了

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/8242091/viewspace-594945/,如需转载,请注明出处,否则将追究法律责任。

上一篇: ORA-12545 RAC 解决
下一篇: ERRPT
请登录后发表评论 登录
全部评论

注册时间:2009-04-06

  • 博文量
    251
  • 访问量
    955385