两个两节点的RAC数据库(OS:AIX 5300-11-04-1015, DB:10.2.0.4/10.2.0.5),经常发生VIP漂移到另一个节点.
$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....61.inst application ONLINE ONLINE cassdb1
ora....62.inst application ONLINE ONLINE cassdb2
ora.cass.db application ONLINE ONLINE cassdb1
ora....B1.lsnr application ONLINE ONLINE cassdb1
ora....db1.gsd application ONLINE ONLINE cassdb1
ora....db1.ons application ONLINE ONLINE cassdb1
ora....db1.vip application ONLINE ONLINE cassdb1
ora....B2.lsnr application ONLINE OFFLINE
ora....db2.gsd application ONLINE ONLINE cassdb2
ora....db2.ons application ONLINE ONLINE cassdb2
ora....db2.vip application ONLINE ONLINE cassdb1
ora.cassdb2.vip.log:
2016-11-21 03:38:54.122: [ RACG][1] [897304][1][ora.cassdb2.vip]: Invalid parameters, or failed to bring up VIP (host=cassdb2)
crsd.log:
2016-11-21 03:38:54.129: [ CRSRES][11124]32ora.cassdb2.vip on cassdb2 went OFFLINE unexpectedly
errpt没有任何报错, 最大的可能原因就是和网关的通信有问题。
设置VIP trace(root账号执行,不用停VIP):
crsctl debug log res "ora.cassdb1.vip:5"
crsctl debug log res "ora.cassdb2.vip:5"
再次发生时的VIP日志:
Mon Nov 21 22:57:02 BEIST 2016 [ 1286242 ] About to execute command: /usr/sbin/ping -S 10.4.40.9 -c 1 -w 1 10.4.40.254
2016-11-21 22:57:06.711: [ RACG][1] [1437858][1][ora.cassdb2.vip]: Mon Nov 21 22:57:04 BEIST 2016 [ 1286242 ] About to execute com
mand: /usr/sbin/ping -S 10.4.40.4 -c 1 -w 1 10.4.40.254
Mon Nov 21 22:57:06 BEIST 2016 [ 1286242 ] IsIfAlive: RX packets checked if=en0 failed
2016-11-21 22:57:06.711: [ RACG][1] [1437858][1][ora.cassdb2.vip]: Mon Nov 21 22:57:06 BEIST 2016 [ 1286242 ] Interface en0 checke
d failed (host=cassdb2)
根据VIPs Often Go Offline Unexpectedly and Relocate to Another Node (文档 ID 1297867.1)
确实是和网关通信有问题
查看racgvip的代码:
# Check the status of the interface thro' pinging gateway
if [ -n "$DEFAULTGW" ]
then
_RET=1
# get base IP address of the interface
tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'`
# get RX packets numbers
_O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
x=$CHECK_TIMES
while [ $x -gt 0 ]
do
if [ -n "$tmpIP" ]
then
logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW"
$PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
else
logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW"
$PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
fi
_O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
if [ "$_O1" != "$_O2" ]
then
# RX packets numbers changed
_RET=0
break
fi
$SLEEP 1
x=`$EXPR $x - 1`
done
if [ $_RET -ne 0 ]
then
logx "IsIfAlive: RX packets checked if=$_IF failed"
else
logx "IsIfAlive: RX packets checked if=$_IF OK"
fi
else
logx "IsIfAlive: Default gateway is not defined (host=$HOSTNAME)"
if [ $FAIL_WHEN_DEFAULTGW_NO_FOUND -eq 1 ]
then
_RET=1
else
_RET=0
fi
fi
if [ $_RET -eq 1 ]
then
logx "Interface $_IF checked failed (host=$HOSTNAME)"
fi
logx "IsIfAlive: end for if=$_IF"
return $_RET
由于ping网关在1秒内没有结果,"_O1"和"_O2"相等,导致VIP漂移
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/37279/viewspace-2128838/,如需转载,请注明出处,否则将追究法律责任。