ITPub博客

首页 > 数据库 > Oracle > VIPs Often Go Offline Unexpectedly and Relocate to Another Node

VIPs Often Go Offline Unexpectedly and Relocate to Another Node

原创 Oracle 作者:warmbreeze 时间:2016-11-22 10:21:09 0 删除 编辑
两个两节点的RAC数据库(OS:AIX 5300-11-04-1015, DB:10.2.0.4/10.2.0.5),经常发生VIP漂移到另一个节点.
$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....61.inst application    ONLINE    ONLINE    cassdb1     
ora....62.inst application    ONLINE    ONLINE    cassdb2     
ora.cass.db    application    ONLINE    ONLINE    cassdb1     
ora....B1.lsnr application    ONLINE    ONLINE    cassdb1     
ora....db1.gsd application    ONLINE    ONLINE    cassdb1     
ora....db1.ons application    ONLINE    ONLINE    cassdb1     
ora....db1.vip application    ONLINE    ONLINE    cassdb1     
ora....B2.lsnr application    ONLINE    OFFLINE               
ora....db2.gsd application    ONLINE    ONLINE    cassdb2     
ora....db2.ons application    ONLINE    ONLINE    cassdb2     
ora....db2.vip application    ONLINE    ONLINE    cassdb1     


ora.cassdb2.vip.log:
2016-11-21 03:38:54.122: [    RACG][1] [897304][1][ora.cassdb2.vip]: Invalid parameters, or failed to bring up VIP (host=cassdb2)


crsd.log:
2016-11-21 03:38:54.129: [  CRSRES][11124]32ora.cassdb2.vip on cassdb2 went OFFLINE unexpectedly


errpt没有任何报错, 最大的可能原因就是和网关的通信有问题。


设置VIP trace(root账号执行,不用停VIP):
crsctl debug log res "ora.cassdb1.vip:5"
crsctl debug log res "ora.cassdb2.vip:5"


再次发生时的VIP日志:
Mon Nov 21 22:57:02 BEIST 2016 [ 1286242 ] About to execute command: /usr/sbin/ping -S 10.4.40.9  -c 1 -w 1 10.4.40.254


2016-11-21 22:57:06.711: [    RACG][1] [1437858][1][ora.cassdb2.vip]: Mon Nov 21 22:57:04 BEIST 2016 [ 1286242 ] About to execute com
mand: /usr/sbin/ping -S 10.4.40.4  -c 1 -w 1 10.4.40.254
Mon Nov 21 22:57:06 BEIST 2016 [ 1286242 ] IsIfAlive: RX packets checked if=en0 failed


2016-11-21 22:57:06.711: [    RACG][1] [1437858][1][ora.cassdb2.vip]: Mon Nov 21 22:57:06 BEIST 2016 [ 1286242 ] Interface en0 checke
d failed (host=cassdb2)




根据VIPs Often Go Offline Unexpectedly and Relocate to Another Node (文档 ID 1297867.1)


确实是和网关通信有问题




查看racgvip的代码:


  # Check the status of the interface thro' pinging gateway
  if [ -n "$DEFAULTGW" ]
  then
    _RET=1
    # get base IP address of the interface
    tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'`
    # get RX packets numbers
    _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
    x=$CHECK_TIMES
    while [ $x -gt 0 ]
    do
      if [ -n "$tmpIP" ]
      then
        logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW"
        $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
      else
        logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW"
        $PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
      fi
      _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
      if [ "$_O1" != "$_O2" ]
      then
        # RX packets numbers changed
        _RET=0
        break
      fi
      $SLEEP 1
      x=`$EXPR $x - 1`
    done
    if [ $_RET -ne 0 ]
    then
      logx "IsIfAlive: RX packets checked if=$_IF failed"
    else
      logx "IsIfAlive: RX packets checked if=$_IF OK"
    fi
   else
    logx "IsIfAlive: Default gateway is not defined (host=$HOSTNAME)"
    if [ $FAIL_WHEN_DEFAULTGW_NO_FOUND -eq 1 ]
    then
      _RET=1
    else
      _RET=0
    fi
  fi
 
  if [ $_RET -eq 1 ]
  then
    logx "Interface $_IF checked failed (host=$HOSTNAME)"
  fi


  logx "IsIfAlive: end for if=$_IF"
  return $_RET


  由于ping网关在1秒内没有结果,"_O1"和"_O2"相等,导致VIP漂移

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/37279/viewspace-2128838/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2012-02-15

  • 博文量
    45
  • 访问量
    42171