ITPub博客

首页 > Linux操作系统 > Linux操作系统 > Oracle 10gR2 RAC Clusterware ONS服务的管理

Oracle 10gR2 RAC Clusterware ONS服务的管理

原创 Linux操作系统 作者:尛样儿 时间:2012-06-12 11:45:40 0 删除 编辑
       
        下面通过一个实际的案例讨论ONS服务的管理。
        在10gR2 RAC环境中,表决磁盘数据丢失,且没有备份,于是准备清空Clusterware配置信息,重新执行root.sh脚本来恢复Clusterware的运行。参考文章:http://space.itpub.net/23135684/viewspace-721081成功执行了/u01/crs/bin/racgons add_config rhel:6251 rhel2:6251命令,之后执行vipca脚本创建两个节点的nodeapps(请注意:vipca脚本会自动创建ons服务,所以之前使用racgons创建ons是没必要的),但是在创建和启动过程中发现第二个节点的ons服务无法启动,查看第二个节点的ons日志:
ons日志的位置是:
/u01/app/oracle/crs/log/rhel2/racg/ora.rhel2.ons.log
格式是: $ORA_CRS_HOME/log//racg/ora..ons.log 
跟踪日志发现如下信息:
2012-06-12 17:21:05.030: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
Adding remote host rhel:6251
1: {node = rhel2, port = 6251}

2012-06-12 17:21:05.032: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.

2012-06-12 17:21:05.032: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
Adding remote host rhel:6251
1: {node = rhel2, port = 6251}
onsctl: ons failed to start

2012-06-12 17:21:05.133: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/crs

2012-06-12 17:21:05.133: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: cmd = /u01/app/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/app/oracle/crs/bin/onsctl start

2012-06-12 17:21:05.133: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: rc = 1, time = 1.650s

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
Adding remote host rhel:6251
1: {node = rhel2, port = 6251}

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: ons is not running ...

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/crs

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: cmd = /u01/app/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/app/oracle/crs/bin/onsctl ping

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: clsrcexecut: rc = 1, time = 0.310s

2012-06-12 17:21:05.448: [    RACG][3999896128] [13230][3999896128][ora.rhel2.ons]: end for resource = ora.rhel2.ons, action = start, status = 1, time = 2.060s

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: onsctl: shutting down ons daemon ...
GETHOSTBYNAME(rhel): 2
GETHOSTBYNAME(rhel): 2
Remote port for local node in local config does not match that from OCR.
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: Adding remote host rhel:6251
1: {node = rhel2, port = 6251}
onsctl: shutdown of ons failed!

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/crs

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: clsrcexecut: cmd = /u01/app/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/app/oracle/crs/bin/onsctl stop

2012-06-12 17:21:07.228: [    RACG][740729408] [13260][740729408][ora.rhel2.ons]: clsrcexecut: rc = 3, time = 0.470s

        从上面的日志可以看出应该是两个节点的端口不匹配导致的问题,手动创建ONS服务使用的是6251端口,使用vipca创建的可能不是6251端口,所以导致两边的端口不匹配。

一.onsctl工具
下面是onsctl工具的帮助信息:
[root@rhel1 bin]# ./onsctl
usage: ./onsctl start|stop|ping|reconfig|debug

start                            - Start opmn only.
stop                             - Stop ons daemon
ping                             - Test to see if ons daemon is running
debug                            - Display debug information for the ons daemon
reconfig                         - Reload the ons configuration
help                             - Print a short syntax description (this).
detailed                         - Print a verbose syntax description.

[root@rhel1 bin]# ./onsctl detailed
usage: ./onsctl start|stop|ping|reconfig|debug

start
    Start ons daemon

stop
    Shutdown ons daemon

reconfig
    Trigger ons to re-read it's configuration files.

ping
    Test to see if ons daemon is alive

debug
    Display debug information about the ons daemon

help
    Print a short syntax description.

detailed
    Print a verbose syntax description (this message).


在第一个节点执行onsctl ping命令:
[root@rhel1 bin]# ./onsctl ping
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
GETHOSTBYNAME(rhel): 2
Adding remote host rhel:6251
GETHOSTBYNAME(rhel): 2
1: {node = rhel2, port = 6251}
Adding remote host rhel2:6251
ons is running ...
        ons在第一个节点已经处于运行状态。

在第二个节点执行onsctl ping命令:
[root@rhel2 bin]# ./onsctl ping
Number of configuration nodes retrieved: 2
0: {node = rhel, port = 6251}
GETHOSTBYNAME(rhel): 2
Adding remote host rhel:6251
GETHOSTBYNAME(rhel): 2
1: {node = rhel2, port = 6251}
Remote port for local node in local config does not match that from OCR.
ons is not running ...

发现第二个节点ons因为端口与第一个节点不匹配的原因而没有启动。

二.查看节点进程:
查看第一个节点的ons进程:
[root@rhel1 bin]# ps -ef | grep ons
root      2412     1  0 16:47 ?        00:00:00 sendmail: accepting connections
oracle   13513     1  0 17:17 ?        00:00:00 /u01/app/oracle/crs/opmn/bin/ons -d
oracle   13515 13513  0 17:17 ?        00:00:00 /u01/app/oracle/crs/opmn/bin/ons -d
root     15646  3340  0 17:22 pts/0    00:00:00 grep ons

查看第二个节点的osn进程:
[root@rhel2 bin]# ps -ef | grep ons
root      2400     1  0 16:45 ?        00:00:00 sendmail: accepting connections
root     13847  3546  0 17:22 pts/0    00:00:00 grep ons


三.ONS配置文件
执行find命令找到了ons的配置文件,如下:
./opmn/conf/ons.config.tmp
./opmn/conf/ons.config
./opmn/conf/ons.config.backup.10205

[root@rhel1 crs]# cat ./opmn/conf/ons.config
localport=6113
remoteport=6200
loglevel=3
useocr=on

显然配置文件中的端口与执行racgons配置的6251不匹配。

四.RACGONS工具
RACGONS的帮助信息如下:
[root@rhel1 bin]# ./racgons
To add ONS daemons configuration:
./racgons.bin add_config hostname:port [hostname:port] ...
To remove ONS daemons configuration:
./racgons.bin remove_config hostname[:port] [hostname:port] ...

        在OCR中可能配置有两条ONS的信息,执行以下的命令删除原有的6251端口配置:
[root@rhel1 bin]# ./racgons remove_config rhel:6251 rhel2:6251
racgons: Existing key value on rhel = 6251.
racgons: rhel:6251 removed from OCR.
racgons: Existing key value on rhel2 = 6251.
racgons: rhel2:6251 removed from OCR.

重新启动nodeapps:
[root@rhel1 bin]# ./srvctl start nodeapps -n rhel2
[root@rhel1 bin]# ./srvctl start nodeapps -n rhel1

查看两个节点的状态:
[root@rhel1 bin]# ./crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.rhel1.gsd  application    ONLINE    ONLINE    rhel1
ora.rhel1.ons  application    ONLINE    ONLINE    rhel1
ora.rhel1.vip  application    ONLINE    ONLINE    rhel1
ora.rhel2.gsd  application    ONLINE    ONLINE    rhel2
ora.rhel2.ons  application    ONLINE    ONLINE    rhel2
ora.rhel2.vip  application    ONLINE    ONLINE    rhel2

恢复正常。
--end--



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23135684/viewspace-732562/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
Oracle数据库管理员,Oracle数据库系统构架员;2012年7月出版《构建最高可用Oracle数据库系统:Oracle 11gR2 RAC管理、维护与性能优化》一书;Oracle 10g OCM。

注册时间:2010-01-05

  • 博文量
    483
  • 访问量
    5249694