ITPub博客

通过实验详解CLUSTER_INTERCONNECTS参数对实例的影响

原创 Oracle 作者:尛样儿 时间:2015-10-20 19:23:39 1 删除 编辑

   在Oracle RAC环境中,RAC实例的Cache Fusion通常都使用的是Clusterware的私有心跳网络,特别是11.2.0.2版本之后,多用HAIP技术,这种技术在提高带宽的同时(最多4个心跳网络),也保证了心跳网络的容错能力,例如:RAC节点服务器4条心跳网络,同时坏3条都不会引起Oracle RAC和Clusterware宕机。

    但是当一套RAC环境中部署有多套数据库时,不同数据库实例之间的Cache Fusion活动会相互的影响,可能有些库对带宽要求高些,有些库对带宽要求低些,为了避免同一套RAC环境的多套数据库的心跳之间相互影响,Oracle在数据库层面提供了cluster_interconnects参数,该参数的作用就是覆盖默认的心跳网络,使用指定的网络用于数据库实例Cache Fusion活动,但该参数不具备容错的能力,下面我们通过实验来说明:

Oracle RAC环境:12.1.0.2.0 标准Cluster for Oracle Linux 5.9 x64。

一.网络配置。

>节点1:
[root@rhel1 ~]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:50:56:A8:16:15                       <<<< eth0管理网络。
          inet addr:172.168.4.20  Bcast:172.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13701 errors:0 dropped:522 overruns:0 frame:0
          TX packets:3852 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1122408 (1.0 MiB)  TX bytes:468021 (457.0 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:A8:25:6B                       <<<< eth1公共网络。
          inet addr:10.168.4.20  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23074 errors:0 dropped:520 overruns:0 frame:0
          TX packets:7779 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15974971 (15.2 MiB)  TX bytes:2980403 (2.8 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:50:56:A8:25:6B  
          inet addr:10.168.4.22  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 00:50:56:A8:25:6B  
          inet addr:10.168.4.24  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:50:56:A8:21:0A                       <<<< eth2心跳网络,属于Clusterware HAIP其中之一。
          inet addr:10.0.1.20  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11322 errors:0 dropped:500 overruns:0 frame:0
          TX packets:10279 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:6765147 (6.4 MiB)  TX bytes:5384321 (5.1 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:50:56:A8:21:0A   
          inet addr:169.254.10.239  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth3      Link encap:Ethernet  HWaddr 00:50:56:A8:F7:F7                       <<<< eth3心跳网络,属于Clusterware HAIP其中之一。
          inet addr:10.0.2.20  Bcast:10.0.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:347096 errors:0 dropped:500 overruns:0 frame:0
          TX packets:306170 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:210885992 (201.1 MiB)  TX bytes:173504069 (165.4 MiB)

eth3:1    Link encap:Ethernet  HWaddr 00:50:56:A8:F7:F7  
          inet addr:169.254.245.28  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth4      Link encap:Ethernet  HWaddr 00:50:56:A8:DC:CC                      <<<< eth4~eth9心跳网络,但不属于Clusterware HAIP。
          inet addr:10.0.3.20  Bcast:10.0.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7247 errors:0 dropped:478 overruns:0 frame:0
          TX packets:6048 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3525191 (3.3 MiB)  TX bytes:2754275 (2.6 MiB)

eth5      Link encap:Ethernet  HWaddr 00:50:56:A8:A1:86  
          inet addr:10.0.4.20  Bcast:10.0.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:40028 errors:0 dropped:480 overruns:0 frame:0
          TX packets:23700 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15139172 (14.4 MiB)  TX bytes:9318750 (8.8 MiB)

eth6      Link encap:Ethernet  HWaddr 00:50:56:A8:F7:53  
          inet addr:10.0.5.20  Bcast:10.0.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13324 errors:0 dropped:470 overruns:0 frame:0
          TX packets:128 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1075873 (1.0 MiB)  TX bytes:16151 (15.7 KiB)

eth7      Link encap:Ethernet  HWaddr 00:50:56:A8:E4:78  
          inet addr:10.0.6.20  Bcast:10.0.6.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13504 errors:0 dropped:457 overruns:0 frame:0
          TX packets:120 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1158553 (1.1 MiB)  TX bytes:14643 (14.2 KiB)

eth8      Link encap:Ethernet  HWaddr 00:50:56:A8:C0:B0  
          inet addr:10.0.7.20  Bcast:10.0.7.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13272 errors:0 dropped:442 overruns:0 frame:0
          TX packets:126 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1072609 (1.0 MiB)  TX bytes:15999 (15.6 KiB)

eth9      Link encap:Ethernet  HWaddr 00:50:56:A8:5E:F6  
          inet addr:10.0.8.20  Bcast:10.0.8.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14316 errors:0 dropped:431 overruns:0 frame:0
          TX packets:127 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1169023 (1.1 MiB)  TX bytes:15293 (14.9 KiB)

节点2:
[root@rhel2 ~]# ifconfig -a                                                       <<<< 网络配置和节点1一致。
eth0      Link encap:Ethernet  HWaddr 00:50:56:A8:C2:66  
          inet addr:172.168.4.21  Bcast:172.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:19156 errors:0 dropped:530 overruns:0 frame:0
          TX packets:278 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4628107 (4.4 MiB)  TX bytes:37558 (36.6 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:A8:18:1A  
          inet addr:10.168.4.21  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:21732 errors:0 dropped:531 overruns:0 frame:0
          TX packets:7918 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4110335 (3.9 MiB)  TX bytes:14783715 (14.0 MiB)

eth1:2    Link encap:Ethernet  HWaddr 00:50:56:A8:18:1A  
          inet addr:10.168.4.23  Bcast:10.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:50:56:A8:1B:DD  
          inet addr:10.0.1.21  Bcast:10.0.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:410244 errors:0 dropped:524 overruns:0 frame:0
          TX packets:433865 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:206461212 (196.8 MiB)  TX bytes:283858870 (270.7 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:50:56:A8:1B:DD  
          inet addr:169.254.89.158  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth3      Link encap:Ethernet  HWaddr 00:50:56:A8:2B:68  
          inet addr:10.0.2.21  Bcast:10.0.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:323060 errors:0 dropped:512 overruns:0 frame:0
          TX packets:337911 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:176652414 (168.4 MiB)  TX bytes:212347379 (202.5 MiB)

eth3:1    Link encap:Ethernet  HWaddr 00:50:56:A8:2B:68  
          inet addr:169.254.151.103  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth4      Link encap:Ethernet  HWaddr 00:50:56:A8:81:DB  
          inet addr:10.0.3.21  Bcast:10.0.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:37308 errors:0 dropped:507 overruns:0 frame:0
          TX packets:27565 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:10836885 (10.3 MiB)  TX bytes:14973305 (14.2 MiB)

eth5      Link encap:Ethernet  HWaddr 00:50:56:A8:43:EA  
          inet addr:10.0.4.21  Bcast:10.0.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:38506 errors:0 dropped:496 overruns:0 frame:0
          TX packets:27985 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:10940661 (10.4 MiB)  TX bytes:14859794 (14.1 MiB)

eth6      Link encap:Ethernet  HWaddr 00:50:56:A8:84:76  
          inet addr:10.0.5.21  Bcast:10.0.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13653 errors:0 dropped:484 overruns:0 frame:0
          TX packets:114 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1102617 (1.0 MiB)  TX bytes:14161 (13.8 KiB)

eth7      Link encap:Ethernet  HWaddr 00:50:56:A8:B6:4F  
          inet addr:10.0.6.21  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13633 errors:0 dropped:474 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1101251 (1.0 MiB)  TX bytes:14343 (14.0 KiB)

eth8      Link encap:Ethernet  HWaddr 00:50:56:A8:97:62  
          inet addr:10.0.7.21  Bcast:10.0.7.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13633 errors:0 dropped:459 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1102065 (1.0 MiB)  TX bytes:14343 (14.0 KiB)

eth9      Link encap:Ethernet  HWaddr 00:50:56:A8:28:10  
          inet addr:10.0.8.21  Bcast:10.0.8.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13764 errors:0 dropped:446 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1159479 (1.1 MiB)  TX bytes:14687 (14.3 KiB)


二.集群当前的心跳网络配置。

[grid@rhel1 ~]$ oifcfg getif
eth1  10.168.4.0  global  public
eth2  10.0.1.0  global  cluster_interconnect
eth3  10.0.2.0  global  cluster_interconnect


三.cluster_interconnects参数调整前。

SQL> show parameter cluster_interconnect

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_interconnects                string

cluster_interconnects默认为空。

SQL> select * from v$cluster_interconnects;

NAME            IP_ADDRESS       IS_ SOURCE                              CON_ID
--------------- ---------------- --- ------------------------------- ----------
eth2:1          169.254.10.239   NO                                           0
eth3:1          169.254.245.28   NO                                           0

V$CLUSTER_INTERCONNECTS displays one or more interconnects that are being used for cluster communication.

    查询v$cluster_interconnects发现,当前RAC环境使用的是HAIP,请注意:这里显示的是HAIP地址,并不是系统配置的地址,这和之后的显示是有区别的。


四.调整cluster_interconnects参数。

    调整cluster_interconnects参数,为了尽可能大的提高心跳带宽,我们为每台机器配置了9个心跳网络:
SQL> alter system set cluster_interconnects="10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20:10.0.5.20:10.0.6.20:10.0.7.20:10.0.8.20:10.0.9.20" scope=spfile sid='orcl1';    <<<< 注意IP之间用冒号隔开,双引号引起来;设置cluster_interconnects参数将覆盖掉通过oifcfg getif命令查看到的clusterware心跳网络,该网络也是RAC心跳通信的默认网络。

System altered.

SQL> alter system set cluster_interconnects="10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21:10.0.5.21:10.0.6.21:10.0.7.21:10.0.8.21:10.0.9.21" scope=spfile sid='orcl2';

System altered.

重启数据库实例收到如下报错:
Advanced Analytics and Real Application Testing options
[oracle@rhel1 ~]$ srvctl stop database -d orcl
[oracle@rhel1 ~]$ srvctl start database -d orcl
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-5017: The resource action "ora.orcl.db start" encountered the following error: 
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:ip_list failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: Too many IPs specified to SKGXP.  Max supported is 4, given 9.
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel2/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.orcl.db' on 'rhel2' failed
CRS-5017: The resource action "ora.orcl.db start" encountered the following error: 
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:ip_list failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: Too many IPs specified to SKGXP.  Max supported is 4, given 9.
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel1/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.orcl.db' on 'rhel1' failed
CRS-2632: There are no more servers to try to place resource 'ora.orcl.db' on that would satisfy its placement policy

看来即使是使用cluster_interconnects网络地址也不能超过4个,这个跟HAIP一致。

于是,去掉后面的5个IP,保留前4个IP用于心跳网络:
节点1:10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20
节点2:10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21


五.测试cluster_interconnects参数容错的能力。

下面我们来测试一下cluster_interconnects的容错能力:

SQL> set linesize 200
SQL> select * from v$cluster_interconnects;

NAME            IP_ADDRESS       IS_ SOURCE                              CON_ID
--------------- ---------------- --- ------------------------------- ----------
eth2            10.0.1.20        NO  cluster_interconnects parameter          0
eth3            10.0.2.20        NO  cluster_interconnects parameter          0
eth4            10.0.3.20        NO  cluster_interconnects parameter          0
eth5            10.0.4.20        NO  cluster_interconnects parameter          0

重启实例之后发现当前RAC使用之前指定的4个IP用于心跳网络。

RAC双节点实例都正常运行:
[oracle@rhel1 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node rhel1
Instance orcl2 is running on node rhel2

手动down掉节点1的其中一个心跳网卡:
[root@rhel1 ~]# ifdown eth4                     <<<<  该网卡不是HAIP其中的IP网口。

[oracle@rhel1 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node rhel1
Instance orcl2 is running on node rhel2
通过srvctl工具显示实例依然是运行状态。

用sqlplus本地登陆:
[oracle@rhel1 ~]$ sql

SQL*Plus: Release 12.1.0.2.0 Production on Tue Oct 20 18:11:35 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected.
SQL>    
这个状态显然不对了。

检查告警日志,收到如下报错:
2015-10-20 18:10:22.996000 +08:00
SKGXP: ospid 32107: network interface query failed for IP address 10.0.3.20.
SKGXP: [error 32607] 
2015-10-20 18:10:31.600000 +08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_qm03_453.trc  (incident=29265) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27501: IPC error creating a port
ORA-27300: OS system dependent operation:bind failed with status: 99
ORA-27301: OS failure message: Cannot assign requested address
ORA-27302: failure occurred at: sskgxpsock
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29265/orcl1_qm03_453_i29265.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_cjq0_561.trc  (incident=29297) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27544: Failed to map memory region for export
ORA-27300: OS system dependent operation:bind failed with status: 99
ORA-27301: OS failure message: Cannot assign requested address
ORA-27302: failure occurred at: sskgxpsock
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29297/orcl1_cjq0_561_i29297.trc
2015-10-20 18:10:34.724000 +08:00
Dumping diagnostic data in directory=[cdmp_20151020181034], requested by (instance=1, osid=561 (CJQ0)), summary=[incident=29297].
2015-10-20 18:10:35.819000 +08:00
Dumping diagnostic data in directory=[cdmp_20151020181035], requested by (instance=1, osid=453 (QM03)), summary=[incident=29265].

从日志来看,实例并没有down掉,HANG在那里了,查看另一个节点的数据库实例日志,发现RAC的其他实例并没有报错,不受影响。

手动恢复网卡:
[root@rhel1 ~]# ifup eth4

随即实例恢复正常,整个过程实例并没有down掉。

那HAIP对应的网口down掉会不会影响实例呢?于是将eth2 down掉:
[root@rhel1 ~]# ifdown eth2

从测试来看,实例依然hang住,跟down掉非HAIP网口的情况一致,网口恢复后实例即恢复正常。

    总结:从测试来看,不管指定的是HAIP网口,还是非HAIP网口,设置cluster_interconnects参数都将使心跳网络不具备容错能力,任何一个指定的网口出现问题,都将使实例HANG住,直到网口恢复正常,实例才能恢复正常,同时cluster_interconnects参数也只支持到4个IP地址。
虽然在RAC环境多数据库的情况下,通过设置数据库实例的cluster_interconnects初始化参数可以覆盖默认的clusterware心跳网络,多个数据库实例的心跳通信相互隔离,但指定的任何网卡出现故障都会引起实例HANG住,高可用性没有得到保障。


相关文章:
   《Oracle CLUSTER_INTERCONNECTS参数详解》:http://blog.itpub.net/23135684/viewspace-714734/

--end--

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23135684/viewspace-1815252/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
Oracle数据库管理员,Oracle数据库系统构架员;2012年7月出版《构建最高可用Oracle数据库系统:Oracle 11gR2 RAC管理、维护与性能优化》一书;Oracle 10g OCM。

注册时间:2010-01-05

  • 博文量
    483
  • 访问量
    5204777