ITPub博客

首页 > 数据库 > Oracle > Oracle RAC 故障处理(一)(NetworkManager导致集群故障)

Oracle RAC 故障处理(一)(NetworkManager导致集群故障)

原创 Oracle 作者:减数分裂 时间:2019-06-09 10:30:03 0 删除 编辑

Oracle RAC 故障处理(一)(NetworkManager服务导致集群故障)


环境:测试

DB:Oracle 11.2.0.4.0

OS:Oracle Linux Server release 6.3 on Oracle VM VirtualBox

node:rac1,rac2

instance:cjcdb1,cjcdb2


问题:数据库服务器rac1和rac2恢复快照后,集群无法正常使用

问题一:rac1和rac2节点crs,css,Event Manager等无法启动

问题原因:NetworkManager服务导致集群故障


解决方案:

---1 检查集群状态

[root@rac1 bin]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@rac1 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4534: Cannot communicate with Event Manager

[root@rac2 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager


---2 查看crsd.log和ocssd.log

[root@rac2 crsd]# vi  crsd.log 

......

2019-05-19 11:13:09.239: [    CRSD][3258771232] Logging level for Module: OCRASM  1

2019-05-19 11:13:09.239: [ CRSMAIN][3258771232] Checking the OCR device

2019-05-19 11:13:09.240: [ CRSMAIN][3258771232] Sync-up with OCR

2019-05-19 11:13:09.240: [ CRSMAIN][3258771232] Connecting to the CSS Daemon

2019-05-19 11:13:09.337: [ CSSCLNT][3252320000]  clssnsquerymode: not connected to CSSD

2019-05-19 11:13:09.342: [  CRSRTI][3258771232] CSS is not ready. Received status 3

2019-05-19 11:13:09.606: [    CRSD][3258771232] Created alert : (:CRSD00109:) :   Could not init the CSS context  , error: 3

2019-05-19 11:13:09.606: [    CRSD][3258771232][PANIC] CRSD exiting: Could not init the CSS context, error: 3

2019-05-19 11:13:09.606: [    CRSD][3258771232] Done.


[root@rac2 cssd]# pwd

/u01/app/11.2.0/grid/log/rac2/cssd

[root@rac2 cssd]# vi  ocssd.log   

2019-06-07 16:19:21.572: [    CSSD][1105811200]clssnmvDHBValidateNcopy:  node 1, rac1, has a disk HB, but no network HB  , DHB has rcfg 455390045, wrtcnt, 32113, LATS 8854744, lastSeqNo 32095, uniqueness 1559895214, timestamp 1559895560/8839004

2019-06-07 16:19:21.588: [    CSSD][1101055744]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 455390045, wrtcnt, 32114, LATS 8854754, lastSeqNo 32111, uniqueness 1559895214, timestamp 1559895561/8839384

2019-06-07 16:19:22.223: [    CSSD][1096300288]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2019-06-07 16:19:22.637: [    CSSD][1105811200]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 455390045, wrtcnt, 32116, LATS 8855804, lastSeqNo 32113, uniqueness 1559895214, timestamp 1559895561/8840004

2019-06-07 16:19:22.684: [    CSSD][1101055744]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 455390045, wrtcnt, 32117, LATS 8855854, lastSeqNo 32114, uniqueness 1559895214, timestamp 1559895562/8840384

2019-06-07 16:19:22.842: [    CSSD][1119430400]clssgmDeadProc: proc 0x7f163805e180

2019-06-07 16:19:22.842: [    CSSD][1119430400]clssgmDestroyProc: cleaning up proc(0x7f163805e180) con(0x4142) skgpid  ospid 455 with 0 clients, refcount 0

2019-06-07 16:19:22.842: [    CSSD][1119430400]clssgmDiscEndpcl: gipcDestroy 0x4142

2019-06-07 16:19:22.845: [    CSSD][1119430400]clssscSelect: cookie accept request 0x9a6030

2019-06-07 16:19:22.845: [    CSSD][1119430400]clssgmAllocProc: (0x7f1638062700) allocated

2019-06-07 16:19:22.846: [    CSSD][1119430400]clssgmClientConnectMsg: properties of cmProc 0x7f1638062700 - 1,2,3,4,5

2019-06-07 16:19:22.846: [    CSSD][1119430400]clssgmClientConnectMsg: Connect from con(0x422d) proc(0x7f1638062700) pid(455) version 11:2:1:4, properties: 1,2,3,4,5

2019-06-07 16:19:22.846: [    CSSD][1119430400]clssgmClientConnectMsg: msg flags 0x0000

2019-06-07 16:19:22.847: [    CSSD][1119430400]clssscSelect: cookie accept request 0x7f1638062700

2019-06-07 16:19:22.847: [    CSSD][1119430400]clssscevtypSHRCON: getting client with cmproc 0x7f1638062700

2019-06-07 16:19:22.847: [    CSSD][1119430400]clssgmRegisterClient: proc(4/0x7f1638062700), client(1/0x7f163803c2e0)

2019-06-07 16:19:22.847: [    CSSD][1119430400]clssgmJoinGrock: global grock CRF- new client 0x7f163803c2e0 with con 0x7f160000425c, requested num -1, flags 0x4000e00

2019-06-07 16:19:22.847: [    CSSD][1119430400]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group information has been received from the master; group name CRF-, member number -1, flags 0x4000e00

2019-06-07 16:19:22.847: [    CSSD][1119430400]clssgmDiscEndpcl: gipcDestroy 0x425c


---3 检查服务器网络

[root@rac1 bin]# ifconfig 

eth0      Link encap:Ethernet  HWaddr 08:00:27:AA:A0:6C  

          inet addr:192.168.31.101  Bcast:192.168.31.255  Mask:255.255.255.0

          inet6 addr: fe80::a00:27ff:feaa:a06c/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:1863183 errors:0 dropped:0 overruns:0 frame:0

          TX packets:3596007 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:188956406 (180.2 MiB)  TX bytes:5103093381 (4.7 GiB)

eth1      Link encap:Ethernet  HWaddr 08:00:27:1E:BC:F1  

          inet6 addr: fe80::a00:27ff:fe1e:bcf1/64 Scope:Link

          UP BROADCAST MULTICAST  MTU:1500  Metric:1

          RX packets:126062 errors:0 dropped:0 overruns:0 frame:0

          TX packets:121233 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:86334731 (82.3 MiB)  TX bytes:74749229 (71.2 MiB)

  

[root@rac2 bin]# ifconfig 

eth0      Link encap:Ethernet  HWaddr 08:00:27:35:26:A2  

          inet addr:192.168.31.102  Bcast:192.168.31.255  Mask:255.255.255.0

          inet6 addr: fe80::a00:27ff:fe35:26a2/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:3382490 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1631043 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:4909210189 (4.5 GiB)  TX bytes:111063079 (105.9 MiB)

eth1      Link encap:Ethernet  HWaddr 08:00:27:8B:AC:D5  

          inet6 addr: fe80::a00:27ff:fe8b:acd5/64 Scope:Link

          UP BROADCAST MULTICAST  MTU:1500  Metric:1

          RX packets:121562 errors:0 dropped:0 overruns:0 frame:0

          TX packets:126595 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:74932353 (71.4 MiB)  TX bytes:86660886 (82.6 MiB)

     

---4 尝试重启network

[root@rac1 network-scripts]# service network restart

Shutting down interface eth0:  Device state: 3 (disconnected)

                                                           [  OK  ]

Shutting down interface eth1:  Error: Device 'eth1' (/org/freedesktop/  NetworkManager  /Devices/1) disconnecting failed: This device is not active

                                                           [FAILED]

Shutting down loopback interface:                          [  OK  ]

Bringing up loopback interface:                            [  OK  ]

Bringing up interface eth0:  Active connection state: activated

Active connection path: /org/freedesktop/NetworkManager/ActiveConnection/2

                                                           [  OK  ]

Bringing up interface eth1:  Error: Connection activation failed: Device not managed by NetworkManager or unavailable

                                                           [FAILED]

[root@rac1 network-scripts]# ifup eth1

Error: Connection activation failed: Device not managed by NetworkManager or unavailable

郑州无痛人流多少钱:https://yyk.familydoctor.com.cn/21521/

---5 检查NetworkManager服务

[root@rac2 bin]# service NetworkManager status

NetworkManager (pid  1873) is running...

[root@rac2 bin]# 

[root@rac2 bin]# service NetworkManager stop

Stopping NetworkManager daemon:                            [FAILED]

[root@rac2 bin]# chkconfig list|grep NetworkManager 

[root@rac2 bin]# chkconfig --list|grep NetworkManager 

NetworkManager  0:off  1:off  2:on  3:on  4:on  5:on  6:off


---6 关闭NetworkManager服务

[root@rac1 ~]# chkconfig NetworkManager off

[root@rac1 ~]# chkconfig network on

[root@rac1 ~]# service NetworkManager stop

[root@rac1 ~]# service network start

[root@rac2 ~]# chkconfig NetworkManager off

[root@rac2 ~]# chkconfig network on

[root@rac2 ~]# service NetworkManager stop

[root@rac2 ~]# service network start 


---7 服务正常启动   

[root@rac1 ~]# cd /u01/app/11.2.0/grid/bin/

[root@rac1 bin]# ./crsctl stop crs -f

[root@rac1 bin]# ./crsctl start crs

[root@rac1 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online


[root@rac2 bin]# ./crsctl stop crs -f

[root@rac2 bin]# ./crsctl start crs

[root@rac2 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/69923456/viewspace-2647100/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2019-06-01

  • 博文量
    4
  • 访问量
    1649