ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 【转】11.2.0.3 RAC 恢复OCR和Votedisk

【转】11.2.0.3 RAC 恢复OCR和Votedisk

原创 Linux操作系统 作者:听海★蓝心梦 时间:2013-10-08 16:03:36 0 删除 编辑
11.2.0.3 RAC 恢复OCR和Votedisk


其实之前写过一篇《针对11.2 RAC丢失OCR和Votedisk所在ASM Diskgroup的恢复手段 (转)》但那篇文档完全是转的maclean的,但直到最近才有机会测试下。

参考文档:http://f.dataguru.cn/thread-155455-1-2.html
                    http://www.askmaclean.com/archives/11-2-lost-ocr-votedisk-group-recovery.html


ASM磁盘组冗余的三种类型:external、normal、high,dataguru上这篇恢复的是external,maclean恢复的是high,正好我这里恢复的是normal。呵呵,齐了。


模拟OCR磁盘或votedisk不可用时,RAC会出现什么现象?给出故障定位的整个过程。
在11.2.0.3中表决盘是放到了ocr中,所以OCR磁盘或votedisk不可用的两个实验一起做。

在11.2.0.3中可以手动备份OCR,但手动备份是无效的。
[root@rac1 ~]# ocrconfig -export /u01/ocr.exp

检查OCR有哪些备份:
[root@rac1 ~]# ocrconfig -showbackup
rac1     2013/07/22 05:39:51     /u01/grid/crs/cdata/rac/backup00.ocr
rac1     2013/07/22 01:39:51     /u01/grid/crs/cdata/rac/backup01.ocr
rac1     2013/07/21 21:39:50     /u01/grid/crs/cdata/rac/backup02.ocr
rac2     2013/07/21 01:52:54     /u01/grid/crs/cdata/rac/day.ocr
rac2     2013/07/09 01:52:25     /u01/grid/crs/cdata/rac/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
orcle明确给出了手动备份是无效的!


查看表决盘信息:
[root@rac1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   745716af7e5b4faebfc8d948d096aa55 (/dev/oracleasm/disks/OCR_VOT1) [OCR_VOT]
 2. ONLINE   7092079f66c04f9dbf65974d0dcc611a (/dev/oracleasm/disks/OCR_VOT2) [OCR_VOT]
 3. ONLINE   6510631353284f5fbf3d4c8839822dbd (/dev/oracleasm/disks/OCR_VOT3) [OCR_VOT]
Located 3 voting disk(s).

停库:[root@rac1 ~]# srvctl stop database -d orcl -o immediate
停集群:[root@rac1 ~]# crsctl stop cluster -all -f

破坏OCR和VOT:
[root@rac1 ~]# dd if=/dev/zero f=/dev/mapper/mpathap1 bs=1024K count=1
记录了1+0 的读入
记录了1+0 的写出
1048576字节(1.0 MB)已复制,0.0160613 秒,65.3 MB/秒
[root@rac1 ~]# dd if=/dev/zero f=/dev/mapper/mpathap2 bs=1024K count=1
记录了1+0 的读入
记录了1+0 的写出
1048576字节(1.0 MB)已复制,0.00800275 秒,131 MB/秒
[root@rac1 ~]# dd if=/dev/zero f=/dev/mapper/mpathap3 bs=1024K count=1
记录了1+0 的读入
记录了1+0 的写出
1048576字节(1.0 MB)已复制,0.00927389 秒,113 MB/秒

破坏后,各节点服务一切正常:
[root@rac1 ~]# crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac1    
ora.FRA.dg     ora....up.type ONLINE    ONLINE    rac1    
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1    
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac1    
ora.OCR_VOT.dg ora....up.type ONLINE    ONLINE    rac1    
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1    
ora.orcl.db   ora....se.type  ONLINE    ONLINE    rac1          
ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac1    
ora....SM1.asm application    ONLINE    ONLINE    rac1    
ora....C1.lsnr application    ONLINE    ONLINE    rac1    
ora....ac1.gsd application    OFFLINE   OFFLINE              
ora....ac1.ons application    ONLINE    ONLINE    rac1    
ora....ac1.vip ora....t1.type ONLINE    ONLINE    rac1    
ora....SM2.asm application    ONLINE    ONLINE    rac2    
ora....C2.lsnr application    ONLINE    ONLINE    rac2    
ora....ac2.gsd application    OFFLINE   OFFLINE              
ora....ac2.ons application    ONLINE    ONLINE    rac2    
ora....ac2.vip ora....t1.type ONLINE    ONLINE    rac2    
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              
ora....network ora....rk.type ONLINE    ONLINE    rac1    
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac1    
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1    
ora....ry.acfs ora....fs.type ONLINE    ONLINE    rac1    
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac1

所有节点重启操作系统后集群服务启不来了:
[root@rac1 ~]# reboot
如果只是停止集群服务,后面的重新创建ASM磁盘组会失败,但重启操作系统后,就可以创建成功。

检查CRS:
[grid@rac1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

启动集群服务:
[root@rac1 ~]# crsctl start cluster -all 
CRS-2672: 尝试启动 'ora.cssdmonitor' (在 'rac1' 上)
CRS-2672: 尝试启动 'ora.cssdmonitor' (在 'rac2' 上)
CRS-2676: 成功启动 'ora.cssdmonitor' (在 'rac1' 上)
CRS-2676: 成功启动 'ora.cssdmonitor' (在 'rac2' 上)
CRS-2672: 尝试启动 'ora.cssd' (在 'rac1' 上)
CRS-2672: 尝试启动 'ora.diskmon' (在 'rac1' 上)
CRS-2672: 尝试启动 'ora.cssd' (在 'rac2' 上)
CRS-2672: 尝试启动 'ora.diskmon' (在 'rac2' 上)
CRS-2676: 成功启动 'ora.diskmon' (在 'rac1' 上)
CRS-2676: 成功启动 'ora.diskmon' (在 'rac2' 上)    --一直停在这里

其他终端使用其他命令启动集群服务:
[root@rac1 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

操作系统及crs日志中没看到特别有用的信息:
[root@rac1 ~]# vi /var/log/messages
[grid@rac1 ~]# vi $ORACLE_HOME/log/rac1/crsd/crsd.log

ocss日志中提示:
vi $ORACLE_HOME/log/rac1/cssd/ocssd.log
2013-07-21 21:15:08.550: [    CSSD][1095031104]clssnmvFindInitialConfigs: No voting files found

发现部分ASM磁盘没有了:
[root@rac1 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]
[root@rac1 ~]# /etc/init.d/oracleasm listdisks
DATA
FRA

依照RAC安装文档重建ASM磁盘:
[root@rac1 ~]# /etc/init.d/oracleasm createdisk OCR_VOT1 /dev/mapper/mpathap1
Marking disk "OCR_VOT1" as an ASM disk:                    [  OK  ]
[root@rac1 ~]# /etc/init.d/oracleasm createdisk OCR_VOT2 /dev/mapper/mpathap2
Marking disk "OCR_VOT2" as an ASM disk:                    [  OK  ]
[root@rac1 ~]# /etc/init.d/oracleasm createdisk OCR_VOT3 /dev/mapper/mpathap3
Marking disk "OCR_VOT3" as an ASM disk:                    [  OK  ]

停掉集群服务:
要加-f,否则可能停止非常慢
[root@rac1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.


以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS
[root@rac1 ~]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded


此时crs仍然报错:
[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@rac1 ~]# crsctl check crs             
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager


重建原ocr和votedisk所在磁盘组:
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ sqlplus  / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jul 22 21:37:13 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> col path for a50
SQL> set lines 300
SQL> select path,header_status from v$asm_disk;
SQL> create diskgroup OCR_VOT normal redundancy disk '/dev/oracleasm/disks/OCR_VOT1','/dev/oracleasm/disks/OCR_VOT2','/dev/oracleasm/disks/OCR_VOT3' attribute 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
ASM磁盘组冗余的三种类型:external、normal、high,我这里之前用的是normal。

从ocr backup中恢复OCR:
在每个节点grid用户下:
cd $ORACLE_HOME/cdata/rac
找到最近的OCR备份
[root@rac1 ~]# ocrconfig -restore /u01/grid/crs/cdata/rac/backup00.ocr


恢复表决盘的准备工作:
SQL> show parameter asm_diskstring
如果asm_diskstring没有值,表示ASM磁盘用的是默认ASM磁盘搜索路径。
修改成实际的ASM磁盘搜索路径:
SQL> alter system set asm_diskstring='/dev/oracleasm/disks/*';

恢复表决盘:
[root@rac1 ~]# crsctl replace votedisk  +OCR_VOT
Successful addition of voting disk 4ad2b9cc0a754fffbf1515281199a78f.
Successful addition of voting disk 9f8dc1c013df4f39bfd85c64051a0bc1.
Successful addition of voting disk a4aea7a1aa434fb3bff161f6ea8ce102.
Successfully replaced voting disk group with +OCR_VOT.
CRS-4266: Voting file(s) successfully replaced

ocr和vot恢复后,crs等服务就会自动起来了。
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

[root@rac1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   4ad2b9cc0a754fffbf1515281199a78f (/dev/oracleasm/disks/OCR_VOT1) [OCR_VOT]
 2. ONLINE   9f8dc1c013df4f39bfd85c64051a0bc1 (/dev/oracleasm/disks/OCR_VOT2) [OCR_VOT]
 3. ONLINE   a4aea7a1aa434fb3bff161f6ea8ce102 (/dev/oracleasm/disks/OCR_VOT3) [OCR_VOT]
Located 3 voting disk(s).

重启集群服务,检查是否已经恢复正常:
[root@rac1 ~]# crsctl stop crs
[root@rac1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

其他节点也可以启动了:
[root@rac2 ~]# crsctl start crs

等一会儿,检查服务已经起来了:
[root@rac1 ~]# crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac1    
ora.FRA.dg     ora....up.type ONLINE    ONLINE    rac1    
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1    
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac1    
ora.OCR_VOT.dg ora....up.type ONLINE    ONLINE    rac1    
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1    
ora.orcl.db   ora....se.type  ONLINE    ONLINE    rac1          
ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac1    
ora....SM1.asm application    ONLINE    ONLINE    rac1    
ora....C1.lsnr application    ONLINE    ONLINE    rac1    
ora....ac1.gsd application    OFFLINE   OFFLINE              
ora....ac1.ons application    ONLINE    ONLINE    rac1    
ora....ac1.vip ora....t1.type ONLINE    ONLINE    rac1    
ora....SM2.asm application    ONLINE    ONLINE    rac2    
ora....C2.lsnr application    ONLINE    ONLINE    rac2    
ora....ac2.gsd application    OFFLINE   OFFLINE              
ora....ac2.ons application    ONLINE    ONLINE    rac2    
ora....ac2.vip ora....t1.type ONLINE    ONLINE    rac2    
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              
ora....network ora....rk.type ONLINE    ONLINE    rac1    
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac1    
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1    
ora....ry.acfs ora....fs.type ONLINE    ONLINE    rac1    
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac1

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/751371/viewspace-773909/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2009-02-18

  • 博文量
    256
  • 访问量
    1197724