ITPub博客

首页 > 数据库 > Oracle > 记一次故障处理:重建CRS

记一次故障处理:重建CRS

原创 Oracle 作者:bluesshadow 时间:2012-03-19 13:22:59 0 删除 编辑
故障处理:重建crs

接到客户电话说oracle rac数据启不来,版本是10.2.0.5,启动crs主机就自动重启,连到数据库查看:

查看ocr和voting disk居然都找不到存储盘,发现crs居然用的本地文件,询问用户故障起因和经过,原因是有人修改了SAN交换机配置,导致存储映射过来的盘路径乱了,现在SAN交换机配置已经恢复了,存储盘也重新扫描到了,总觉得应该对了,但没想到CRS配置却不对,询问是否有人直接修改了CRS配置,客户说不清楚,严重怀疑有人动过ocr配置文件,好在数据盘没人动过,ASM还能手动起来,数据库就能单机启动,于是赶紧做备份。
# su - oracle
$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :        128
         Available space (kbytes) :     261992
         ID                       :  779870474
         Device/File Name         : /oracle/product/10.2.0/db_1/cdata/localhost/local.ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

$ crsctl query css votedisk
located 0 votedisk(s).
#cat /etc/oracle/ocr.loc
/oracle/product/10.2.0/db_1/cdata/localhost/local.ocr
ocrconfig_loc=/oracle/product/10.2.0/db_1/cdata/localhost/local.ocr
local_only=true


# lspv|grep power
hdiskpower0     none                                None            
hdiskpower1     none                                None            
hdiskpower2     none                                None            
hdiskpower3     none                                None            
hdiskpower4     00c1bae6f4782704                    arch1       active
hdiskpower5     00c1bae6f47fe91e                    arch1       active
hdiskpower6     00c1bae6f4836a47                    arch1       active
hdiskpower7     00c1bae6f4849731                    arch1       active
hdiskpower8     00c1bb46f475cc92                    None            
hdiskpower9     00c1bb46f4770ffe                    None            
hdiskpower10    00c1bb46f47e0c6e                    None            
hdiskpower11    00c1bb46f479094c                    None            
hdiskpower12    none                                None            
hdiskpower13    none                                None            
hdiskpower14    none                                None            
           
  

将配置文件里存储盘路径修改回来,用ocrcheck倒是能看到,但CRS跟先前一样,启动主机就重启,没法,只有恢复或者重建CRS了。
ocrconfig_loc=/dev/rhdiskpower0
local_only=FALSE

$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     204580
         Used space (kbytes)      :       4652
         Available space (kbytes) :     199928
         ID                       :  507532603
         Device/File Name         : /dev/rhdiskpower0
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

$ crsctl query css votedisk
 0.     0    /dev/rhdiskpower1
located 1 votedisk(s).

# ./crsctl add css votedisk /dev/rhdiskpower1 -force
votedisk named /dev/rhdiskpower1 already configured as /dev/rhdiskpower1.
# ./crsctl query css votedisk
 0.     0    /dev/rhdiskpower1

located 1 votedisk(s).


如果有备份,使用备份进行恢复,如果没有备份,就需要对ocr和voting disk进行重建,很遗憾,客户那没备份只能重建
重建过程大致如下:
备份两个节点上的crs_hoem目录

删除CRS配置:
# $CRS_HOME/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script. for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
Cleaning up Network socket directories

# $CRS_HOME/install/rootdeinstall.sh

Removing contents from OCR device
2560+0 records in.
2560+0 records out.

查看是否还有crs进程运行,如有就KILL掉
# ps -e|grep -i 'ocs[s]d'
# ps -e|grep -i 'cr[s]d.bin'
# ps -e|grep -i 'ev[m]d.bin'

如果不放心,怕重建时先前有信息留在CRS盘里,可用dd命令清空ocr和voting disk盘
dd if=/dev/zero f=/dev/rhdiskpower0 bs=1024k count=200
dd if=/dev/zero f=/dev/rhdiskpower0 bs=1024k count=200


重新运行脚本来建立CRS,两个节点都要运行:

# /oracle/product/10.2.0/crs/root.sh
WARNING: directory '/oracle/product/10.2.0' is not owned by root
WARNING: directory '/oracle/product' is not owned by root
WARNING: directory '/oracle' is not owned by root
No value set for the CRS parameter CRS_OCR_LOCATIONS. Using Values in paramfile.crs
Checking to see if Oracle CRS stack is already configured
Copying opriproc to /etc/oracle/bin for AIX

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/oracle/product/10.2.0' is not owned by root
WARNING: directory '/oracle/product' is not owned by root
WARNING: directory '/oracle' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: oradb01 oradb01-priv oradb01
node 2: oradb02 oradb02-priv oradb02
Creating OCR keys for user 'root', privgrp 'system'..
Operation successful.
Now formatting voting device: /dev/rhdiskpower1
Format of 1 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        oradb01
CSS is inactive on these nodes.
        oradb02
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.

在第二个节点上运行:

# /oracle/product/10.2.0/crs/root.sh
WARNING: directory '/oracle/product/10.2.0' is not owned by root
WARNING: directory '/oracle/product' is not owned by root
WARNING: directory '/oracle' is not owned by root
No value set for the CRS parameter CRS_OCR_LOCATIONS. Using Values in paramfile.crs
Checking to see if Oracle CRS stack is already configured
Copying opriproc to /etc/oracle/bin for AIX

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/oracle/product/10.2.0' is not owned by root
WARNING: directory '/oracle/product' is not owned by root
WARNING: directory '/oracle' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: oradb01 oradb01-priv oradb01
node 2: oradb02 oradb02-priv oradb02
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        oradb01
        oradb02
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
Invalid interface "255.255.255.0/en0" entered in an input argument.

提示以root用户运行vipca,运行完了就OK

查看一下CRS当前状态:
$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02 




通过netca重新建立监听


$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....01.lsnr application    ONLINE    ONLINE    oradb01    
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02    

将ASM实例注册到CRS
$ srvctl add asm -n oradb01 -i +ASM1 -o /oracle/product/10.2.0/db_1
$ srvctl add asm -n oradb02 -i +ASM2 -o /oracle/product/10.2.0/db_1

$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE              
ora....01.lsnr application    ONLINE    ONLINE    oradb01    
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....SM2.asm application    OFFLINE   OFFLINE              
ora....02.lsnr application    ONLINE    ONLINE    oradb02    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02  

启动ASM
$ srvctl start asm -n oradb01
$ srvctl start asm -n oradb02

$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    oradb01    
ora....01.lsnr application    ONLINE    ONLINE    oradb01    
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....SM2.asm application    ONLINE    ONLINE    oradb02    
ora....02.lsnr application    ONLINE    ONLINE    oradb02    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02

将数据库注册到CRS
$ srvctl add database -d oradb -o /oracle/product/10.2.0/db_1
$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.oradb.db   application    OFFLINE   OFFLINE              
ora....SM1.asm application    ONLINE    ONLINE    oradb01    
ora....01.lsnr application    ONLINE    ONLINE    oradb01    
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....SM2.asm application    ONLINE    ONLINE    oradb02    
ora....02.lsnr application    ONLINE    ONLINE    oradb02    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02 

将实例注册到CRS  
$ srvctl add instance -d oradb -i oradb1 -n oradb01
$ srvctl add instance -d oradb -i oradb2 -n oradb02
$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.oradb.db   application    OFFLINE   OFFLINE              
ora....b1.inst application    OFFLINE   OFFLINE              
ora....b2.inst application    OFFLINE   OFFLINE              
ora....SM1.asm application    ONLINE    ONLINE    oradb01    
ora....01.lsnr application    ONLINE    ONLINE    oradb01    
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....SM2.asm application    ONLINE    ONLINE    oradb02    
ora....02.lsnr application    ONLINE    ONLINE    oradb02    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02

将数据库实例和ASM实例关联:
$ srvctl modify instance -d oradb -i oradb1 -s +ASM1
$ srvctl modify instance -d oradb -i oradb2 -s +ASM2

启动数据库
$ srvctl start database -d oradb
启动实例
$ srvctl start instance -d oradb -i oradb01,oradb02

查看crs状态:
$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.oradb.db   application    ONLINE    ONLINE    oradb02    
ora....b1.inst application    ONLINE    ONLINE    oradb01    
ora....b2.inst application    ONLINE    ONLINE    oradb02    
ora....SM1.asm application    ONLINE    ONLINE    oradb01    
ora....01.lsnr application    ONLINE    ONLINE    oradb01    
ora....b01.gsd application    ONLINE    ONLINE    oradb01    
ora....b01.ons application    ONLINE    ONLINE    oradb01    
ora....b01.vip application    ONLINE    ONLINE    oradb01    
ora....SM2.asm application    ONLINE    ONLINE    oradb02    
ora....02.lsnr application    ONLINE    ONLINE    oradb02    
ora....b02.gsd application    ONLINE    ONLINE    oradb02    
ora....b02.ons application    ONLINE    ONLINE    oradb02    
ora....b02.vip application    ONLINE    ONLINE    oradb02  


SQL> select status from gv$instance;

STATUS
------------
OPEN
OPEN

至此,重建crs顺利完成。



 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24668589/viewspace-719001/,如需转载,请注明出处,否则将追究法律责任。

下一篇: Oracle 10g重建EM
请登录后发表评论 登录
全部评论

注册时间:2010-09-27

  • 博文量
    34
  • 访问量
    139339