ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 连接ASM实例后出现ORA-1012错误

连接ASM实例后出现ORA-1012错误

原创 Linux操作系统 作者:yangtingkun 时间:2007-04-16 00:00:00 0 删除 编辑

ASM还真是让人不是很踏实,今天又碰到一个和ASM有关的问题。


今天RAC的测试数据库在导入数据时长时间没有响应。通过V$SESSION_WAIT检查发现,系统在等待归档:

SQL> SELECT SID, EVENT FROM V$SESSION_WAIT WHERE SID = 132;

SID EVENT
---------- ----------------------------------------------------------------
132 log file switch (archiving needed)

数据库采用ASM,由于是测试环境,没有给ASM太大的空间,而且为了方便,将归档日志也放在ASM中了。

产生问题的原因很明显了,由于归档日志不断产生,导致ASM空间用完,因此新的归档无法产生,导致数据库处于等待状态。

本来认为是个小问题,于是通过RMAN将所有的归档日志删除。奇怪的是,成功的删除所有的归档日志后,这个问题仍然没有解决。

SQL> SELECT SID, EVENT FROM V$SESSION_WAIT WHERE SID = 132;

SID EVENT
---------- ----------------------------------------------------------------
132 log file switch (archiving needed)

再次检查等待会话,发现仍然在等待归档,怀疑Oracle处于两次重试之间的间歇期,于是手工切换当前实例的日志去激活归档操作:

SQL> alter system switch logfile;

结果这个操作也一直处于等待中。

难道是ASM中的空间并没有释放?于是在当前节点(节点1racnode1)启动dbca,查看ASM的配置,结果居然出现下面的信息:

DBCA could not startup the ASM instance configured on this node. To proceed with ASM diskgroup management you need the ASM instance to be up and running. Do you want to recreate the ASM instance on the node?

奇怪RAC环境一直是在ASM上运行的,怎么会出现这个问题呢?难道ASM实例没有启动:

SQL> host
$ ps -ef|grep ASM
oracle 8205 1 0 12:31:18 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 9734 1 0 12:32:17 ? 0:01 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11007 1 0 12:33:11 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4157 1 0 Apr 13 ? 0:02 asm_lck0_+ASM1
oracle 4077 1 0 Apr 13 ? 0:02 asm_lmon_+ASM1
oracle 4087 1 0 Apr 13 ? 0:03 asm_dbw0_+ASM1
oracle 4079 1 0 Apr 13 ? 0:08 asm_lmd0_+ASM1
oracle 4124 1 0 Apr 13 ? 0:00 asm_rbal_+ASM1
oracle 11098 1 0 12:33:14 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 20253 1 0 11:59:39 ? 2:50 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4578 1 0 Apr 13 ? 0:00 asm_o000_+ASM1
oracle 11131 1 0 12:33:16 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 12954 1 0 12:14:52 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11164 1 0 12:33:17 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4073 1 0 Apr 13 ? 0:01 asm_diag_+ASM1
oracle 7028 1 0 12:10:57 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 9349 1 0 12:32:02 ? 0:09 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11067 1 0 12:33:13 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4085 1 0 Apr 13 ? 0:00 asm_mman_+ASM1
oracle 12754 1 0 Apr 13 ? 0:01 /data/oracle/product/10.2/database/bin/racgimon daemon ora.racnode1.ASM1.asm
oracle 10915 10806 0 15:58:37 pts/4 0:00 grep ASM
oracle 4081 1 0 Apr 13 ? 0:02 asm_lms0_+ASM1
oracle 10773 1 0 12:33:00 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4126 1 0 Apr 13 ? 0:24 asm_gmon_+ASM1
oracle 23205 1 0 15:01:11 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 16362 1 0 12:17:02 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 1075 1 0 12:26:38 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4101 1 0 Apr 13 ? 0:00 asm_lgwr_+ASM1
oracle 4122 1 0 Apr 13 ? 0:00 asm_smon_+ASM1
oracle 11195 1 0 12:33:18 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4120 1 0 Apr 13 ? 0:00 asm_ckpt_+ASM1
oracle 23017 1 0 12:21:22 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 22951 1 0 12:01:32 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4071 1 0 Apr 13 ? 0:00 asm_pmon_+ASM1
oracle 20259 1 2 11:59:40 ? 21:03 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4917 1 1 Apr 13 ? 14:01 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11162 1 0 12:33:17 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 19873 1 0 12:37:51 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11129 1 0 12:33:15 ? 0:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4075 1 0 Apr 13 ? 0:00 asm_psp0_+ASM1

显然ASM实例已经启动,那么怎么会出现这个问题呢,于是尝试直接连接到ASM实例上:

SQL> exit Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
断开

$ bash
bash-2.03$ export ORACLE_SID=+ASM1
bash-2.03$ sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 4 16 16:02:42 2007

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

已连接。
SQL> SELECT * FROM V$ASM_DISKGROUP;
SELECT * FROM V$ASM_DISKGROUP
*
1 行出现错误:
ORA-01012: not logged on


SQL>

看来是ASM出现了问题。

在节点2上尝试用dbca管理ASM,结果发现ASM尚有40G的剩余空间,这是归档删除后的剩余空间。

而且节点2上直接登陆ASM实例:

$ bash
bash-2.03$ export ORACLE_SID=+ASM2
bash-2.03$ sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 4 16 16:08:33 2007

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> SET PAGES 100 LINES 120
SQL> COL NAME FORMAT A30
SQL> SELECT GROUP_NUMBER, NAME, TOTAL_MB, FREE_MB FROM V$ASM_DISKGROUP;

GROUP_NUMBER NAME TOTAL_MB FREE_MB
------------ ------------------------------ ---------- ----------
1 DISK 110934 40797

SQL>

看来是节点1上的ASM出现了故障,导致Oracle的实例1无法正常工作。

尝试在节点2上执行日志切换:

SQL> EXIT Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
断开

bash-2.03$ exit
exit
$ sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 4 16 16:11:45 2007

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> SELECT INSTANCE_NAME FROM V$INSTANCE;

INSTANCE_NAME
----------------
testrac2

SQL> ALTER SYSTEM SWITCH LOGFILE;

系统已更改。

果然可以,看来问题完全定位到节点1ASM实例上了。怀疑是空间用完导致ASM实例出现了问题。

将节点1上的ASM实例重启就应该可以解决问题,但是我不想损失已经进行了一半的工作,于是尝试通过其他的方法解决无法归档的问题,待工作完成后再重启实例。

登陆节点1上的实例:

$ sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.3.0 - Production on 星期一 4 16 16:15:11 2007

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> SET PAGES 100 LINES 120
SQL> SELECT INSTANCE_NAME FROM V$INSTANCE;

INSTANCE_NAME
----------------
testrac1

SQL> SHOW PARAMETER LOG_ARCHIVE

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_config string
log_archive_dest string
log_archive_dest_1 string LOCATION=+DISK/testrac/
log_archive_dest_10 string
log_archive_dest_2 string
log_archive_dest_3 string
log_archive_dest_4 string
log_archive_dest_5 string
log_archive_dest_6 string
log_archive_dest_7 string
log_archive_dest_8 string
log_archive_dest_9 string
log_archive_dest_state_1 string enable
log_archive_dest_state_10 string enable
log_archive_dest_state_2 string enable
log_archive_dest_state_3 string enable
log_archive_dest_state_4 string enable
log_archive_dest_state_5 string enable
log_archive_dest_state_6 string enable
log_archive_dest_state_7 string enable
log_archive_dest_state_8 string enable
log_archive_dest_state_9 string enable
log_archive_duplex_dest string
log_archive_format string %t_%s_%r.dbf
log_archive_local_first boolean TRUE
log_archive_max_processes integer 2
log_archive_min_succeed_dest integer 1
log_archive_start boolean FALSE
log_archive_trace integer 0

观察归档的相关信息,虽然无法不重启数据库的情况下改变归档模式,不过由于LOG_ARCHIVE_MIN_SUCCEED_DEST参数设置为1,因此可以通过添加第二归档LOCATION的方式来解决位置1无法使用的问题。

由于实例2没有归档的问题,可以仅修改当前实例的参数:

SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_2 = 'LOCATION=/data/oracle/oradata/testrac' SCOPE = MEMORY SID = 'testrac1';

系统已更改。

修改后,发现刚才一直处于等待状态的ALTER SYSTEM SWITCH LOGFILE命令已经执行完毕,且一直等待归档的会话也改变了等待事件:

SQL> alter system switch logfile;

系统已更改。

SQL> SELECT SID, EVENT FROM V$SESSION_WAIT WHERE SID = 132;

SID EVENT
---------- ----------------------------------------------------------------
132 SQL*Net more data from client

至此,问题基本上解决了,只需要等到导入工作完成后,手工重启节点1的数据库和ASM实例就可以了。

问题出现的时候只是通过V$SESSION_WAIT和数据库的错误信息进行了分析,事后检查ASM和实例1alert文件,再次确定了ASM的问题:

Mon Apr 16 12:01:33 2007
WARNING: allocation failure on disk DISK_0019 for file 392 xnum 2147483648
Mon Apr 16 12:10:58 2007
WARNING: allocation failure on disk DISK_0006 for file 396 xnum 163
Mon Apr 16 12:14:52 2007
WARNING: allocation failure on disk DISK_0004 for file 396 xnum 2147483648
Mon Apr 16 12:17:02 2007
WARNING: allocation failure on disk DISK_0004 for file 397 xnum 2147483648
Mon Apr 16 12:21:22 2007
WARNING: allocation failure on disk DISK_0002 for file 399 xnum 2147483648
Mon Apr 16 12:26:39 2007
WARNING: allocation failure on disk DISK_0038 for file 400 xnum 445
Mon Apr 16 12:31:18 2007
WARNING: allocation failure on disk DISK_0009 for file 400 xnum 2147483648
Mon Apr 16 12:33:00 2007
WARNING: allocation failure on disk DISK_0015 for file 400 xnum 2147483648
Mon Apr 16 12:33:11 2007
WARNING: allocation failure on disk DISK_0038 for file 400 xnum 445
Mon Apr 16 12:33:14 2007
WARNING: allocation failure on disk DISK_0038 for file 400 xnum 445
Mon Apr 16 12:33:15 2007
WARNING: allocation failure on disk DISK_0003 for file 400 xnum 2147483648
Mon Apr 16 12:33:16 2007
WARNING: allocation failure on disk DISK_0017 for file 400 xnum 2147483648
Mon Apr 16 12:33:17 2007
WARNING: allocation failure on disk DISK_0038 for file 400 xnum 445
Mon Apr 16 12:33:17 2007
WARNING: allocation failure on disk DISK_0004 for file 400 xnum 2147483648
Mon Apr 16 12:33:18 2007
WARNING: allocation failure on disk DISK_0038 for file 400 xnum 445
Mon Apr 16 12:33:19 2007
WARNING: allocation failure on disk DISK_0038 for file 400 xnum 445
Mon Apr 16 12:37:51 2007
WARNING: allocation failure on disk DISK_0019 for file 400 xnum 2147483648
Mon Apr 16 15:01:12 2007
WARNING: allocation failure on disk DISK_0006 for file 400 xnum 2147483648
Mon Apr 16 15:23:23 2007
Starting ORACLE instance (normal)
Mon Apr 16 15:26:05 2007
Starting ORACLE instance (normal)
Mon Apr 16 15:30:53 2007
Process PZ99 died, see its trace file
Process PZ99 died, see its trace file
Mon Apr 16 15:31:11 2007
Process PZ99 died, see its trace file
Process PZ99 died, see its trace file
Mon Apr 16 15:58:34 2007
Starting ORACLE instance (normal)

上面是ASM的错误信息,指出了ASM分配空间失败,而下面的数据库实例错误信息:

ORACLE Instance testrac1 - Archival Error
Mon Apr 16 12:33:24 2007
ORA-16014: log 1 sequence# 85 not archived, no available destinations
ORA-00312: online log 1 thread 1: '+DISK/testrac/onlinelog/group_1.258.618591139'
ORA-00312: online log 1 thread 1: '+DISK/testrac/onlinelog/group_1.259.618591145'
Mon Apr 16 12:33:24 2007
Errors in file /data/oracle/admin/testrac/bdump/testrac1_arc1_5191.trc:
ORA-16014: log 1 sequence# 85 not archived, no available destinations
ORA-00312: online log 1 thread 1: '+DISK/testrac/onlinelog/group_1.258.618591139'
ORA-00312: online log 1 thread 1: '+DISK/testrac/onlinelog/group_1.259.618591145'

则说明ASM的问题导致了归档的错误。

幸好出现问题的是测试库,重启对于测试数据库来说不算问题,但是重启对于产品库来说就不是小问题了,即使这个产品库是RAC环境。

从这个问题上看,感觉ASMbug还是多了一点,这个问题目前还找不到不重启ASM实例就可以解决问题的方法,而ASM给人的感觉太不透明了。

BTW:由于再节点1和节点2上多次尝试启动dbca来管理asm,还碰到了ORA-20错误,对于这个问题metalink有专门的文档描述:Doc ID: Note:308376.1Subject: Dbca Fails To Create Asm Disk Group, then upon restart of ASM instance errors

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-69241/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
暂无介绍

注册时间:2007-12-29

  • 博文量
    1955
  • 访问量
    10488265