ITPub博客

首页 > Linux操作系统 > Linux操作系统 > ASM空间爆满导致数据库挂起

ASM空间爆满导致数据库挂起

原创 Linux操作系统 作者:space6212 时间:2019-03-31 22:48:07 0 删除 编辑

今天用客户端链接一个测试库时出现错误ORA-00257: archiver error. Connect internal only, until freed。
数据库环境是SOLARIS 10 + 10203 RAC + ASM。


之前遇到过几次这个问题,是因为数据库出了问题,导致无法归档。登上数据库服务器,用sys连接进去查看日志的状态:
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------------
1 1 54 1073741824 2 NO CURRENT 5678852562 11-10??-07
2 1 51 1073741824 2 NO INACTIVE 5678784185 11-10??-07
3 1 53 1073741824 2 NO INACTIVE 5678835269 11-10??-07
4 1 52 1073741824 2 NO INACTIVE 5678800967 11-10??-07
5 2 22 1073741824 2 YES INACTIVE 5678750740 11-10??-07
6 2 23 1073741824 2 NO CURRENT 5678835302 11-10??-07
7 2 20 1073741824 2 YES INACTIVE 5678603818 11-10??-07
8 2 21 1073741824 2 YES INACTIVE 5678685075 11-10??-07

果然是有很多日志无法归档。再查看alert文件:
ASM的ALERT报警:
Sun Sep 30 02:16:22 2007
WARNING: allocation failure on disk DATA_0000 for file 364 xnum 2147483648
Sun Sep 30 03:26:59 2007
WARNING: allocation failure on disk DATA_0000 for file 370 xnum 2147483648
........

数据库其中一个节点的ALERT
Sun Sep 30 02:16:22 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc1_13668.trc:
ORA-19504: failed to create file "+DATA/archivelog/1_19_634432026.dbf"
ORA-17502: ksfdcre:4 Failed to create file +DATA/archivelog/1_19_634432026.dbf
ORA-15041: diskgroup space exhausted
Sun Sep 30 02:16:22 2007
ARC1: Error 19504 Creating archive log file to '+DATA/archivelog/1_19_634432026.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
Sun Sep 30 02:16:22 2007
ORACLE Instance prerac1 - Archival Error
........
Thu Oct 11 23:55:03 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc2_15831.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded

从alert文件可以知道,不能归档的原因是因为ASM DISK的空间被用完了,但后面的错误:
ORA-00020: maximum number of processes () exceeded是怎么回事呢?
当时整个数据库都已经顿住了,查询v$process完全没有反映,只能从操作系统上查:
bash-3.00$ ps -ef|grep ora|wc -l
84

这个进程数离process参数设定值还差很远,所以基本可以排除是数据库的进程数满了,再看一下asm的进程:
bash-3.00$ ps -ef|grep asm|wc -l
39
asm的进程设定值是40,所以应该是asm的进程达到的设定值。正常情况下asm只有十几个进程,现在出现39个进程肯定是不正常的。
查了一会资料发现,原因还是因为归档引起的。因为我的归档是放在ASM上的,当oracle不能归档时,oracle会新起进程尝试进行归档,而不释放之前的归档进程,而归档进程是需要链接ASM实例的,所以导致ASM的进程数增大,最终出现ORA-00020错误。

为了解决空间不足问题,我用asmcmd删除了大部分的归档日志,用在RMAN中清除了相关的归档日志信息。
空间虽然有了,但此时问题让人存在,估计是因为归档进程并不清楚已经有空间了。此时解决这一系列问题有两种方法:
1、设置log_archive_dest_state_n,让归档暂时放在其他路径
2、重启数据库

在我解决过程中,ASM的连接数不断增加,最终我用sys连接数据库无法shutdown数据库:
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc0_13665.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded

最后通过重启crs的方式把数据库重启了:
bash-3.00# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

bash-3.00# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

重启后v$log的情况如下:
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- --------------
1 1 54 1073741824 2 YES INACTIVE 5678852562 11-10? -07
2 1 55 1073741824 2 YES INACTIVE 5678932182 11-10? -07
3 1 53 1073741824 2 YES INACTIVE 5678835269 11-10? -07
4 1 56 1073741824 2 NO CURRENT 5678958358 12-10? -07
5 2 22 1073741824 2 YES INACTIVE 5678750740 11-10? -07
6 2 23 1073741824 2 YES INACTIVE 5678835302 11-10? -07
7 2 24 1073741824 2 NO CURRENT 5678958359 12-10? -07
8 2 21 1073741824 2 YES INACTIVE 5678685075 11-10? -07

8 rows selected.

可见,数据库已经恢复正常。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/231499/viewspace-63861/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2005-01-25

  • 博文量
    245
  • 访问量
    168877