ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 在奋战了两天之后,RMAN终于正常了

在奋战了两天之后,RMAN终于正常了

原创 Linux操作系统 作者:BTxigua 时间:2011-12-08 22:01:06 0 删除 编辑
数据库主机一天连续宕机两次,两个RAC也被crash两次,重启之后,结果RMAN备份歇菜了。
 
NBU备份一执行,就挂住不再有反应了。检查了一下连接情况:
SQL> select sid,serial#,event,paddr from v$session where program like 'rman%' ;
       SID    SERIAL# EVENT                                                            PADDR
---------- ---------- ---------------------------------------------------------------- ----------------
      1906      54106 SQL*Net message from client                                      C000000CFA6423D0
      3214      47345 enq: WL - contention                                             C000000CFD640658
 
原来有等待,metalink 1209896.1描述:
The root-cause is unpublished bug 6113783 - ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK

The session which is executing the ALTER SYSTEM  ARCHIVE LOG CURRENT is waiting for the event :
    'enq: WL - contention'

This session holding this enqueue seems to be hanging and therefor blocking the ARCHIVE LOG CURRENT to continue.
 
Get the blocker with :

SQL> select * from v$lock
     where v$lock.type = 'WL'
       and v$lock.lmode > 0
       and v$lock.block = 1;

The related process is :

SQL> select v$session.machine, v$session.process, v$session.program
     from v$session, v$lock
     where v$lock.sid = v$session.sid
       and v$lock.type = 'WL'
       and v$lock.lmode > 0
       and v$lock.block = 1;
 

Solution

If the blocker is an archiver process (ARCx) than the issue is related to the unpublished bug 6113783 and is fixed in 11g. (11.1.0.7)

The workaround for 10g is to kill the related archiver process on OS-level.

Unix:
 % kill -9

The archiver will be restarted automaticly.
 
 按照文档方法,检查了一下:
SQL> select * from v$lock
  2       where v$lock.type = 'WL'
  3         and v$lock.lmode > 0
  4         and v$lock.block = 1;
ADDR             KADDR                   SID TYPE        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- ---- ---------- ---------- ---------- ---------- ---------- ----------
C000000CFCA11148 C000000CFCA11168       3268 WL   1843785734  676770513          5          0     185657          1
SQL>
SQL> select v$session.machine, v$session.process, v$session.program
  2       from v$session, v$lock
  3       where v$lock.sid = v$session.sid
  4         and v$lock.type = 'WL'
  5         and v$lock.lmode > 0
  6         and v$lock.block = 1;
MACHINE                                                          PROCESS      PROGRAM
---------------------------------------------------------------- ------------ ------------------------------------------------
actdb1                                                           1827         oracle@actdb1 (ARC1)
SQL> select spid from v$process where addr in (select paddr from v$session where sid=3268) ;
SPID
------------
1827
 
原来是1827的归档进程作怪。我的版本是10204,操作系统是HP11.31,无补丁。
只能kill
actdb1:/oracle/niyl/rman>ps -ef | grep ora_arc
  oracle  1827     1  0  Dec  5  ?         0:00 ora_arc1_ngact1
  oracle  1825     1  0  Dec  5  ?         5:56 ora_arc0_ngact1
  oracle 10233 23821  1 00:58:50 pts/tb    0:00 grep ora_arc
actdb1:/oracle/niyl/rman>kill -9 1827        
actdb1:/oracle/niyl/rman>ps -ef | grep ora_arc
  oracle  1827     1  0  Dec  5  ?         0:00 ora_arc1_ngact1
  oracle  1825     1  0  Dec  5  ?         5:56 ora_arc0_ngact1
  oracle 14709 23821  1 01:01:07 pts/tb    0:00 grep ora_arc
 
kill不掉。尝试多次也没办法。
将数据库和crs全部重启之后,恢复正常。NBU终于正常了。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/10867315/viewspace-712991/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2008-01-31

  • 博文量
    101
  • 访问量
    291317