ITPub博客

首页 > 数据库 > Oracle > oracle adg Automatic block repair bug

oracle adg Automatic block repair bug

原创 Oracle 作者:fangfeichu 时间:2016-02-26 10:36:24 0 删除 编辑
今天发现生产库的一个DG备库数据同步失败,查看日志发现数据库有坏块

ORA-01578: ORACLE 数据块损坏 (文件号 55, 块号 235997)
ORA-01110: 数据文件 55: '/u02/oradata/weike/datafile/appstore/appstore14.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/weike_standby/weike/incident/incdir_24742/weike_ora_22392_i24742.trc

oracle 11G ADG提供了Automatic block repair按理来说数据库坏块都会知道修复的,下面介绍下Automatic block repair的功能。
-----------------------------------------------------
Oracle 11G 新特性 Automatic block repair
在Oracle 的世界中,corrupted block一般都是块分裂。(不是快分裂,呵呵) 一般是块头和块尾的TRAIL NUMBER不一致才导致的块分裂,这样的块分裂是逻辑损坏。
在强大的11GR2 中,Oracle 可以自动去检测CORRUPTED BLOCK而去修复。
在手工恢复的年代,如果遇到坏块,那需要RMAN 恢复对应的ARCHIVED LOGS去恢复;或者应用flashback logs去恢复;或者如果坏块所在的OBJECT是索引,那可以重新REBUILD ONLINE。那自动修复BLOCK是如何实现的?和常规的手工恢复是一样的?
先看下NEW FEATURE中的描述:
Automatic block repair allows corrupt blocks on the primary database or physical standby database to be automatically repaired, as soon as they are detected, by
transferring good blocks from the other destination. In addition, RECOVER BLOCK is enhanced to restore blocks from a physical standby database. The physical standby database must be in real-time query mode. 
This feature reduces time when production data cannot be accessed, due to block corruption, by automatically repairing the corruptions as soon as they are detected in real-time using good blocks from a physical standby database. This reduces block recovery time by using up-to-date good blocks from a real-time, synchronized physical standby database as opposed to disk or tape backups or flashback logs.

主要有两点: 
1) 这个自动修复是需要物理DG的配合才能完成的,而且要求DG建立在实时查询模式;
2) 该特性由于直接从DG中找到对应的BLOCK来恢复,而直接跳过RESTORE 归档日志或者FLASHBACK 日志,当然更快/更强大;

对物理DG有些限制:
1) 物理DG必须运行在实时查询模式,而该模式需要ACTIVE DATA GUARD Lience;
2) 物理DG 最好运行在实时应用模式;因为任何模式的DG保护模式都是支持自动块修复,但是修复主库中损坏块的效率大大依赖DG中没有损坏块的版本。
    这段话说起来有点绕,还是举个简单的例子。

一个主库中CORRUPT块的SCN 为 1005,
而备库中对应块的SCN为1001,那ORACLE为修复该BLOCK,必须要让备库中的块SCN到达1005才能REPAIR。如果备库同步的时间越晚,那需要同步的时间就越多,修复的效率就越低。
反过来,如备库的块损坏,那ORACLE可以从主库中把对应的块拷贝过来,这需要在备库中的参数要有效的设置:
FAL_SERVER ,主库的NET SERICE NAME;
LOG_ARCHIVE_CONFIG

修复好后,会在库中alert日志写上日志;
这个自动修复,不会100%的成功,如果不能修复,那会报ORA-1578 的错误,
Error: ORA-1578  
Text: ORACLE data block corrupted (file # %s, block # %s)

那这时需要DBA的介入,或者REBUILD或者跳过坏块导入/导出;
------------------------------------------------------------------------------------------------------------------

于是重新开启实时同步看数据库是否能自动修复:

SQL> alter database recover managed standby database using current logfile disconnect;
Database altered.

再查看数据库日志发现如下信息,提示修复成功
Automatic block media recovery successful for (file# 55, block# 235997)
Automatic block media recovery requested for (file# 55, block# 235997)
Errors in file /u01/app/oracle/diag/rdbms/weike_standby/weike/trace/weike_pr02_10739.trc:
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified

但是再次查询刚才提示有坏块的表时还是提示有坏块。而且此时数据库服务器负载很高,操作系统日志报数据库内存不够。
查看Errors in file /u01/app/oracle/diag/rdbms/weike_standby/weike/trace/weike_pr02_10739.trc日志的信息。
Trace file /u01/app/oracle/diag/rdbms/weike_standby/weike/trace/weike_pr02_10739.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      app122
Release:        2.6.32-358.el6.x86_64
Version:        #1 SMP Tue Jan 29 11:47:41 EST 2013
Machine:        x86_64
Instance name: weike
Redo thread mounted by this instance: 1
Oracle process number: 52
Unix process pid: 10739, image: oracle@app122 (PR02)




*** 2016-02-24 18:07:06.332
*** SESSION ID:(964.95) 2016-02-24 18:07:06.332
*** CLIENT ID:() 2016-02-24 18:07:06.332
*** SERVICE NAME:(SYS$USERS) 2016-02-24 18:07:06.332
*** MODULE NAME:() 2016-02-24 18:07:06.332
*** ACTION NAME:() 2016-02-24 18:07:06.332
 
Downgrade ASYNC->SYNC: fno=4, fname=/u02/oradata/weike/datafile/undotbs2.341.887806811


*** 2016-02-24 18:08:29.460
krsd_get_primary_connect_string: found pcs 'weike' by reverse lookup
OCI error val is 12154 and errmsg is 'ORA-12154: TNS:could not resolve the connect identifier specified
'
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server


*** 2016-02-24 18:09:18.554
krsd_get_primary_connect_string: found pcs 'weike' by reverse lookup
OCI error val is 12154 and errmsg is 'ORA-12154: TNS:could not resolve the connect identifier specified
'
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-12154: TNS:could not resolve the connect identifier specified
ORA-17629: Cannot connect to the remote database server


*** 2016-02-24 18:12:44.724
Recovery is unrecoverable because 1 other slaves  require space.


*** 2016-02-24 18:12:49.790
Recovery is unrecoverable because 1 other slaves  require space.


查看metalink发现是 Automatic block repair  的一个bug
Bug 11809377 : AUTO BLOCK MEDIA RECOVERY IS CAUSING PASSWORD ERRORS ON PHYSICAL STANDBY
单击此项可添加到收藏夹 通过电子邮件发送此文档的链接 可打印页 转到底部转到底部
 

Bug 属性

 

类型 B - Defect 已在产品版本中修复
严重性 2 - Severe Loss of Service 产品版本 11.2.0.1
状态 91 - Closed, Could Not Reproduce 平台 226 - Linux x86-64
创建时间 2011-2-24 平台版本 ORACLE LINUX 5
更新时间 2016-1-20 基本 Bug N/A
数据库版本 11.2.0.1 影响平台 Generic
产品源 Oracle 与此 Bug 相关的知识, 补丁程序和 Bug
 

相关产品

 

产品线 Oracle Database Products 系列 Oracle Database Suite
区域 Oracle Database 产品 5 - Oracle Database - Enterprise Edition
Hdr: 11809377 11.2.0.1 RDBMS 11.2.0.1 DATAGUARD_PSBY PRODID-5 PORTID-226
Abstract: AUTO BLOCK MEDIA RECOVERY IS CAUSING PASSWORD ERRORS ON PHYSICAL STANDBY *** 02/24/11 09:43 am *** ----
 
PROBLEM:
--------
Errors in file 
/u01/oracle/db/diag/rdbms/p411/p411i8/trace/p411i8_pr0e_9448.trc:
ORA-17629: Cannot connect to the remote database server
ORA-17629: Cannot connect to the remote database server
ORA-17629: Cannot connect to the remote database server
 
 
DIAGNOSTIC ANALYSIS:
--------------------
alert_p411i8.log
=================
Sun Dec 05 04:43:06 2010 <-- start of log
Adjusting the default value of parameter parallel_max_servers
...
Mon Feb 07 18:58:13 2011 <-- first ORA-1017
Block version read by Primary database is higher than the
block version on Standby database => possible lost write on the Standby
Errors in file 
/u01/oracle/db/diag/rdbms/p411/p411i8/trace/p411i8_pr05_9422.trc:
ORA-17629: Cannot connect to the remote database server
...
Wed Feb 09 15:46:04 2011 <-- last ORA-1017
Errors in file 
/u01/oracle/db/diag/rdbms/p411/p411i8/trace/p411i8_pr0e_9448.trc:
ORA-17629: Cannot connect to the remote database server
...
Wed Feb 09 22:37:51 2011 <-- end of log
Media Recovery Waiting for thread 6 sequence 17867
 
p411i8_pr0e_9448.trc
====================
Warning: recovery process PR0E cannot use async I/O
Block version read by Primary database is higher than the *** 18:59:42.193 block version on Standby database => possible lost write on the Standby *** 18:59:42.365 4539 krsu.c Logged on to standby successfully
krsd_get_primary_connect_string: found pcs 'P411DG_EX4' by FAL_SERVER lookup
'
ORA-17629: Cannot connect to the remote database server
 
p411i8_pr05_9422.trc
====================
Warning: recovery async reads limited to 128 (req 1024)
Block version read by Primary database is higher than the *** 18:58:13.032 block version on Standby database => possible lost write on the Standby *** 18:58:13.143 4539 krsu.c Logged on to standby successfully
krsd_get_primary_connect_string: found pcs 'P411DG_EX4' by FAL_SERVER lookup
'
ORA-17629: Cannot connect to the remote database server
 
WORKAROUND:
-----------
None
 
RELATED BUGS:
-------------
 
REPRODUCIBILITY:
----------------
Error is occurring intermittently.
 
TEST CASE:
----------
 
STACK TRACE:
------------
 
SUPPORTING INFORMATION:
-----------------------
 
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
 
DIAL-IN INFORMATION:
--------------------

该bug暂时还没有修复补丁,于是只能重建下这个DG备库。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26794255/viewspace-1994947/,如需转载,请注明出处,否则将追究法律责任。

下一篇: Hadoop安装
请登录后发表评论 登录
全部评论

注册时间:2014-06-26

  • 博文量
    26
  • 访问量
    126721