ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 【DataGuard】错误的log_file_name_convert参数导致物理Data Guard配置故障分析与处理

【DataGuard】错误的log_file_name_convert参数导致物理Data Guard配置故障分析与处理

原创 Linux操作系统 作者:secooler 时间:2011-06-17 23:06:00 0 删除 编辑
  在同一台主机上部署物理Data Guard,或部署的Data Guard对应的数据库文件、日志文件目录不一致时,便会使用到db_file_name_convert参数和log_file_name_convert参数。

  本文讨论当log_file_name_convert参数未设置或者设置不当而导致的Data Guard部署故障场景及处理分析过程。

  关于同一台主机部署物理Data Guard的方法请参考文章《【DataGuard】同一台主机实现物理Data Guard配置安装》(http://space.itpub.net/519536/viewspace-578181)。

  这里有关Data Guard配置过程中需要准备的数据文件、日志文件以及Standby控制文件均已准备完毕,准备方法这里不赘述。这里我们将备库pfile文件中的log_file_name_convert参数内容注释掉。

1.调整后的备库pfile内容
secdg@secdb /home/oracle$ cat $ORACLE_HOME/dbs/initsecdg.ora
*.audit_file_dest='/u01/app/oracle/admin/secdg/adump'
*.background_dump_dest='/u01/app/oracle/admin/secdg/bdump'
*.control_files='/u01/app/oracle/oradata/secdg/cfile/control01.ctl','/u01/app/oracle/oradata/secdg/cfile/control02.ctl','/u01/app/oracle/oradata/secdg/cfile/control03.ctl'
*.core_dump_dest='/u01/app/oracle/admin/secdg/cdump'
*.db_file_multiblock_read_count=8# SMALL
*.db_files=80# SMALL
*.db_name='secdb'
*.global_names=TRUE
*.job_queue_processes=5
*.log_buffer=32768# SMALL
*.log_checkpoint_interval=10000
*.max_dump_file_size='10240'# limit trace file size to 5 Meg each
*.parallel_max_servers=5# SMALL
*.processes=200# SMALL
*.sga_max_size=300M
*.sga_target=300M
*.undo_management='AUTO'
*.user_dump_dest='/u01/app/oracle/admin/secdg/udump'

## Parameters for Standby Database.
db_name='secdb'
db_unique_name=secdg
log_archive_format=log%t_%s_%r.arc
log_archive_config='DG_CONFIG=(secdb,secdg)'
fal_server=secdb
fal_client=secdg
standby_file_management=AUTO
db_file_name_convert='/u01/app/oracle/oradata/secdb/dfile/','/u01/app/oracle/oradata/secdg/dfile/'
#log_file_name_convert='/u01/app/oracle/oradata/secdb/lfile/','/u01/app/oracle/oradata/secdg/lfile/'
standby_archive_dest='/home/oracle/arch/secdg'
## Parameters which using for switch over from Standby to Primary.
LOG_ARCHIVE_DEST_1='LOCATION=/home/oracle/arch/secdg VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=secdg'
LOG_ARCHIVE_DEST_2='SERVICE=secdb LGWR SYNC  VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=secdb'
log_archive_dest_state_1=ENABLE
log_archive_dest_state_2=ENABLE
remote_login_passwordfile=EXCLUSIVE

注意,这里我们注释掉了“log_file_name_convert”参数,表示在启动Data Guard库的时候日志文件将不被转换。看一下结果。

2.启动备库物理Data Guard
secdb@secdb /home/oracle$ export ORACLE_SID=secdg
secdg@secdb /home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jun 17 18:11:48 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

NotConnected@> startup nomount
ORACLE instance started.

Total System Global Area  314572800 bytes
Fixed Size                  1219184 bytes
Variable Size              96470416 bytes
Database Buffers          213909504 bytes
Redo Buffers                2973696 bytes
NotConnected@> alter database mount standby database;

Database altered.

3.确认log_file_name_convert参数信息
NotConnected@> show parameter convert

NAME                   TYPE    VALUE
---------------------- ------- -------------------------------------
db_file_name_convert   string  /u01/app/oracle/oradata/secdb/dfile/,
                               /u01/app/oracle/oradata/secdg/dfile/
log_file_name_convert  string

此处显示“log_file_name_convert”内容为空,表示未设置。

4.由log_file_name_convert引发的配置故障结果
NotConnected@> col GROUP# for 999999
NotConnected@> col STATUS for a5
NotConnected@> col TYPE for a7
NotConnected@> col MEMBER for a50
NotConnected@> select * From v$logfile order by 1,4;

 GROUP# STATU TYPE    MEMBER                                             IS_
------- ----- ------- -------------------------------------------------- ---
      1 STALE ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo01.log     NO
      1 STALE ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo01_b.log   NO
      2       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo02.log     NO
      2       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo02_b.log   NO
      3       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo03.log     NO
      3       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo03_b.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo04_a.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo04_b.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo05_a.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo05_b.log   NO
      6       STANDBY /u01/app/oracle/oradata/secdb/lfile/log6a.rdo      NO
      6       STANDBY /u01/app/oracle/oradata/secdb/lfile/log6b.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdb/lfile/log7a.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdb/lfile/log7b.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdb/lfile/log8a.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdb/lfile/log8b.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdb/lfile/log9a.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdb/lfile/log9b.rdo      NO
     10       STANDBY /u01/app/oracle/oradata/secdb/lfile/log10a.rdo     NO
     10       STANDBY /u01/app/oracle/oradata/secdb/lfile/log10b.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdb/lfile/log11a.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdb/lfile/log11b.rdo     NO

22 rows selected.


这里便体现了这种配置的问题,由于没有设置log_file_name_convert参数,导致备库使用到的日志文件不正确,这里显示的是主库日志信息!

为方便比较,正确的内容如下:
NotConnected@> select * From v$logfile order by 1,4;

 GROUP# STATU TYPE    MEMBER                                             IS_
------- ----- ------- -------------------------------------------------- ---
      1       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo01.log     NO
      1       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo01_b.log   NO
      2       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo02.log     NO
      2       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo02_b.log   NO
      3       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo03.log     NO
      3       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo03_b.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo04_a.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo04_b.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo05_a.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo05_b.log   NO
      6       STANDBY /u01/app/oracle/oradata/secdg/lfile/log6a.rdo      NO
      6       STANDBY /u01/app/oracle/oradata/secdg/lfile/log6b.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdg/lfile/log7a.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdg/lfile/log7b.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdg/lfile/log8a.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdg/lfile/log8b.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdg/lfile/log9a.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdg/lfile/log9b.rdo      NO
     10       STANDBY /u01/app/oracle/oradata/secdg/lfile/log10a.rdo     NO
     10       STANDBY /u01/app/oracle/oradata/secdg/lfile/log10b.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdg/lfile/log11a.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdg/lfile/log11b.rdo     NO

22 rows selected.


5.尝试主备库Switch Over后在secdg实例切换日志
secdg实例的日志中将会记在如下报错
Fri Jun 17 22:51:56 2011
Errors in file /u01/app/oracle/admin/secdg/bdump/secdg_lgwr_1165.trc:
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02_b.log'
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02.log'
Fri Jun 17 22:51:56 2011
Errors in file /u01/app/oracle/admin/secdg/bdump/secdg_lgwr_1165.trc:
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02_b.log'
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02.log'
Fri Jun 17 22:51:56 2011
LGWR: terminating instance due to error 314
Instance terminated by LGWR, pid = 1165

这便是由于log_file_name_convert参数没有设置导致备库错误地用到了“原主库日志文件”的严重后果!
secdg数据库实例因此而被杀死!尝试重新启动的话,仍然会报同样的错误。
secdg@secdb /home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jun 17 23:00:47 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

NotConnected@> startup;
ORACLE instance started.

Total System Global Area  314572800 bytes
Fixed Size                  1219184 bytes
Variable Size              96470416 bytes
Database Buffers          213909504 bytes
Redo Buffers                2973696 bytes
Database mounted.
ORA-00314: log 1 of thread 1, expected sequence# 43 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01_b.log'
ORA-00314: log 1 of thread 1, expected sequence# 43 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01.log'

到此切换后的主库(原备库)将不在可用!

6.故障处理
既然问题出在备库的配置上,并且备库已经不在可用。最后的可行的出路便是将现在的备库(secdb实例,及原主库)调整为主库以供生产所用。
我们能采用的方法便是强制将备库做Failover处理。具体操作如下。
NotConnected@> alter database recover managed standby database finish force;

Database altered.

NotConnected@> alter database commit to switchover to primary;

Database altered.

sys@secdb> select DATABASE_ROLE from v$database;

DATABASE_ROLE
----------------
PRIMARY

sys@secdb> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

后续的拯救措施便是,根据这个Failover出来的主库重新按照正确的方法重建Data Guard备库。

7.另外一种故障场景
  另外一种故障场景是当主库切换日志时遇到类似故障,主库将不再可用。此时的处理方式只能是放弃主库,然后将备库切换为主库,然后再重建整个Data Guard环境。
类似的报错信息如下:
Errors in file /u01/app/oracle/admin/secdb/udump/secdb_ora_2201.trc:
ORA-00314: log 1 of thread 1, expected sequence# 60 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01_b.log'
ORA-00314: log 1 of thread 1, expected sequence# 60 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01.log'

8.小结
  通过这个故障场景的再现及处理过程可以得出一个结论,我们在配置Data Guard环境时一定要对每个参数的含义和作用做到了如指掌,防止出现故障素手无策。
  另外一个老生常谈的结论是:“有备无患”!,在准备对生产库做重大调整前一定要对数据库做好有效的备份。这样在出现类似文章提到的Data Guard配置故障后,可以非常高效的进行恢复。

Good luck.

secooler
11.06.17

-- The End --

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/519536/viewspace-700320/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
Oracle ACE 总监,阿里云MVP,北京大学理学硕士,恩墨学院创始人,教育专家,中国区 Cloudera 首位官方授权大数据讲师,金牌培训专家,BDA大数据联盟创始人,OCM联盟创始人,ACCUG创始人、ACOUG核心专家,Blogger。

注册时间:2008-03-16

  • 博文量
    797
  • 访问量
    8091666