ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 当前日志损坏的案例(蚂蚁)

当前日志损坏的案例(蚂蚁)

原创 Linux操作系统 作者:thompsun 时间:2011-01-30 09:12:27 0 删除 编辑

[oracle@ts01 oracle]$ sqlplus ‘/ as sysdba’

SQL*Plus: Release 9.2.0.4.0 - Production on Tue Nov 15 10:35:11 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> conn / as sysdba
Connected.
SQL>

在另一个SESSION:
[oracle@ts01 oracle]$ cd /oracle/oradata/TSMISC02
[oracle@ts01 TSMISC02]$ ll
total 681820
drwxr-xr-x    2 oracle   oinstall     4096 Nov 15 08:52 archive
-rw-r—–    1 oracle   oinstall  1662976 Nov 15 10:35 control01.ctl
-rw-r—–    1 oracle   oinstall  1662976 Nov 15 10:35 control02.ctl
-rw-r—–    1 oracle   oinstall  1662976 Nov 15 10:35 control03.ctl
-rw-r—–    1 oracle   oinstall 20979712 Nov 15 09:23 drsys01.dbf
-rw-r—–    1 oracle   oinstall 26222592 Nov 15 09:23 indx01.dbf
-rw-r—–    1 oracle   oinstall 20979712 Nov 15 09:23 odm01.dbf
-rw-r—–    1 oracle   oinstall  2097664 Nov 15 10:35 redo01.log
-rw-r—–    1 oracle   oinstall  2097664 Nov 15 05:48 redo02.log
-rw-r—–    1 oracle   oinstall  2097664 Nov 15 08:52 redo03.log
-rw-r—–    1 oracle   oinstall 387981312 Nov 15 10:26 system01.dbf
-rw-r—–    1 oracle   oinstall 52436992 Nov 15 09:23 system02.dbf
-rw-r—–    1 oracle   oinstall 42999808 Nov  1 14:16 temp01.dbf
www.ixdba.net
-rw-r—–    1 oracle   oinstall 10493952 Nov 15 09:23 tools01.dbf
-rw-r—–    1 oracle   oinstall 52436992 Nov 15 10:28 undotbs2.dbf
-rw-r—–    1 oracle   oinstall 26222592 Nov 15 09:23 users01.dbf
-rw-r—–    1 oracle   oinstall 47194112 Nov 15 09:23 xdb01.dbf
[oracle@ts01 TSMISC02]$ cat redo01.log| od -x|head
0000000 0200 0000 0200 0000 1000 0000 5c5d 5a5b
0000020 0000 0000 1606 0000 0000 0000 0000 0000
0000040 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000 003e 0000 0001 0000 f0c1 223c 0000 fec1
0001020 0000 0920 0000 0920 98c2 6798 5354 494d
0001040 4353 3230 01ec 0000 1000 0000 0200 0000
0001060 0001 0002 15c2 6798 0000 0000 0000 0000
0001100 0000 0000 0000 0000 0000 0000 0000 0000
0001120 0000 0000 0000 0000 0000 0000 6854 6572
[oracle@ts01 TSMISC02]$ >redo01.log
[oracle@ts01 TSMISC02]$ ll
total 679764
drwxr-xr-x    2 oracle   oinstall     4096 Nov 15 08:52 archive
-rw-r—–    1 oracle   oinstall  1662976 Nov 15 11:12 control01.ctl
-rw-r—–    1 oracle   oinstall  1662976 Nov 15 11:12 control02.ctl
-rw-r—–    1 oracle   oinstall  1662976 Nov 15 11:12 control03.ctl
-rw-r—–    1 oracle   oinstall 20979712 Nov 15 09:23 drsys01.dbf
-rw-r—–    1 oracle   oinstall 26222592 Nov 15 09:23 indx01.dbf
-rw-r—–    1 oracle   oinstall 20979712 Nov 15 09:23 odm01.dbf
-rw-r—–    1 oracle   oinstall        0 Nov 15 11:12 redo01.log
-rw-r—–    1 oracle   oinstall  2097664 Nov 15 05:48 redo02.log
-rw-r—–    1 oracle   oinstall  2097664 Nov 15 08:52 redo03.log
-rw-r—–    1 oracle   oinstall 387981312 Nov 15 10:57 system01.dbf
-rw-r—–    1 oracle   oinstall 52436992 Nov 15 09:23 system02.dbf
-rw-r—–    1 oracle   oinstall 42999808 Nov  1 14:16 temp01.dbf
-rw-r—–    1 oracle   oinstall 10493952 Nov 15 09:23 tools01.dbf
-rw-r—–    1 oracle   oinstall 52436992 Nov 15 11:11 undotbs2.dbf
-rw-r—–    1 oracle   oinstall 26222592 Nov 15 09:23 users01.dbf
-rw-r—–    1 oracle   oinstall 47194112 Nov 15 09:23 xdb01.dbf
[oracle@ts01 TSMISC02]$
回到SESSION2:
SQL> seleCT * from v$log;

GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS
———- ———- ———- ———- ———- — —————-
FIRST_CHANGE# FIRST_TIM
1          1         62    2097152          1 NO  CURRENT
       658443 15-NOV-05

2          1         60    2097152          1 YES INACTIVE
       641419 15-NOV-05

3          1         61    2097152          1 YES INACTIVE
       649920 15-NOV-05

IXDBA.NET社区论坛
SQL> set linesize 132
SQL> l
  1* select * from v$log
SQL> /

 GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
———- ———- ———- ———- ———- — —————- — 1          1         62    2097152          1 NO  CURRENT                 658443 15-NOV-05
2          1         60    2097152          1 YES INACTIVE                641419 15-NOV-05
3          1         61    2097152          1 YES INACTIVE                649920 15-NOV-05

SQL> alter system switch logfile;
alter system switch logfile
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
SQL> /
alter system switch logfile
*
ERROR at line 1:
ORA-03114: not connected to ORACLE
SQL> conn / as sysdba
Connected to an idle instance.
SQL> /
alter system switch logfile
*
ERROR at line 1:
ORA-01034: ORACLE not available
SQL> conn / as sysdba
Connected to an idle instance.
SQL>

启动数据库,报错:
SQL> conn / as sysdba
Connected to an idle instance.
SQL>
SQL> startup
ORACLE instance started.

Total System Global Area  236000356 bytes
Fixed Size                   451684 bytes
Variable Size             201326592 bytes
Database Buffers           33554432 bytes
Redo Buffers                 667648 bytes
Database mounted.
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: ‘/oracle/oradata/TSMISC02/redo01.log’
IXDBA.NET技术社区
SQL> select name,OPEN_MODE from  v$database;

NAME      OPEN_MODE
——— ———-
TSMISC02  MOUNTED

SQL> select * from v$instance;

INSTANCE_NUMBER INSTANCE_NAME
————— —————-
HOST_NAME
—————————————————————-
VERSION           STARTUP_T STATUS       PAR    THREAD# ARCHIVE LOG_SWITCH_
—————– ——— ———— — ———- ——- ———–
LOGINS     SHU DATABASE_STATUS   INSTANCE_ROLE      ACTIVE_ST
———- — —————– —————— ———
1 TSMISC02
ts01
9.2.0.4.0         15-NOV-05 MOUNTED      NO           1 STARTED
ALLOWED    NO  ACTIVE            PRIMARY_INSTANCE   NORMAL
SQL> set linesize 132

SQL> col member for a100
SQL> select * from v$logfile;

    GROUP# STATUS  TYPE    MEMBER
———- ——- ——-
         1         ONLINE  /oracle/oradata/TSMISC02/redo01.log
         2         ONLINE  /oracle/oradata/TSMISC02/redo02.log
   3         ONLINE  /oracle/oradata/TSMISC02/redo03.log

SQL>
检查alert日志:
Tue Nov 15 11:30:13 2005
Started first pass scan
Tue Nov 15 11:30:13 2005
Errors in file /oracle/admin/TSMISC02/udump/tsmisc02_ora_29582.trc:
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: ‘/oracle/oradata/TSMISC02/redo01.log’
ORA-316 signalled during: ALTER DATABASE OPEN…
Tue Nov 15 11:31:09 2005
Restarting dead background process QMN0
QMN0 started with pid=9
Tue Nov 15 11:36:10 2005
Restarting dead background process QMN0
QMN0 started with pid=9
Tue Nov 15 11:41:11 2005
Restarting dead background process QMN0
QMN0 started with pid=9
Tue Nov 15 11:46:12 2005
Restarting dead background process QMN0
QMN0 started with pid=9
Tue Nov 15 11:51:13 2005
Restarting dead background process QMN0
QMN0 started with pid=9
Tue Nov 15 11:56:14 2005
Restarting dead background process QMN0
QMN0 started with pid=9

。。。
相关trace文件内容:
IXDBA.NET社区论坛
[oracle@ts01 udump]$ ll tsmisc02_ora_29582.trc
-rw-r—–    1 oracle   oinstall      912 Nov 15 11:30 tsmisc02_ora_29582.trc
[oracle@ts01 udump]$ cat tsmisc02_ora_29582.trc
/oracle/admin/TSMISC02/udump/tsmisc02_ora_29582.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /oracle/product/920
System name:    Linux
Node name:      ts01
Release:        2.4.21-4.EL
Version:        #1 Fri Oct 3 18:13:58 EDT 2003
Machine:        i686
Instance name: TSMISC02
Redo thread mounted by this instance: 1
Oracle process number: 14
Unix process pid: 29582, image: oracle@ts01 (TNS V1-V3)

*** SESSION ID:(11.3) 2005-11-15 11:30:13.297
Thread checkpoint rba:0×00003e.00000002.0010 scn:0×0000.000a0c0b
On-disk rba:0×00003e.00000c1e.0000 scn:0×0000.000a257e
Use incremental checkpoint cache-low RBA
Thread 1 recovery from rba:0×00003e.0000095f.0000 scn:0×0000.00000000
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: ‘/oracle/oradata/TSMISC02/redo01.log’
[oracle@ts01 udump]$
SQL> alter database clear logfile group 1;
alter database clear logfile group 1
*
ERROR at line 1:
ORA-01624: log 1 needed for crash recovery of thread 1
ORA-00312: online log 1 thread 1: ‘/oracle/oradata/TSMISC02/redo01.log’
SQL> alter database clear unarchived logfile group 1;
alter database clear unarchived logfile group 1
*
ERROR at line 1:
ORA-01624: log 1 needed for crash recovery of thread 1
ORA-00312: online log 1 thread 1: ‘/oracle/oradata/TSMISC02/redo01.log’
SQL>
SQL> archive log list;
Database log mode              Archive Mode

Automatic archival             Enabled
Archive destination            /oracle/oradata/TSMISC02/archive
Oldest online log sequence     61
Next log sequence to archive   62
Current log sequence           62
SQL> seleCT * from v$log;

GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
———- ———- ———- ———- ———- — —————-
1          1         62    2097152          1 NO INVALIDATED             658443 15-NOV-05
2          1          0    2097152          1 YES INACTIVE                       0 15-NOV-05
3          1         61    2097152          1 YES INACTIVE                649920 15-NOV-05

SQL>
断定,损坏的是current 的日志。

接下来设置两个隐含参数:_ALLOW_RESETLOGS_CORRUPTION = TRUE 和 _corrupted_rollback_segments ,因为redo损坏的时候,undo数据也大都不一致了。

IXDBA.NET社区论坛

使用隐含参数启动数据库:
SQL> create pfile=’/home/oracle/pfile.tmp’ from spfile;

File created.
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
[oracle@ts01 oracle]$ tail pfile.tmp
*.sort_area_size=524288
*.star_transformation_enabled=’FALSE’
*.timed_statistics=TRUE
*.undo_management=’AUTO’
*.undo_retention=10800
*.undo_tablespace=’UNDOTBS2′
*.user_dump_dest=’/oracle/admin/TSMISC02/udump’

_ALLOW_RESETLOGS_CORRUPTION = TRUE

[oracle@ts01 oracle]$

这个参数的解释:
SQL> select KSPPDESC from X$KSPPI where ksppinm=’_allow_resetlogs_corruption’;

KSPPDESC
—————————————————————-
allow resetlogs even if it will cause corruption

[oracle@ts01 oracle]$ sqlplus ‘/ as sysdba’

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 21 15:57:21 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile=pfile.tmp
ORACLE instance started.

Total System Global Area  236000356 bytes
Fixed Size                   451684 bytes

Variable Size             201326592 bytes
Database Buffers           33554432 bytes
Redo Buffers                 667648 bytes
Database mounted.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery
SQL> select status from v$instance;

STATUS
————
MOUNTED

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: ‘/oracle/oradata/TSMISC02/redo01.log’
SQL>
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 658443 generated at 11/15/2005 08:52:17 needed for thread 1
ORA-00289: suggestion : /oracle/oradata/TSMISC02/archive/1_62.dbf
ORA-00280: change 658443 for thread 1 is in sequence #62

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
www.ixdba.net
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/oradata/TSMISC02/system01.dbf’

ORA-01112: media recovery not started

SQL> alter database open resetlogs;
。。。
很长时间,就好像hang住了

又过了一会:
SQL> alter database open resetlogs;

alter database open resetlogs
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
SQL>
SQL> select status from v$instance;
select status from v$instance
*
ERROR at line 1:
ORA-03114: not connected to ORACLE

SQL> conn / as sysdba
Connected to an idle instance.
检查日志,发现:
[oracle@ts01 bdump]$ tail alert_TSMISC02.log
Errors in file /oracle/admin/TSMISC02/udump/tsmisc02_ora_10762.trc:
ORA-00600: internal error code, arguments: [2662], [0], [658448], [0], [664313], [12582921], [], []
Mon Nov 21 16:11:44 2005
Errors in file /oracle/admin/TSMISC02/udump/tsmisc02_ora_10762.trc:
ORA-00600: internal error code, arguments: [2662], [0], [658448], [0], [664313], [12582921], [], []
Mon Nov 21 16:11:44 2005
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 10762
ORA-1092 signalled during: alter database open resetlogs…

此时我们可以通过Oracle的内部事件来调整SCN:

增进SCN有两种常用方法:

1.通过immediate trace name方式(在数据库Open状态下)

alter session set events ‘IMMEDIATE trace name ADJUST_SCN level x’;

2.通过10015事件(在数据库无法打开,mount状态下)

alter session set events ‘10015 trace name adjust_scn level x’;

注:level 1为增进SCN 10亿 (1 billion) (1024*1024*1024),通常Level 1已经足够。也可以根据实际情况适当调整。
设置event adjust_scn:

alter session set events ‘10015 trace name adjust_scn level 1′;

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount pfile=pfile.tmp    
ORACLE instance started.

Total System Global Area  236000356 bytes
Fixed Size                   451684 bytes
Variable Size             201326592 bytes
Database Buffers           33554432 bytes
Redo Buffers                 667648 bytes
Database mounted.
SQL> alter session set events ‘10015 trace name adjust_scn level 1′;

Session altered.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 658445 generated at 11/21/2005 16:11:43 needed for thread 1
ORA-00289: suggestion : /oracle/oradata/TSMISC02/archive/1_1.dbf
ORA-00280: change 658445 for thread 1 is in sequence #1
Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/oradata/TSMISC02/system01.dbf’
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
SQL>
检查日志:
[oracle@ts01 bdump]$ tail -f alert_TSMISC02.log
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
 Mem# 0 errs 0: /oracle/oradata/TSMISC02/redo01.log
Mon Nov 21 16:21:41 2005
Errors in file /oracle/admin/TSMISC02/udump/tsmisc02_ora_10795.trc:
ORA-00607: Internal error occurred while making a change to a data block
IXDBA.NET社区论坛
ORA-00600: internal error code, arguments: [4194], [91], [69], [], [], [], [], []
Error 607 happened during db open, shutting down database
USER: terminating instance due to error 607
Instance terminated by USER, pid = 10795
ORA-1092 signalled during: alter database open resetlogs…
设置隐含参数:
将undo改变成手工管理的,然后重启数据库,

SQL> show parameter undo

NAME                                 TYPE        VALUE
———————————— ———– ——————————
undo_management                      string      MANUEL
undo_retention                       integer     10800
undo_suppress_errors                 boolean     FALSE
undo_tablespace                      string      UNDOTBS2

SQL>

[oracle@ts01 oracle]$ tail pfile.tmp
*.undo_management=’AUTO’
*.undo_retention=10800
*.undo_tablespace=’UNDOTBS2′
*.user_dump_dest=’/oracle/admin/TSMISC02/udump’

*._ALLOW_RESETLOGS_CORRUPTION = TRUE
*.undo_management=’manual’
[oracle@ts01 oracle]$ exit
SQL> startup pfile=pfile.tmp
ORACLE instance started.

Total System Global Area  236000356 bytes
Fixed Size                   451684 bytes
Variable Size             201326592 bytes
Database Buffers           33554432 bytes
Redo Buffers                 667648 bytes
Database mounted.
Database opened.
SQL>
SQL> conn lunar/lunar
Connected.
SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
—————————— ——- ———-
LUNARTEST                      TABLE

SQL>

总结:

将undo改变成手工管理的,
IXDBA.NET技术社区
然后设置隐含参数 _ALLOW_RESETLOGS_CORRUPTION = TRUE 和 _corrupted_rollback_segments ,因为redo损坏的时候,undo数据也大都不一致
了。
2,open resetlogs之前,先使用recover database using backup controlfile until cancel;
如果此时又遇到600错误,就使用ADJUST_SCN事件来调整当前的SCN,如果SCN相差不多,可以通过多次重起数据库解决。如果scn相差比较多,

可以使用10015 event:
alter session set events ‘10015 trace name adjust_scn level 1′;
如果SCN相差比较多,可以设置level 2,。。。level 10等 (level 1是每次打开时将将scn推进1百万)

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11134237/viewspace-686511/,如需转载,请注明出处,否则将追究法律责任。

上一篇: 一些X$表的小结
请登录后发表评论 登录
全部评论

注册时间:2009-01-11

  • 博文量
    96
  • 访问量
    251586