ITPub博客

首页 > 数据库 > Oracle > 异常宕机后启动报ora-600[4000]错误恢复

异常宕机后启动报ora-600[4000]错误恢复

原创 Oracle 作者:xiangqinghu 时间:2015-04-17 16:33:43 0 删除 编辑

客户反映说一个库前几天因为磁盘损坏,导致操作系统宕机。几经折腾终于把盘恢复了,却又发现数据库无法启动了。并且已经使用trace里面的backup controlfile重建了控制文件,但还是无法正常启动。

 

检查之后,mount实例,尝试recover database open 数据库:

recover database until cancel using backup controlfile;

 

因为controlfile是重建出来的,所以当前的controlfile并不知道哪个在线日志是current的,需要手动指定。

Thu Apr 16 13:01:14 2015
ALTER DATABASE RECOVER  database until cancel using backup controlfile  
Thu Apr 16 13:01:14 2015
Media Recovery Start
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
...
WARNING! Recovering data file 75 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel usin...
Thu Apr 16 13:02:31 2015
ALTER DATABASE RECOVER    LOGFILE 'E:\datafile\redo01.log'  
Thu Apr 16 13:02:31 2015
Media Recovery Log E:\datafile\redo01.log
Errors with log E:\datafile\redo01.log
ORA-339 signalled during: ALTER DATABASE RECOVER    LOGFILE 'E:\datafile\red...
Thu Apr 16 13:02:31 2015
ALTER DATABASE RECOVER CANCEL 
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Thu Apr 16 13:03:02 2015
ALTER DATABASE RECOVER  database until cancel using backup controlfile  
Thu Apr 16 13:03:02 2015
Media Recovery Start
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
...
WARNING! Recovering data file 75 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel usin...
Thu Apr 16 13:03:16 2015
ALTER DATABASE RECOVER    LOGFILE 'E:\datafile\redo02.log'  
Thu Apr 16 13:03:16 2015
Media Recovery Log E:\datafile\redo02.log
Errors with log E:\datafile\redo02.log
ORA-339 signalled during: ALTER DATABASE RECOVER    LOGFILE 'E:\datafile\red...
Thu Apr 16 13:03:16 2015
ALTER DATABASE RECOVER CANCEL 
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Thu Apr 16 13:03:34 2015
ALTER DATABASE RECOVER  database until cancel using backup controlfile  
Thu Apr 16 13:03:34 2015
Media Recovery Start
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
...
WARNING! Recovering data file 75 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel usin...
Thu Apr 16 13:03:49 2015
ALTER DATABASE RECOVER    LOGFILE 'E:\datafile\redo03.log'  
Thu Apr 16 13:03:49 2015
Media Recovery Log E:\datafile\redo03.log
Incomplete recovery applied all redo ever generated.
Recovery completed through change 14081497748113
Media Recovery Complete
Completed: ALTER DATABASE RECOVER    LOGFILE 'E:\datafile\red

 

多次尝试后发现redo03.log是所需要的current group member,尝试alter database open resetlogs,但是出现ORA-00600 [4000]错误:

Thu Apr 16 08:00:40 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_144.trc:
ORA-00600: 内部错误代码,参数: [4000], [3], [], [], [], [], [], []
 
Thu Apr 16 08:00:40 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_144.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码,参数: [4000], [3], [], [], [], [], [], []
 
Thu Apr 16 08:00:40 2015
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Thu Apr 16 08:00:40 2015
Errors in file d:\oracle\admin\kf2\bdump\kf2_pmon_5172.trc:
ORA-00704: bootstrap process failure

以上ORA-00600: [4000], [3],说明在使用usn=3的回滚段rollback数据块时发现rollback segment存在错误,且伴随有ORA-00704: bootstrap process failure错误,说明需要回滚的数据块是bootstrap需要的自举对象。

一般来说bootstrap object需要做rollbackcleanup,而apply undo数据时,我们是无法使用_corrupted_rollback_segments,_offline_rollback_segments10513事件来阻止ORA-00600: [4000]的发生。

 

不过还是抱着试一试的心理尝试一下:

*._allow_resetlogs_corruption=true
event=('10510 trace name context forever,level 1','10511 trace name context forever,level 2','10512 trace name context forever,level 1','10513 trace name context forever,level 2')
*._corrupted_rollback_segments=(_SYSSMU3$)
*._offline_rollback_segments=(_SYSSMU3$)

 

再次尝试启动数据库:

SMON: enabling cache recovery
Thu Apr 16 08:10:20 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5148.trc:
ORA-00600: 内部错误代码,参数: [4000], [3], [], [], [], [], [], []
 
Thu Apr 16 08:10:20 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5148.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码,参数: [4000], [3], [], [], [], [], [], []

确实还是无法避免bootstrap对象发生ORA-00600:[4000]错误。没办法,要想恢复这个数据库就必须要解决这个bootstrap对象。

 

先来看一下ORA-00600:[4000]内部错误的trace日志:

Dump file d:\oracle\admin\kf2\udump\kf2_ora_5148.trc
Thu Apr 16 08:10:08 2015
ORACLE V9.2.0.8.0 - Production vsnsta=0
vsnsql=12 vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: kf2
 
Redo thread mounted by this instance: 1
 
Oracle process number: 12
 
Windows thread id: 5148, image: ORACLE.EXE
 
 
*** SESSION ID:(9.3) 2015-04-16 08:10:08.515
Start recovery at thread 1 ckpt scn 14081497668136 logseq 1 block 2
*** 2015-04-16 08:10:10.265
Media Recovery Log D:\ORACLE\ORA92\RDBMS\ARC00001.001
*** 2015-04-16 08:10:10.281
Media Recovery Log D:\ORACLE\ORA92\RDBMS\ARC00001.001
*** 2015-04-16 08:10:18.015
*** 2015-04-16 08:10:20.140
ksedmp: internal or fatal error
ORA-00600: 内部错误代码,参数: [4000], [3], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1

 

引发错误的数据块头信息:

Block header dump:  0x0040007a
 Object id on Block? Y
 seg/obj: 0x12  csc: 0xcce.9aabd24f  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0003.000.0001a69c  0x0080002d.33df.01  --U-    1  fsc 0x0000.9aabd250
 
data_block_dump,data header at 0x34270044

 

通过上面的trace文件内容,我们知道:

1.  引发ORA-00600:[4000], [3]错误的语句是”select ctime, mtime, stime from obj$ where obj# = :1″。这是一条常用的递归SQL语句,查询的对象是非常重要的bootstrap自举对象OBJ$基表,这说明需要cleanup的块是OBJ$表上的。

2.  引发ORA-00600:[4000], [3]错误的数据块是1号数据文件的122块,seg/obj0x12,块类型为Data,且存在有一条ITL entry:

SQL> select DBMS_UTILITY.data_block_address_file (TO_NUMBER ('0040007a','xxxxxxxx')) file_no,
  2  DBMS_UTILITY.data_block_address_block (TO_NUMBER ('0040007a','xxxxxxxx')) block_no
  3  from dual;
 
   FILE_NO   BLOCK_NO
---------- ----------
         1        122
 
SQL>
 
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0003.000.0001a69c  0x0080002d.33df.01  --U-    1  fsc 0x0000.9aabd250

因为该致命内部错误涉及到的对象是重要的BootstrapOBJ$,无法使用传统的例如_corrupted_rollback_segments,_offline_rollback_segments10513事件来阻止ORA-00600: [4000]的发生,必须使用块修改工具BBED来修改存在问题的数据块将ITL事务槽的FLAGU修改为C(Commit),手工提交该事务。

 

事务状态标识:

TRANSACTION_COMMITED = 0x08;

TRANSACTION_UPBOUND = 0x02;

TRANSACTION_ACTIVE = 0x01;

 

Flag= -U- TRANSACTION_UPBOUND时,flag值为0x02,需要将该字节修改为TRANSACTION_COMMITED = 0x08;

 

下面用bbed修改system01.dbf文件。注意修改前一定要先备份。

 
E:\datafile>bbed filename=system01.dbf password=blockedit blocksize=8192 mode=edit
 
BBED: Release 2.0.0.0.0 - Limited Production on 星期四 4 16 09:58:02 2015
 
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
 
************* !!! For Oracle Internal Use only !!! ***************
 
BBED> map
 File: system01.dbf (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header
 
 struct kcvfh, 360 bytes                    @0
 
 ub4 tailchk                                @8188
 
 
BBED> set block 122
        BLOCK#          122
 
BBED> map
 File: system01.dbf (0)
 Block: 122                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 
 struct kcbh, 20 bytes                      @0
 
 struct ktbbh, 48 bytes                     @20
 
 struct kdbh, 14 bytes                      @68
 
 struct kdbt[1], 4 bytes                    @82
 
 sb2 kdbr[108]                              @86
 
 ub1 freespace[859]                         @302
 
 ub1 rowdata[7027]                          @1161
 
 ub4 tailchk                                @8188
 
 
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x9aabd24f
      ub2 kscnwrp                           @32       0x0cce
   b2 ktbbhict                              @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0003
         ub2 kxidslt                        @46       0x0000
         ub4 kxidsqn                        @48       0x0001a69c
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x0080002d
         ub2 kubaseq                        @56       0x33df
         ub1 kubarec                        @58       0x01
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x9aabd250
 
BBED> set offset 61
        OFFSET          61
BBED> set count 16
        COUNT           16
BBED> d
 File: system01.dbf (0)
 Block: 122              Offsets:   61 to   76           Dba:0x00000000
------------------------------------------------------------------------
 20000050 d2ab9a00 016c00ff ffea0045
 
 <32 bytes per line>
 
BBED> m /x 0x80
 File: system01.dbf (0)
 Block: 122              Offsets:   61 to   76           Dba:0x00000000
------------------------------------------------------------------------
 80000050 d2ab9a00 016c00ff ffea0045
 
 <32 bytes per line>
 
BBED> sum apply
Check value for File 0, Block 122:
current = 0xb0d6, required = 0xb0d6
 
BBED>

 

再次尝试打开数据库,出现ORA-00600:[2256]错误

Thu Apr 16 14:43:57 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5744.trc:
ORA-00600: 内部错误代码,参数: [2256], [3278], [3221225472], [3278], [3221225539], [], [], []
 
Thu Apr 16 14:43:57 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5744.trc:
ORA-00600: 内部错误代码,参数: [2256], [3278], [3221225472], [3278], [3221225539], [], [], []
 
Thu Apr 16 14:43:57 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600

 

一般ORA-00600:[2256]错误可以通过10015 ADJUST_SCN事件来推进SCN解决。

MOS里面对ORA-00600:[2256]的各个参数解释如下:

Arg [a]  Current SCN WRAP
Arg [b]  Current SCN BASE
Arg [c]  dependent SCN WRAP
Arg [d]  dependent SCN BASE 
Arg [e]  Where present this is the DBA where the dependent SCN came from.

 

我们的日志中的各个参数值如下:

ORA-00600: 内部错误代码,参数: [2662], [3278], [2594951966], [3278], [2595857063], [4235250], [], []

 

根据level的计算规则算出需要推进的level大小:

Level = Arg[c] * 4 + Arg[d]/1024/1024/1024 = 13115

 

alter session set events '10015 trace name adjust_scn level 13115';

 

再次open数据库,alert日志如下:

Thu Apr 16 13:47:49 2015
alter database open resetlogs
Thu Apr 16 13:47:49 2015
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 14081497748203
Resetting resetlogs activation ID 1527068138 (0x5b0535ea)
Online log 1 of thread 1 was previously cleared
Online log 2 of thread 1 was previously cleared
Thu Apr 16 13:47:51 2015
Assigning activation ID 1527030262 (0x5b04a1f6)
Thread 1 opened at log sequence 1
  Current log# 3 seq# 1 mem# 0: E:\DATAFILE\REDO03.LOG
  Current log# 3 seq# 1 mem# 1: E:\DATAFILE\REDO06.LOG
Successful open of redo thread 1
Thu Apr 16 13:47:52 2015
SMON: enabling cache recovery
Thu Apr 16 13:47:52 2015
Debugging event used to advance scn to 14082124021760
Dictionary check beginning
Dictionary check complete
Thu Apr 16 13:47:55 2015
SMON: enabling tx recovery
Thu Apr 16 13:47:55 2015
Database Characterset is ZHS16CGB231280
Hex dump of Absolute File 1, Block 44180 in trace file d:\oracle\admin\kf2\udump\kf2_ora_5052.trc
***
Corrupt block relative dba: 0x0040ac94 (file 1, block 44180)
Fractured block found during buffer read
Data in bad block -
 type: 6 format: 2 rdba: 0x0040ac94
 last change scn: 0x0cce.9ab97c09 seq: 0x1 flg: 0x06
 consistency value in tail: 0xae1f0601
 check value in block header: 0xc9d7, computed block checksum: 0xc30f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0040ac94 (file 1, block 44180) found same corrupted data
Thu Apr 16 13:47:55 2015
Errors in file d:\oracle\admin\kf2\udump\kf2_ora_5052.trc:
ORA-01578: ORACLE 数据块损坏(文件号1,块号44180
ORA-01110: 数据文件 1: 'E:\DATAFILE\SYSTEM01.DBF'
 
Error 1578 happened during db open, shutting down database
USER: terminating instance due to error 1578

可以看到,这时的报错已经不一样了,这是因为system01.dbf文件中还有其他的坏块从而导致down库的。

 

dbv检查system01.dbf文件:

E:\datafile>dbv file='E:\datafile\SYSTEM01.dbf' blocksize=8192
 
DBVERIFY: Release 9.2.0.8.0 - Production on 星期四 4 16 13:59:11 2015
 
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
 
DBVERIFY - 验证正在开始 : FILE = E:\datafile\SYSTEM01.dbf
Block Checking: DBA = 4194426, Block Type = KTB-managed data block
data header at 0x1162244
kdbchk: row locked by non-existent transaction
        table=0   slot=46
        lockid=1   ktbbhitc=1
 122 失败,校验代码为 6101
汇入的页43268 - 可能是介质损坏
***
Corrupt block relative dba: 0x0040a904 (file 1, block 43268)
Fractured block found during dbv:
Data in bad block -
 type: 6 format: 2 rdba: 0x0040a904
 last change scn: 0x0cce.9ab97baa seq: 0x1 flg: 0x06
 consistency value in tail: 0xaf060601
 check value in block header: 0x43af, computed block checksum: 0xd8a5
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
 
汇入的页44052 - 可能是介质损坏
***
Corrupt block relative dba: 0x0040ac14 (file 1, block 44052)
Fractured block found during dbv:
Data in bad block -
 type: 6 format: 2 rdba: 0x0040ac14
 last change scn: 0x0cce.9aaba70d seq: 0x1 flg: 0x06
 consistency value in tail: 0x748e0601
 check value in block header: 0x7766, computed block checksum: 0x91c3
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
 
汇入的页44180 - 可能是介质损坏
***
Corrupt block relative dba: 0x0040ac94 (file 1, block 44180)
Fractured block found during dbv:
Data in bad block -
 type: 6 format: 2 rdba: 0x0040ac94
 last change scn: 0x0cce.9ab97c09 seq: 0x1 flg: 0x06
 consistency value in tail: 0xae1f0601
 check value in block header: 0xc9d7, computed block checksum: 0xc30f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
 
汇入的页47652 - 可能是介质损坏
***
Corrupt block relative dba: 0x0040ba24 (file 1, block 47652)
Fractured block found during dbv:
Data in bad block -
 type: 6 format: 2 rdba: 0x0040ba24
 last change scn: 0x0cce.9aabaf06 seq: 0x1 flg: 0x06
 consistency value in tail: 0x7baa0601
 check value in block header: 0xf2dc, computed block checksum: 0x9ec4
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
 
 
 
DBVERIFY - 验证完成
 
检查的页总数         128000
处理的页总数(数据):31569
失败的页总数(数据):1
处理的页总数(索引):5268
失败的页总数(索引):0
处理的页总数(其它):1546
处理的总页数 ()  : 0
失败的总页数 ()  : 0
空的页总数            89613
标记为损坏的总页数:4
汇入的页总数           4
Highest block SCN            : 14081498655836 (3278.2595859548)
 
E:\datafile>

检查出有4个坏块(43268441804405247652)。

 

再次用bbed工具修复坏块:

E:\datafile>bbed filename=system01.dbf password=blockedit blocksize=8192 mode=edit
 
BBED: Release 2.0.0.0.0 - Limited Production on 星期四 4 16 15:03:53 2015
 
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
 
************* !!! For Oracle Internal Use only !!! ***************
 
 
BBED> set block 43268
        BLOCK#          43268
 
BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0x02
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x0040a904
   ub4 bas_kcbh                             @8        0x9ab97baa
   ub2 wrp_kcbh                             @12       0x0cce
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x43af
   ub2 spare3_kcbh                          @18       0x0000
 
BBED> p tailchk
ub4 tailchk                                 @8188     0xaf060601

正常情况下tail check=scn base+block type+scn seq,因此taikchk的值应该是7baa + 06 + 01,0x7baa0601。而我们的当前值是0xaf060601

 

根据little-endian的规则,0x7baa0601应该写成:01 06 aa 7b

BBED> m /x 0106aa7b offset 8188
 File: system01.dbf (0)
 Block: 43268            Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 0106aa7b
 
 <32 bytes per line>
 
BBED> sum apply
Check value for File 0, Block 43268:
current = 0x4fa6, required = 0x4fa6
 
BBED> verify
DBVERIFY - 验证正在启动
FILE =system01.dbf
BLOCK = 43268
 
 
DBVERIFY - 验证完成
 
检查的总块数:1
已处理的总块数(数据):1
无法处理的总块数(数据):0
已处理的总块数(索引):0
无法处理的总块数(索引):0
空的总块数:0
标记为损坏的总数块:0
汇入的块总数:0

验证后,43268的块已经修复。用同样的方法将另外3个块修复。

 

再次打开数据库:

SQL> alter database open resetlogs;
数据库已更改。
SQL>

终于启动起来了。

不过还需要重建undo表空间,并立马将数据exp导出备份好。事实上,数据字典还是有所损坏,部分数据查询时还有错误,在此不再赘述。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22213086/viewspace-1583588/,如需转载,请注明出处,否则将追究法律责任。

下一篇: 没有了~
请登录后发表评论 登录
全部评论

注册时间:2009-07-28

  • 博文量
    10
  • 访问量
    62567