ITPub博客

首页 > 数据库 > Oracle > 【故障诊断】cr块slot notfound解决过程

【故障诊断】cr块slot notfound解决过程

原创 Oracle 作者:yellowlee 时间:2014-11-12 10:18:14 0 删除 编辑
在一次严重故障中控制文件重建,强行拉起库,前面的问题解决了(HA引发的血案,涉及他人误操作,具体略)
数据库和应用起来后却
遇到大量cr块slot notfound的问题,时间正是业务高峰的白天,一阵头皮发麻,具体报错内容:
alert:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
相关trace
tpioadb1<*oa*/home/oracle/app/diag/rdbms/oa/oa/trace>$ls -trl *.trc
-rw-r-----    1 oracle   oinstall  181401269 Oct 14 12:27 oa_ora_17825870.trc
-rw-r-----    1 oracle   oinstall  181462560 Oct 14 12:28 oa_ora_49872920.trc
-rw-r-----    1 oracle   oinstall  181776046 Oct 14 12:29 oa_ora_52166770.trc
-rw-r-----    1 oracle   oinstall  182010875 Oct 14 12:29 oa_ora_47317200.trc
-rw-r-----    1 oracle   oinstall  182080391 Oct 14 12:30 oa_ora_198714.trc
-rw-r-----    1 oracle   oinstall  182205907 Oct 14 12:30 oa_ora_47448310.trc
-rw-r-----    1 oracle   oinstall  182159439 Oct 14 12:31 oa_ora_48234660.trc
-rw-r-----    1 oracle   oinstall  182186772 Oct 14 12:31 oa_ora_41617502.trc
-rw-r-----    1 oracle   oinstall  182186742 Oct 14 12:31 oa_ora_48824328.trc
-rw-r-----    1 oracle   oinstall  182186771 Oct 14 12:31 oa_ora_52166772.trc
-rw-r-----    1 oracle   oinstall  182258339 Oct 14 12:32 oa_ora_39913614.trc
-rw-r-----    1 oracle   oinstall  182258369 Oct 14 12:32 oa_ora_30148840.trc
-rw-r-----    1 oracle   oinstall  182375997 Oct 14 12:32 oa_ora_17825876.trc
-rw-r-----    1 oracle   oinstall  182408523 Oct 14 12:32 oa_ora_66256924.trc
-rw-r-----    1 oracle   oinstall  182408580 Oct 14 12:32 oa_ora_20121758.trc
-rw-r-----    1 oracle   oinstall  182490385 Oct 14 12:32 oa_ora_6424806.trc
-rw-r-----    1 oracle   oinstall  182490387 Oct 14 12:33 oa_ora_15272010.trc
-rw-r-----    1 oracle   oinstall  182490356 Oct 14 12:33 oa_ora_4851758.trc
-rw-r-----    1 oracle   oinstall  182516313 Oct 14 12:33 oa_ora_16713804.trc
-rw-r-----    1 oracle   oinstall  182594432 Oct 14 12:33 oa_ora_28510210.trc
-rw-r-----    1 oracle   oinstall  182616536 Oct 14 12:33 oa_ora_30015720.trc
-rw-r-----    1 oracle   oinstall  182700158 Oct 14 12:33 oa_ora_25102374.trc
-rw-r-----    1 oracle   oinstall  182734746 Oct 14 12:34 oa_ora_28641506.trc
-rw-r-----    1 oracle   oinstall  182734715 Oct 14 12:34 oa_ora_3016860.trc
-rw-r-----    1 oracle   oinstall  182841440 Oct 14 12:34 oa_ora_52429006.trc
-rw-r-----    1 oracle   oinstall  182865907 Oct 14 12:34 oa_ora_47185978.trc
-rw-r-----    1 oracle   oinstall  182865909 Oct 14 12:34 oa_ora_57212934.trc
-rw-r-----    1 oracle   oinstall  182865907 Oct 14 12:34 oa_ora_23267338.trc
-rw-r-----    1 oracle   oinstall  182865938 Oct 14 12:34 oa_ora_14354628.trc
-rw-r-----    1 oracle   oinstall   70326197 Oct 14 12:34 oa_ora_7276572.trc
-rw-r-----    1 oracle   oinstall  152186229 Oct 14 12:34 oa_ora_52953126.trc
-rw-r-----    1 oracle   oinstall   96766948 Oct 14 12:34 oa_ora_47448312.trc
-rw-r-----    1 oracle   oinstall   25950203 Oct 14 12:34 oa_ora_44040304.trc
-rw-r-----    1 oracle   oinstall     124525 Oct 14 12:34 oa_ora_30015724.trc
-rw-r-----    1 oracle   oinstall   11891981 Oct 14 12:34 oa_ora_23332896.trc
-rw-r-----    1 oracle   oinstall   67996842 Oct 14 12:34 oa_ora_20580500.trc
-rw-r-----    1 oracle   oinstall   50813235 Oct 14 12:34 oa_ora_17301742.trc
-rw-r-----    1 oracle   oinstall  171017368 Oct 14 12:34 oa_ora_14811386.trc


tpioadb1<*oa*/home/oracle/app/diag/rdbms/oa/oa/trace>$more oa_ora_52429006.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_52429006.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name:    AIX
Node name:      tpioadb1
Release:        1
Version:        6
Machine:        00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 169
Unix process pid: 52429006, image: oracle@tpioadb1




*** 2014-10-14 12:33:21.941
*** SESSION ID:(1289.36435) 2014-10-14 12:33:21.941
*** CLIENT ID:() 2014-10-14 12:33:21.941
*** SERVICE NAME:(SYS$USERS) 2014-10-14 12:33:21.941
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 12:33:21.941
*** ACTION NAME:() 2014-10-14 12:33:21.941
 
* kdsgrp1-1: *************************************************
            row 0x024e73b1.97 continuation at
            0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump:  0x024e73b1
 Object id on Block? Y
 seg/obj: 0x1ce1e  csc: 0x638.28d23ba0  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x24e7281 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x003b.012.0000246a  0x00c048a5.011d.32  --U-    1  fsc 0x0016.28d23ba1
0x02   0x0002.006.00021b71  0x00c03c7f.1478.18  C---    0  scn 0x0638.28d21c9c
bdba: 0x024e73b1
data_block_dump,data header at 0x70001028890e064
===============

通过语句找到该对象和块:
Select owner, segment_name, segment_type, partition_name,tablespace_name 
From dba_extents 
Where relative_fno = 9
And  947121 between block_id and (block_id+blocks-1);

dump该块,发现该块里面只有1个table的216行,由于系统已经带病运行,且无可用备份,无停机时间,意味着最多可能丢失这个表的216行数据。
dump内容如下
tpioadb1<*oa*/home/oracle/app/diag/rdbms/oa/oa/trace>$more oa_ora_38142130.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_38142130.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name:    AIX
Node name:      tpioadb1
Release:        1
Version:        6
Machine:        00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 177
Unix process pid: 38142130, image: oracle@tpioadb1




*** 2014-10-14 14:39:49.105
*** SESSION ID:(167.64629) 2014-10-14 14:39:49.105
*** CLIENT ID:() 2014-10-14 14:39:49.105
*** SERVICE NAME:(SYS$USERS) 2014-10-14 14:39:49.105
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 14:39:49.105
*** ACTION NAME:() 2014-10-14 14:39:49.105
 
* kdsgrp1-1: *************************************************
            row 0x024e73b1.97 continuation at
            0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump:  0x024e73b1
 Object id on Block? Y
 seg/obj: 0x1ce1e  csc: 0x638.29bb4112  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x24e7281 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x002d.01a.0000a6d7  0x04c08774.04a9.29  --U-    1  fsc 0x0017.29bb4115
0x02   0x002f.007.0000945e  0x00c02f56.0412.0d  C---    0  scn 0x0638.29bb08d6
bdba: 0x024e73b1
data_block_dump,data header at 0x7000105876a6064
===============
tsiz: 0x1f98
hsiz: 0x1c2
pbl: 0x7000105876a6064
     76543210
flag=--------
ntab=1
nrow=216
frre=29
fsbo=0x1c2
fseo=0x7d4
avsp=0xbe9
tosp=0xc02
0xe:pti[0]      nrow=216        offs=0



后通过业务分析,找到白天有dml的行,通过create as select查询者部分行,发现会话hang,报错依旧,采用逐行insert的方式,最终定位出有问题的只有一行,通过dump的行数据,将关键字段转化出来给业务确认,确认为该行为无效行,剔除该行后,安排在下班后维护窗口内重建该表,问题解决,业务未受到影响。


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16179598/viewspace-1329351/,如需转载,请注明出处,否则将追究法律责任。

上一篇: c#简易 log
下一篇: 博客重新开启
请登录后发表评论 登录
全部评论

注册时间:2008-12-27

  • 博文量
    316
  • 访问量
    657333