ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 遭遇ORA-03113 & ORA-07445

遭遇ORA-03113 & ORA-07445

原创 Linux操作系统 作者:jcq0 时间:2009-01-05 16:32:19 0 删除 编辑

1.问题描述

 客户说最近几天产生大量trace文件导致文件系统撑满

2.数据库版本

该客户数据库版本8.1.7.4 for aix 4.3
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle8i Enterprise Edition Release 8.1.7.4.0 - 64bit Production
PL/SQL Release 8.1.7.4.0 - Production
CORE    8.1.7.0.0       Production
TNS for IBM/AIX RISC System/6000: Version 8.1.7.4.0 - Production
NLSRTL Version 3.4.1.0.0 - Production

3.初步诊断

登上去看到
SNP1 started with pid=207
Fri Jan  2 23:31:30 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snp1_91196_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:32:36 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snp2_251096_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:34:27 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snp4_148778_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:34:41 2009
Restarting dead background process SNP0
SNP0 started with pid=13
Fri Jan  2 23:34:41 2009
Restarting dead background process SNP3
SNP3 started with pid=16
Fri Jan  2 23:34:41 2009
Restarting dead background process SNP4
SNP4 started with pid=35
Fri Jan  2 23:34:45 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snp0_74068_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:35:47 2009
Restarting dead background process SNP2
SNP2 started with pid=113
Fri Jan  2 23:35:49 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snp3_48338_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:36:53 2009
Restarting dead background process SNP1
SNP1 started with pid=104
Fri Jan  2 23:36:56 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snp1_151260_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:37:59 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snpb_195544_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:39:04 2009
Restarting dead background process SNPB
SNPB started with pid=104
Fri Jan  2 23:39:08 2009
Errors in file /oracle/app/oracle/admin/database_sid/bdump/snpb_195570_database_sid.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Fri Jan  2 23:39:37 2009

看来就是job出了问题导致不断的产生trace文件,才是文件系统不断暴涨。
再看,发现这个问题已经在5天之前就出现了。

4.新的疑问

按道理,如果job失败的话,失败16次就会被broken,而这里的job重启了这么多次还在不断的重启,不断地产生trace文件.

SQL> select count(FAILURES) from dba_jobs_running;

COUNT(FAILURES)
---------------
              121

不断检查发现job重启并不增加job失败的次数.

5.进一步检查

打开trc文件发现都是如下内容:
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Current SQL statement for this session:
select .... from a,b,c,d,e where ......
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp+00fc          bl       ksedst               100000001 ?
ssexhd+0284          bl       ksedmp               300000067 ?
0000480C             ?        00000000
ddfnet2Normal+022c   bl       ddfnetCFull          1101F4B58 ? 100000001 ?
                                                   1101FDB60 ? FFFFFFFFFFF74B0 ?
                                                   70000023D5B4500 ? 000000000 ?
                                                   000000001 ? 000000000 ?
kkmtab+0840          bl       ddfnet2Normal        110012BE0 ? 000000000 ?
                                                   700000216298118 ?
kkmprfro+0078        bl       kkmtab               100A39620 ? 000000001 ?
                                                   FFFFFFF00000005 ? 00000133A ?
                                                   FFFFFFFFFFF7710 ?
kkmdrv+03a4          bl       kkmprfro             FFFFFFFFFFF78C0 ? 101C29DA8 ?
                                                   FFFFFFFFFFF77E0 ? 110011718 ?
                                                   100A12E0C ? 100000001 ?
kkmevw+064c          bl       kkmdrv               000000010 ? 100000001 ?
                                                   10183CC6C ? 000000000 ?
kkmtab+0e34          bl       kkmevw               FFFFFFFFFFFA1A0 ?
                                                   700000224D0C8D8 ?
                                                   70000022265ADF0 ?
                                                   135000000000003 ? 000000000 ?
kkmprfro+0078        bl       kkmtab               11000D0A0 ? 700000204A7BEF0 ?
                                                   70000020DE4F6A8 ?
                                                   7000001E8B884E0 ? 000000000 ?
kkmdrv+03a4          bl       kkmprfro             000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? FFFFFFFFFFFA1A0 ?
opiprs+0c64          bl       kkmdrv               000000024 ? 100000001 ?
                                                   101C389D4 ?
                                                   44202482FFFF7F20 ?
kksald+03c8          bl       opiprs               FFFFFFFFFFFA1A0 ?
                                                   FFFFFFFFFFFD030 ? 000000347 ?
                                                   70000020A9C5718 ?
                                                   FFFFFFFFFFF8780 ?
                                                   2220242200000060 ?
                                                   101774AC4 ?
rpiswu2+02bc         bl       _ptrgl
kkslod+0fd8          bl       rpiswu2              7000001E897B1B8 ? 000000000 ?
                                                   70000020A9C5D78 ? 200000002 ?
                                                   70000020A9C5E88 ? 000000000 ?
                                                   70000020A9C5DB4 ? 000000000 ?
kglobld+0318         bl       _ptrgl
kglobpn+0458         bl       kglobld              11000F378 ? 700000216298008 ?
                                                   FFFFFFFFFFFAFF8 ?
                                                   700000230E3CD60 ?
                                                   7000001F7EADF38 ?
kglpim+0184          bl       kglobpn              000000000 ? 700000228C3CED8 ?
                                                   FFFFFFFFFFFA9A0 ?
                                                   70000023C8956E0 ?
kglpin+092c          bl       kglpim               11000F378 ? FFFFFFFFFFFAFF8 ?
                                                   700000230E3CD60 ?
kksfbc+1d18          bl       kglpin               11000F378 ? FFFFFFFFFFFAFF8 ?
                                                   70000023C8956E0 ?
                                                   700000228C3CE78 ? 300000000 ?
                                                   1017903D8 ? 1101F4BF8 ?
kkspsc0+0728         bl       kksfbc               1101F4B58 ? 300000003 ?
                                                   800000008 ? FFFFFFFFFFFD030 ?
                                                   000000347 ? 70000023AFBC3E8 ?
                                                   000000000 ? 000000000 ?
opiosq0+087c         bl       kkspsc0              1101F25A8 ? FFFFFFFFFFFD030 ?
                                                   000000347 ? 300000003 ?
                                                   600000006 ? 400000000000004 ?
                                                   000000000 ?
kpooprx+0130         bl       opiosq0              3FFFFC5C0 ?
                                                   42202022101F4B58 ?
                                                   100121C34 ? 4FFFFFFFFFFD550 ?
kpoal8+02d0          bl       kpooprx              FFFFFFFFFFFD554 ?
                                                   FFFFFFFFFFFD030 ?
                                                   34600000346 ? 100000001 ?
                                                   000000000 ? 000007FFF ?
                                                   00000010E ? 000000006 ?
opiodr+06bc          bl       _ptrgl
ttcpip+0a74          bl       _ptrgl
opitsk+06d8          bl       ttcpip               57B0000057B ? 11010FA60 ?
                                                   FFFFFFFFFFFE248 ? 0000007FF ?
                                                   000000800 ? 000000000 ?
                                                   000000001 ? 000012D81 ?
opiino+0670          bl       opitsk               110013C88 ?
opiodr+06bc          bl       _ptrgl
opidrv+056c          bl       opiodr               3C00000002 ? 4200FE590 ?
                                                   FFFFFFFFFFFFE00 ? 000000000 ?
sou2o+0028           bl       opidrv               3C0000003C ? 400000004 ?
                                                   FFFFFFFFFFFFE00 ?
main+0128            bl       sou2o                90000000022C5B0 ?
                                                   9FFFFFFF0009E00 ?
                                                   FFFFFFFFFFFFE10 ?
                                                   9001000A0214428 ?
__start+0090         bl       main                 000000000 ? 000000000 ?
----- Argument/Register Address Dump -----

6.问题定位

尝试单独执行trace中的sql语句报同样错误
前台
ORA-03113: 通信通道的文件结束 
后台
ORA-07445: exception encountered: core dump [] [] [] [] [] []

由此定位问题由该语句引起,而和是否在job中没有关系

由于语句涉及多个表,所以不断对各个表进行各种操作将问题定位到表a

继续查看表a 发现它根本不是一个表 而是一个视图 类似如下
select ... from f.g@h

schema f下的表g的select 权限授予了db link h所在的用户,其中远程数据库版本9.2.0.4

7.该死的bug

再次回到trace文件发现与一个bug基本相符

Bug 2610353 - Dump / memory corruption possible from remote select

Dump / memory corruption can occur when doing a select
of a remote table if the remote table contains "hidden"
columns (such as SYS_NC_OID$ , SYS_NC_ROWINFO$ etc).

During select of remote table the local session performs
a describe of the table then checks whether any indexes
reference any columns of that table.
 
Workaround:
  Remove the index/s on the table (may not be feasible)

查看该补丁仅在solaris的9以上的版本才有对应的补丁
无奈只能建议客户尝试drop掉函数索引了,但即使drop掉函数索引也may not be feasible。

8附:

8.1.将一台版本为9.2.0.4的数据库做为本地库连接远程数据库没有该问题出现。
8.2.将8.1.7.4的库做为本地库,将另外一台9.2.0.4的数据库做为远程数据库,建立测试表和测试的函数索引没能重现该问题。

看来这个问题并不总是出现,而且oracle还告诉我们除了升级还并不总是能够解决,oh,yeah,神奇的oracle !!!

(需要引用, 请注明出处:痴情甲骨文http://space.itpub.net/14130873)

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/14130873/viewspace-526840/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2008-11-25

  • 博文量
    93
  • 访问量
    347732