ITPub博客

首页 > Linux操作系统 > Linux操作系统 > Oracle错误处理思路(一)

Oracle错误处理思路(一)

原创 Linux操作系统 作者:zhanglincon 时间:2011-08-05 17:48:08 0 删除 编辑

思路一:
1,发现错误(从alert文件以及错误对应的trace文件)
2,到metalink上查询该错误号
3,根据metalink的描述开始猜测,把可能性罗列出来
4,通过重现问题来验证3中列出的可能性
5,找到原因后解决问题.

下面针对Ora-600(17069)错误的解决过程详细说明以上思路,该例来自杨廷坤大师.

详细的错误信息为:

Fri Feb 20 08:16:44 2009
Errors in file /u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc:
ORA-00600: internal error code, arguments: [17069], [0x6A5DEE1E0], [], [], [], [], [], []
Fri Feb 20 08:16:47 2009
Errors in file /u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc:
ORA-00600: internal error code, arguments: [17069], [0x6A5DEE1E0], [], [], [], [], [], []

进一步检查对应的trace文件:

bash-2.03$ more /u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc
/u1/oracle/admin/repdb01/bdump/repdb01_j015_5099.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /data/oracle/product/920
System name: SunOS
Node name: newreport
Release: 5.8
Version: Generic_117350-26
Machine: sun4u
Instance name: repdb01
Redo thread mounted by this instance: 1
Oracle process number: 35
Unix process pid: 5099, image: oracle@newreport (J015)

*** SESSION ID:(12.28191) 2009-02-20 08:16:44.060
*** 2009-02-20 08:16:44.060
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17069], [0x6A5DEE1E0], [], [], [], [], [], []
Current SQL statement for this session:
DECLARE job BINARY_INTEGER := :job; next_date DATE := :mydate; broken BOOLEAN := FALSE; BEGIN P_GENERATE_REPDATA('FR20T000002000000
0000032'); :mydate := next_date; IF broken THEN :b := 1; ELSE :b := 0; END IF; END;
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+328 CALL ksedst()+0 FFFFFFFF7FFF6430 ?
000000000 ? 000000000 ?
00000003E ?
FFFFFFFF7FFF6CC8 ?
1031D56C8 ?
kgeriv()+208 PTR_CALL 0000000000000000 000000000 ? 000103400 ?
0001035D9 ? 000102C00 ?
1035D9000 ? 1035D9C28 ?
kgesiv()+108 CALL kgeriv()+0 1035D9E88 ? 1036C7148 ?
000000258 ? 0000013C8 ?
FFFFFFFF7FFF7608 ?
1035DB258 ?
kgesic1()+32 CALL kgesiv()+0 1035D9E88 ? 1036C7148 ?
0000042AD ? 000000001 ?
FFFFFFFF7FFF7608 ?
004000000 ?
kglgob()+1972 CALL kgesic1()+0 1035D9E88 ? 1036C7148 ?
0000042AD ? 000000002 ?
6A5DEE1E0 ? 0000010A0 ?
kgldpo()+524 CALL kglgob()+0 000000000 ? 000000000 ?
6A5DEE1E0 ?
FFFFFFFF7FFF77A8 ?
000080000 ? 000000010 ?
kgldon()+248 CALL kgldpo()+0 000000000 ? 000000000 ?
69A79DA60 ? 000000001 ?
000000002 ?
FFFFFFFF7FFF7BEE ?
pkldon()+108 CALL kgldon()+0 1035D9E88 ?
FFFFFFFF7FFF7DE0 ?
69A79DA60 ? 000000001 ?
000000000 ?
FFFFFFFF7FFF7D8E ?
pkloud()+204 CALL pkldon()+0 FFFFFFFF7FFFA1A0 ?
FFFFFFFF7FFF7DE0 ?
69A79DA60 ? 000000001 ?
000000000 ?
FFFFFFFF7FFF7D8E ?
phnnrl_name_resolve CALL pkloud()+0 1033FCA90 ?
_by_loading()+280 FFFFFFFF7FFF7E3C ?
000000000 ? 000000000 ?
000030000 ? 6A475CE18 ?
phngdl_get_defining CALL phnnrl_name_resolve 000000000 ? 000020015 ?
_libunit()+124 _by_loading()+0 FFFFFFFF7FFF9830 ?
FFFFFFFF7FFF8160 ?
000020015 ? 000000000 ?
phnrpls_resolve_pre CALL phngdl_get_defining FFFFFFFF7FFF9830 ?
fix_libscope()+12 _libunit()+0 FFFFFFFF7FFF85A8 ?
FFFFFFFF7FFF843C ?
000000000 ? 000000000 ?
000000000 ?

无论是从trace文件对应的名称,还是从trace文件中对应的语句都可以确定,引起问题的是一个JOB

检查metalinkOracle在文档Doc ID: 39616.1中对这个错误的已知bug,进行了汇总,不过这些bug的描述似乎没有和当前十分相符的。

查看文档的描述,发现ORA-600错误的第二个参数,这里是0x6A5DEE1E0,代表Library Cache Object Handle

看来问题可能和LATCH有关。

但是根据信息在V$LATCHV$LATCH_CHILDREN视图中,没有找到有价值的信息。

这个JOB由于失败会自动再次执行,检查JOB运行时的V$LOCK信息:

SQL> SELECT ADDR, TYPE, ID1, ID2, LMODE, REQUEST, BLOCK
2 FROM V$LOCK
3 WHERE SID = 75;

ADDR TY ID1 ID2 LMODE REQUEST BLOCK
---------------- -- ---------- ---------- ---------- ---------- ----------
0000000690342780 CU -1.703E+09 6 6 0 0
00000006903426F8 JQ 0 63 6 0 0

V$LOCK中看不到什么特别有价值的信息,接着检查V$SESSION_WAIT,看看这个JOB在等待什么:

SQL> SELECT EVENT, P1TEXT, P1RAW, P2TEXT, P2RAW, STATE
2 FROM V$SESSION_WAIT
3 WHERE SID = 75;

EVENT P1TEXT P1RAW P2TEXT P2RAW STATE
----------------- --------------- ---------------- ------------ ---------------- -------
library cache pin handle address 00000006A5DEE1E0 pin address 00000006B1A971A8 WAITING

这次的信息就明显了,ORA-600错误的第二个参数就是V$SESSION_WAIT视图的P1RAW的值,而且从等待事件上也可以看到,问题就是出现在LIBRARY CACHE PIN的过程中。

重新查看METALINK的信息,这个错误可能发生在一个长时间运行的进程,在其运行过程中,所依赖的对象被编译或者删除了。

检查JOB调用的过程的状态:

SQL> SELECT OWNER, OBJECT_NAME, OBJECT_TYPE, STATUS
2 FROM DBA_OBJECTS
3 WHERE WNER = 'FUJIANREP'
4 AND OBJECT_NAME = 'P_GENERATE_REPDATA';

OWNER OBJECT_NAME OBJECT_TYPE STATUS
------------------------------ ------------------------------ ------------------ -------
FUJIANREP P_GENERATE_REPDATA PROCEDURE INVALID

果然问题过程处于不正常的状态。

JOB至于BROKEN状态,避免JOB再次运行:

SQL> EXEC DBMS_JOB.BROKEN(63, TRUE)

PL/SQL procedure successfully completed.

SQL> COMMIT;

Commit complete.

杀掉JOB对应的PROCESS

SQL> SELECT SPID FROM V$PROCESS WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = 75);

SPID
------------
14927

SQL> HOST kill -9 14927

下面用重新编译该过程:

SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;
ALTER PROCEDURE P_GENERATE_REPDATA COMPILE
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object FUJIANREP.P_GENERATE_REPDATA

由于从V$LOCKV$LATCH无法得到信息,只能看看有没有其他人当前在访问P_GENERATE_REPDATA所依赖的对象:

SQL> SELECT * FROM V$ACCESS
2 WHERE (OWNER, OBJECT) IN
3 (SELECT REFERENCED_OWNER, REFERENCED_NAME
4 FROM DBA_DEPENDENCIES
5 WHERE WNER = 'FUJIANREP'
6 AND NAME = 'P_GENERATE_REPDATA');

SID OWNER OBJECT TYPE
---------- ------------------------------ ------------------------------ ------------------------
54 FUJIANREP CAT_BUYER SYNONYM
54 FUJIANREP CAT_CATEGORY SYNONYM
54 FUJIANREP CAT_DOSEAGE_FORM. SYNONYM
54 FUJIANREP CAT_DRUG SYNONYM
54 FUJIANREP CAT_ENTERPRISE SYNONYM
54 FUJIANREP CAT_METRIC SYNONYM
54 FUJIANREP CAT_ORG SYNONYM
54 FUJIANREP CAT_PRODUCT SYNONYM
54 FUJIANREP CAT_QUALITY_DEFINE SYNONYM
54 FUJIANREP GOV_CAT_BUYER TABLE
54 FUJIANREP GOV_CAT_ENTERPRISE TABLE
54 FUJIANREP GOV_S_MO_BU TABLE
54 FUJIANREP GOV_S_MO_BU_EN TABLE
54 FUJIANREP GOV_S_MO_BU_PR TABLE
54 FUJIANREP GOV_S_MO_EN TABLE
54 FUJIANREP GOV_S_MO_ME TABLE
54 FUJIANREP GOV_S_MO_ME_CA TABLE
54 FUJIANREP GOV_S_MO_ME_PR TABLE
54 FUJIANREP GOV_S_MO_ORDER TABLE
54 FUJIANREP GOV_S_YE_ORDER TABLE
54 FUJIANREP GRP_HOSPITAL TABLE
54 FUJIANREP GRP_LEVEL TABLE
54 FUJIANREP ORD_ORDER TABLE
54 FUJIANREP ORD_ORDER_ITEM TABLE
54 FUJIANREP ORD_ORDER_ITEM_REP CURSOR
54 FUJIANREP ORD_ORDER_RECEIVE TABLE
54 FUJIANREP ORD_ORDER_RECEIVE_REP SYNONYM
54 FUJIANREP ORD_ORDER_REP CURSOR
54 FUJIANREP ORD_ORDER_RETURN TABLE
54 FUJIANREP ORD_ORDER_RETURN_REP CURSOR
54 FUJIANREP PLT_PLAT CURSOR
54 FUJIANREP USER_TAB_PARTITIONS CURSOR
54 NDMAIN CAT_BUYER TABLE
54 NDMAIN CAT_CATEGORY TABLE
54 NDMAIN CAT_DOSEAGE_FORM. TABLE
54 NDMAIN CAT_DRUG TABLE
54 NDMAIN CAT_ENTERPRISE TABLE
54 NDMAIN CAT_METRIC TABLE
54 NDMAIN CAT_ORG TABLE
54 NDMAIN CAT_PRODUCT TABLE
54 NDMAIN CAT_QUALITY_DEFINE TABLE
54 NDMAIN ORD_ORDER VIEW
54 NDMAIN ORD_ORDER_ITEM VIEW
54 NDMAIN ORD_ORDER_RECEIVE VIEW
54 NDMAIN ORD_ORDER_RETURN VIEW
54 NDMAIN PLT_PLAT TABLE
54 PUBLIC USER_TAB_PARTITIONS SYNONYM
54 SYS STANDARD PACKAGE
145 SYS STANDARD PACKAGE
54 SYS SYS_STUB_FOR_PURITY_ANALYSIS PACKAGE
54 SYS USER_TAB_PARTITIONS VIEW

51 rows selected.

对象果然被其他人所访问,看看这个会话在做什么:

SQL> SELECT SID, SERIAL#, USERNAME, PROGRAM, TERMINAL
2 FROM V$SESSION
3 WHERE SID = 54;

SID SERIAL# USERNAME PROGRAM TERMINAL
---------- ---------- ------------------------------ ------------ ----------
54 26216 FUJIANREP PlSqlDev.exe LIBY

没想到是同事的连接的会话,看看他在干什么:

SQL> SELECT SQL_TEXT FROM V$SQL
2 WHERE ADDRESS IN
3 (SELECT SQL_ADDRESS FROM V$SESSION
4 WHERE SID = 54);

SQL_TEXT
---------------------------------------------------------------------
ALTER TABLE GOV_S_MO_EN TRUNCATE PARTITION P200901

居然是TRUNCATE分区操作,难怪会导致过程处于INVALID状态,不过这个操作应该不会持续很长时间的,难道这个操作一直没有完成吗:

SQL> SELECT EVENT, P1TEXT, P1, P2TEXT, P2, P3TEXT, P3, SECONDS_IN_WAIT
2 FROM V$SESSION_WAIT WHERE SID = 54;

EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3 SECONDS_IN_WAIT
------------------------- ------- ---- -------- -------- -------- ---- ---------------
db file sequential read file# 1 block# 170158 blocks 1 3995643

这个等待已经发生了几十天了,显然这是一个僵死的会话。

从后台kill掉对应的进程:

SQL> SELECT SPID FROM V$PROCESS
2 WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = 54);

SPID
------------
12974

SQL> HOST kill -9 12974

切换为FUJIANREP用户,再次编译过程:

SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;

Procedure altered.

至此,问题解决。将JOB重新设置BROKEN即可。

SQL> EXEC DBMS_JOB.BROKEN(63, FALSE)

PL/SQL procedure successfully completed.

SQL> COMMIT;

Commit complete.

列出Metalink:Note:146580.1

ORA-600 [17069] "Failed to pin a library cache object after 50 attempts" [ID 39616.1]

修改时间 08-JUL-2011 类型 REFERENCE 状态 PUBLISHED


Note: For additional ORA-600 related information please read Note:146580.1

PURPOSE:            
  This article discusses the internal error "ORA-600 [17069]", what 
  it means and possible actions. The information here is only applicable 
  to the versions listed and is provided only for guidance.
 
ERROR:              
#helpWin {position:absolute; width:400; visibility:hidden; background-color:#f7f7e7; padding:20px; border-width:2px; border-left-color:#336699; border-top-color:#336699; border-right-color:#336699; border-bottom-color:#336699; border-style.:solid; }
Help text goes here.

 
VERSIONS:           
  versions 7.0 and above
 
DESCRIPTION:        

  Failed to pin a library cache object after 50 attempts.
 
ARGUMENTS:          
  Arg [a] Library Cache Object Handle.
 
FUNCTIONALITY:      
  GENERIC LAYER
 
IMPACT:             
  PROCESS FAILURE
  NO DATA CORRUPTION
 
SUGGESTIONS:        

  It is possible to get this error on a long running process if the 
  dependent object has been dropped or recompiled.

  If the error occurs on a procedure, attempt to recompile the procedure.

  Reference Note:1058215.6 if receiving this error in release 7.3 while 
  using database links.

  For repeatable occurrences, please submit the trace files and alert.log 
  to Oracle Support for further analysis.

  Known issues:


function getdoc(docstr) { pos=docstr.indexOf(':'); docid=docstr.substring(pos+1,docstr.length); tgt="_self"; if ( location.href.slice(7,12) == "webiv" || location.href.slice(7,11) == "rmsu" ) { loc="Get?WwwID=note:" + docid; window.open(loc,tgt) return; } if ( location.href.slice(7,12) == "metal" || location.href.slice(7,12) == "suppo" ) { loc="showdoc?db=NOT&id=" + docid; window.open(loc,tgt) return; } else { loc="http://metalink.oracle.com/metalink/plsql/showdoc"+ "?db=NOT&id="+docid; window.open(loc,tgt) return; } } Known Bugs "+ "Other bugs may affect this version "+ " but have not been confirmed as being relevant yet."; } } } // ===== ACT ON URL ===== window.onload = SetVer; // ======================

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:



NB Bug Fixed Description
10138929 12.1.0.0 ORA-600 [17069] in replication receiver / self deadlock on replication internal package V10020002 V10020003 V10020004 V10020005 V11010006 V11010007 V11020001 V11020002 DEADLOCK HANG/DB OERI SYMREP
4587556 10.2.0.4, 11.1.0.6 OERI:17069 compiling a package with a self referencing synonym V10020002 V10020003 OERI PLSQL
I 10091698 10.2.0.5.3 Upgrade to 10.2.0.5 gets ORA-600 [17069] when running catupgrd.sql with MultiMedia (intermedia) installed V10020002 V10020003 V10020004 V10020005 IMAGE MIG NLS
5016165 OERI[17069] during 9.2 to 10.2 upgrade when running catupgrd.sql V10010005 MIG OERI
2894277 9.2.0.4, 10.1.0.2 OERI[17200] accessing 9i -> 8i remote object with BFILE column from PLSQL DBLINK ERROR INTEROP LOB OERI
2878379 9.2.0.8, 10.1.0.2 ALTER PACKAGE COMPILE can hang then OERI:17069 after many hours HANG OERI
2176969 9.0.1.4, 9.2.0.1 OERI:17069 possible on database OPEN after downgrade from 9.2 to 9.0 MIG OERI
  • '*' against a bug indicates that an alert exists for that issue.
  • '+' indicates a particularly notable bug.
  • 'P' indicates a port specific bug.
  • "OERI:nnnnn" is used as shorthand for ORA-600 [nnnnn].

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/8554499/viewspace-704304/,如需转载,请注明出处,否则将追究法律责任。

上一篇: JAVA RMI
下一篇: v$session_wait中p1 p2 p3应用总结分析
请登录后发表评论 登录
全部评论

注册时间:2009-03-24

最新文章