ITPub博客

首页 > 数据库 > Oracle > 不断有core文件在$ORACLE_HOME/dbs目录产生

不断有core文件在$ORACLE_HOME/dbs目录产生

原创 Oracle 作者:kdhkdh 时间:2014-01-21 14:09:39 0 删除 编辑
oracle版本:oracle 11.2.0.3
平台:aix 6.1
现象:不断有core dump文件在$ORACLE_HOME/dbs目录产生

第一步:分析core文件
$ cd $ORACLE_HOME/dbs
$ ls
core0                  core14                 core2                  core8                  hc_circbjdg1.dat
core1                  core15                 core3                  core9                  init.ora
core10                 core16                 core4                  core_15270062          initcircbjdg1.ora
core11                 core17                 core5                  core_16384198          initcircbjdg1.ora.bak
core12                 core18                 core6                  core_16384200          orapwcircbjdg1
core13                 core19                 core7                  core_17367274          snapcf_circbjdg1.f
$ cd core_17367274
$ dbx 
Type 'help' for help.
enter object file name (default is `a.out', ^D to exit): $ORACLE_HOME/bin/oracle core
cannot read $ORACLE_HOME/bin/oracle core
enter object file name (default is `a.out', ^D to exit): ^C$ 
$ dbx $ORACLE_HOME/bin/oracle core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...


IOT/Abort trap in pthread_kill at 0x9000000004efa30 ($t1)
0x9000000004efa30 (pthread_kill+0xb0) e8410028          ld   r2,0x28(r1)
(dbx) where
pthread_kill(??, ??) at 0x9000000004efa30
_p_raise(??) at 0x9000000004ef2a8
raise.raise(??) at 0x90000000002c2ac
abort() at 0x90000000007d084
skgdbgcra(??) at 0x1008c45f0
sksdbgcra(??, ??) at 0x1028b1160
ksdbgcra() at 0x1028b0b30
ssexhd(??, ??, ??) at 0x10299d3a8
ksmpclrpga() at 0x101e2d9f0
opidcl(??, ??) at 0x10781dd84
opidrv(??, ??, ??) at 0x10781d660
sou2o(??, ??, ??, ??) at 0x1078131e8
opimai_real(??, ??) at 0x10000089c
ssthrdmain(??, ??) at 0x1000ee84c
main(??, ??) at 0x10000064c
(dbx) quit

第二步:找oracle bug号
Bug 13808372 : CORE GENERATED IN ORACLE_HOME/DBS
单击此项可添加到收藏夹 通过电子邮件发送此文档的链接 可打印页 转到底部转到底部
 

Bug 属性

 

类型 B - Defect 已在产品版本中修复
严重性 2 - Severe Loss of Service 产品版本 11.2.0.3
状态 95 - Closed, Vendor OS Problem 平台 212 - IBM AIX on POWER Systems (64-bit)
创建时间 2012-3-5 平台版本 6.1
更新时间 2013-11-4 基本 Bug N/A
数据库版本 11.2.0.3 影响平台 Generic
产品源 Oracle
 

相关产品

 

产品线 Oracle Database Products 系列 Oracle Database
区域 Oracle Database 产品 5 - Oracle Database - Enterprise Edition
Hdr: 13808372 11.2.0.3 RDBMS 11.2.0.3 UNKNOWN PRODID-5 PORTID-212
Abstract: CORE GENERATED IN ORACLE_HOME/DBS *** 03/05/12 12:30 am *** PROBLEM:
--------
CORE FILE ARE GENERATED IN $ORACLE_HOME/dbs 
DATABASE Version is 11.2.0.3 
 
$ ls -l 
drwxrwx---   74 oracle   dba-4096 Feb 10 13:24 ..
drwxr-x---    2 oracle   dba-256 Mar 02 11:51 core_4260040
drwxr-x---    2 oracle   dba-256 Mar 02 11:52 core_7340074
drwxr-x---    2 oracle   dba-256 Mar 02 11:52 core_6225922
drwxr-x---    2 oracle   dba-256 Mar 02 11:52 core_17236164
drwxr-x---    2 oracle   dba-256 Mar 02 12:00 core_9175102
drwxr-x---    2 oracle   dba-256 Mar 02 12:00 core_7798936
drwxr-x---    2 oracle   dba-256 Mar 02 12:02 core_9502832
drwxr-x---    2 oracle   dba-256 Mar 02 12:02 core_13697132
drwxr-x---    2 oracle   dba-256 Mar 02 12:11 core_7667918
drwxr-x---    2 oracle   dba-256 Mar 02 12:12 core_5570702
drwxr-x---    2 oracle   dba-256 Mar 02 12:15 core_12714056
drwxr-x---    2 oracle   dba-256 Mar 02 12:21 core_11272362
drwxr-x---    2 oracle   dba-256 Mar 02 12:22 core_5570670
drwxr-x---    2 oracle   dba-256 Mar 02 12:31 core_12714106
drwxr-x---    2 oracle   dba-256 Mar 02 12:32 core_9175240
drwxr-x---    2 oracle   dba-256 Mar 02 12:41 core_7995586
drwxr-x---    2 oracle   dba-256 Mar 02 12:42 core_8978624
drwxr-x---    2 oracle   dba-256 Mar 02 12:51 core_14942342
drwxr-x---    2 oracle   dba-256 Mar 02 12:52 core_6029324
drwxr-x---    2 oracle   dba-256 Mar 02 12:52 core_7733320
............
 
CT say there is no error such lke ora-7445 in alertlog.
 
DIAGNOSTIC ANALYSIS:
--------------------
drwxr-xr-x  462 oracle   dba-24576 Feb 27 17:44 ..
-rw-r-----    1 oracle   dba-13298965 Feb 27 17:44 core
[DGQIS01] oracle@gqmdbd01:/ora_engine/1120/dbs/core_15270044 $ file core
core: AIX core file fulldump 64-bit, oracle
 
[DGQIS01] oracle@gqmdbd01:/ora_engine/1120/dbs/core_4260040 $ dbx 
$ORACLE_HOME/bin/oracle core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...
 
IOT/Abort trap in pthread_kill at 0x9000000004efa30 ($t1)
0x9000000004efa30 (pthread_kill+0xb0) e8410028          ld   r2,0x28(r1)
(dbx) where
pthread_kill(??, ??) at 0x9000000004efa30
_p_raise(??) at 0x9000000004ef2a8
raise.raise(??) at 0x90000000002c2ac
abort() at 0x90000000007d084
skgdbgcra(??) at 0x1008c45f0
sksdbgcra(??, ??) at 0x102db2440
ksdbgcra() at 0x102db1e10
ssexhd(??, ??, ??) at 0x102e9b5a8
.() at 0x0
dbgerEvaluateRules(??, ??, ??) at 0x1006d1610
dbgerEvaluateRules(??, ??, ??) at 0x1006d1610
dbgexPhaseII(??, ??, ??) at 0x1002c30b4
dbgexExplicitEndInc(??, ??) at 0x1002c429c
dbgeEndDDEInvocationImpl(??, ??) at 0x10015ec20
dbgeEndDDEInvocation(??) at 0x10015e930
ssexhd(??, ??, ??) at 0x102e9b4bc
.() at 0x0
dbgerEvaluateRules(??, ??, ??) at 0x1006d1610
dbgerEvaluateRules(??, ??, ??) at 0x1006d1610
dbgexPhaseII(??, ??, ??) at 0x1002c30b4
dbgexExplicitEndInc(??, ??) at 0x1002c429c
dbgeEndDDEInvocationImpl(??, ??) at 0x10015ec20
dbgeEndDDEInvocation(??) at 0x10015e930
ssexhd(??, ??, ??) at 0x102e9b4bc
ksmpclrpga() at 0x101e2d620
opidcl(??, ??) at 0x107587224
opidrv(??, ??, ??) at 0x107586b00
sou2o(??, ??, ??, ??) at 0x10757c688
opimai_real(??, ??) at 0x10000089c
ssthrdmain(??, ??) at 0x1000ee84c
main(??, ??) at 0x10000064c
(dbx) quit 
 
WORKAROUND:
-----------
n/a
 
RELATED BUGS:
-------------
i check the known issue like below
but CT does not use EM/GRID CONTROL Agent & RMAN, TSM.
 
++ RMAN Core Dumps With TSM Client 6.x (Doc ID 1248324.1)
++ RMAN Creating Core Dump Files in $ORACLE_HOME/dbs (Doc ID 1275194.1)
++ Core Files Generated Under $ORACLE_HOME/dbs Directory (Doc ID 1327258.1)
 
REPRODUCIBILITY:
----------------
YES, EVERY DAY
 
TEST CASE:
----------
N/A
 
STACK TRACE:
------------
pthread_kill <- p_raise <- raise <- abort <- skgdbgcra
       <- sksdbgcra <- ksdbgcra <- ssexhd <- dbgerEvaluateRules <- 
dbgerEvaluateRules
        <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocationImpl <- 
dbgeEndDDEInvocation <- ssexhd
         <- dbgerEvaluateRules <- dbgerEvaluateRules <- dbgexPhaseII <- 
dbgexExplicitEndInc <- dbgeEndDDEInvocationImpl
          <- dbgeEndDDEInvocation <- ssexhd <- ksmpclrpga <- opidcl <- opidrv
           <- sou2o <- opimai_real <- ssthrdmain <- main
 
第三步:找解决方案
Apply OS level patch IFIX IV09580 and relink the oracle software.
1. 下载补丁
iv09580紧急补丁的描述:https://www-304.ibm.com/support/docview.wss?uid=isg1IV09580 iv09580紧急补丁的下载:ftp://public.dhe.ibm.com/aix/efixes/iv09580 2.使用操作系统的emgr命令应用iv09580补丁 : 
						
                 从上面的地址下载iv09580紧急补丁,执行下面的步骤应用紧急补丁。 1).紧急补丁安装预览命令:
#emgr -p -e IV09580.epkg.Z
出现INSTALL PREVIEW ,SUCCESS的情况下才能执行后面的安装命令。 2).应用紧急补丁:
#emgr -e IV09580.epkg.Z
3).查看补丁情况:
mzrac1@root[/]emgr -l
ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT
=== ===== ========== ================= ========== ======================================
1    S    IV09580s01 06/27/12 21:55:58            Ifix for IV09580@6.1TL7SP1
STATE codes:
 S = STABLE
 M = MOUNTED
 U = UNMOUNTED
 Q = REBOOT REQUIRED
 B = BROKEN
 I = INSTALLING
 R = REMOVING
 T = TESTED
 P = PATCHED
 N = NOT PATCHED
 SP = STABLE + PATCHED
 SN = STABLE + NOT PATCHED
 QP = BOOT IMAGE MODIFIED + PATCHED
 QN = BOOT IMAGE MODIFIED + NOT PATCHED
 RQ = REMOVING + REBOOT REQUIRED
第四步:relink oracle
     对于Oracle Grid Infrastructure(GI) 11.2 及之后的版本,在GRID HOME中有一些binary需要在OS升级或者打补丁后被relink。      对于数据库软件(RDBMS binary),在OS升级或者OS打补丁后推荐做relink, RAC 的binary也是一样的,需要relink。      下面是在11.2 集群环境中执行relink的过程,包括了对GI和RAC做relink的步骤: 1. 首先停止这个节点上的所有数据库实例,这是因为之后停止CRS时虽然会停止数据库实例,但是是以shutdown abort的方式,我们需要以shutdown immediate或者normal来停止数据库实例: $su - oracle $srvctl stop instance -d  -i  -o immediate 2. 如果业务需要高可用性,确保这个实例上的service已经切换到了其它节点的实例上。 srvctl status service -d  3. 用root用户执行/crs/install/rootcrs.pl -unlock来修改相应目录权限并停止GI: [root@rac1 ~]# cd /u01/app/11.2.0/grid/crs/install [root@rac1 install]# perl rootcrs.pl -unlock Using configuration parameter file: ./crsconfig_params CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1' CRS-2673: Attempting to stop 'ora.crsd' on 'rac1' CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1' CRS-2673: Attempting to stop 'ora.rac2.vip' on 'rac1' CRS-2673: Attempting to stop 'ora.oc4j' on 'rac1' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac1' CRS-2673: Attempting to stop 'ora.cvu' on 'rac1' CRS-2677: Stop of 'ora.rac2.vip' on 'rac1' succeeded CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac1' CRS-2677: Stop of 'ora.scan1.vip' on 'rac1' succeeded CRS-2677: Stop of 'ora.oc4j' on 'rac1' succeeded CRS-2677: Stop of 'ora.cvu' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1' CRS-2673: Attempting to stop 'ora.CRS.dg' on 'rac1' CRS-2673: Attempting to stop 'ora.racdb.db' on 'rac1' CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1' CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded CRS-2677: Stop of 'ora.CRS.dg' on 'rac1' succeeded CRS-2677: Stop of 'ora.racdb.db' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1' CRS-2673: Attempting to stop 'ora.RECO.dg' on 'rac1' CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded CRS-2677: Stop of 'ora.RECO.dg' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.asm' on 'rac1' CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.ons' on 'rac1' CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1' CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1' CRS-2673: Attempting to stop 'ora.crf' on 'rac1' CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1' CRS-2673: Attempting to stop 'ora.evmd' on 'rac1' CRS-2673: Attempting to stop 'ora.asm' on 'rac1' CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'rac1' CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1' CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1' CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed CRS-4133: Oracle High Availability Services has been stopped. Successfully unlock /u01/app/11.2.0/grid 注意,如果在$GRID_HOME/rdbms/audit下面的audit文件很多,会导致rootcrs.pl执行很长时间,这样的话可以将$GRID_HOME/rdbms/audit/*.aud 文件备份到GRID_HOME之外,然后删除。 4. 禁止GI在OS重启后自动启动,这是因为升级OS或者打OS补丁后,可能需要重启主机,这样的话,需要在relink之前禁止GI启动。 用root用户: [root@rac1 install]# crsctl disable crs CRS-4621: Oracle High Availability Services autostart is disabled. 5. 备份GI和RDBMS的ORACLE_HOME。 6. 升级OS或者给OS打补丁,包括重启主机等(如果需要)。 7. 用GI的属主用户来对GI binary进行relink: [root@rac1 audit]# su - grid [grid@rac1 ~]$ export ORACLE_HOME=/u01/app/11.2.0/grid  确保GI是停止的,然后再执行relink: [grid@rac1 ~]$ ps -ef|grep d.bin grid      3408  3360  0 17:09 pts/0    00:00:00 grep d.bin [grid@rac1 ~]$ crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors. [grid@rac1 ~]$ $ORACLE_HOME/bin/relink writing relink log to: /u01/app/11.2.0/grid/install/relink.log [grid@rac1 ~]$ <===relink结束后,并不会有任何信息提示,只是显示命令提示符。 需要检查/u01/app/11.2.0/grid/install/relink.log, 查看是否有错误。 下面截取了末尾的一些行,如下: ...  - Linking Oracle rm -f /u01/app/11.2.0/grid/rdbms/lib/oracle gcc  -o /u01/app/11.2.0/grid/rdbms/lib/oracle -m64 -L/u01/app/11.2.0/grid/rdbms/lib/ -L/u01/app/11.2.0/grid/lib/ - ... lsnls11 -lnls11 -lcore11 -lnls11 -lasmclnt11 -lcommon11 -lcore11 -laio    `cat /u01/app/11.2.0/grid/lib/sysliblist` -Wl,- rpath,/u01/app/11.2.0/grid/lib -lm    `cat /u01/app/11.2.0/grid/lib/sysliblist` -ldl -lm   -L/u01/app/11.2.0/grid/lib test ! -f /u01/app/11.2.0/grid/bin/oracle ||\            mv -f /u01/app/11.2.0/grid/bin/oracle /u01/app/11.2.0/grid/bin/oracleO mv /u01/app/11.2.0/grid/rdbms/lib/oracle /u01/app/11.2.0/grid/bin/oracle chmod 6751 /u01/app/11.2.0/grid/bin/oracle 8. 用RDBMS的属主对数据库binary做relink: su - oracle 确保$ORACLE_HOME设置为了数据库的ORACLE_HOME,然后执行: [oracle@rac1 ~]$ $ORACLE_HOME/bin/relink all writing relink log to: /u01/app/oracle/product/11.2.0/dbhome_1/install/relink.log <===relink结束后,并不会有任何信息提示,只是显示命令提示符。 需要检查/u01/app/oracle/product/11.2.0/dbhome_1/install/relink.log, 查看是否有错误。 截取relink.log中部分内容: Starting Oracle Universal Installer... <<<<<<开头 ... le/product/11.2.0/dbhome_1/lib/sysliblist` -ldl -lm   -L/u01/app/oracle/product/11.2.0/dbhome_1/lib test ! -f /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle ||\            mv -f /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle /u01/app/oracle/product/11.2.0/dbhome_1/bin/ oracleO mv /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle <<<<<<结尾 9. 用root用户执行/crs/install/rootcrs.pl -patch来修改相应目录权限并启动GI: [root@rac1 ~]# cd /u01/app/11.2.0/grid/crs/install [root@rac1 install]# perl rootcrs.pl -patch Using configuration parameter file: ./crsconfig_params CRS-4123: Oracle High Availability Services has been started. 10. Enable CRS来保证主机重启后可以自动启动GI: [root@rac1 install]# crsctl enable crs CRS-4622: Oracle High Availability Services autostart is enabled. 11. 确认所有的应启动的资源都已启动: [root@rac1 install]#  crsctl stat res -t -------------------------------------------------------------------------------- NAME           TARGET  STATE        SERVER                   STATE_DETAILS        -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg                ONLINE  ONLINE       rac1                                                         ONLINE  ONLINE       rac2                                          ora.DATA.dg                ONLINE  ONLINE       rac1                                                         ONLINE  ONLINE       rac2                                          ora.LISTENER.lsnr                ONLINE  ONLINE       rac1                                                         ONLINE  ONLINE       rac2                                          ora.RECO.dg                ONLINE  ONLINE       rac1                                                         ONLINE  ONLINE       rac2                                          ora.asm                ONLINE  ONLINE       rac1                     Started                             ONLINE  ONLINE       rac2                     Started              ora.gsd                OFFLINE OFFLINE      rac1                                                         OFFLINE OFFLINE      rac2                                          ora.net1.network                ONLINE  ONLINE       rac1                                                         ONLINE  ONLINE       rac2                                          ora.ons                ONLINE  ONLINE       rac1                                                         ONLINE  ONLINE       rac2                                          -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr       1        ONLINE  ONLINE       rac2                                          ora.cvu       1        ONLINE  ONLINE       rac2                                          ora.oc4j       1        ONLINE  ONLINE       rac2                                          ora.rac1.vip       1        ONLINE  ONLINE       rac1                                          ora.rac2.vip       1        ONLINE  ONLINE       rac2                                          ora.racdb.db       1        ONLINE  ONLINE       rac2                     Open                       2        OFFLINE OFFLINE                               Instance Shutdown    ora.scan1.vip       1        ONLINE  ONLINE       rac2                                          如果发现实例没有启动,可以手工启动: $srvctl start instance -d  -i  12. 可以用下面的MOS文档中的方法来确认oracle 的binary是RAC的: How to Check Whether Oracle Binary/Instance is RAC Enabled and Relink Oracle Binary in RAC [ID 284785.1] 方法1:如果下面的命令能查出kcsm.o ,说明binary是RAC的: su - oracle $ar -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o kcsm.o  在AIX上命令是不同的:  ar -X32_64 -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o 方法2:查看RAC特有的后台进程是否存在,比如: [grid@rac1 ~]$ ps -ef|grep lmon grid      7732     1  0 17:59 ?        00:00:17 asm_lmon_+ASM1 oracle   18605     1  0 20:49 ?        00:00:00 ora_lmon_RACDB1 <=========== grid     20992 10160  0 21:10 pts/2    00:00:00 grep lmon 上面的所有步骤需要在集群的各个节点上依次执行。 上述relink GI的过程来源于下面MOS文档中章节 “Do I need to relink the Oracle Clusterware / Grid Infrastructure home after an OS upgrade?”  RAC: Frequently Asked Questions [ID 220970.1] 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/13454868/viewspace-1073517/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2013-09-22

  • 博文量
    4
  • 访问量
    26156