ITPub博客

首页 > Linux操作系统 > Linux操作系统 > linux下修改drop_cache参数触发ORA-600 [KGHLKREM1]

linux下修改drop_cache参数触发ORA-600 [KGHLKREM1]

原创 Linux操作系统 作者:skuary 时间:2012-06-26 11:08:44 0 删除 编辑

昨天在主站的3个节点上执行了如下命令:

echo 3 > /proc/sys/vm/drop_cache

直接导致其中一个节点2实例宕掉,详细的告警日志信息如下:

Mon Jun 25 17:06:51 CST 2012
Errors in file /oracle/admin/yesmynet/bdump/yesmynet2_lmon_10048.trc:
ORA-00600: internal error code, arguments: [KGHLKREM1], [0x4BC000020], [], [], [], [], [], []
Mon Jun 25 17:06:52 CST 2012
Trace dumping is performing id=[cdmp_20120625170652]
Mon Jun 25 17:06:52 CST 2012
Errors in file /oracle/admin/yesmynet/bdump/yesmynet2_lmon_10048.trc:
ORA-00600: internal error code, arguments: [KGHLKREM1], [0x4BC000020], [], [], [], [], [], []
Mon Jun 25 17:06:52 CST 2012
LMON: terminating instance due to error 481
Mon Jun 25 17:06:52 CST 2012
Shutting down instance (abort)
License high water mark = 798
Mon Jun 25 17:06:57 CST 2012
Instance terminated by LMON, pid = 10048
Mon Jun 25 17:06:57 CST 2012
Instance terminated by USER, pid = 29345

可以看出,17:06分的时候,lmon进程直接terminate实例2,mos相关文档描述如下:

ORA-600 [KGHLKREM1] On Linux Using Parameter drop_cache On hugepages Configuration [ID 1070812.1]

  修改时间 20-DEC-2011     类型 PROBLEM     状态 PUBLISHED  

In this Document
  
  

  

asm1_lmd0_8600.trc
~~~~~~~~~~~~~~~~~~
*** 2010-02-08 15:57:38.274
***** Internal heap ERROR KGHLKREM1 addr=0x6c400020 ds=0x60000058 *****
***** Dump of memory around addr 0x6c400020:
06C3FF020 00000000 00000000 00000000 00000000 [................]
Repeat 511 times





 

Changes

1. On your system you are running with vm.drop_caches=1 (or 3), drop_cache have been set to a value greater than zero , or you are executing

echo 3 > /proc/sys/vm/drop_caches


 

/proc/sys/vm/drop_caches (since Linux 2.6.16)
Writing to this file causes the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free.

To free pagecache:

* echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes:

* echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries and inodes:

* echo 3 > /proc/sys/vm/drop_caches

As this is a non-destructive operation, and dirty objects are not freeable, the user should run "sync" first in order to make sure all cached objects are freed.


2. You have setup the Hugepages

Cause

This is a Linux Kernel issue.
Using the linux kernel "drop_cache" parameter and having the hugepages a memory corruption can occurs.

Per internal Bug 9461825, executing vm.drop_caches corrupts Oracle Database SGA hugepages;
it is fixed in Linux Kernel version 2.6.18-194.0.0.0.4.EL5


Solution

1.  As a workaround when hugepages are set avoid any vm.drop_cache settings.

OR

2.  Upgrade to Linux Kernel version 2.6.18-194.0.0.0.4.EL5


References

BUG:9358381 - ASM INSTANCE IS CRASHING AS ORA-600[KGHLKREM1] WHEN HUGEPAGES ARE IN USE
https://bugzilla.redhat.com/show_bug.cgi?id=578977

而3个节点只有节点2使用了hugepage:

[root@rac2 ~]# grep Huge /proc/meminfo
HugePages_Total:  9885
HugePages_Free:   9836
HugePages_Rsvd:   4868
Hugepagesize:     2048 kB
 
linux内核版本如下:
 
[root@rac2 ~]# uname -a
Linux rac2 2.6.18-128.el5

看来,linux下在使用hugepages参数的情况下,尽量不要随便修改drop_cache参数,要么就直接升级linux内核版本到

2.6.18-194.0.0.0.4.EL5

最后关闭所有节点2的相关集群进程,然后在开启,终于恢复正常了!

记录一下~~

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/25618347/viewspace-733804/,如需转载,请注明出处,否则将追究法律责任。

下一篇: oracle诊断事件
请登录后发表评论 登录
全部评论

注册时间:2011-03-31

  • 博文量
    88
  • 访问量
    324518