ITPub博客

首页 > 数据库 > Oracle > 系统内存不足导致oracle进程被误杀terminating the instance due to error 822

系统内存不足导致oracle进程被误杀terminating the instance due to error 822

原创 Oracle 作者:shawnloong 时间:2015-09-21 22:01:42 0 删除 编辑
今天收到一个报警邮件,oracle进程已经不存在了
Alarm Time:2015-09-21 17:45:38
Trigger: Alive xyxdb_oa
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. Alive (x.x.x.x:alive): 0
2. *UNKNOWN* (x.x.x.x :*UNKNOWN*): *UNKNOWN*
Original event ID: 760121


查看到alert日志
System state dump requested by (instance=1, osid=2044 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xyxdbp/xyxdb/trace/xyxdb_diag_2062_20150921174417.trc
Mon Sep 21 17:44:18 2015
PMON (ospid: 2044): terminating the instance due to error 822
Dumping diagnostic data in directory=[cdmp_20150921174417], requested by (instance=1, osid=2044 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 2044
Mon Sep 21 17:46:39 2015
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 64 KB

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB

RECOMMENDATION:
  Total System Global Area size is 3282 MB. For optimal performance,
  prior to the next instance restart:
  1. Increase the number of unused large pages by
at least 1641 (page size 2048 KB, total size 3282 MB) system wide to


RECOMMENDATION:
  Total System Global Area size is 3282 MB. For optimal performance,
  prior to the next instance restart:
  1. Increase the number of unused large pages by
at least 1641 (page size 2048 KB, total size 3282 MB) system wide to
  get 100% of the System Global Area allocated with large pages
  2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 3290 MB to lock
100% System Global Area's large pages into physical memory
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 6
Number of processor cores in the system is 6
Number of processor sockets in the system is 1
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
    NUMA status: non-NUMA system
    cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
    Grp 0:



[root@OA01-1-24 scripts]# cat /proc/50966/oom_
oom_adj        oom_score      oom_score_adj 
[root@OA01-1-24 scripts]# cat /proc/50966/oom_adj
0
[root@OA01-1-24 scripts]# vim oomscore.sh
[root@OA01-1-24 scripts]# chmod u+x oomscore.sh
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
31 51010 ora_mman_xyxdb
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
13 51026 ora_smon_xyxdb
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51034 ora_mmon_xyxdb
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)

查询系统日志
Sep 21 17:44:15 OA01-1-24 kernel: [39519]   500 39519   900699     5848   2       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [39521]   500 39521   900699     5877   5       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [42514]   500 42514   900846    10963   1       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [42578]   500 42578   900706     9012   1       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [43519]     0 43519    24998     1489   5       0             0 sshd
Sep 21 17:44:15 OA01-1-24 kernel: [43533]     0 43533    14309      550   5       0             0 sftp-server
Sep 21 17:44:15 OA01-1-24 kernel: [43557]     0 43557    14432      671   5       0             0 sftp-server
Sep 21 17:44:15 OA01-1-24 kernel: [44331]    89 44331    20234      861   2       0             0 pickup
Sep 21 17:44:15 OA01-1-24 kernel: [44491]     0 44491  1107908   148835   4       0             0 java
Sep 21 17:44:15 OA01-1-24 kernel: [44684]   500 44684   900015     4658   0       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45199]   500 45199   900699     5525   3       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45201]   500 45201   900699     5548   4       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45203]   500 45203   900704     8184   5       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45211]   500 45211   900699     5506   0       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45213]   500 45213   900699     5504   4       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45901]     0 45901  1051478   117538   2       0             0 java
Sep 21 17:44:15 OA01-1-24 kernel: [45943]   500 45943   900956     7194   0       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45945]   500 45945   900315     5444   1       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [45947]   500 45947   900315     5423   5       0             0 oracle
Sep 21 17:44:15 OA01-1-24 kernel: [46232]     0 46232    25226      152   4       0             0 sleep
Sep 21 17:44:15 OA01-1-24 kernel: Out of memory: Kill process 2074 (oracle) score 125 or sacrifice child
Sep 21 17:44:15 OA01-1-24 kernel: Killed process 2074, UID 500, (oracle) total-vm:3600064kB, anon-rss:3444kB, file-rss:1510892kB

通常是因为某时刻应用程序大量请求内存导致系统内存不足造成的,这通常会触发 Linux 内核里的 Out of Memory (OOM) killer,OOM killer 会杀掉某个进程以腾出内存留给系统用,不致于让系统立刻崩溃。
后来查看到开发人员在这台db服务器启用了两个tomcat应用,由于程序故障导致大量内存使用
至于oom killer 原理可以参阅http://www.vpsee.com/2013/10/how-to-configure-the-linux-oom-killer/这篇文章很详细
oracle有一部分相关文档
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

我们可以配置内核参数来防止进程被杀
通过脚本找出最容易被杀的进程
# vi oomscore.sh
#!/bin/bash
for proc in $(find /proc -maxdepth 1 -regex '/proc/[0-9]+'); do
    printf "%2d %5d %s\n" \
        "$(cat $proc/oom_score)" \
        "$(basename $proc)" \
        "$(cat $proc/cmdline | tr '\0' ' ' | head -c 50)"
done 2>/dev/null | sort -nr | head -n 10


[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
31 51010 ora_mman_xyxdb
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
13 51026 ora_smon_xyxdb
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51034 ora_mmon_xyxdb
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score
7
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score_adj
0
[root@OA01-1-24 scripts]# echo -15 >/proc/51034/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51034/oom_score
1
[root@OA01-1-24 scripts]# cat /proc/51026/oom_adj
0
[root@OA01-1-24 scripts]# cat /proc/51026/oom_score
13
[root@OA01-1-24 scripts]# echo -15 >/proc/51026/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51026/oom_adj
-15
[root@OA01-1-24 scripts]# cat /proc/51026/oom_score
1
[root@OA01-1-24 scripts]# echo -15 >/proc/51010/oom_adj
[root@OA01-1-24 scripts]# cat /proc/51010/oom_score
1
[root@OA01-1-24 scripts]# ./
alertbyday.sh          oracle_cron.sh         sendrman.py            updatedb/             
installora/            rmanbackup.sh          sync_date.sh           uploadbackup.sh       
oomscore.sh            senderrorlog.py        tablespace_monitor.py 
[root@OA01-1-24 scripts]# ./oomscore.sh
63 37608 /usr/bin/java -Djava.util.logging.config.file=/usr
20 37579 /usr/bin/java -Djava.util.logging.config.file=/usr
16 51938 /usr/java/jdk1.7.0_79/jre/bin/java -Djava.util.log
14 51496 oraclexyxdb (LOCAL=NO)
8 51167 oraclexyxdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROT
7 51014 ora_dbw0_xyxdb
6 51480 oraclexyxdb (LOCAL=NO)
5 52007 oraclexyxdb (LOCAL=NO)
5 51474 oraclexyxdb (LOCAL=NO)
5 51466 oraclexyxdb (LOCAL=NO)

后来还检查到一个问题,关于swap使用配置
[root@OA01-1-24 scripts]# cat /proc/sys/vm/swappiness
0
这里0代表不使用swap
系统工程师更改的时候没有注意,oracle最好不要关掉swap
重新修改
[root@OA01-1-24 scripts]# cat /proc/sys/vm/swappiness
60
总结:DB服务器尽量专用,不然会出现很多意想不到事儿

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24486203/viewspace-1805598/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2011-08-29

  • 博文量
    111
  • 访问量
    214632