ITPub博客

首页 > Linux操作系统 > Linux操作系统 > RAC下trace暴涨诊断

RAC下trace暴涨诊断

原创 Linux操作系统 作者:cc59 时间:2007-01-07 00:00:00 0 删除 编辑


DB版本oracle 9207 rac
OS版本solaris 9
集群件Veritas cluster server 4.1


故障:

平均三秒钟产生一个trace文件。Trace文件不断增加,导致磁盘空间迅速减小

而alter中无任何错误信息,只有一行:

Thu Jan 4 11:34:53 2007
Errors in file /oracle_bin/rac9i/admin/XXrac/udump/XXrac2_ora_1942.trc

trace文件:
/oracle_bin/rac9i/admin/XXrac/udump/XXrac2_ora_1942.trc
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
ORACLE_HOME = /oracle_bin/rac9i/product
System name: SunOS
Node name: XXXXX
Release: 5.9
Version: Generic_118558-13
Machine: sun4u
Instance name: yjrac2
Redo thread mounted by this instance: 2
Oracle process number: 13
Unix process pid: 1942, image: oracle@XXXXX (TNS V1-V3)

*** SESSION ID:(85.7779) 2006-04-23 01:13:12.231
=================================
Begin 4031 Diagnostic Information
=================================
The following information assists Oracle in diagnosing
causes of ORA-4031 errors. This trace may be disabled
by setting the init.ora parameter _4031_dump_bitvec = 0
======================================
Allocation Request Summary Information
======================================
Current information setting: 00654fff
Dump Interval=300 seconds SGA Heap Dump Interval=3600 seconds
Last Dump Time=04/14/2030 14:07:45
Allocation request for: kglsim object batch
Heap: 380032950, size: 4032
******************************************************
HEAP DUMP heap name="sga heap(2,0)" desc=380032950
extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0
parent=0 owner=0 nex=0 xsz=0x1
====================
Process State Object
====================
----------------------------------------
SO: 41f4ee448, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=13, calls cur/top: 41f6d3320/41f6d3320, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
holding 428c53f60 Child library cache level=5 child#=5
Location from where latch is held: kglobpn: child:: latch
Context saved from call: 6
state=busy
Process Group: DEFAULT, pseudo proc: 41f5cbab0
O/S info: user: orarac, term: UNKNOWN, ospid: 1942
OSD pid info: Unix process pid: 1942, image: oracle@XXXXX(TNS V1-V3)
=========================
User Session State Object
=========================
----------------------------------------
SO: 4204f6d48, type: 4, owner: 41f4ee448, flag: INIT/-/-/0x00
(session) trans: 0, creator: 41f4ee448, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 438fa8348, psql: 0, user: 0/SYS
O/S info: user: , term: , ospid: , machine:
program:
temporary object counter: 0
...No current library cache object being loaded
...No instantiation object
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksm_4031_dump()+186 CALL ksedst() 00000000B ? 000000000 ?
8 000000000 ? 103327258 ?
00000003E ?
FFFFFFFF7FFF7938 ?
ksmasg()+352 CALL ksm_4031_dump() 000103756 ? 380000030 ?
380032950 ? 000654FFF ?
103756000 ? 103751768 ?
kghnospc()+364 PTR_CALL 0000000000000000 1037519C8 ? 380000030 ?
000000FC0 ? 000000FC0 ?
380000078 ?
FFFFFFFF7FFFB138 ?
kghalo()+4156 CALL kghnospc() 1037519C8 ? 380032950 ?
000000000 ? 004000000 ?
102D430F8 ? 103519280 ?
kglsim_chk_objlist( CALL kghalo() 000000000 ?
)+340 FFFFFFFF7FFFB2A0 ?
1037519C8 ? 000001000 ?
428D58638 ? 000000000 ?


很明显,由于4031造成的错误,但是为什么这么频繁的4031错误产生呢?
通过v$resource_limit,我们发现以下情况:
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
processes 479 490 1000 1000
sessions 486 506 1105 1105
enqueue_locks 165 468 13282 13282
enqueue_resources 179 343 5080 UNLIMITED
ges_procs 478 488 1001 1001
ges_ress 32169 59712 20754 UNLIMITED
ges_locks 29613 55281 32150 UNLIMITED
ges_cache_ress 1929 29346 0 UNLIMITED
ges_reg_msgs 1087 2301 2230 UNLIMITED

RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
ges_big_msgs 106 1183 2230 UNLIMITED
ges_rsv_msgs 0 0 1000 1000
gcs_resources 161607 189852 211052 211052
gcs_shadows 88914 99594 211052 211052
dml_locks 26 370 4860 UNLIMITED
temporary_table 0 2 UNLIMITED UNLIMITED
_locks

transactions 8 21 1215 UNLIMITED
branches 0 1 1215 UNLIMITED
cmtcallbk 0 1 1215 UNLIMITED

RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
sort_segment_lo 27 41 UNLIMITED UNLIMITED
cks

max_rollback_se 14 15 244 244
gments

max_shared_serv 0 0 20 20
ers

parallel_max_se 0 5 6 6
rvers

RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------


22 rows selected.

我们可以发现ges_ress和ges_locks的当前分配数量已经超出了初始分配数据,最大分配数甚至超出了几倍,
我们知道,当ges_ress和ges_locks超出初始分配的数量时,就会从shared_pool_size里面强行申请内存片。
超出的越多,当然就占用更多的内存区域。而这时候,数据库的压力又非常的大,从而不断的产生4031错误。


这时候我们能做的就是将这两个指标的数量控制在一定范围之内。或者将他的初始分配值扩大,以及限制它的最高值。
可以使用两个隐含参数来控制:_lm_locks和_lm_ress。如_lm_locks=(200000,200000) 。这么设定的意思是:初始值、最大值。
也就是说初始分配200000,最大也只能使用200000, 但是设置这个需要注意一点的是,会增加oracle使用内存的数量。
假如您使用了10G的sga.那么设定1000000的话大概就是多出1G.加上SGA的话。就是11G的内存了。因此在遇到这种情
况时一定要注意主机的内存情况,因为修改完该参数后,重启实例时就会预分配内存。这个可以通过ipcs看出来。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/104152/viewspace-139991/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2007-12-21

  • 博文量
    132
  • 访问量
    288706