ITPub博客

首页 > 数据库 > Oracle > AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster

AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster

Oracle 作者:lwitpub 时间:2014-09-19 16:46:25 0 删除 编辑


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
IBM AIX on POWER Systems (64-bit)

PURPOSE

This note lists the top things to stabilize 11gR2 Grid Infrastructure and Real Application Cluster on IBM AIX, the focus area is the issues that cause high memory usage, high CPU usage and hang.

SCOPE

This document is intended for Oracle Clusterware/RAC Database Administrators and Oracle support engineers. 

DETAILS

A. Required OS Technology Level and Service Pack, and Recommended VM Setting

  • AIX kernel should be equal or higher than the following(execute "/bin/oslevel -s" to confirm):
AIX 7.1 TL 00 SP1 ("7100-00-01"), 64-bit kernel
AIX 6.1 TL 02 SP1 ("6100-02-01"), 64-bit kernel
AIX 5.3 TL 09 SP1 ("5300-09-01"), 64 bit kernel
  • Recommended Virtual Memory setting:
maxperm%=90
minperm%=3
maxclient%=90
strict_maxperm=0
strict_maxclient=1
lru_file_repage=0
page_steal_method=1     ###(change requires reboot to take effective)

B. USLA heap fix to reduce memory footprint for Oracle Server processes

  • For AIX 6.1 TL07 SP02/AIX 7.1 TL01 SP02 or later, apply patch 13443029
  • For AIX 6.1 TL07 or AIX 7.1 TL01, install AIX 6.1 TL-07 APAR IV09580, AIX 7.1 TL-01 APAR IV09541, and apply patch 13443029
  • For other AIX level, apply patch 10190759, this will disable Oracle's online patching mechanism

 

  • Note: as of 06/21/2012, fix for bug 13443029 or bug 10190759 are not included in any PSU and the interim patch is needed. Interim patch 10190759 exists on top of most PSU, andpatch 13443029 on top of 11.2.0.3 does not conflict with 11.2.0.3.1 PSU and can be applied on top of both 11.2.0.3 base and 11.2.0.3.1 PSU.

 

  • New connection can be slow to establish without fix for bug 13494030 which is fixed in 11.2.0.4
@ if online patch exists, process startup/new connection will be slower: bug 13494030, duplicate bug 14069786 bug 13573651

C. Other recommended OS fixes

  • note 1528452.1 - AIX 6.1 TL8 or 7.1 TL2: 11gR2 GI Second Node Fails to Join the Cluster as CRSD and EVMD are in INTERMEDIATE State 
     
  • Paging space growth leads to node failure/eviction:
64K paging taking place when available system RAM exists, the fix will avoid unexpected paging space growth and node failure. Below is a matrix of APAR for various TL/SP level

6100 TL5            6100-05                   IZ71603
6100 TL4 SP4     6100-04-04-1014    IZ71191
6100 TL3 SP4     6100-03-04-1014    IZ72031
6100 TL2 SP7     6100-02-07-1014    IZ71850
6100 TL1 SP8     6100-01-08-1014    IZ71987
5300 TL12          5300-12                  IZ71460
5300 TL11 SP4   5300-11-04-1015    IZ73687
5300 TL10 SP4   5300-10-04-1015    IZ73754
5300 TL9 SP7     5300-09-07-1015    IZ73864
5300 TL8 SP10   5300-08-10-1015    IZ67445

For more info, refer to https://www-304.ibm.com/support/docview.wss?uid=isg1fixinfo114834
  • gc block lost or IPC send timeout or instance eviction
VIOS Server will not forward traffic from its VIO Clients to the external network, interrupts do not reach the trunk adapter, the fix will avoid SEA/VIO client hang. Below is a matrix of APAR for various TL/SP level
 
7100 TL0 SP3      7100-00-03-1115    IZ97035
6100 TL6 SP5      6100-06-05-1115    IZ96155
6100 TL5 SP6      6100-05-06-1119    IZ97457
6100 TL4 SP10    6100-04-10-1119    IZ97605
5300 TL12 SP4    5300-12-04-1119    IZ98126
5300 TL11 SP7    5300-11-07-1119    IZ98424

For more info, refer to https://www-304.ibm.com/support/docview.wss?uid=isg1fixinfo122900
  • Other kernel hang fix
* IZ91983 lockl performance issue, hang

For more info, refer to http://www-01.ibm.com/support/docview.wss?uid=isg1IZ91983


* IV04047: shlap64 unable to process Oracle request leading to kernel hang

For more info, refer to http://www-01.ibm.com/support/docview.wss?uid=isg1IV04047

 

  • Excessive CPU usage in LPAR in shared processor mode

If LPAR is in shared processor mode, without the following fix, LPAR may see excessive CPu usage:

APARs for WAITPROC IDLE LOOPING CONSUMES CPU:

IV01111 AIX 6.1 TL05 if before SP08 (fixed in SP08)
IV06197 AIX 6.1 TL06 if before SP07 (fixed in SP07)
IV10172 AIX 6.1 TL07 if before SP02 (fixed in SP02)
IV09133 AIX 7.1 TL00 if before SP05 (fixed in SP05)
IV10484 AIX 7.1 TL01 if before SP02 (fixed in SP02)

This problem can effect POWER7 systems running any level of Ax720 firmware prior to Ax720_101. But it is recommended to update to the latest available firmware. If required, AIX and Firmware fixes can be obtained from IBM Support Fix Central:

http://www-933.ibm.com/support/fixcentral/main/System+p/AIX

 

  • Crash in netinfo_unixdomnlist while running netstat

6100 TL6 SP6  6100-06-06-1140  IZ97166
6100 TL5 SP7  6100-05-07-1140  IZ97353
6100 TL4 SP11  6100-04-11-1140  IV00634

For more info, refer to http://www-01.ibm.com/support/docview.wss?uid=isg1fixinfo126289

D. Apply the latest GI PSU to avoid known high resource consumption bugs

If you are running 11.2.0.3, apply 11.2.0.3 GI PSU8 (patch 17272731)

For 11.2.0.3, applying above PSU will fix the following known bugs (Note: it does not fixes bugs in Section D1)

  • Note 1062676.1 - ORAAGENT or ORAROOTAGENT High Resource (CPU, Memory etc) Usage
Except bug 12709476, all others have been fixed in 11.2.0.2 

bug 12709476 is fixed in 11.2.0.2 GI PSU6, 11.2.0.3 GI PSU2, 11.2.0.4 and 12.1, interim patch 12709476 exists on top of 11.2.0.3.1 GI PSU

 

  • Note 1287709.1 - ocssd.bin High CPU Usage, Instance Crashes With ORA-29702 or ORA-29770 or ORA-29701 With "gipcWait failed with 16"
This note talks about bug 11069614 which is fixed in 11.2.0.2 GI PSU3, 11.2.0.3

 

This note talks about the following bugs:
bug 10019726, fixed in 11.2.0.2 GI PSU2, 11.2.0.3 and above
bug 12615394, fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above

 

  • Note 1348202.1 - 11gR2 Grid Infrastructure CRSD High CPU Usage or Slow Command Response
This note talks about the following bugs:
bug 10019726 is fixed in 11.2.0.2 GI PSU3, 11.2.0.3 and above
bug 12615394 is fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above
bug 12767563 is fixed in 11.2.0.2 GI PSU4, 11.2.0.3 and above

 

  • note 1455973.1 - 11gR2 Grid Infrastructure High CPU Usage by crsd.bin, ocssd.bin, evmd.bin gipcd.bin etc due to GIPC
bug 13498267 is fixed in 11.2.0.4, please request interim patch 13498267 if it's not available

E. ASM/Database fixes

Refer to Note 1376981.1 for more information
  • bug 12412983 - high "log file sync" or "asynch descriptor resize" wait , fixed in 11.2.0.4, interim patch 12412983 exists on top of most patchset/PSU
  • bug 12596494 - higher CPU usage in 11.2 on AIX , fixed in 11.2.0.4. Interim patch 12596494 on top of 11.2.0.3 does not conflict with 11.2.0.3.1 PSU and can be applied on top of both 11.2.0.3 base and 11.2.0.3.1 PSU
  • bug 12767867 - instance hangs; fixed in 11.2.0.2 DB PSU4, 11.2.0.3
Refer to Note 1348264.1 for more information.
Please check the interim patch available status for your release.

 
F. CSSD fix to avoid node eviction/reboot related issues

  • bug 13940331 - Threads does not always inherit parent processes's real time priority

 

  • bug 13869978 - 11.2.0.3 GI node reboot if only one voting file exists
Refer to Note 1466639.1 for more information

G. EM agent high memory consumption on AIX (likely node will be rebooted)

  • note 1530102.1 - EM 12c: Agent emdprocstats.pl Consuming High Memory

 
Apendix A: Data gathering


If the issue still happens after the above recommendations are in place, collect output of the followings from all nodes as root user:

# svmon -P -O unit=MB -O segment=category
# svmon -U -O unit=MB -O segment=category 
# ps -elf
# vmstat 5 3

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/628922/viewspace-1274402/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2009-05-08

  • 博文量
    107
  • 访问量
    395481