ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 系统cpu占用超高故障分析一例

系统cpu占用超高故障分析一例

原创 Linux操作系统 作者:浪漫双鱼 时间:2011-02-22 11:16:18 0 删除 编辑

下班的时候突然接到电话,通知一台主机资源占用超过,同时手机短信也不停的报session数,赶快连上服务器查看具体原因:

1、  通过top可以发现系统CPU资源占用100%

查当前进程数,发现比平时多了100

Select * from v$license;

SESSIONS_MAX SESSIONS_WARNING SESSIONS_CURRENT SESSIONS_HIGHWATER  USERS_MAX

------------ ---------------- ---------------- ------------------ ----------

           0                0              234                246          0

session等待事件,大部分session都在做latch free等待

SQL> select sid,event,P1TEXT,state from v$session_wait;

 

   SID EVENT                          P1TEXT                         STATE

------ ------------------------------ ------------------------------ -------------------

    32 latch free                     address                        WAITED KNOWN TIME

    38 latch free                     address                        WAITED KNOWN TIME

    41 latch free                     address                        WAITING

    57 latch free                     address                        WAITING

    89 latch free                     address                        WAITED KNOWN TIME

    93 latch free                     address                        WAITED KNOWN TIME

   111 latch free                     address                        WAITED KNOWN TIME

   118 latch free                     address                        WAITING

   137 latch free                     address                        WAITED KNOWN TIME

   201 latch free                     address                        WAITING

   200 latch free                     address                        WAITED KNOWN TIME

   194 latch free                     address                        WAITED KNOWN TIME

   186 latch free                     address                        WAITED KNOWN TIME

   182 latch free                     address                        WAITED KNOWN TIME

   177 latch free                     address                        WAITING

   163 latch free                     address                        WAITING

   147 latch free                     address                        WAITED KNOWN TIME

   146 latch free                     address                        WAITED KNOWN TIME

   238 latch free                     address                        WAITED KNOWN TIME

   236 latch free                     address                        WAITED KNOWN TIME

   233 latch free                     address                        WAITED KNOWN TIME

   225 latch free                     address                        WAITED KNOWN TIME

   221 latch free                     address                        WAITED KNOWN TIME

   211 latch free                     address                        WAITING

   209 latch free                     address                        WAITED KNOWN TIME

   207 latch free                     address                        WAITED KNOWN TIME

   204 latch free                     address                        WAITING

   261 latch free                     address                        WAITED KNOWN TIME

   257 latch free                     address                        WAITED KNOWN TIME

   255 latch free                     address                        WAITED KNOWN TIME

   253 latch free                     address                        WAITED KNOWN TIME

   251 latch free                     address                        WAITED KNOWN TIME

   241 latch free                     address                        WAITING

   239 latch free                     address                        WAITED KNOWN TIME

   119 latch free                     address                        WAITED KNOWN TIME

   113 latch free                     address                        WAITED KNOWN TIME

    97 latch free                     address                        WAITING

    92 latch free                     address                        WAITED KNOWN TIME

    88 latch free                     address                        WAITED KNOWN TIME

    87 latch free                     address                        WAITED KNOWN TIME

    68 latch free                     address                        WAITED KNOWN TIME

  

2、  查询占用cpu的进程情况,大量进程占用都很高,如3720.

SQL> SELECT /*+ ordered */ p.spid, s.sid, s.serial#, s.username, s.program,s.status,TO_CHAR(s.logon_time, 'mm-dd-yyyy hh24:mi') logon_time, s.last_call_et, st.value, s.sql_hash_value, s.sql_address, sq.child_number ,sq.sql_text  

  2  FROM v$statname sn, v$sesstat st, v$process p, v$session s, v$sql sq     

  3  WHERE s.paddr=p.addr        

  4  AND s.sql_hash_value = sq.hash_value and s.sql_Address = sq.address

  5  AND s.sid = st.sid

  6  AND st.STATISTIC# = sn.statistic#        

  7  AND sn.NAME = 'CPU used by this session'

  8  AND p.spid = &osPID -- parameter to restrict for a specific PID

  9  -- AND s.status = 'ACTIVE'

 10  ORDER BY st.value desc;

Enter value for ospid: 3720

 

SPID            SID SERIAL# USERNAME   PROGRAM                          STATUS   LOGON_TIME       LAST_CALL_ET      VALUE SQL_HASH_VALUE SQL_ADDRESS      CHILD_NUMBER

------------ ------ ------- ---------- -------------------------------- -------- ---------------- ------------ ---------- -------------- ---------------- ------------

SQL_TEXT

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

3720             97   17856 CCATSUPT   JDBC Thin Client                 ACTIVE   02-21-2011 15:28         3413      16088     1113672989 C000000354738950            0

SELECT /*+ INDEX (C, I_SVR_PUB_DA_MAINQUEUE_HIS_FRT) */ COUNT(1)  FROM Svr_pub_da_MainQueue_his c,pub_Specialty k  WHERE c.Business in ( 'D46C2BC08404D1211DFA6F7BA8DCB9DB', '293C7B04CD7B0FFD56B255CC2585E16F')   AND c.specialty = k.specialtyid AND c.FirstReceptTime BETWEEN TO_DATE('2011-01-21 00:00:00', 'yyyy-MM-dd HH24:MI:SS') AND TO_DATE('2011-02-22 00:00:00', 'yyyy-MM-dd HH24:MI:SS')  AND ( k.TreeCode LIKE '0109%')  AND EXISTS (SELECT 1 FROM org_unit ou                             WHERE c.sourcedept = ou.unitid                               AND ou.TreeCode LIKE '0001000301970003%')

3、查看该sql执行计划,存在开销极大的nested loop,检查各个表的数据量后,怀疑是统计信息出问题,走了不该走的索引。

 

-------------------------------------------------------------------------------------------------------------------------

| Id   | Operation                            | Name                           |  Rows | Bytes |  Cost | Pstart | Pstop |

-------------------------------------------------------------------------------------------------------------------------

|    0 | SELECT STATEMENT                     |                                |       |       | 34498 |        |       |

|    1 |  SORT AGGREGATE                      |                                |     1 |   203 |       |        |       |

| *  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID | SVR_PUB_DA_MAINQUEUE_HIS       |     1 |   101 | 34493 |  ROW L | ROW L |

|    3 |    NESTED LOOPS                      |                                |     1 |   203 | 34498 |        |       |

|    4 |     MERGE JOIN CARTESIAN             |                                |     1 |   102 |     5 |        |       |

|    5 |      TABLE ACCESS BY INDEX ROWID     | PUB_SPECIALTY                  |     1 |    46 |     3 |        |       |

| *  6 |       INDEX RANGE SCAN               | I_PUB_SPECIALTY_TR             |     1 |       |     2 |        |       |

|    7 |      BUFFER SORT                     |                                |     1 |    56 |     2 |        |       |

|    8 |       SORT UNIQUE                    |                                |       |       |       |        |       |

|    9 |        TABLE ACCESS BY INDEX ROWID   | ORG_UNIT                       |     1 |    56 |     2 |        |       |

| * 10 |         INDEX RANGE SCAN             | ORG_UNIT_I01                   |     1 |       |     1 |        |       |

| * 11 |     INDEX RANGE SCAN                 | I_SVR_PUB_DA_MAINQUEUE_HIS_FRT | 40241 |       |   119 |        |       |

-------------------------------------------------------------------------------------------------------------------------

4、更新统计信息

analyze table PUB_SPECIALTY compute statistics for table for all indexed columns for all indexes;

analyze table org_unit compute statistics for table for all indexed columns for all indexes;

5、此时再次运行sql,查看执行计划,发现已经已无nested loop

 

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT ptimizer=CHOOSE (Cost=39980 Card=1 Bytes=2

          00)

 

   1    0   SORT (AGGREGATE)

   2    1     HASH JOIN (Cost=39980 Card=1811 Bytes=362200)

   3    2       TABLE ACCESS (FULL) OF 'PUB_SPECIALTY' (Cost=4 Card=17

          0 Bytes=7480)

 

   4    2       HASH JOIN (Cost=39975 Card=2059 Bytes=321204)

   5    4         SORT (UNIQUE)

   6    5           TABLE ACCESS (FULL) OF 'ORG_UNIT' (Cost=10 Card=21

           Bytes=1155)

 

   7    4         TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'SVR_PUB_DA_

          MAINQUEUE_HIS' (Cost=39951 Card=16097 Bytes=1625797)

 

   8    7           INDEX (RANGE SCAN) OF 'I_SVR_PUB_DA_MAINQUEUE_HIS_

          FRT' (NON-UNIQUE) (Cost=158 Card=40241)

6、查询等待事件,latch freesession仍然存在,由于走错了执行计划,所以决定杀掉这些session,杀掉session后系统恢复正常

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11088128/viewspace-687675/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2007-12-10

  • 博文量
    87
  • 访问量
    231799