ITPub博客

首页 > 数据库 > Oracle > [20200808]优化的困惑10.txt

[20200808]优化的困惑10.txt

原创 Oracle 作者:lfree 时间:2020-08-09 15:14:26 0 删除 编辑

[20200808]优化的困惑10.txt

--//前一段时间遇到的问题,链接http://www.itpub.net/thread-2137868-1-1.html,CPU使用率100%.
--//我下载对方提供的awr报表,顺便说一下.对方提供的awr html文件无法正常跳转.必须使用vim执行如下:
--// :%s+http://172.19.11.136/awr/dongguan/DGMESDB-mesdb/202007/mesdb_73613_73614_202007220700_202007220800.html++

--//很容易定位里面的问题,逻辑读太高,消耗大量CPU资源,而里面sql语句使用rule提示.
--//但是我开始看时还是遇到1点点疑惑.
--//注:数据库版本是10.2.0.5.0,操作系统:windows 64位版本的,实际上我个人建议不要在windows下跑数据库,特别在生产环境,无形增
--//加问题定位的难度.

1.DB Time问题:
            Snap Id  Snap Time           Sessions  Cursors/Session
Begin Snap: 73613    22-Jul-20 07:00:36  206       24.9
End Snap:   73614    22-Jul-20 08:00:16  212       23.2
Elapsed:             59.66 (mins)                                     
DB Time:             1,070.16 (mins)         

--//取样1个小时,而DB Time达到1070/60 = 17.83 小时.而实际的CPU数量是8.:
Operating System Statistics
Statistic               Total
AVG_BUSY_TIME           354,972
AVG_IDLE_TIME           2,432
AVG_SYS_TIME            4,388
AVG_USER_TIME           350,487
BUSY_TIME               2,840,695
IDLE_TIME               19,653
SYS_TIME                36,011
USER_TIME               2,804,684
RSRC_MGR_CPU_WAIT_TIME  0
VM_IN_BYTES             -1.5E+19
VM_OUT_BYTES            2.1E+16
PHYSICAL_MEMORY_BYTES   85,862,932,480
NUM_CPUS                8
NUM_CPU_CORES           8
--//实际的CPU数量仅仅8个,感觉不大可能DB time大于8小时.
--//再回过头看.

Top 5 Timed Events
Event                           Waits    Time(s)  Avg Wait(ms) % Total Call Time  Wait Class
CPU time                                 29,020                45.2                
latch: cache buffers chains     101,977  4,455    44           6.9                Concurrency
db file sequential read         261,308  3,864    15           6.0                User I/O
log file sync                   124,032  3,540    29           5.5                Commit
direct path read temp           38,162   1,321    35           2.1                User I/O

--//注意观察列 Total Call Time,CPU time仅仅占45.2. 这些前5名相加仅仅45.2+6.9+6.0+5.5+2.1 = 65.7.
--//还有什么其它事件占3X%呢?
--//另外latch: cache buffers chains 的Avg Wait(ms) 达到 44ms,真心觉得有一些夸张...CPU已经消耗殆尽了.
--//顺便说一下,我以前学习oracle一直认为CPU资源很难消耗100%.除非人为"破坏".直到随着学习的深入...

2.继续:
--//再看SQL ordered by Elapsed Time部分.

SQL ordered by Elapsed Time

    Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
    % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
    Total DB Time (s): 64,209
    Captured SQL account for 55.0% of Total
--//注意Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.

Elapsed Time (s)  CPU Time (s)  Executions  Elap per Exec (s)  % Total DB Time  SQL Id         SQL Module  SQL Text
6,346             2,655         164,771     0.04               9.88             gyc05t01rz49h  TGS2.exe    declare p_owner varchar(30) :=...
6,110             2,551         164,791     0.04               9.52             0tpg5r73s1146  TGS2.exe    SELECT /*+ RULE */ COUNT(*) FR...
5,389             2,601         164,835     0.03               8.39             8cd1y3p3rnsys  TGS2.exe    declare p_owner varchar(30) :=...
5,316             2,563         164,829     0.03               8.28             6x6aknt5p0uqd  TGS2.exe    SELECT /*+ RULE */ ARGUMENT_NA...

--//一定是sql_id=gyc05t01rz49h里面调用sql_id=0tpg5r73s1146语句.
--//sql_id=gyc05t01rz49h格式化如下:
DECLARE
   p_owner         VARCHAR (30) := :owner;
   p_object_name   VARCHAR (30) := :object_name;
   p_proc_name     VARCHAR (30) := :procedure_name;
   p_overload      NUMBER := :overload;
   p_object_type   VARCHAR (19);
   p_count         NUMBER;
BEGIN
   IF p_owner IS NULL
   THEN
      BEGIN
         SELECT /*+ RULE */
               object_type, owner
           INTO p_object_type, p_owner
           FROM sys.all_objects
          WHERE     owner = SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')
                AND object_type IN
                       ('PROCEDURE'
                       ,'FUNCTION'
                       ,'PACKAGE'
                       ,'PACKAGE BODY'
                       ,'SYNONYM')
                AND object_name = p_object_name
                AND ROWNUM <= 1;
      EXCEPTION
         WHEN NO_DATA_FOUND
         THEN
            BEGIN
               SELECT /*+ RULE */
                     object_type, owner
                 INTO p_object_type, p_owner
                 FROM sys.all_objects
                WHERE     owner = 'PUBLIC'
                      AND object_type IN
                             ('PROCEDURE'
                             ,'FUNCTION'
                             ,'PACKAGE'
                             ,'PACKAGE BODY'
                             ,'SYNONYM')
                      AND object_name = p_object_name
                      AND ROWNUM <= 1;
            EXCEPTION
               WHEN NO_DATA_FOUND
               THEN
                  IF p_proc_name IS NOT NULL
                  THEN
                     p_owner := p_object_name;
                     p_object_name := p_proc_name;
                     p_proc_name := NULL;
                  ELSE
                     RAISE;
                  END IF;
            END;
      END;
   END IF;

   IF p_object_type IS NULL
   THEN
      SELECT /*+ RULE */
            object_type
        INTO p_object_type
        FROM sys.all_objects
       WHERE     object_type IN
                    ('PROCEDURE'
                    ,'FUNCTION'
                    ,'PACKAGE'
                    ,'PACKAGE BODY'
                    ,'SYNONYM')
             AND owner = p_owner
             AND object_name = p_object_name
             AND ROWNUM <= 1;
   END IF;

   WHILE p_object_type = 'SYNONYM'
   LOOP
      SELECT /*+ RULE */
            table_owner, table_name
        INTO p_owner, p_object_name
        FROM sys.all_synonyms
       WHERE owner = p_owner AND synonym_name = p_object_name;

      SELECT /*+ RULE */
            object_type
        INTO p_object_type
        FROM sys.all_objects
       WHERE owner = p_owner AND object_name = p_object_name AND ROWNUM <= 1;
   END LOOP;

   SELECT /*+ RULE */
         COUNT (*)
     INTO p_count
     FROM sys.all_procedures
    WHERE     owner = p_owner
          AND object_name = p_object_name
          AND (   procedure_name = p_proc_name
               OR p_proc_name IS NULL AND procedure_name IS NULL);

   IF p_count < p_overload
   THEN
      RAISE NO_DATA_FOUND;
   END IF;

   :owner := p_owner;
   :object_name := p_object_name;
   :procedure_name := p_proc_name;

   SELECT /*+ RULE */
         COUNT (*)
     INTO :param_count
     FROM sys.all_arguments
    WHERE     owner = p_owner
          AND (       p_proc_name IS NULL
                  AND package_name IS NULL
                  AND object_name = p_object_name
               OR     p_proc_name IS NOT NULL
                  AND package_name = p_object_name
                  AND object_name = p_proc_name
                  AND NVL (overload, 1) = p_overload)
          AND data_type IS NOT NULL;
END;

--//sql_id = 0tpg5r73s1146.

SELECT /*+ RULE */
      COUNT (*)
  FROM SYS.ALL_ARGUMENTS
 WHERE     OWNER = :B4
       AND (   :B2 IS NULL AND PACKAGE_NAME IS NULL AND OBJECT_NAME = :B3
            OR     :B2 IS NOT NULL
               AND PACKAGE_NAME = :B3
               AND OBJECT_NAME = :B2
               AND NVL (OVERLOAD, 1) = :B1)
       AND DATA_TYPE IS NOT NULL;

--//仔细比较可以发现sql_id = 0tpg5r73s1146正是前面存储过程最后要执行的语句.
--//我有点奇怪的是使用rule提示.这些存储过程是开发写的吗?还是程序后台自动生成执行的,这个也是我的疑问.
--//这样就很好理解我为什么看到db time很高了,db time的执行时间被重复计算了.

--//再看SQL ordered by Gets部分.
SQL ordered by Gets

Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
Total Buffer Gets: 8,982,272,506
Captured SQL account for 29.5% of Total

Buffer Gets  Executions  Gets per Exec  %Total  CPU Time (s)  Elapsed Time (s)  SQL Id                 SQL Module  SQL Text
945,581,260  164,771     5,738.76       10.53   2655.28       6345.51           gyc05t01rz49h  TGS2.exe    declare p_owner varchar(30) :=...
940,649,793  164,835     5,706.61       10.47   2600.58       5389.32           8cd1y3p3rnsys  TGS2.exe    declare p_owner varchar(30) :=...
940,453,081  164,791     5,706.94       10.47   2550.73       6109.69           0tpg5r73s1146  TGS2.exe    SELECT /*+ RULE */ COUNT(*) FR...
940,296,517  164,829     5,704.68       10.47   2562.72       5316.13           6x6aknt5p0uqd  TGS2.exe    SELECT /*+ RULE */ ARGUMENT_NA...
--//可以相互验证.其实每次并不是很多,仅仅57XX 逻辑读,但是执行频率非常高.
--//164771/3600 = 45.76次/秒.

3.解决方案:
--//设置 alter system set "_optimizer_ignore_hints"=true看看.

SYS@test> alter system set "_optimizer_ignore_hints"=true scope=memory;
System altered.

--//也可以在session 设置"_optimizer_ignore_hints"=true;参数. 通过登陆触发器设置.

--//分析数据字典以及固定对象.
execute sys.dbms_stats.GATHER_FIXED_OBJECTS_STATS()
execute sys.dbms_stats.GATHER_DICTIONARY_STATS()

--//可惜提出解决方案后,楼主一直没给出反馈建议.也不知道我分析是否正确.
--//还有1个非常无赖的方案就是玩一点点欺骗,改名视图sys.all_arguments,然后建立一张真实的表代替.
--//sys.all_arguments里面实际上记录调用函数,存储过程的参数.

--//还有我记忆遇到程序大量调用SYS.ALL_ARGUMENTS是使用deliph程序开发的.
--//不过我以前遇到的情况不同,执行语句里面没owner,而是直接写ALL_ARGUMENTS,这样更加简单,我直接在用户的schema下建议一张
--//ALL_ARGUMENTS表,来欺骗应用程序,风险就是如果改存储过程带入参数发生变化,ALL_ARGUMENTS也要同步更新.不知道不更新情况会怎
--//样??

--//当然解决完上述问题,还可能需要优化剩下的sql语句,不过lz没反馈,只能分析到这里了.比如:
a4g0jrkkr3ggn

--//格式化:
  SELECT COUNT (*) AS fail_QTY, T1.WORK_ORDER, T2.DEFECT_DESC
    FROM (SELECT a.recid
                ,a.DEFECT_id
                ,A.WORK_ORDER
                ,a.SERIAL_NUMBER
                ,a.rec_time
                ,B.Defect_Code
                ,ROW_NUMBER () OVER ( PARTITION BY A.WORK_ORDER, A.SERIAL_NUMBER ORDER BY A.rec_time ASC, B.Defect_Code DESC)
                    AS rowindex
            FROM sajet.G_SN_DEFECT a, sajet.sys_defect B
           WHERE     A.PROCESS_ID = :ProcessId
                 AND A.WORK_ORDER = :WO
                 AND a.defect_id = B.defect_id
                 AND a.SERIAL_NUMBER IN
                        (SELECT SERIAL_NUMBER
                           FROM (SELECT A.WORK_ORDER
                                       ,a.SERIAL_NUMBER
                                       ,a.CURRENT_STATUS
                                       ,a.OUT_PROCESS_TIME
                                       ,RANK () OVER ( PARTITION BY A.WORK_ORDER ,A.SERIAL_NUMBER ORDER BY a.OUT_PROCESS_TIME)
                                           AS rowindex
                                   FROM sajet.G_SN_TRAVEL a
                                  WHERE     A.WORK_ORDER = :WO
                                        AND A.PROCESS_ID = :ProcessId) b
                          WHERE     b.CURRENT_STATUS = 1
                                AND b.rowindex = 1
                                AND TO_CHAR ( b.OUT_PROCESS_TIME ,'yyyymmddHH24') BETWEEN :STime AND :ETime)) T1
        ,sajet.SYS_DEFECT T2
   WHERE T1.rowindex = 1 AND T1.DEFECT_id = T2.DEFECT_id
GROUP BY T1.WORK_ORDER, T2.DEFECT_DESC
ORDER BY COUNT (*) DESC;

--//使用TO_CHAR函数,以及内层IN使用分析函数是否多余,理论讲只要exists应该就可以了.看看是否有优化余地.

4.继续分析:
--//看看Load Profile部分:
Load Profile

                  Per Second    Per Transaction
Redo size:        338,989.88    2,438.14
Logical reads:    2,509,093.86  18,046.33
Block changes:    3,404.38      24.49
Physical reads:   284.63        2.05
Physical writes:  257.32        1.85
User calls:       975.01        7.01
Parses:           925.49        6.66
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hard parses:      33.63         0.24
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sorts:            940.09        6.76
Logons:           0.64          0.00
Executes:         10,063.58     72.38
Transactions:     139.04                           

--//硬分析不是很大.但是在SQL ordered by CPU Time部分出现:

CPU Time (s)  Elapsed Time (s)  Executions  CPU per Exec (s)  % Total  % Total DB Time  SQL Id           SQL Module     SQL Text
2,655         6,346             164,771     0.02              9.15     9.88             gyc05t01rz49h  TGS2.exe       declare p_owner varchar(30) :=...
2,601         5,389             164,835     0.02              8.96     8.39             8cd1y3p3rnsys  TGS2.exe       declare p_owner varchar(30) :=...
2,563         5,316             164,829     0.02              8.83     8.28             6x6aknt5p0uqd  TGS2.exe       SELECT /*+ RULE */ ARGUMENT_NA...
2,551         6,110             164,791     0.02              8.79     9.52             0tpg5r73s1146  TGS2.exe       SELECT /*+ RULE */ COUNT(*) FR...
848           984               30          28.26             2.92     1.53             67086ft5pmkbn                 DECLARE job BINARY_INTEGER := ...
582           1,174             6,607       0.09              2.01     1.83             04t2q2nf3pd4b  APService.exe  begin SAJET.SJ_CKRT_INPUTMATER...
579           1,165             6,607       0.09              1.99     1.81             cca7qc96c2k2h  APService.exe  SELECT COUNT (DISTINCT STATION...
293           736               128,775     0.00              1.01     1.15             7dupqruy76j0u  TGS2.exe       BEGIN COMMAND_CODE(:TSAJET1, :...
213           380               690         0.31              0.73     0.59             gdb3vjjk0m4xs  APService.exe  select max(END_TIME) from SAJE...
211           392               690         0.31              0.73     0.61             88jc1wztmqbky  APService.exe  select min(END_TIME) from SAJE...

--//sql_id=gdb3vjjk0m4xs
gdb3vjjk0m4xs     select max(END_TIME) from SAJET.G_WH_HR A where A.WORK_ORDER= '20405946' AND PROCESS_ID='100085' AND SHIFT_ID IN ( 10000017) AND SHIFT_DATE ='2020/07/21' AND PDLINE_ID = 10075

SELECT max(END_TIME)
  FROM SAJET.G_WH_HR A
 WHERE A.WORK_ORDER = '20405946'
   AND PROCESS_ID   = '100085'
   AND SHIFT_ID IN ( 10000017)
   AND SHIFT_DATE   = '2020/07/21'
   AND PDLINE_ID    = 10075
--//没有使用绑定变量,应该可以有优化余地.

88jc1wztmqbky     select min(END_TIME) from SAJET.G_WH_HR A where A.WORK_ORDER= '20405946' AND PROCESS_ID='100085' AND SHIFT_ID IN ( 10000017) AND SHIFT_DATE ='2020/07/21' AND PDLINE_ID = 10075

SELECT min(END_TIME)
  FROM SAJET.G_WH_HR A
 WHERE A.WORK_ORDER = '20405946'
   AND PROCESS_ID   = '100085'
   AND SHIFT_ID IN ( 10000017)
   AND SHIFT_DATE   = '2020/07/21'
   AND PDLINE_ID    = 10075
--//没有使用绑定变量,应该可以有优化余地.我估计程序应该有很多类似的语句.awr报表仅仅出现2次.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-2710302/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
熟悉oracle相关技术,擅长sql优化,rman备份与恢复,熟悉linux shell编程。

注册时间:2008-01-03

  • 博文量
    2713
  • 访问量
    6519898