ITPub博客

首页 > 数据库 > Oracle > ORA-32701错误原因分析及处理方法

ORA-32701错误原因分析及处理方法

原创 Oracle 作者:xueshancheng 时间:2021-09-26 16:55:09 0 删除 编辑

1 在处理ORA-32701前,需要先了解以下数据库的hang管理器,参考Oracle的官方文档(Oracle Hang Manager (文档 ID 1534591.1))。

内容如下,可以认为如果数据库发生hang,一般情况数据库会自动处理的,但有时由于情况复杂,不能处理:

DETAILS   Overview

Hang Manager has existed since Oracle Database 10.2.0.1. The main goal of Hang Manager is to reliably detect and,

 if hang resolution is enabled, resolve hangs in a timely manner. Over various releases, Hang Manager has been 

enhanced along with the wait event infrastructure on which it relies. However, it is only in Oracle Database 11.2.0.2 

that Hang Manager actually resolves any hangs by terminating sessions and/or processes. This remains the default 

operation in Oracle Database 12.1.0.1.

Hang Manager is only active when Oracle RAC is enabled. You can determine whether an Oracle RAC database is 

used / enabled by querying the CLUSTER_DATABASE. If this parameter is set to "TRUE", Oracle RAC is enabled on 

this database.


Terminology

Cross Boundary Hang

This is a hang which spans the ASM/DB boundary.  In 12.1.0.1, it is possible for Hang Manager to detect hangs 

which span the boundary between a database and ASM or ASM and a database.

Deadlock or Closed Chain

A deadlock is a number of sessions making no progress which are in a closed chain or cycle. 

 All sessions have a waiter and a blocker.  The only way to break the chain is to cause one of the sessions to 

either progress or terminate.

Hang or Open Chain

From Hang Manager's perspective, a hang is a number of sessions making no progress which are in an open chain 

configuration.  This is not a cycle or deadlock.  An open chain or hang has a root or final blocker which is blocking 

all other sessions in the chain.  It also contains a final waiter which is at the end of the chain and is blocked by all 

other sessions in the chain.

Immediate Waiter 

The session in an open chain that is being blocked by the root of the hang.

Quality of Service (QoS) Management

Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests 

for an entire system. Oracle Database QoS Management manages the resources that are shared across applications, 

and adjusts the system configuration to keep the applications running at the performance levels needed by your 

business.

Root (also known as Victim or Final Blocker)

This is the session in an open chain or hang that is blocking all other sessions in the chain.

Self-Resolved Hang

This is a hang or deadlock that was detected by Hang Manager but no longer exists.  This could either be because 

the root or immediate waiter have progressed or either or both of them have been terminated but not by Hang 

Manager.

Hang Manager in 12.1.0.1

Since Oracle Database 11.2.0.2, Hang Manager may resolve hangs it detects by terminating the root session or 

process.  By default, Hang Manager will not terminate an instance or evict a node. Also, Hang Manager will not 

resolve all hangs that it detects. For example, hangs which involve a possible application issue are left to the user 

to determine the correct course of action.  Also, if Hang Manager determines that the instance on which the root 

of a hang resides is experiencing high CPU or IO load, hang resolution will be delayed.  This gives some time for 

the root to progress and the hang to resolve itself.  Finally, Hang Manager currently does not resolve any hangs 

on ASM.

Starting with Oracle Database 12.1.0.1 and if QoS is active on the cluster, Hang Manager will use additional

 information provided by QoS to determine if a hang should be ignored or resolved.  If QoS would like Hang 

Manager to resolve a hang, hang resolution may occur more quickly than Hang Manager's normal detection 

and resolution times or hang resolution may be delayed.

Previous to 12.1.0.1, hangs were detected only within the database or ASM cluster.  However, if a session 

on ASM or the database was being blocked by a session on the other side of the ASM/Database boundary

 (a Cross Boundary Hang), there was no way to detect it.  In Oracle Database 12.1.0.1, Hang Manager was

 enhanced to detect hangs which cross the ASM/Database boundary and outputs information to its various 

trace files so it is easier to troubleshoot these hangs.


同hang管理器相关的视图:

V$ Views

Hang Manager exposes some information via various V$ views.  These are now documented in the Oracle 

Database Reference.  A summary of the information in these views is below. For a more detailed description of these views, please go to the appropriate section in the Oracle Database Reference.

V$HANG_INFO

This view contains the active hangs and deadlocks detected by Hang Manager. It does not include any hangs 

which have been resolved by Hang Manager or have self-resolved. Only the latest 32 active hangs are in this view.

The root or victim of the hang is included in the information for each active hang in the view.

V$HANG_SESSION_INFO

This view contains the sessions in the main chain for all of the hangs included in the V$HANG_INFO view. 

Only the first 20 sessions of the main chain, including the root of the hang, for each hang are kept in this view.

(G)V$HANG_STATISTICS

This view contains various Hang Manager statistics regarding hangs or deadlocks which it has detected. These statistics include: the number of hangs detected, the number of deadlocks detected, the number of hangs resolved by Hang Manager, the number of hangs ignored broken down by the reason for ignoring the hang, the number of hangs which self-resolved, etc.


2 数据库的告警,查看发生hang的相关信息

节点1:

Sun Sep 27 00:31:23 2020

Errors in file /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc  (incident=920985):

ORA-32701: Possible hangs up to hang ID=1 detected                    --hang ID=1 说明数据库自动诊断出Hang

Incident details in: /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc

DIA0 requesting termination of session sid:1244 with serial # 75 (ospid:88984) on instance 3  

--请求终止实例3的会话,操作系统PID:88984

     due to a LOCAL, HIGH confidence hang with ID=1.

     Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

DIA0: Examine the alert log on instance 3 for session termination status of hang with ID=1.  

--发现实例3的会话阻塞住会话,检查实例3的alert日志



节点3:


Sun Sep 27 00:31:23 2020

Errors in file /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc  (incident=880256):

ORA-32701: Possible hangs up to hang ID=1 detected                        --hang ID=1 说明数据库自动诊断出Hang

Incident details in: /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc

DIA0 terminating blocker (ospid: 88984 sid: 1244 ser#: 75) of hang with ID = 1    

 --本节点Hang住了,故终止了此会话

    requested by master DIA0 process on instance 1                                    --请求的进程来自于实例1

    Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

    by terminating session sid:1244 with serial # 75 (ospid:88984)                      --终止了操作系统进程


首先根据hang ID=1 和 hang ID=2  查看如下视图,此视图会解释每个代码的含义:

SYS@test3>select * from V$HANG_STATISTICS;


STATISTIC# NAME                                               VALUE

---------- --------------------------------------------- ----------

         0 number of deadlocks detected and ignored               0

         1 number of hangs detected                               0

         2 number of local hangs                                  0

         3 number of global hangs                                 0

         4 number of transient hangs                              0

         5 hangs ignored due to high CPU on root's node           0

         6 hangs ignored due to high IO on root's node            0

         7 hangs ignored due to application contention            0

         8 hangs ignored due to long running operations           0

         9 hangs monitored due to archiving issues                0

        10 hangs ignored due to archiving issues                  0

        11 hangs ignored, blocked by remote database              0

        12 hangs ignored due to SQL parsing                       0

        13 hangs ignored due to dumping system state              0

        14 hangs ignored, instance termination required           0

        15 hangs ignored, only one active instance                0

        16 number of explicitly resolved hangs                    0

        17 number of self-resolved hangs                          0

        18 total self-resolved hang time in seconds               0

        19 minimum self-resolved hang time in seconds             0

        20 maximum self-resolved hang time in seconds             0

        21 number of HSC matched hangs                            0

        22 hangs resolved due to instance termination             0


23 rows selected.


4 分析数据库生成的trace文件,找出报错的根本原因

节点1 查看test1_dia0_25054_i920985.trc

Dump file /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_920985/test1_dia0_25054_i920985.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1

System name: Linux

Node name: h0783

Release: 2.6.32-642.el6.x86_64

Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016

Machine: x86_64

Instance name: test1

Redo thread mounted by this instance: 1

Oracle process number: 10

Unix process pid: 25054, image: oracle@h0783 (DIA0)



*** 2020-09-27 00:31:23.807

*** SESSION ID:(201.1) 2020-09-27 00:31:23.807

*** CLIENT ID:() 2020-09-27 00:31:23.807

*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.807

*** MODULE NAME:() 2020-09-27 00:31:23.807

*** ACTION NAME:() 2020-09-27 00:31:23.807

 

Dump continued from file: /u01/app/oracle/diag/rdbms/test/test1/trace/test1_dia0_25054.trc

ORA-32701: Possible hangs up to hang ID=1 detected


========= Dump for incident 920985 (ORA 32701) ========

----- Beginning of Customized Incident Dump(s) -----

 

One or more resolvable hangs have been detected on your system.

An attempt will be made to resolve these hangs.

 

The stack included in this incident file is not the cause of the

problem.  It is merely pointing to the module that triggered the

incident.  The short stacks of the root or final blocked of the

hang and its waiter can be found in the DIA0 trace files on their

respective local instances.

 

'Resolvable Hangs' below indicate one or more hangs that were found

and identify the final blocking session and instance on which

they occurred. Since the current hang resolution state is 'PROCESS',

any hangs requiring session or process termination will be

automatically resolved.

 

The following information will assist Oracle Support Services

in further analysis of the root cause of the hang.

 


*** 2020-09-27 00:31:23.809

Resolvable Hangs in the System

                     Root       Chain Total               Hang               

  Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution         

    ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action             

 ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

      1 HANG RSLNPEND    3  1244     2     2   HIGH  LOCAL Terminate Process     

 --此处说明Hang的会话在节点3上,会话的ID为1244

  Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

 

  inst# SessId  Ser#     OSPID PrcNm Event

  ----- ------ ----- --------- ----- -----

      3   2562   353     28939  M001 enq: WF - contention

      3   1244    75     88984  M000 not in wait                                 --此处说明会话1244  等待是事件为 not in wait

 

  The incident file on instance 3 may contain information

  about session 1244 with serial number 75 and operating system

  process ID 88984.  The incident file may contain a short stack

  and a process state dump.

 

Victim Information

                                                                      Ignored

  HangID  Inst#  Sessid  Ser Num      OSPID  Fatal BG  Previous Hang    Count

  ------  -----  ------  -------  ---------  --------  -------------  -------

       1      3    1244       75      88984     N      Existing Hang        1     


*** 2020-09-27 00:31:23.809

 

Wait-For-Graphs collected at 09/27/20 00:25:31)                                 --以下信息说明搜集的图形化阻塞的相关信息

===============================================================================

Non-intersecting chains:

 

-------------------------------------------------------------------------------

Chain 1:         --Chain 1 说明会话2562等待的事件为:'enq: WF - contention' ,被1244会话阻塞住,

1244会话的等待事件为 not int wait,但等待:event: 'gc current grant 2-way'

-------------------------------------------------------------------------------

    Oracle session identified by:

    {

                instance: 3 (test.test3)

                   os id: 28939

              process id: 512, oracle@h0785 (M001)

              session id: 2562

        session serial #: 353

    }

    is waiting for 'enq: WF - contention' with wait info:

    {

                      p1: 'name|mode'=0x57460006

                      p2: '0'=0x38

                      p3: '0'=0x0

            time in wait: 1 min 53 sec

           timeout after: never

                 wait id: 73008

                blocking: 0 sessions

            wait history:

              * time between current wait and wait #1: 0.003403 sec

              1.       event: 'db file sequential read'

                 time waited: 0.000226 sec

                     wait id: 73007           p1: 'file#'=0x2

                                              p2: 'block#'=0x1eeff

                                              p3: 'blocks'=0x1

              * time between wait #1 and #2: 0.000390 sec

              2.       event: 'db file sequential read'

                 time waited: 0.000207 sec

                     wait id: 73006           p1: 'file#'=0x2

                                              p2: 'block#'=0xe178

                                              p3: 'blocks'=0x1

              * time between wait #2 and #3: 0.002593 sec

              3.       event: 'db file scattered read'

                 time waited: 0.000276 sec

                     wait id: 73005           p1: 'file#'=0x2

                                              p2: 'block#'=0x1eefb

                                              p3: 'blocks'=0x2

    }

    and is blocked by

 => Oracle session identified by:

    {

                instance: 3 (test.test3)

                   os id: 88984

              process id: 446, oracle@h0785 (M000)

              session id: 1244

        session serial #: 75

    }

    which is not in a wait:

    {

               last wait: 1165 min 7 sec ago

                blocking: 1 session

            wait history:

              1.       event: 'gc current grant 2-way'    --此类等待事件为数据库从内存中读取数据块,发现没有,

然后从磁盘中读到内存中。故认为此等待事件正常。结合实际查询,可以认为2562的会话被1224会话阻塞住,

但由于1224会话正在从磁盘大量的读取数据块到内存中,导致的阻塞。经细查,2562会话当时操作的SQL为:

delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap 

and dbid = :dbid) and not exists  (select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid) 

and (tab.snap_id >= b.start_snap_id) and (tab.snap_id <= b.end_snap_id)) ,

 1224会话操作的SQL为: insert into wrh$_sql_bind_metadata  ,两个会话都是操作AWR相关的表,故有可能导致阻塞。

                 time waited: 0.000321 sec

                     wait id: 1878            p1: ''=0x2

                                              p2: ''=0x2a1a6

                                              p3: ''=0x2010001

              * time between wait #1 and #2: 0.000204 sec

              2.       event: 'gc current grant 2-way'

                 time waited: 0.000384 sec

                     wait id: 1877            p1: ''=0x2

                                              p2: ''=0x39862

                                              p3: ''=0x2010001

              * time between wait #2 and #3: 0.000523 sec

              3.       event: 'gc current block 3-way'

                 time waited: 0.000692 sec

                     wait id: 1876            p1: ''=0x2

                                              p2: ''=0x29dea

                                              p3: ''=0x1

    }

 

Chain 1 Signature: <not in a wait><='enq: WF - contention'

Chain 1 Signature Hash: 0xee4c6e02

-------------------------------------------------------------------------------

 

===============================================================================

Sessions in an involuntary wait or not in a wait:

 

查看实例3的test3_dia0_22856_i880256.trc

Dump file /u01/app/oracle/diag/rdbms/test/test3/incident/incdir_880256/test3_dia0_22856_i880256.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/db_1

System name: Linux

Node name: h0785

Release: 2.6.32-642.el6.x86_64

Version: #1 SMP Wed Apr 13 00:51:26 EDT 2016

Machine: x86_64

Instance name: test3

Redo thread mounted by this instance: 3

Oracle process number: 10

Unix process pid: 22856, image: oracle@h0785 (DIA0)



*** 2020-09-27 00:31:23.982

*** SESSION ID:(201.1) 2020-09-27 00:31:23.982

*** CLIENT ID:() 2020-09-27 00:31:23.982

*** SERVICE NAME:(SYS$BACKGROUND) 2020-09-27 00:31:23.982

*** MODULE NAME:() 2020-09-27 00:31:23.982

*** ACTION NAME:() 2020-09-27 00:31:23.982

 

Dump continued from file: /u01/app/oracle/diag/rdbms/test/test3/trace/test3_dia0_22856.trc

ORA-32701: Possible hangs up to hang ID=1 detected


========= Dump for incident 880256 (ORA 32701) ========

----- Beginning of Customized Incident Dump(s) -----

 

There are resolvable hangs on your system.  Hang Manger will

attempt to resolve these hangs.  Some information about these

is output below.  Complete information is available in an

incident trace file on instance 1.

 

If the hang is to be resolved by terminating the session or

process that is the root or victim of the hang, additional

information will be output on the local instance of that

session or process.  Below are the hangs for which resolution

will be attempted.

 


*** 2020-09-27 00:31:23.983

Resolvable Hangs in the System

                      Root       Chain Total               Hang               

   Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution         

     ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action             

  ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------

      1 HANG RSLNPEND    3  1244     2     2   HIGH  LOCAL Terminate Process  

  Hang Resolution Reason: Although the number of affected sessions did not

    justify automatic hang resolution initially, this previously ignored

    hang was automatically resolved.

 

      inst# SessId  Ser#     OSPID PrcNm Event

      ----- ------ ----- --------- ----- -----

          3   2562   353     28939  M001 enq: WF - contention

          3   1244    75     88984  M000 not in wait

 

Dumping process info of pid[446.88984] (sid:1244, ser#:75)

    requested by master DIA0 process on instance 1.

 


*** 2020-09-27 00:31:23.983

Process diagnostic dump for oracle@h0785 (M000), OS id=88984,

pid: 446, proc_ser: 32, sid: 1244, sess_ser: 75 

-------------------------------------------------------------------------------

os thread scheduling delay history: (sampling every 1.000000 secs)

  0.000000 secs at [ 00:31:23 ]

    NOTE: scheduling delay has not been sampled for 0.647808 secs  0.000000 secs from [ 00:31:19 - 00:31:24 ], 5 sec avg

  0.000000 secs from [ 00:30:24 - 00:31:24 ], 1 min avg

  0.000000 secs from [ 00:26:24 - 00:31:24 ], 5 min avg

loadavg : 3.04 3.01 2.86

Memory (Avail / Total) = 363436.51M / 516750.80M

Swap (Avail / Total) = 65536.00M /  65536.00M

F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD

0 R oracle    88984      1 99  80   0 - 315843 ?     Sep26 ?        19:28:37 ora_m000_test3

Short stack dump:  --此处为trace生成的二进制错误的堆键信息

ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+247<-kqlfbctc()+645<-kgligi()+31<-kglic_cbk()+345<-kglic0()+751<-kglic()+74

<-kqlfbct()+376<-qerfxFetch()+3072<-qerjotRowProc()+359<-qerhjInnerProbeHashTable()+491<-qerfxFetch()+1106<-rwsfcd()+103<-qerhjFetch()+621<-qerjotFetch()+2025

<-rwsfcd()+103<-qerltcFetch()+1223<-insexe()+691<-opiexe()+5632<-kpoal8()+2380<-opiodr()+917<-kpoodrc()+36<-rpiswu2()+1776<-kpoodr()+636<-upirtrc()+2436<-kpurcsc()+98

<-kpuexec()+10790<-OCIStmtExecute()+39<-kewrose_oci_stmt_exec()+55<-kewrgwxf1_gwrsql_exft_1()+317<-kewrgwxf_gwrsql_exft()+496<-kewrews_execute_wr_sql()+52

<-kewrftbs_flush_table_by_sql()+180<-kewrfabt_flush_attribute_table()+997<-kewrfsb_flush_sqlbind()+34<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766

<-kewrftsq_flush_topsql()+768<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfsr_flush_snapshot_r()+191

<-kewrrfs_remote_flush_slave()+1169<-kebm_slave_main()+586<-ksvrdp()+1766<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201

<-__libc_start_main()+253

 

-------------------------------------------------------------------------------

Process diagnostic dump actual duration=0.340000 sec

  (max dump time=15.000000 sec)


*** 2020-09-27 00:31:24.319   --以下为报错是执行想相关SQL,相关的表为 insert into wrh$_sql_bind_metadata  

current sql: insert into wrh$_sql_bind_metadata   (snap_id, dbid,    sql_id, name, position, dup_position,    

datatype, datatype_string,    character_sid, precision, scale, max_length)  SELECT /*+ ordered use_nl(bnd) 

index(bnd sql_id) */      :lah_snap_id, :dbid,      bnd.sql_id, name, position, dup_position,      datatype, dataty

 

                    ----------------------------------------

                    SO: 0x1ec14370a8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3

                     proc=0x1ec14370a8, name=process, file=ksu.h LINE:12721, pg=0

                    (process) Oracle pid:446, ser:32, calls cur/top: 0x1d32cfb3c8/0x1d33002b20

                              flags : (0x2) SYSTEM

                              flags2: (0x30),  flags3: (0x10) 

                              intr error: 0, call error: 0, sess error: 0, txn error 0

                              intr queue: empty

                      ksudlp FALSE at location: 0


5  根据第二步的会话ID,查询在故障时间点执行的SQL,根据如下信息,可以确定在故障点2562会话确实被1244会话阻塞住:


SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID from 

dba_hist_active_sess_history where     INSTANCE_NUMBER=3 and SESSION_ID=1244 and

     SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and 

to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS')  order by 3;


SESSION_ID SESSION_SERIAL# SAMPLE_TIME                           WAIT_TIME BLOCKING_SESSION SQL_ID

---------- --------------- ------------------------------------ ---------- ---------------- -------------

      1244              75 27-SEP-20 12.20.08.580 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.20.18.620 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.20.28.640 AM                   321                  82h7w3w07k63z

...........................

      1244              75 27-SEP-20 12.27.40.744 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.27.50.854 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.00.894 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.10.931 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.20.971 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.28.31.021 AM                   321                  82h7w3w07k63z

...........................

      1244              75 27-SEP-20 12.32.42.270 AM                   321                  82h7w3w07k63z

      1244              75 27-SEP-20 12.32.52.370 AM                   321                  82h7w3w07k63z


77 rows selected. 


SYS@test3>select SESSION_ID,SESSION_SERIAL#,SAMPLE_TIME,WAIT_TIME,BLOCKING_SESSION,SQL_ID 

from dba_hist_active_sess_history where 

   INSTANCE_NUMBER=3 and SESSION_ID=2562 and

  SAMPLE_TIME between to_timestamp('2020-09-27 00:20:00','yyyy-mm-dd HH24:MI:SS') and 

to_timestamp('2020-09-27 00:40:00','yyyy-mm-dd HH24:MI:SS')

    order by 3; 


SESSION_ID SESSION_SERIAL# SAMPLE_TIME                        WAIT_TIME BLOCKING_SESSION SQL_ID

---------- --------------- --------------------------------- ---------- ---------------- -------------

      2562             347 27-SEP-20 12.20.28.640 AM                  3                  0v9v22h2nuc0q

      2562             353 27-SEP-20 12.22.29.241 AM                  0                  7gfhbhxrxx8wf

      2562             353 27-SEP-20 12.22.39.291 AM                408                  9v9n97qj8z1dg

      2562             353 27-SEP-20 12.22.49.421 AM                  0                  bzscyq07w79ab

      2562             353 27-SEP-20 12.22.59.451 AM                  0                  bzscyq07w79ab

      2562             353 27-SEP-20 12.23.09.511 AM                  0                  bzscyq07w79ab

      2562             353 27-SEP-20 12.23.19.551 AM                  0                  2whm2vvjb98k7

      2562             353 27-SEP-20 12.23.29.581 AM                186                  acc988uzvjmmt

      2562             353 27-SEP-20 12.23.39.621 AM                  0             1244

      2562             353 27-SEP-20 12.23.49.731 AM                  0             1244

      2562             353 27-SEP-20 12.23.59.751 AM                  0             1244

      2562             353 27-SEP-20 12.24.09.810 AM                  0             1244

.....................

      2562             353 27-SEP-20 12.32.32.240 AM                  0             1244

      2562             353 27-SEP-20 12.32.42.270 AM                  0             1244

      2562             353 27-SEP-20 12.32.52.370 AM                  0             1244

      2562             353 27-SEP-20 12.33.02.420 AM                149                  18yuw9npdrxwc

      2562             353 27-SEP-20 12.33.12.440 AM                  0                  9wygvu6cx2npy

      2562             353 27-SEP-20 12.33.22.480 AM                  0                  9wygvu6cx2npy

      2562             353 27-SEP-20 12.33.32.510 AM                  0                  gkbrnf9xhv80g


68 rows selected. 


acc988uzvjmmt 执行的SQL:

delete from WRH$_SYSMETRIC_HISTORY tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap 

and dbid = :dbid) and not exists 

(select 1 from WRM$_BASELINE b where (tab.dbid = b.dbid) and (tab.snap_id >= b.start_snap_id) 

and (tab.snap_id <= b.end_snap_id))


6 查询数据库的AWR生成信息,发现在故障点不能生成AWR:如下


Listing the last 3 days of Completed Snapshots


                                                        Snap

Instance     DB Name        Snap Id    Snap Started    Level

------------ ------------ --------- ------------------ -----

test3       test            46313 26 Sep 2020 00:00      1

                              46314 26 Sep 2020 01:00      1


                              46315 26 Sep 2020 02:00      1

                              46316 26 Sep 2020 03:00      1

                              46317 26 Sep 2020 04:00      1    --26日的4点 直接到27日的7:00

                              46325 27 Sep 2020 07:00      1

                              46326 27 Sep 2020 08:00      1

                              46327 27 Sep 2020 09:00      1

                              46328 27 Sep 2020 10:00      1

                              46329 27 Sep 2020 11:00      1

                              46330 27 Sep 2020 12:00      1

                              46331 27 Sep 2020 13:00      1

                              46332 27 Sep 2020 14:00      1






7 查询x$KQLFBC, 发现等待2小时都无法出来结果,通过追踪,找到执行计划,发现此表由1.4亿(138*1024*1024)条数据,

故导致运行2小时无法查询出结果



SYS@test3> select count(*) from x$kqlfbc;


Plan hash value: 4012607404


----------------------------------------------------------------------

| Id  | Operation         | Name     | Rows  | Cost (%CPU)| Time     |

----------------------------------------------------------------------

|   0 | SELECT STATEMENT  |          |       |    18 (100)|          |

|   1 |  SORT AGGREGATE   |          |     1 |            |          |

|   2 |   FIXED TABLE FULL| X$KQLFBC |   138K|    18 (100)| 00:00:01 |

----------------------------------------------------------------------



14 rows selected.



SYS@test3>select count(*) from x$ksmsp;


  COUNT(*)

----------

   3102652


8 由于ORA-32701故障发生在wrh$_sql_bind_metadata及 x$kqlfbc表上,是由于系统有大量的绑定变量引起的,

故查询如下参数,发现如下参数设置太小,需要调整。 

SYS@test1>show parameter cursor


NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cursor_bind_capture_destination      string      memory+disk

cursor_sharing                       string      EXACT

cursor_space_for_time                boolean     FALSE

open_cursors                         integer     300

session_cached_cursors               integer     500

        

PARAMETER              VALUE      USAGE

---------------------- ---------- -----

session_cached_cursors   500       258%

open_cursors             300       432%



HIGHEST_OPEN_CUR MAX_OPEN_CUR

---------------- --------------------

             246 300



PARAMETER                 VALUE                          USAGE

-------------------------------------------------------- ---------------

session_cached_cursors      500                           212%

open_cursors                300                           356%


                                                                                                 


HIGHEST_OPEN_CUR MAX_OPEN_CUR

---------------- --------------------

             236 300


PARAMETER              VALUE      USAGE

---------------------- ---------- -----

session_cached_cursors   500       249%

open_cursors             300       418%


ORA-32701问题的解决方法,以下的解决方法,是根据所有的Oracle官方文档总结出来的,

需要根据实际情况采用不同的方法,一般情况下一种方法不行,就使用第二种,直到解决问题。

一般情况下重启数据库实例就可以临时的解决此问题,但生产环境一般是不允许重启实例的,故需要

使用不重启实例的方法解决。



9.1 搜集如下表的统计信息

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRTSQLPLAN');

exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');

exec dbms_stats.gather_fixed_objects_stats;

exec dbms_stats.gather_schema_stats('SYS');


查询相关表的统计信息是否过期:    

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRATTRNEW';

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRTSQLPLAN';

select OWNER, TABLE_NAME, LAST_ANALYZED from dba_tab_statistics where table_name='X$KEWRSQLIDTAB';



9.2 有时此表x$KQLFBC的数据量会特别大,会导致查询不出来或不能生成AWR,故需要通过设置隐含参数和刷共享池来解决

SYS@test1>select count(1) from x$KQLFBC;


  COUNT(1)

----------

    113714

    

select count(*) from  X$KQLFBC;


刷共享池

alter system flush shared_pool;

设置隐含参数

alter system set "_awr_disabled_flush_tables"='wrh$_sql_bind_metadata';



部署自动刷新shared的脚本


10 8,18 * * *  /home/oracle/database_check_sxwy/scripts/flush_shared.sh    


#!/bin/bash                                                                                      

export ORACLE_SID=test1                                                                         

sqlplus / as sysdba <<EOF

alter system flush shared_pool;                           

exit;                                                                                            

EOF


9.3 第三种禁用方法

先查询AWR Table Info (KEW layer)使用下面的SQL:


select table_id_kewrtb, table_name_kewrtb from x$kewrtb order by 1;


TABLE_ID_KEWRTB TABLE_NAME_KEWRTB

--------------- ----------------------------------------------------------------

              0 WRM$_DATABASE_INSTANCE

              1 WRM$_SNAPSHOT

              2 WRM$_BASELINE

              3 WRM$_WR_CONTROL

              4 WRH$_STAT_NAME

。。。。。。。。。。

             53 WRH$_TABLESPACE_SPACE_USAGE

             54 WRH$_METRIC_NAME

             55 WRH$_SYSMETRIC_SUMMARY

             56 WRH$_SQL_BIND_METADATA

             57 WRH$_JAVA_POOL_ADVICE

             58 WRH$_THREAD

             59 WRH$_SESSMETRIC_HISTORY

             60 WRH$_OSSTAT

             61 WRH$_OSSTAT_NAME

             62 WRH$_SYS_TIME_MODEL

             63 wri$_optstat_opr

             64 WRH$_OPTIMIZER_ENV

             65 WRM$_SNAP_ERROR

             66 WRH$_SERVICE_WAIT_CLASS

             67 WRH$_CR_BLOCK_SERVER

。。。。。。。。。。。

            115 WRM$_WR_USAGE

            116 WRH$_MVPARAMETER

            117 WRH$_PERSISTENT_QMN_CACHE

            118 WRH$_TABLESPACE

            119 WRM$_SNAPSHOT_DETAILS

            120 WRI$_SQLTEXT_REFCOUNT


121 rows selected.



从上面的查询中得到table_id用于下面的level 值,比如我们这里表wrh$_sql_bind_metadata对应的是56


alter session set events ‘immediate trace name awr_flush_table_off level 56’;


后期如果想打开flushing to awr 如果第二种方法禁用,打开使用


alter session set events ‘immediate trace name awr_flush_table_on level 56’ off;


9.4 第四种解决方法为数据库打相关的补丁。


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/69996316/viewspace-2793845/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
本人目前就职于北京海天起点技术服务有限股份公司,从事Oracle数据库有十几年了,对Oracle及goldengate比较精通。

注册时间:2021-03-11

  • 博文量
    38
  • 访问量
    9249