ITPub博客

首页 > 数据库 > Oracle > 物理dg - Monitoring Redo Generation and Managed Recovery Performance

物理dg - Monitoring Redo Generation and Managed Recovery Performance

Oracle 作者:yu5782647 时间:2014-01-20 13:20:39 0 删除 编辑
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Information in this document applies to any platform.

Goal

To collect diagnostics relevant to troubleshoot Physical Standby managed recovery performance issues.

Solution

The following action plan will provide additional diagnostics for troubleshooting performance issues in physical standby databases. Please note the scripts may need to be altered to suit the sites requirements. Ensure they are TESTED in an environment that is NOT production prior to executing them against production environments.


Data Guard Managed Recovery Performance Monitoring


1. Download and install OS Watcher per

     Note: 301137.1 OSWatcher Black Box

     in both primary and standby site (all nodes if using RAC) and supply OSW data taken at regular intervals (for example: 30 seconds) covering the slow performance period.
     Please create or edit private.net (from ExamplePrivate.net), add the host IP used by log shipping to gather network stats.

2. Download and install ProcWatcher per

    Note: 459694.1 Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes

    on the Standby Site, then gather a ProcWatcher Package as outlined in this Note


3. On the Primary site perform the following to gather redo generation metrics.
    As the RDBMS Home Owner: (in the sample command given below, the OS user is oracle), download DGscripts.zip from this note

$ mkdir DGPerf
$ cp DGscripts.zip DGPerf
$ cd DGPerf
$ unzip DGscripts.zip
$ chown oracle:oinstall *ksh
$ chmod u+x *ksh
Set the users shell environment using oraenv to the SID for the Primary site instance. Execute the redo size monitoring script:
$ ./rs.ksh 60 &e


4. On the Standby site,
4. 1 As the ASM/GRID Home Owner: (in the sample commands given below the OS user is grid), download asmiostat.zip from Note 437996.1 ASMIOSTAT Script to collect iostats for ASM disks

$ mkdir ASMPerf
$ cp asmiostat.zip ASMPerf
$ cd ASMPerf
$ unzip asmiostat.zip
$ chown grid:oinstall *ksh
$ chmod u+x *ksh
$ ./asmiostat.sh 30 > asmiostat.out &


4.2  As the RDBMS Home Owner: (in the sample commands given below the OS user is oracle), download DGscripts.zip from this note

$ mkdir DGPerf
$ cp DGscripts.zip DGPerf
$ cd DGPerf
$ unzip DGscripts.zip
$ chown oracle:oinstall *ksh
$ chmod u+x *ksh


4.3  As the RDBMS Home owner, (oracle), set the environment ORACLE_SID and ORACLE_HOME to the Standby instance.
    Cancel managed recovery:

sqlplus / as sysdba
SQL> recover managed standby database cancel
SQL> exit

    Start collecting the managed recovery performance metrics:

$ ./recStats.ksh 30 &
$ ./prWaits.ksh 30 &
$ ./stbyStats.ksh 30 &

    Start the managed recovery again

sqlplus / as sysdba
SQL> recover managed standby database using current logfile disconnect
SQL> exit


  Let this run for a minimum of 1 hour or at least 2 log switches while the performance problems are occurring.


5. Once the performance problems have occurred and diagnostic information is collected, stop the scripts on both sites:
    On the standby node

$ ps -ef | grep recStats
$ kill -9
$ ps -ef | grep prWaits
$ kill -9
$ ps -ef | grep stbyStats
$ kill -9
$ ps -ef | grep asmiostat
$ kill -9

   On the Primary node

$ ps -ef | grep rs.ksh
$ kill -9


6. Upload the following information:

a. the output files generated by the scripts written to the directories DGPerf and ASMPerf at each site.
b. the alert logs from each sites RDBMS and ASM instances (all instances for RAC)
c. the output per Note 241438.1 Script to Collect Data Guard Physical Standby Diagnostic Information from the standby
d. the output per Note 241374.1 Script to Collect Data Guard Primary Site Diagnostic Information from the primary
e. OSwatcher and ProcWatcher data covering the above diagnostic collection time.


Please Note : These scripts have been written by someone, who is not clear,  so I am unable to acknowledge this person or provide support for the scripts themselves. Should any problems be encountered these will need to be corrected through troubleshooting the issues yourself.


Manager Recovery Monitoring - Standby Site

prWaits.ksh

--------------

#!/bin/ksh
#
# prWaits.ksh gathers wait event data for the MRP (managed recovery process) and PR0n (parallel recovery slaves).
# The output is written to one file in the following format: __.
# By default the view data is sampled every 15 seconds.
#
if [ $# -eq 0 ]; then
  $0 5        # Run every 5 secs
elif [ $# -ne 1 ]; then
  echo "Usage: $0 "
  exit 1
fi
#
run_interval=$1
#
while [ 1 ]; do
sqlplus -s '/ as sysdba' << eof >> prWaits_`uname -n`_${ORACLE_SID}
set lines 200 pages 2000
col process format a8
col spid format a8
col event format a50 tru
col SIW format 999999
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
       ,s.process
       , p.spid
       , substr(s.program, -6) PROC
       , s.event
       , s.p1
       , s.p2
       , s.p3
       , s.seconds_in_wait SIW
       , s.seq#
from v\$session s, v\$process p
where p.addr = s.paddr and (s.program like '%MRP%' or s.program like '%PR0%' or s.program like '%DBW%' or s.program like '%CKPT%')
order by s.process
/
eof
sleep ${run_interval}
done



recStats.ksh

--------------

#!/bin/ksh
#
# RecStats.ksh gathers data from the following V$ views at defined sample intervals:
# - V$SYSTEM_EVENT
# - V$EVENT_HISTOGRAM
# - V$SYSSTAT
# The output is written to 3 files (one per view) in the following format: __.
# By default the view data is sampled every 15 seconds.
#
if [ $# -eq 0 ]; then
  $0 15 # Run every 15 secs
elif [ $# -ne 1 ]; then
  echo "Usage: $0 "
exit 1
fi
#
run_interval=$1
#
while [ 1 ]; do
sqlplus -s '/ as sysdba' << eof >> system_event_`uname -n`_${ORACLE_SID}
set linesize 250 pagesize 2000 numwidth 25
col event format a60
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
     , event
     , total_waits
     , total_timeouts
     , time_waited
     , average_wait
  from v\$system_event
order by time_waited
/
eof


sqlplus -s '/ as sysdba' << eof >> event_histogram_`uname -n`_${ORACLE_SID}
set linesize 250 pagesize 2000 numwidth 25
col event format a60
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
     , event
     , wait_time_milli
     , wait_count
  from v\$event_histogram
order by event, wait_time_milli
/
eof
sqlplus -s '/ as sysdba' << eof >> sysstat_`uname -n`_${ORACLE_SID}
set linesize 250 pagesize 2000 numwidth 25
col name format a60
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
     , name
     , value
  from v\$sysstat
/
eof
sleep ${run_interval}
done



Redo Generation Monitoring - Primary Site

rs.ksh

-------


#!/bin/ksh
if [ $# -eq 0 ]; then
  $0 30        # Run every 30 secs
elif [ $# -ne 1 ]; then
  echo "Usage: $0 "
  exit 1
fi
#
run_interval=$1
#
while [ 1 ]; do
sqlplus -s '/ as sysdba' << eof >> redoSize_`uname -n`_${ORACLE_SID}.out
set linesize 150 pagesize 2000
col value format 9999999999999999
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time", value
from v\$sysstat
where name = 'redo size'
/
eof
sleep ${run_interval}
done


Standby Recovery Progress Monitoring

stbyStats.ksh

---------------------
#!/bin/ksh
if [ $# -eq 0 ]; then
  $0 15        # Run every 15 secs
elif [ $# -ne 1 ]; then
  echo "Usage: $0 "
  exit 1
fi
#
run_interval=$1
#
while [ 1 ]; do
sqlplus -s '/ as sysdba' << eof >> recoveryData_`uname -n`_${ORACLE_SID}.out
set linesize 200 pagesize 2000
col units format a20
col comments format a15
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
     , ITEM
     , SOFAR
     , TOTAL
     , UNITS
     , to_char(TIMESTAMP,'DD-MON-YYYY HH24:MI:SS') "Timestamp"
from v\$recovery_progress
where total = 0
/
col name format a30
col value format a18
col unit format a35
col time_computed format a22
select
       NAME
     , VALUE
     , UNIT
     , DATUM_TIME
     , TIME_COMPUTED
from v\$dataguard_stats
order by time_computed
/
col inst_id format 99
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
     , INST_ID
     , PROCESS
     , PID
     , CLIENT_PROCESS
     , STATUS
     , THREAD#
     , SEQUENCE#
     , BLOCK#
     , BLOCKS
from gv\$managed_standby
order by inst_id
/
select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Current time"
     , GROUP#
     , THREAD#
     , SEQUENCE#
     , USED
from v\$standby_log
where STATUS = 'ACTIVE'
/
eof
sleep ${run_interval}
done

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24577884/viewspace-1073055/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2013-05-15

  • 博文量
    32
  • 访问量
    222020