Top 5 中表明log file 可能存在问题；log file sync 、log file parallel write的Avg Wait(ms)都偏高；
Top 5提供优化基本方向可能是log file 问题，继续向下分析！
Log file parallel write的Avg Wait(ms)指标超过20，根据经验意味着存在 IO争用了。
说明redo log files存在IO争用
Per hour=9.95 约6分钟切换一次，这个是远高于15~20分钟公认的切换一次
说明redo log files 过小
User calls/(user commits+user rollbacks) <30 这个时候数据库commit是频繁的
1、 redo log files存在IO争用
2、 redo log files 过小
1、log file parallel write IO争用：建议更换IO性能高的磁盘，此系统为在线生产系统目前先不做更换，做好更换的规划
2、log switches (derived)：
附表：转MOS文档1376916.1 （排除log file sync思路）
Troubleshooting: "log file sync" Waits (文档 ID 1376916.1)
When a user session commits, the session's redo information needs to be flushed from memory to the redo logfile to make it permanent.
At the time of commit, the user session will post the LGWR to write the log buffer (containing the current unwritten redo, including this session's redo information) to the redo log file. When the LGWR has finished writing, it will post the user session to notify it that this has completed. The user session waits on 'log file sync' while waiting for LGWR to post it back to confirm all redo changes have made it safely on to disk.
The time between the user session posting the LGWR and the LGWR posting the user after the write has completed is the wait time for 'log file sync' that the user session will show.
Note that if a sync is ongoing, other sessions that want to commit (and thus flush log information) will also wait for the LGWR to sync and will also wait on 'log file sync'?
To initially analyse 'log file sync' waits the following information is helpful:
Waits for the 'log file sync' event can occur at any stage between a user process posting the LGWR to write redo information and the LGWR posting back the user process after the redo has been written from the log buffer to disk and the user process waking up.
For more information see:
Document:34592.1 WAITEVENT: "log file sync"
In terms of the most common causes, these are :
Details of these causes and how to troubleshoot them are outlined below:
Wait event 'log file parallel' write is waited for by LGWR while the actual write operation to the redo is occurring. The duration of the event shows the time waited for the IO portion of the operation to occur. For more information on "log file parallel write" see: :
Document:34583.1 WAITEVENT: "log file parallel write" Reference Note
Looking at this event in conjunction with "log file sync" shows how much of the sync operation is spent on IO and also, by inference, how much processing time is spent on the CPU.
The example above shows high wait times for both 'log file sync' and 'log file parallel write'
If the proportion of the 'log file sync' time spent on 'log file parallel write' times is high, then most of the wait time is due to IO (waiting for the redo to be written). The performance of LGWR in terms of IO should be examined. As a rule of thumb, an average time for 'log file parallel write' over 20 milliseconds suggests a problem with IO subsystem.
Even if the average wait for 'log file parallel write' may be in the normal range, there may be peaks where the write time is longer and will therefore influence waits on 'log file sync'. From 10.2.0.4, messages are written in the LGWR trace when a write to the log file takes more than 500 ms. This is quite a high threshold so a lack of messages does not necessarily mean there is no problem. The messages look similar to the following:
*** 2011-10-26 10:14:41.718
Warning: log write elapsed time 21130ms, size 1KB
(set event 10468 level 4 to disable this warning)
*** 2011-10-26 10:14:42.929
Warning: log write elapsed time 4916ms, size 1KB
(set event 10468 level 4 to disable this warning)
Note: Peaks like the following may not have a high influence on the 'log file parallel wait' if they are far between. However , if 100s of sessions are waiting for the 'log file parallel wait' to complete, the total wait for 'log file sync' can be high as the wait time will be multiplied for the 100s of sessions. Therefore it is worth investigating the reason for the high peaks in IO for the log writer.
Document:601316.1 LGWR Is Generating Trace file with "Warning: Log Write Time 540ms, Size 5444kb" In 10.2.0.4 DatabaseRecommendations
Note: These warnings can be particularly useful for preempting potential issues. Even if a general problem in terms of the average wait time is not been seen, by highlighting extreme peaks of IO performance, a dba can have a useful indicator that LGWR is encountering intermittent issues. These can then be resolved before they cause outages or similar.
A 'log file sync' operation is performed every time the redo logs switch to the next log to ensure that everything is written before the next log is started. Standard recommendations are that a log switch should occur at most once every 15 to 20 minutes. If switches occur more frequently than this, then more 'log file sync' operations will occur meaning more waiting for individual sessions.
Thu Jun 02 14:57:01 2011
Thread 1 advanced to log sequence 2501 (LGWR switch)
Current log# 5 seq# 2501 mem# 0: /opt/oracle/oradata/orcl/redo05a.log
Current log# 5 seq# 2501 mem# 1: /opt/oracle/logs/orcl/redo05b.log
Thu Nov 03 14:59:12 2011
Thread 1 advanced to log sequence 2502 (LGWR switch)
Current log# 6 seq# 2502 mem# 0: /opt/oracle/oradata/orcl/redo06a.log
Current log# 6 seq# 2502 mem# 1: /opt/oracle/logs/orcl/redo06b.log
Thu Nov 03 15:03:01 2011
Thread 1 advanced to log sequence 2503 (LGWR switch)
Current log# 4 seq# 2503 mem# 0: /opt/oracle/oradata/orcl/redo04a.log
Current log# 4 seq# 2503 mem# 1: /opt/oracle/logs/orcl/redo04b.log
In the above example we see log switches every 2 to 4 minutes which is at best 5 times more frequent than the recommendations.
The example above shows that based on the information in AWR, there are 29.98 redo logs switches per hour: ~1 switch every 2 minutes. This is higher than the accepted value of 1 switch every 15-20 minutes and will have an affect on the time foreground process will need to wait for 'log file sync' waits to complete because the overhead of initiating the sync operation more than necessary.
Increase the size of the redo logs
In this case the question to answer is "Is the Application Committing too Frequently?".
If it is , then the excessive commit activity can cause performance issues since commits flush redo from the log buffer to the redo logs which can cause waits for 'log file sync'.
To identify a potential high commit rate, if the average wait time for 'log file sync' is much higher than the average wait time for 'log file parallel write', then this means that most of the time waiting is not due to waiting for the redo to be written and thus slow IO is not the cause of the problem. The surplus time is CPU activity and is most commonly contention caused by over committing.
Additionally, if the average time waited on 'log file sync' is low, but the number of waits is high, then the application might be committing too frequently.
In the AWR or Statspack report, if the average user calls per commit/rollback calculated as "user calls/(user commits+user rollbacks)" is less than 30, then commits are happening too frequently:
In the above example we see an average of 5.76 user calls per commit which is considered high - about 5x higher that recommended.
Rule of thumb, we should expect at least 25 user calls / commit. This of course depends on the application.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/26442936/viewspace-767901/，如需转载，请注明出处，否则将追究法律责任。