Oracle Database - Enterprise Edition - Version 188.8.131.52 and later
Information in this document applies to any platform.
The database version is 184.108.40.206
The log writer process on the database is blocking most of the other sessions and taking an average wait time of 10ms to commit.
On checking the hardware, the I/O wait time is very small from the hardware side
This means that all the time is on CPU processing the request.
Database is OLTP-based and has high commits/second
The same issue did not occur in 10g.
The problem seems to 'go away' temporarily after the database is restarted but ultimately returns after a few days.
AWR may show 'log file sync' waits as the highest wait:
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
log file sync 845,096 8,747 10 88.05 Commit
Hang analysis may show wait on log file sync as well:
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
Chain 1 Signature Hash: 0xcfe404d3
[b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
Chain 2 Signature Hash: 0xcfe404d3
[c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
Chain 3 Signature Hash: 0xcfe404d3
The stack of the LGWR process may look like following:
The database was upgraded from 10.2.0.4 to 220.127.116.11.1.
The following bug was filed for this issue:
Bug:13930580 Bug 13930580: LGWR IS BLOCKING SESSIONS
One of the findings in the bug indicated frequent switching back and forth between post/wait and polling as shown in the LGWR trace file:
*** 2012-04-24 19:07:09.319
kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0
*** 2012-04-24 19:07:09.319
Log file sync switching to post/wait
Current approximate redo synch write rate is 295 per sec
*** 2012-04-24 22:36:36.917
Warning: log write elapsed time 828ms, size 11KB
kcrfw_update_adaptive_sync_mode: post->poll long#=79 sync#=569 sync=531877
poll=10012 rw=406 rw+=409 ack=0 min_sleep=10012
In this case the bug recommended disabling the _use_adaptive_log_file_sync by changing it from its default setting of true to false as a possible solution.
This parameter was introduced in Oracle 11g. However, it was not turned on to true until 18.104.22.168.
The bug describes the adaptive switching controlled by the parameter in the following way:
With " _use_adaptive_log_file_sync " = true in 22.214.171.124 ,adaptive switching between post/wait and polling for log file sync is implemented. When polling is adaptively selected, the polling interval is also dynamically adjusted. If post/wait is selected and the foreground processes fail to receive a post from LGWR, an incident is recorded, diagnostic traces are performed, and polling is used instead of post/wait.
This implies that as there are no other symptoms of issues with I/O or in other areas, the problem could be with excessive switching between post/wait and polling wait methods.
Set the parameter _use_adaptive_log_file_sync = false and restart the database:
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync" = FALSE;
In cases where a restart is not feasible, then you can set in memory and also in the SP file for when a restart does occur:
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync"=FALSE; -- to set in memory
SQL> ALTER SYSTEM SET "_use_adaptive_log_file_sync"=FALSE scope=sfile sid='*'; -- to set on spfile, so parameter is used at next restart.
This will disable the adaptive switching.
If this solution does not alleviate the problem then see:
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/13750068/viewspace-1799368/，如需转载，请注明出处，否则将追究法律责任。