ITPub博客

首页 > Linux操作系统 > Linux操作系统 > 100 Things You Probably Didn't Know About Oracle Database

100 Things You Probably Didn't Know About Oracle Database

原创 Linux操作系统 作者:zecaro 时间:2011-02-11 21:51:20 0 删除 编辑

题记:

        最近想去了解国外ORACLE方面的blog,正好看到Arup为OTN写的文章,所以拜访了一下Arup的blog,看到他正开始写这个系列——100 Things You Probably Didn't Know About Oracle Database

        细细读之,觉得很好,内容虽然基础,不过思路清晰,讲得很透彻。之后的文章还有一些实际的操作验证方法,有些像Tom的风格。文章后的评论和讨论也很值得一读。分享之~

        这一篇主要讲的是oracle的“写”,包括写redo,写进磁盘等方面的内容。

        这里翻译和补充一下文章最后的总结:

(1)数据buffer刷入磁盘与commit语句是相互独立的。commit不会触发数据buffer刷入磁盘。
(2)dirty buffer:在buffer cache中已经修改了但还未刷入磁盘的buffer
(3)如果一个buffer是clean(也就是说不是dirty),并不意味着数据的变化的相关语句已经commit。
(4)commit语句触发redo日志的log buffer(而不是数据buffer)写入磁盘
(5)redo日志的log buffer可能因为其它事件的触发已经写入磁盘(而不仅是commit)。所以在redo log file里面记录的内容并不一定是已经提交的。(比如,用logminer时有显示commit语句的方式)
(6)commit语句会在redo log里面记录一个特别的“commit marker”
(7)数据buffer刷入磁盘的频率与
MTTR(Mean Time To Recover)设置 以及 是否需要free buffer来cache新到来的数据文件中的block有关。

   那么如何来利用这些信息?你可以考虑如下:

(1) commit越多,log buffer写磁盘次数就越多,这对I/O不是很好
(2) MTTR设置的越小,instance crash时恢复时间也就越短,但是写datafile的频率也越高——i/o操作
(3) MTTR与commit频率无关,它们相互独立。所以减少commit的频率不会减少刷数据到磁盘的频率。
(4) 如果buffer cache比较小,则需要更频繁地刷数据到磁盘上
(5) 到底多少才算“小”呢?这没有固定的公式。这取决于在buffer cache中有多少数据被更新了。

 


    100 Things You Probably Didn't Know About Oracle Database
    作者:Arup  原文链接

Recently, while delivering a presentation on Cache Fusion at New York Oracle Users Group (www.nyoug.org), the regional user group where I have been a long time member, I was surprised to hear from many participants some beliefs they had held for a long time that were completely wrong. I always thought these were as obvious as they come; but of course I was dead wrong. What was even more surprising that most of these believers were veterans in Oracle Database technologies; not newbies. Part of the problem – I think – lies with the system that focuses on the execution rather than learning and part of it due to the lack of clear documentation. During that discussion some encouraged me to write about these. I immediately agreed it was a great idea and merited serious attention. Here is the product: my attempt at explaining some of the “mysteries” of how Oracle Database operates. I will cover 100 such nuggets of information, roughly once a week.

Before you start, however, I would like to bring your attention to this important point. You may already be aware of these facts. I did; so it is reasonable to believe that a vast majority would as well. Under no circumstances I am claiming these to be ground breaking or awe-inspiring. If you are already familiar with this specific knowledge, I am not at all surprised. Please feel free to skip. For those who read on, I hope you found these helpful and will take a moment to write to me how you felt.

Part 1: Myth of Commit Causing Buffer to be Flushed to the Disk

Consider a scenario: In the EMP table I updated my salary from 1000 to 2000; and committed immediately. The very instance after I issued commit, if I check the datafile on the disk of the tablespace where this table is located, which value should I see – 1000 or 2000? (Remember, the value was committed)

Did you answer 2000 (perhaps because it was committed)? If so, then consider a normal application where commits are issued up to tens of thousands of times every minute. In a traditional database the weakest link in the chain is always I/O. If Oracle issued an update of the datafile every time someone commits, it would grind to a halt.

Did you answer 1000? Well, in that case, consider a case when the instance crashes. The datafile would have had 1000; not 2000 – the value that was committed. In such a case the instance must bring back the value committed (2000, in this case) to the datafile. How?

Let’s examine a different scenario. Suppose I did not issue a commit after the update (perhaps I was not sure of the implication of giving myself a pay hike or perhaps I had pang of conscience). I left the session as is and left for the day. The next day I was sick and didn’t come to work. 24 hours passed since I updated the record. At that point, if someone reads the datafile, what value would they see – 1000 or 2000?

Did you answer 1000 – a logical choice since the read consistency model of Oracle Database guarantees that the other sessions will see the pre-change data for the un-committed transactions?

Question #3 in this scenario: if you check the redo log file (not the datafile), what value will you find there – 1000 or 2000? Remember, it has not been committed. Did you answer 1000? It sort of makes sense; the changes are not committed so there is no reason for them to be in the redo log file, which is a very important part of the recovery process. If you answered 2000, then how would you explain the recovery process? In case of instance failure, the recovery must read the redo log file and since the transaction was not committed, it must roll the vale back to the previous – 1000. How would it do that if the redo log file contains 2000, not 1000?

The answers, if you can’t wait any longer: 1000 for the first question, 2000 for the second and 2000 for the third. How so? Please read on.

Explanation

To understand the mechanics of the process, let’s go over the buffer management process of the Oracle database. It’s a rudimentary detail but is quite vital in the path to understand the myth here. Consider a very small table in an equally small tablespace we created:


SQL> create tablespace testts datafile '/tmp/testts_01.dbf' size 1M;

SQL> create table mythbuster1 (col1 varchar2(200)) tablespace testts;


Insert a row:


SQL> insert into mythbuster1 values (‘ORIGINAL_VALUE’);
SQL> Commit;

Shutdown and restart the database so that the buffer cache is completely devoid of this table. You can also issue ALTER SYSTEM FLUSH BUFFER_CACHE; but I want to make sure all traces of this table (and value of the column inside) vanish from all memory areas – buffer cache, shared pool, PGA, whatever. You can now check the presence of the value in the datafile:

$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
1j)
w>!
ORIGINAL_VALUE

The value is right there. Now suppose a user issues a statement like this from SQL*Plus:
SQL> select * from mythbuster1;

Oracle creates a process – called “server process” – on behalf of this user session to service the request from the session. This process is named, in unix and like OS’es, oracle. Here is how you can find it out:



$ ps -aef|grep sqlplus
oracle 14257 14214 0 13:42 pts/2 00:00:00 sqlplus as sysdba
$ ps -aef | grep 14257
oracle 14257 14214 0 13:42 pts/2 00:00:00 sqlplus as sysdba
oracle 14258 14257 0 13:42 ? 00:00:00 oracleD112D2 DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
The process 14258 is the server process. The SQL*Plus process is known as the user process which can be any process a user executes such as a Java program, a Pro*C code, a TOAD session and so on. It’s the server process that handles all the interaction with the Oracle database; not the user process. This is why Oracle database interaction is said to be based on a Two Task Architecture; there are always two tasks – the user task that a regular user has written and the server task that performs the database operations. This is an important concept established during the early foundations of the Oracle database to protect the database from errant code in the user task introduced either maliciously or inadvertently.

The server process then identifies the block the row exists in. Since the database instance just came up the buffer cache is empty and the block will not be found. Therefore the server process issues a read call from the datafile for that specific block. The block is read from the disk to the buffer cache. Until the loading of the block from the disk to the buffer cache is complete, the session waits with the event – db file scattered read. In this very case the session issues a full table scan. Had it performed an index scan, the session would have waited with the eventdb file sequential read. [I know, I know – it seems to defy conventional logic a little bit. I would have assumed index scan to be named scattered reads and full table scans to be sequential].

Once this process is complete, the buffer cache holds the copy of the block of the table mythbuster1. Subsequent session, if they select from the table, will simply get the data from this buffer; not from the disk.

Now, suppose the session issues the statement:

SQL> update mythbuster1 set col1 = ‘CHANGED_VALUE’;
And commits:
SQL> commit;

Immediately afterwards, check for the presence of the values in the datafile:
$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
1j)
w>!
ORIGINAL_VALUE

The old value, not the new value, is found. The datafile on the disk still has the old value; not the new one, even though the transaction has been committed. The update statement actually updated only the buffer in the cache; not the disk. So, when is the data on the disk updated?

The datafile gets updated by a process known as Database Writer (a.k.a. Database Buffer Writer). It’s named DBW0. Actually, there may be more than one such process and they are named DBW0, DBW1, etc. – more conveniently addressed as DBWn. For the purpose of the discussion here, let’s assume only one process – DBW0. It has only one responsibility – to update the datafile with the most up to date buffers from the buffer caches. [Note: I used buffer caches – plural. This is not a typo. There may be more than one buffer cache in the database – keep, recycle, default and other block sizes – but that’s for another day]. The buffer that has been updated is known as a dirty buffer since its contents are different from the block on the disk. DBW0 process writes the contents of the buffer to the disk – making it clean again.

But the big question is when DBW0 writes the dirty buffer to the disk? Ah, that’s the very question we are pondering over here. There are several “triggering” events that cause DBW0 to copy the buffers to the disk – also called flushing of the buffers. By the way, DBW0 is a lazy process; it does not flush buffers by itself or on a regular basis. It sleeps most of the time and must be woken up by another process to perform. its duties. One such watchdog process is called the Checkpoint (you can check its existence by ps -aef | grep ckpt in Unix systems). Checkpoint actually does not perform. the flushing (also called checkpointing activity); but calls the DBW0 process to do it. How often does Checkpoint process perform. a checkpoint? It depends on various conditions – the biggest of all is the MTTR setting, which we will cover later in a different installment.

Next. let's examine a different scenario. Drop the table, create the table again, recycle the database to remove all buffers of the table and then perform. the update; but do not commit. Then flush the buffers from the cache to the disk. You can also trigger a checkpointing activity manually instead of waiting for the checkpoint process. Here is how to do it:

SQL> alter system checkpoint;
After that statement completes, check the presence of the values in the datafile again:
$ strings /tmp/testts_01.dbf
}|{z
-N?pD112D2
TESTTS
1j)
w>!
CHANGED_VALUE,
ORIGINAL_VALUE
The old value is still there; but that is an artifact; it will eventually be gone. The new value is updated in the datafile. But do you remember a very important fact – the transaction is still not committed? In a different session, if you check the data in COL1 column, you will see the value ORIGINAL_VALUE. Where does Oracle get that value from? It gets that value from the Undo Segments in the Undo Tablespace. The undo tablespace contains the pre-change value.

Well, now you may wonder how on earth the server process knows that the data is uncommitted and therefore the undo segment is to be checked. Good question. Let me add yet another wrinkle to it – the datafile contains the new value; not the old one. How does Oracle even know to return which rows pre-change? It gets that information from the header of the block where the transactions are recorded – called Transaction Table – or, a little bit differently: Interested Transaction List (ITL). I will cover that in detail in a future installment of this series. For the time being, please bear in mind that the block header holds that information. When the server process accesses the buffer (or the block on the disk) to get the column value, it accesses the transaction table, sees that there is an uncommitted transaction against it and gets the undo information from there. Finally it creates a different copy of the buffer as it would have looked like had the update statement not been issued. This process is calledConsistent Read (CR) Processing.

Now back to our original discussion. Since DBW0 does not immediately flush the buffers to the datafile, it makes datafile inconsistent with the committed data. Won’t that compromise the recovery process? What would happen when the instance crashes before the flushing has occurred? Since the transaction was previously committed, the recovery should update the datafile. Where does that information come from? Undo tablespace? No; undo tablespace is also another datafile; it gets flushed in the same manner; so it may not have those values. Besides it may not even contain the new value.

Redo Stream

This is where the other leg of the database’s guarantee of the committed transaction comes in. When the changes occur in the table, Oracle also records the information in another pool in the memory called Log Buffer. Compared to buffer caches, which could be several terabytes; this buffer is tiny – often just a few MBs. The update statement records the pre and post change values to the log buffer (not to the log file, mind you). But the log buffer is just an area of memory; it also goes away when the instance crashes. So how does Oracle use the information to protect the committed data?

This is where the redo log files (a.k.a. online redo logs) come into picture. When the session commits, the contents of the log buffer are flushed to the redo log files. Until the flushing is completed, the session waits with various wait events depending on conditions, the majority of which are “log file sync” and “log file parallel write”. But does the log buffer flushing occur only when a commit occurs? No. There are other triggering events as well:
(1) When one third of the log buffer is full
(2) When 1 MB of log buffer is written
(3) Every three seconds

There are other events as well; but these are the major ones. Since commit statement flushes the log buffer to the redo log file, even if the instance crashes the information is stored in the redo log file and can be easily read by the instance recovery processes. In case of a RAC database, a single instance may have crashed. The instance recovery is done by one of the surviving instances. But it must read the redo entries of the crashed instance to reconstruct the blocks on the disk. This is why the redo log files, although for only one instance, must be visible to all nodes.

Even if the commit is not issued, the other triggering events flush the contents of the log buffer to the redo log files as well. The presence of the redo entries on the redo log files is independent of the commit. From the previous discussion you learned that the checkpoint flushes the buffers from the cache to the disk, regardless of the issuance of the commit statement. Therefore, these are the interesting possibilities after a session updates the data (which is updated in the buffer cache):


Scenario
Session committed?
Log Buffer Flushed
Checkpoint Occurred
Datafile Updated
Redo Log Updated
Comment
1
No
No
No
No
No

2
No
Yes
No
No
Yes

3
No
No
Yes
Yes
No

4
No
Yes
Yes
Yes
Yes

5
Yes
Yes
No
No
Yes
Commit will force a redo log flush
6
Yes
Yes
Yes
Yes
Yes



Looking at the table above you may see some interesting conundrums – redo log has the changed data but datafile does not and vice versa. How does Oracle know when and what exactly to recover since the presence of record in the redo log file is not a guarantee that the data was committed?

To address that issue, Oracle places a special “marker”, called a Commit Marker in the redo stream which goes into the redo log buffer. When instance recovery is required, Oracle doesn’t just recover anything that is present in the redo log buffer; it looks for a commit marker. If one is not found, then the changes are deemed to be uncommitted; and therefore Oracle rolls them back. If the changes are not found in the redo log, then the changes are uncommitted – guaranteed (remember, a commit will definitely flush the log buffer to redo). In that case Oracle rolls them back from the datafiles – a process known as rolling back. When the changes are found in redo log (along with the commit marker) but no corresponding changes in the datafile (scenario #5), Oracle will apply the changes to the datafile from the redo entries – a process known as roll forward. Recovery consists of both rolling back and forward.

To put it all together, here is a rough algorithm for the actions of the recovery process:

Read the redo log entries starting with the oldest one
Check the SCN number of the change
Look for the commit marker. If the commit marker is found, then data has been committed.
If found, then look for the changes in the datafile (via the SCN number)
    Change has been reflected in the datafile?
    If yes, then move on
    If no, then apply the changes to the datafile (roll forward)
If not found, then the data is uncommitted. Look for the changes in the datafile.
    Change found in datafile?
    If no, then move on
    If yes, then update the datafile with the pre-change data (rollback)

Takeaways

Let me reiterate some of the lessons from this installment.

(1) Data buffers are flushed to the disk from the buffer cache independently of the commit statement. Commit does not flush the buffers to the disk.
(2) If the buffer is modified in the buffer cache but not yet flushed to the disk, it is known as a dirty buffer.
(3) If a buffer is clean (i.e. not dirty), it does not mean that the data changes have been committed.
(4) When a commit occurs, the log buffer (not the buffer cache) is flushed to the disk
(5) Log buffer may already have been flushed to the disk due to other triggering events. So if a change in found in the redo log file, the change is not necessarily commited.
(6) A commit statement puts a special “commit marker” on the redo log, which is the guarantee of a commit.
(7) The frequency of the flushing of the buffer cache to the datafiles is controlled by the MTTR setting and whether free buffers are needed in the cache due to incoming blocks from the datafiles.

How do you use this information? There are several things for you to consider:

(1) The more you commit, the more log buffer will be flushed, not very good for I/O.
(2) The more aggressive the MTTR target is, the less time it will take if the instance crashes, but the more the frequency of flushing to the datafile will be as well – causing I/O
(3) The MTTR target has nothing to do with commit frequency; they are two independent activities. So, reducing commit frequency will not cause a reduction in flushing frequency.
(4) If your buffer size is small, there will be more the need to flush
(5) Exactly how small is “small”? There is no fixed formula; it depends on how much of the data in the buffer cache is updated.

I hope you enjoyed this installment of “100 Things …”.  In the future installments I will explain some of the other nuances of the Oracle database that you may not have been aware of. As always, I will highly appreciate if you could drop me a line telling me your feedback – good, bad and anything inbetwen.

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23650854/viewspace-686850/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2010-10-28

  • 博文量
    182
  • 访问量
    348650