ITPub博客

首页 > Linux操作系统 > Linux操作系统 > [20111223]索引键值在B tree索引块中的顺序.txt

[20111223]索引键值在B tree索引块中的顺序.txt

原创 Linux操作系统 作者:lfree 时间:2011-12-23 14:14:34 0 删除 编辑
[20111223]索引键值在B tree索引块中的顺序.txt

参考链接:http://www.adellera.it/blog/2009/05/24/order-keys-inside-index-blocks/

自己为了加强理解重复一下对方的测试!

1.建立测试表以及索引


SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> create table t (x varchar2(10));
SQL> create index i_t_x on t(x);

随机插入如下记录:

insert into t values('000000');
insert into t values('777777');
insert into t values('111111');
insert into t values('666666');
insert into t values('222222');
insert into t values('555555');
insert into t values('333333');
insert into t values('444444');
commit ;


2.转储对应的索引块:
SQL> select header_file,header_block from dba_segments where wner='SCOTT' and segment_name='I_T_X';

HEADER_FILE HEADER_BLOCK
----------- ------------
          4        11298

SQL> alter system dump datafile 4 block 11299;

3.查看:
Block header dump:  0x01002c23
 Object id on Block? Y
 seg/obj: 0x12afc  csc: 0x00.3d4076  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x1002c20 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0035.001.00000007  0x00c00ae0.000e.1a  --U-    8  fsc 0x0000.003d4081
Leaf block dump
===============
header address 47190358755940=0x2aeb5c920a64
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: pcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 8
kdxcofbo 52=0x34
kdxcofeo 7904=0x1ee0
kdxcoavs 7852
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8032
row#0[8016] flag: ------, lock: 2, len=16
col 0; len 6; (6):  30 30 30 30 30 30         -- 000000
col 1; len 6; (6):  01 00 05 0e 00 00
row#1[7984] flag: ------, lock: 2, len=16
col 0; len 6; (6):  31 31 31 31 31 31         -- 111111
col 1; len 6; (6):  01 00 05 0e 00 02
row#2[7952] flag: ------, lock: 2, len=16
col 0; len 6; (6):  32 32 32 32 32 32         -- 222222
col 1; len 6; (6):  01 00 05 0e 00 04
row#3[7920] flag: ------, lock: 2, len=16
col 0; len 6; (6):  33 33 33 33 33 33         -- 333333
col 1; len 6; (6):  01 00 05 0e 00 06
row#4[7904] flag: ------, lock: 2, len=16
col 0; len 6; (6):  34 34 34 34 34 34         -- 444444
col 1; len 6; (6):  01 00 05 0e 00 07
row#5[7936] flag: ------, lock: 2, len=16
col 0; len 6; (6):  35 35 35 35 35 35         -- 555555
col 1; len 6; (6):  01 00 05 0e 00 05
row#6[7968] flag: ------, lock: 2, len=16
col 0; len 6; (6):  36 36 36 36 36 36         -- 666666
col 1; len 6; (6):  01 00 05 0e 00 03
row#7[8000] flag: ------, lock: 2, len=16
col 0; len 6; (6):  37 37 37 37 37 37         -- 777777
col 1; len 6; (6):  01 00 05 0e 00 01
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 11299 maxblk 11299

注意看row#0 括号的数字,表示该键值的索引块的位置,col0是索引键值,col1是rowid。

插入顺序:   '000000'   '777777'   '111111'    '666666'     '222222'      '555555'       '333333'    '444444'
在块中位置:   8016       8000        7984          7968           7952            7936             7920         7904
    可以看出索引在块中插入并不是安装键值排序的,而是从尾部按照插入顺序插入的。

4.在"row directory"中记录键值的具体位置:
Start dump data blocks tsn: 4 file#:4 minblk 11299 maxblk 11299
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4, rdba=16788515
BH (0xa8ff5fd8) file#: 4 rdba: 0x01002c23 (4/11299) class: 1 ba: 0xa8f2a000
  set: 6 pool 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 186,19
  dbwrid: 0 obj: 76540 objn: 76540 tsn: 4 afn: 4 hint: f
  hash: [0xbe252cb0,0xbe252cb0] lru: [0xa8ff61f0,0xa8ff5f90]
  ckptq: [NULL] fileq: [NULL] objq: [0xa8ff6218,0xba80aa40]
  st: XCURRENT md: NULL tch: 2
  flags: block_written_once redo_since_read
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [1]
  cr pin refcnt: 0 sh pin refcnt: 0
Block dump from disk:
buffer tsn: 4 rdba: 0x01002c23 (4/11299)
scn: 0x0000.003d4081 seq: 0x01 flg: 0x06 tail: 0x40810601
frmt: 0x02 chkval: 0xbf9f type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00002AEB5C920A00 to 0x00002AEB5C922A00
2AEB5C920A00 0000A206 01002C23 003D4081 06010000  [....#,...@=.....]
2AEB5C920A10 0000BF9F 00000002 00012AFC 003D4076  [.........*..v@=.]
2AEB5C920A20 00000000 00320002 01002C20 00000000  [......2. ,......]
2AEB5C920A30 00000000 00000000 00000000 00000000  [................]
2AEB5C920A40 00000000 00010035 00000007 00C00AE0  [....5...........]
2AEB5C920A50 001A000E 00002008 003D4081 00000000  [..... ...@=.....]
2AEB5C920A60 00000000 02800000 00000000 00340008  [..............4.]
2AEB5C920A70 1EAC1EE0 00000000 00000000 00000000  [................]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2AEB5C920A80 00000000 00001F60 1F301F50 1EF01F10  [....`...P.0.....]
2AEB5C920A90 1F001EE0 1F401F20 00000000 00000000  [.... .@.........]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2AEB5C920AA0 00000000 00000000 00000000 00000000  [................]
        Repeat 489 times
2AEB5C922940 00000000 34060200 34343434 00010634  [.......444444...]
2AEB5C922950 07000E05 33060200 33333333 00010633  [.......333333...]
2AEB5C922960 06000E05 35060200 35353535 00010635  [.......555555...]
2AEB5C922970 05000E05 32060200 32323232 00010632  [.......222222...]
2AEB5C922980 04000E05 36060200 36363636 00010636  [.......666666...]
2AEB5C922990 03000E05 31060200 31313131 00010631  [.......111111...]
2AEB5C9229A0 02000E05 37060200 37373737 00010637  [.......777777...]
2AEB5C9229B0 01000E05 30060200 30303030 00010630  [.......000000...]
2AEB5C9229C0 00000E05 00000000 00000000 00000000  [................]
2AEB5C9229D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
2AEB5C9229F0 00000000 00000000 00000000 40810601  [...............@]

    kdxlebksz 8032,从row directory中确定如下:
2AEB5C920A80 00000000 00001F60 1F301F50 1EF01F10  [....`...P.0.....]
2AEB5C920A90 1F001EE0 1F401F20 00000000 00000000  [.... .@.........]

插入顺序:   '000000' '777777' '111111' '666666' '222222' '555555' '333333' '444444'
在块中位置:   8016     8000     7984      7968     7952     7936     7920     7904

1F60=8032  end of "bottom heap" (kdxlebksz), also 8016 + 16

1F30=7984   '111111'  
1F50=8016   '000000'
1EF0=7920   '333333'
1F10=7952   '222222'
1F00=7936   '555555'
1EE0=7904   '444444'
1F40=8000   '777777'
1F20=7968   '666666'


-- 摘抄链接,不自己写了,以免出现不必要的错误:http://www.adellera.it/blog/2009/05/24/order-keys-inside-index-blocks/

    that is, the addresses in the row directory are ordered by the (binary) value of the key they point to - remember that you must swap the high and low 2-byte quantities in each 32-bit word (the usual little-endian ordering; I haven't checked whether the results would be different on a machine that is not little-endian, as my one - an x86 - is).
如果交换字节顺序,可以确定按照以上地址顺序排列的。说明在索引插入时,键值从底部插入,而排序仅仅需要修改这个row directory就ok了。这样插入效率比较高。

    The reason for this ordering is to improve the efficiency of typical visits to the index block. Consider a range scan, when the kernel has to find all the index keys contained in the search interval [min_extreme, max_extreme] (possibly min_extreme=max_extreme for equality predicates). As it is well known, the kernel first walks the branch blocks to identify  the leaf block that contains min_extreme (or max_extreme, but that would be the same for our discussion); then, it has to identify all the keys that are >= min_extreme and <= max_extreme. Thanks to the ordering, it can simply perform. a binary search to locate the first key >= min_extreme, and then walk the next entries in the row directory until it gets to the last one that is <= max_extreme (possibly jumping on the next leaf block). Without this ordering, the kernel would be forced to visit all the keys contained in the block.

    The advantage is huge, since visiting all the keys is a O( N ) operation, while a binary search has only O ( log(2,N) ) complexity.

    To fully appreciate this on a concrete example, consider that each key of our example needs 16+2 bytes of storage, hence a perfectly packed 8K block might contain approximately 8000 / 18 = 444 keys. Thanks to the ordering, for the very frequent equality search (classic for Primary Key searches for example), the processing consists essentially of the binary search  only - hence the number of keys to be considered are down to about ceil( log(2, 444) ) = 9, thus consuming only 9/444 = 2%
(!!) of the resources.

    It is also worth remembering that this way, only a portion of the block is accessed, thus needing less accesses to central memory (it is unlikely in fact that the whole 8K can be found in the processor's caches), thus reducing the elapsed time considerably since stalling for central memory is a very common bottleneck in database systems - and thus improving scalability for the whole system thanks to the reduction of the traffic on the memory bus.

    Of course there's a price to be paid for modifications, since each modification has to keep the row directory ordered, thus shifting the slots of the row directory, which is a relatively expensive operation. Oracle pays an higher bill at modification time to save (hugely) at read time.

--modify 代价很大!

    As a final observation: note that the branch block visit can be considered a binary search as well, "pre-calculated" by the B+tree structure - the ordering inside a block is just the same divide-and-conquer algorithm applied in a different way.


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-713847/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
熟悉oracle相关技术,擅长sql优化,rman备份与恢复,熟悉linux shell编程。

注册时间:2008-01-03

  • 博文量
    2353
  • 访问量
    6098424