ITPub博客

首页 > Linux操作系统 > Linux操作系统 > InnoDB 中变长列的存储

InnoDB 中变长列的存储

原创 Linux操作系统 作者:alsmn 时间:2012-04-07 23:21:27 0 删除 编辑

正如前面所述,InnoDB 中存储于数据库页的所有数据构成一颗 B-tree 树,这就是所谓聚簇索引或者主键索引。二类索引也构成b-tree树,包含由索引key和主键值构成的值对。
对于变长列(比如BLOG或者VARCHAR)太长以致不能存储在一个页上时将被分配在单独的磁盘页上(称为overflow,溢出页)。这些列称为溢出页的列(off-page列)。这些列的值以单链表方式将存放的溢出页组织存储,每个这样的列具有自己的一个或者多个溢出页的列表。在某些情况下,长的列值的前缀(或者全部值)被存储在B-tree树中,避免浪费存储,消除了读取单独页的浪费。新的 Barracuda 文件格式提供了一个新的选项来控制这样列的多少部分将被存储在聚簇索引中,多少部分被存储在溢出页中。

COMPACT 和 REDUNDANT 行的格式
先前版本的InnoDB 使用的是一种未命名(现在称为Antelope )的文件格式。在这种格式中,表被定义为row_format=COMPACT(或者row_format=REDUNDANT),InnoDB 将存储可变长列的前768字节在Btree节点的索引中,其它的存储在溢出页当中。
为了保留对这些先前格式的兼容,使用InnoDB 插件创建的表将使用这种前缀格式,除非在CREATE TABLE 命令中指定或者暗示了使用row_format = DYNAMIC 或者 row_format = COMPRESSED。

使用Antelope 时,如果一个列的值不足 768 字节,将不需要溢出页,这对于相对较短的BLOB 是没有问题的,但是可能引起Btree 节点做不必要的数据填充(这个不明白)。 带有多个BLOB 列的表可能引起Btree 节点太慢,从而使得可以包含的行太少,使得索引相对于稍短的行的情况或者列值以溢出页存储的情况的效率要低。

当innodb_file_format 被设置为Barracuda 时,如果一个表创建时 row_format = DynaMIC 或者 row_format = COMPRESSED,较长的列的值完全以溢出页存储,聚簇索引只存放到溢出页的一个指针。
列是否被存储在溢出页中取决于页的大小和行的总尺寸,如果行太大,InnoDB 将选择最长的列来存直到聚簇索引可以放在Btree 页上为止。

DYNAMIC 行格式维持了在索引节点中存储整行的效率,如果恰好可以放下(正如COMPACT 和 REDUNDANT 格式那样),但是这种新的格式避免了对较长的列使用较多数据来填充Btree 节点的问题。DYNAMIC 格式考虑的是如果一个较长的数据的一部分需要存储在溢出页上,那么通常最有效的方式就是将所有数据都存储在溢出页上。较短的列仍然会存放在Btree 节点上,可以减少对任何给定行所需的最少溢出页的数量。

5.1. Overview
All data in InnoDB is stored in database pages comprising a B-tree index (the so-called clustered index or primary key index).
The essential idea is that the nodes of the B-tree contain, for each primary key value (whether user-specified or generated or chosen by the system), the values of the remaining columns of the row as well as the key. In some other database systems, a clustered index is called an “index-organized table”. Secondary indexes in InnoDB are also B-trees, containing pairs of values of the index key and the value of the primary key, which acts as a pointer to the row in the clustered index.
There is an exception to this rule. Variable-length columns (such as BLOB and VARCHAR) that are too long to fit on a Btree page are stored on separately allocated disk (“overflow”) pages. We call these “off-page columns”. The values of such columns are stored on singly-linked lists of overflow pages, and each such column has its own list of one or more overflow pages. In some cases, all or a prefix of the long column values is stored in the B-tree, to avoid wasting storage and eliminating the need to read a separate page.
The new “Barracuda” file format provides a new option to control how much column data is stored in the clustered index, and how much is placed on overflow pages.

5.2. COMPACT and REDUNDANT Row Format
Previous versions of InnoDB used an unnamed file format (now called “Antelope”) for database files. With that format, tables were defined with ROW_FORMAT=COMPACT (or ROW_FORMAT=REDUNDANT) and InnoDB stored up to the first 768 bytes of variable-length columns (such as BLOB and VARCHAR) in the index record within the B-tree node, with the remainder stored on the overflow page(s).
To preserve compatibility with those prior versions, tables created with the InnoDB Plugin will use the prefix format, unless one of ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED is specified (or implied) on the CREATE TABLE command.
With the “Antelope” file format, if the value of a column is not longer than 768 bytes, no overflow page is needed, and some savings in i/o may result, since the value is in the B-tree node. This works well for relatively short BLOBs, but may cause Btree nodes to fill with data rather than key values, thereby reducing their efficiency. Tables with many BLOB columns could cause B-tree nodes to become too full of data, and contain too few rows, making the entire index less efficient than if the rows were shorter or if the column values were stored off-page.

5.3. DYNAMIC Row Format
When innodb_file_format is set to “Barracuda” and a table is created with ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED, long column values are stored fully off-page, and the clustered index record contains only a 20-byte pointer to the overflow page.
Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long, InnoDB will choose the longest columns for off-page storage until the clustered index record fits on the B-tree page.
The DYNAMIC row format maintains the efficiency of storing the entire row in the index node if it will fit (as do the COMPACT and REDUNDANT formats), but this new format avoids the problem of filling B-tree nodes with a large number of data bytes of long columns. The DYNAMIC format is predicated on the idea that if a portion of a long data value is stored off-page, it is usually most efficient to store all of the value off-page. With DYNAMIC format, shorter columns are likely to remain in the Btree node, minimizing the number of overflow pages needed for any given row.


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16989307/viewspace-720638/,如需转载,请注明出处,否则将追究法律责任。

上一篇: ORACLE ASH和AWR .
请登录后发表评论 登录
全部评论

注册时间:2011-07-23

  • 博文量
    4
  • 访问量
    10098