ITPub博客

首页 > Linux操作系统 > Linux操作系统 > ORACLE 聚簇因子优化

ORACLE 聚簇因子优化

原创 Linux操作系统 作者:wshxgxiaoli 时间:2012-07-05 14:07:23 0 删除 编辑
聚簇因子:数据表中的数据都是无序的存在库中,当我们对数据进行检索的时候,查找起来很是耗费资源,于是我们就需要为表创建索引,索引的作用就是把表中的数据按照一定的顺序排列保存起来,于是就出现了一个问题,有的表中的数据和索引排列的顺序很是相近,而另一些表中的数据和索引排列的顺序相距甚远,聚簇因子的作用就是用来标记这个的,聚簇因子越小,相似度越高,聚簇因子越大,相似度越低。

数据和索引相似度:ORACLE 在存储数据时,并不是按照数据块的顺序挨个进行存入数据,因为前面存入的数据经常会有DML或者DDL操作,删除数据后,原先存有数据的数据块就变成了空块,ORACLE为了节省存储空间,当数据库再次有新数据进行插入的话,就会优先使用那些空块,只有当空块不够使用的时候,才会去高水位上开辟新块,这种情况也就会导致,一张表中的数据,并不是存储在相邻的数据块中,于是聚簇因子变的很大,当这种情况进行逻辑读取的时候,就会增加IO的次数, 影响读取的速度。

测试:
建表:
CREATE TABLE t_1 AS SELECT ROWNUM rn,a.* FROM all_objects a ORDER BY object_name DESC;

--创建t_1表关于rownum索引
CREATE INDEX ind_t_1 ON t_1(rn);

--创建表表t_2
CREATE TABLE t_2 AS SELECT * FROM ( SELECT ROWNUM rn,a.* FROM all_objects a ) ORDER BY rn ASC;
 
--创建t_2表关于rownum索引
CREATE INDEX ind_t_2 ON t_2(rn);
 
--分析两张表及其索引
EXEC DBMS_STATS.gather_table_stats(USER, 'T_1');
EXEC DBMS_STATS.gather_table_stats(USER, 'T_2');
EXEC DBMS_STATS.gather_index_stats(USER, 'IND_T_1');
EXEC DBMS_STATS.gather_index_stats(USER, 'IND_T_2');

执行查询操作
SQL> set autotrace traceonly;
SQL> SELECT * FROM t_1 WHERE rn BETWEEN 100 AND 120;

21 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 923245147

--------------------------------------------------------------------------------
-------

| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Tim
e     |

--------------------------------------------------------------------------------
-------

|   0 | SELECT STATEMENT            |         |    22 |  2244 |    14   (0)| 00:
00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T_1     |    22 |  2244 |    14   (0)| 00:
00:01 |

|*  2 |   INDEX RANGE SCAN          | IND_T_1 |    22 |       |     2   (0)| 00:
00:01 |

--------------------------------------------------------------------------------
-------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("RN">=100 AND "RN"<=120)


Statistics
----------------------------------------------------------
        445  recursive calls
          0  db block gets
         85  consistent gets
          0  physical reads
          0  redo size
       3426  bytes sent via SQL*Net to client
        427  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
         21  rows processed

表二查询:
SELECT * FROM t_2 WHERE rn BETWEEN 100 AND 120;

21 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2193767187

--------------------------------------------------------------------------------
-------

| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Tim
e     |

--------------------------------------------------------------------------------
-------

|   0 | SELECT STATEMENT            |         |    22 |  2244 |     3   (0)| 00:
00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T_2     |    22 |  2244 |     3   (0)| 00:
00:01 |

|*  2 |   INDEX RANGE SCAN          | IND_T_2 |    22 |       |     2   (0)| 00:
00:01 |

--------------------------------------------------------------------------------
-------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("RN">=100 AND "RN"<=120)


Statistics
----------------------------------------------------------
        445  recursive calls
          0  db block gets
         69  consistent gets
          0  physical reads
          0  redo size
       3426  bytes sent via SQL*Net to client
        427  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
         21  rows processed
观察结果:
 通过执行统计信息观察,T_1表的查询一致读是85,而T_2表的一致读只有69,周样的表结构,同样的数据,为何有这么大的差距呢。

分析:
SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1','T_2') and a.table_name = b.table_name;
TABLE_NAME                     INDEX_NAME             NUM_ROWS     BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1                            IND_T_1                   71761       1069             38034
T_2                            IND_T_2                   71763       1069              1069
通过查询聚簇因子发现,两个表的聚簇因子差别很大,基于RN的索引在RN是顺序排列的表中,CLUSTERING_FACTOR的值相差很大。
在表中数据有时候属于无序状态,这个时候的CLUSTERING_FACTOR比较接近NUM_ROWS,说明如果扫描整个表,每次都要根据INDEX来读取相应行的ROWID,这个时候的IO操作很多,自然检索时间会比较长。如果数据有序的话,CLUSTERING_FACTOR比较接近BLOCKS,说明相邻的数据在一个块中,减少了IO操作数量,自然检索时间会大大降低。

解决办法:
第一种办法:对表进行重构:alter table t_1 move;
SQL> alter table t_1 move;

Table altered.
再次查询聚簇因子:

SQL>  select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1','T_2') and a.table_name = b.table_name;

TABLE_NAME                     INDEX_NAME             NUM_ROWS     BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1                            IND_T_1                   71761       1069             38034
T_2                            IND_T_2                   71763       1069              1069
发现还是一样。

第二种方法: 按照索引顺序重建表:
SQL> create table t_1_bk as select * from t_1 order by rn;

Table created.

创建索引:
 create index ind_t1bk on t_1_bk(rn);

Index created.

统计索引
EXEC DBMS_STATS.gather_index_stats(USER, 'IND_T1BK');

再次查看聚簇因子:
select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1_BK','T_2') and a.table_name = b.table_name;

TABLE_NAME                     INDEX_NAME             NUM_ROWS     BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1_BK                         IND_T1BK                                                1069
T_2                            IND_T_2                   71763       1069              1069
可以看到聚簇因子有所降低,但没了NUM_ROWS与BLOCKS。
在这里没有这两个字段的值是因为user_indexes 和user_tables的信息来自于对表的统计信息。
SQL> exec dbms_stats.gather_table_stats(USER,'T_1_BK');

PL/SQL procedure successfully completed.

SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1_BK','T_2') and a.table_name = b.table_name;

TABLE_NAME                     INDEX_NAME             NUM_ROWS     BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1_BK                         IND_T1BK                  71761       1069              1069
T_2                            IND_T_2                   71763       1069              1069
以上这种办法只针对于小表, 但如果这个表的数据量非常大就不能够使用这种办法了。

第三种方法: 针对大表的:分区
在线将表改为分区表:
EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE(USER, 'T_1',dbms_redefinition.cons_index);
 EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE(USER, 'T_1',dbms_redefinition.cons_index);

PL/SQL procedure successfully completed.

获得创建表的语句:select dbms_metadata.get_ddl('TABLE','T_1','SYS') from dual;
获得创建INDEX的语句:select dbms_metadata.get_ddl('INDEX','IND_T_1','SYS') from dual;

创建表:
create table t_1new(RN NUMBER,
        OWNER VARCHAR2(30) NOT NULL ENABLE,
        OBJECT_NAME VARCHAR2(30) NOT NULL ENABLE,
        SUBOBJECT_NAME VARCHAR2(30),
        OBJECT_ID NUMBER NOT NULL ENABLE,
        DATA_OBJECT_ID NUMBER,
        OBJECT_TYPE VARCHAR2(19),
        CREATED DATE NOT NULL ENABLE,
        LAST_DDL_TIME DATE NOT NULL ENABLE,
        TIMESTAMP VARCHAR2(19),
        STATUS VARCHAR2(7),
        TEMPORARY VARCHAR2(1),
        GENERATED VARCHAR2(1),
        SECONDARY VARCHAR2(1),
        NAMESPACE NUMBER NOT NULL ENABLE,
        EDITION_NAME VARCHAR2(30)) partition by range(rn)
(partition p1 values less than(20000),
partition p2 values less than(40000),
partition p3 values less than(60000),
partition p4 values less than(maxvalue));

创建索引:create index ind_t_1new on t_1new(rn);


EXEC DBMS_REDEFINITION.START_REDEF_TABLE(USER, 'T_1', 'T_1NEW', 'RN RN, OWNER OWNER, OBJECT_NAME OBJECT_NAME, SUBOBJECT_NAME SUBOBJECT_NAME, OBJECT_ID OBJECT_ID, DATA_OBJECT_ID DATA_OBJECT_ID, OBJECT_TYPE OBJECT_TYPE, CREATED CREATED, LAST_DDL_TIME LAST_DDL_TIME, TIMESTAMP TIMESTAMP, STATUS STATUS, TEMPORARY TEMPORARY, GENERATED GENERATED, SECONDARY SECONDARY, NAMESPACE NAMESPACE, EDITION_NAME EDITION_NAME', DBMS_REDEFINITION.CONS_INDEX);


 exec dbms_redefinition.finish_redef_table('HR','T_1','T_1NEW');

PL/SQL procedure successfully completed.

验证查询:
SQL> select count(*) from t_1 partition(p1);

  COUNT(*)
----------
     19999

SQL> select count(*) from t_1 partition(p2);

  COUNT(*)
----------
     20000
分区成功!  我们这时再来看看聚簇因子:
select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1') and a.table_name = b.table_name;
SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1') and a.table_name = b.table_name;

TABLE_NAME                     INDEX_NAME             NUM_ROWS     BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1                            IND_T_1NEW                                             38104

<- ->! 无解!   咋个不行。  应该是理解问题了, 这个只能解决聚簇因子所带来的性能问题。

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22740983/viewspace-734637/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2012-03-30

  • 博文量
    33
  • 访问量
    51696