ITPub博客

首页 > 数据库 > Oracle > [20190810]如何索引一个超长字段.txt

[20190810]如何索引一个超长字段.txt

原创 Oracle 作者:lfree 时间:2019-08-10 16:20:34 0 删除 编辑

[20190810]如何索引一个超长字段.txt

--//链接问的一个问题:http://www.itpub.net/thread-2119521-1-1.html
因业务需要,一个字段的字符要很长,最长有4000, 并且还要用字段来判断唯一性,如果直接建立索引他会报错的,而且怎么长查询起
来是否很慢?请问各位前辈,有什么好的方法来处理这个问题。

--//他还加了一个需求保证唯一性。不知道对方使用什么版本,自己测试看看:

1.环境:
SCOTT@test01p> @ ver1

PORT_STRING                    VERSION        BANNER                                                                               CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0           12.2.0.1.0     Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0

2.测试例子建立:
SCOTT@test01p> create table t (id number ,v1 varchar2(4000) );
Table created.

SCOTT@test01p> insert into t select level ,lpad('x',100,'x')||level v1 from dual connect by level < 2e5;
199999 rows created.

SCOTT@test01p> create unique index i_t_v1 on t(v1);
Index created.

--//实际上这样的索引还是能建立的,估计重新rebuild online 就会报错。
SCOTT@test01p> alter index i_t_v1 rebuild online ;
alter index i_t_v1 rebuild online
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01450: maximum key length (3215) exceeded

--//因为rebuild online 要建立一张IOT表,导致maximum key length (3215) exceeded.

SCOTT@test01p> alter index i_t_v1 rebuild ;
Index altered.

--//取消online 没有问题。

3.测试:
--//分析略。
SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
I_T_V1                        11         1066   28311552       3456

--//你可以发现在我的例子中,索引占用的空间比表还大,因为我索引的字段长度占10X字节。如果想原链接的情况也许更大。

SCOTT@test01p> alter session set statistics_level = all;
Session altered.

SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42;
SUBSTR
------
x42

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  9bky8g0hf1td9, child number 1
-------------------------------------
select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42
Plan hash value: 4077016850
----------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |       |     2 (100)|          |      1 |00:00:00.01 |       3 |
|*  1 |  INDEX UNIQUE SCAN| I_T_V1 |      1 |      1 |   107 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx42')

4.继续测试:
--//12c 提供standard_hash 函数。

SCOTT@test01p> create unique index if_t_v1_1 on t(standard_hash(v1));
Index created.

SCOTT@test01p> create unique index if_t_v1_2 on t(ora_hash(v1));
create unique index if_t_v1_2 on t(ora_hash(v1))
                                            *
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found

--//可以发现使用ora_hash出现重复值,无法建立唯一约束。而使用standard_hash函数没有问题。

SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
IF_T_V1_1                     11          434    7340032        896
I_T_V1                        11         1066   28311552       3456

--//可以发现建立standard_hash的函数索引相对小一些。

SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43;
SUBSTR
------
x43

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  0jc1sx44ckdj9, child number 0
-------------------------------------
select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43

Plan hash value: 4077016850

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |       |     2 (100)|          |      1 |00:00:00.03 |       3 |      2 |
|*  1 |  INDEX UNIQUE SCAN| I_T_V1 |      1 |      1 |   111 |     2   (0)| 00:00:01 |      1 |00:00:00.03 |       3 |      2 |
-------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              x43')
       filter(STANDARD_HASH("V1")=HEXTORAW('FBDADC99CB387566504D65A6003CFA57CA05F238'))

--//产生的执行有点怪怪的,仔细看filter条件就明白了不知道是否存在standard_hash函数索引的原因,也许使用I_T_V1不用回表的原因,
--//但是使用的索引是I_T_V1.修改执行语句如下:

SCOTT@test01p> select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44;
        ID SUBSTR
---------- ------
        44 x44

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  dazgkpkj95mfs, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44
Plan hash value: 1893196135
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |
|*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_1 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx44'
   2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0'))

--//注意我执行的查询条件是v1=lpad('x',100,'x')||44,并没有使用standard_hash函数查询,如果写成如下:

SCOTT@test01p> select id,substr( v1,100,3) from t where standard_hash(v1)=standard_hash(lpad('x',100,'x')||44);
        ID SUBSTR
---------- ------
        44 x44

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  9z17dmg6hk84n, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where
standard_hash(v1)=standard_hash(lpad('x',100,'x')||44)
Plan hash value: 1893196135
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |
|*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_1 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0'))

--//上下比较可以发现前者多了一个filter("V1"='xx......xx44'.
--//也就是使用standard_hash建立的索引也可以使用像v1=lpad('x',100,'x')||44这样的谓词查询。
--//类似的情况到目前位置我仅仅知道trunc以及substr(v1,1,N)的情况。

5.继续测试..2:
--//作者并没有提示使用的是12c,如果不是12c应该没有standard_hash函数。自己建立1个md5函数看看。

CREATE OR REPLACE FUNCTION FN_MD5(par1 IN VARCHAR) RETURN VARCHAR2
  DETERMINISTIC IS
  retval varchar2(32);
BEGIN
  retval := utl_raw.cast_to_raw(DBMS_OBFUSCATION_TOOLKIT.MD5(INPUT_STRING => par1));
  RETURN retval;
END FN_MD5;

SCOTT@test01p> create unique index if_t_v1_2 on t(fn_md5(v1));
Index created.

SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
IF_T_V1_1                     11          434    7340032        896
I_T_V1                        11         1066   28311552       3456
IF_T_V1_2                     11        14850   10485760       1280

--//使用fn_md5自定义函数的索引比standard_hash索引稍微大一些。

SCOTT@test01p> select id,substr( v1,100,3) from t where fn_md5(v1)=fn_md5(lpad('x',100,'x')||48);
        ID SUBSTR
---------- ------
        48 x48

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  ggv02trp66xas, child number 0
-------------------------------------
select id,substr( v1,100,3) from t where
fn_md5(v1)=fn_md5(lpad('x',100,'x')||48)
Plan hash value: 675596911
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |
|*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_2 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T"."SYS_NC00004$"="FN_MD5"('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxx48'))

--//OK没有问题。

6.继续测试..3:
--//如果不考虑唯一性,可以考虑ora_hash函数。
SCOTT@test01p> create  index if_t_v1_3 on t(ora_hash(v1));
Index created.

SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and
               segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2','IF_T_V1_3');
SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS
-------------------- ----------- ------------ ---------- ----------
T                             11          306   27262976       3328
IF_T_V1_1                     11          434    7340032        896
I_T_V1                        11         1066   28311552       3456
IF_T_V1_2                     11        14850   10485760       1280
IF_T_V1_3                     11        16130    5242880        640
--//使用ora_hash函数索引更小。

SCOTT@test01p>  select count(*),ora_hash(v1) from t group by ora_hash(v1) order by 1 desc FETCH FIRST 2 ROWS ONLY;
  COUNT(*) ORA_HASH(V1)
---------- ------------
         2   3933908345
         2   3975299677

SCOTT@test01p> select id,substr(v1,100,10) from t where ora_hash(v1)=3933908345;
        ID SUBSTR(V1,100,10)
---------- --------------------
     36510 x36510
    144288 x144288        

SCOTT@test01p> select id,substr( v1,100,3) from t where ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510);
        ID SUBSTR
---------- ------
     36510 x36
    144288 x14

SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  4dvtgan3r97pp, child number 1
-------------------------------------
select id,substr( v1,100,3) from t where
ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510)
Plan hash value: 1660760003
-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |           |      1 |        |       |   801 (100)|          |      2 |00:00:00.01 |       5 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T         |      1 |   2000 |   244K|   801   (0)| 00:00:01 |      2 |00:00:00.01 |       5 |
|*  2 |   INDEX RANGE SCAN                  | IF_T_V1_3 |      1 |    800 |       |     1   (0)| 00:00:01 |      2 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / T@SEL$1
   2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T"."SYS_NC00005$"=3933908345)
--//也可以使用索引,缺点返回多余的行。

总结:
--//使用standard_hash,自定义md5函数,或者ora_hash,但是ora_hash不保证唯一。我知道就这些,不知道还有什么其它方法。
--//另外关于ORA-01450: maximum key length (3215) exceeded错误,可以考虑定义更大的数据块db_16k_cache_size,
--//db_32k_cache_size.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-2653279/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
熟悉oracle相关技术,擅长sql优化,rman备份与恢复,熟悉linux shell编程。

注册时间:2008-01-03

  • 博文量
    2470
  • 访问量
    6277822