ITPub博客

首页 > 大数据 > 数据挖掘 > 表数据的存储对索引的影响

表数据的存储对索引的影响

数据挖掘 作者:bigdata01 时间:2014-04-09 09:21:32 0 删除 编辑

表格数据的存储对索引的影响

 

近期碰到一张包含日期字段的表格,用户反应即使在日期字段上建立了索引,查询报表时涉及的SQL仍然选择走全表扫描,而忽略索引。

 

对应表名为:T_GATE_CTN_FT

查询SQL如下:

select sum(T5253.TEU) as c1,

T2881.OWC_WEEK_DESC as c2,

T2881.OWC_WEEK_NUMBER as c3,

T2881.OWC_YEAR_NO as c4

from T_CTN_DM T2867, T_DATE_DM T2881, T_GATE_CTN_FT T5253

where (T2867.CTN_STATUS = 'F' and T2867.CTN_TYPE_UID = T5253.CTN_TYPE_UID and

T2881.DATE_UID = T5253.DATE_UID and T5253.MOVE_KIND = 'RECV' and

(T2881.OWC_YEAR_NO + 1 in (2012) or T2881.OWC_YEAR_NO in (2012)))

group by T2881.OWC_WEEK_NUMBER, T2881.OWC_YEAR_NO, T2881.OWC_WEEK_DESC

order by c2, c3

;

注:2012由前台参数传入,语句不建议修改

 

执行计划如下:

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 4280972504

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 436K| 37M| 29487 (2)| 00:0

| 1 | SORT GROUP BY | | 436K| 37M| 29487 (2)| 00:0

|* 2 | HASH JOIN | | 436K| 37M| 29460 (2)| 00:0

|* 3 | TABLE ACCESS FULL | T_CTN_DM | 5549 | 94333 | 103 (1)| 00:0

|* 4 | HASH JOIN | | 436K| 30M| 29354 (2)| 00:0

|* 5 | TABLE ACCESS FULL| T_DATE_DM | 359 | 16514 | 42 (0)| 00:0

|* 6 | TABLE ACCESS FULL| T_GATE_CTN_FT | 4783K| 127M| 29277 (1)| 00:0

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("T2867"."CTN_TYPE_UID"="T5253"."CTN_TYPE_UID")

3 - filter("T2867"."CTN_STATUS"='F')

4 - access("T2881"."DATE_UID"="T5253"."DATE_UID")

5 - filter("T2881"."OWC_YEAR_NO"=2012 OR "T2881"."OWC_YEAR_NO"+1=2012)

6 - filter("T5253"."MOVE_KIND"='RECV')

 

执行时间:

SQL>

ALTER SYSTEM FLUSH BUFFER_CACHE;

 

System altered

 

Executed in 0.156 seconds

ALTER SYSTEM FLUSH SHARED_POOL;

 

System altered

 

Executed in 0.468 seconds

select sum(T5253.TEU) as c1,

T2881.OWC_WEEK_DESC as c2,

T2881.OWC_WEEK_NUMBER as c3,

T2881.OWC_YEAR_NO as c4

from T_CTN_DM T2867, T_DATE_DM T2881, T_GATE_CTN_FT T5253

where (T2867.CTN_STATUS = 'F' and T2867.CTN_TYPE_UID = T5253.CTN_TYPE_UID and

T2881.DATE_UID = T5253.DATE_UID and T5253.MOVE_KIND = 'RECV' and

(T2881.OWC_YEAR_NO + 1 in (2012) or T2881.OWC_YEAR_NO in (2012)))

group by T2881.OWC_WEEK_NUMBER, T2881.OWC_YEAR_NO, T2881.OWC_WEEK_DESC

order by c2, c3

;

Result:

……

……

105 rows selected

 

Executed in 18.299 seconds

 

检查表格和对应索引的统计信息:

select * from user_tables where table_name='T_GATE_CTN_FT';

select * from user_indexes where table_name='T_GATE_CTN_FT' and index_name='I_GATE_CTN_DATE_UID';

注意上图聚集因子达到了536655.

 

select * from user_ind_columns where index_name='I_GATE_CTN_DATE_UID';

 

如果强制使用索引:

语句改为:

select /*+index(T5253 I_GATE_CTN_DATE_UID)*/sum(T5253.TEU) as c1,

T2881.OWC_WEEK_DESC as c2,

T2881.OWC_WEEK_NUMBER as c3,

T2881.OWC_YEAR_NO as c4

from T_CTN_DM T2867, T_DATE_DM T2881, T_GATE_CTN_FT T5253

where (T2867.CTN_STATUS = 'F' and T2867.CTN_TYPE_UID = T5253.CTN_TYPE_UID and

T2881.DATE_UID = T5253.DATE_UID and T5253.MOVE_KIND = 'RECV' and

(T2881.OWC_YEAR_NO + 1 in (2012) or T2881.OWC_YEAR_NO in (2012)))

group by T2881.OWC_WEEK_NUMBER, T2881.OWC_YEAR_NO, T2881.OWC_WEEK_DESC

order by c2, c3

;

 

执行计划:

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 1969617402

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | C

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 436K| 37M| 5

| 1 | SORT GROUP BY | | 436K| 37M| 5

|* 2 | HASH JOIN | | 436K| 37M| 5

|* 3 | TABLE ACCESS FULL | T_CTN_DM | 5549 | 94333 |

| 4 | NESTED LOOPS | | | |

| 5 | NESTED LOOPS | | 436K| 30M| 5

|* 6 | TABLE ACCESS FULL | T_DATE_DM | 359 | 16514 |

|* 7 | INDEX RANGE SCAN | I_GATE_CTN_DATE_UID | 1894 | |

|* 8 | TABLE ACCESS BY INDEX ROWID| T_GATE_CTN_FT | 1214 | 33992 |

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("T2867"."CTN_TYPE_UID"="T5253"."CTN_TYPE_UID")

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

3 - filter("T2867"."CTN_STATUS"='F')

6 - filter("T2881"."OWC_YEAR_NO"=2012 OR "T2881"."OWC_YEAR_NO"+1=2012)

7 - access("T2881"."DATE_UID"="T5253"."DATE_UID")

8 - filter("T5253"."MOVE_KIND"='RECV')

 

执行时间:

SQL>

ALTER SYSTEM FLUSH BUFFER_CACHE;

 

System altered

 

Executed in 0.125 seconds

ALTER SYSTEM FLUSH SHARED_POOL;

 

System altered

 

Executed in 0.686 seconds

select /*+index(T5253 I_GATE_CTN_DATE_UID)*/sum(T5253.TEU) as c1,

T2881.OWC_WEEK_DESC as c2,

T2881.OWC_WEEK_NUMBER as c3,

T2881.OWC_YEAR_NO as c4

from T_CTN_DM T2867, T_DATE_DM T2881, T_GATE_CTN_FT T5253

where (T2867.CTN_STATUS = 'F' and T2867.CTN_TYPE_UID = T5253.CTN_TYPE_UID and

T2881.DATE_UID = T5253.DATE_UID and T5253.MOVE_KIND = 'RECV' and

(T2881.OWC_YEAR_NO + 1 in (2012) or T2881.OWC_YEAR_NO in (2012)))

group by T2881.OWC_WEEK_NUMBER, T2881.OWC_YEAR_NO, T2881.OWC_WEEK_DESC

order by c2, c3

;

Result:

……

……

105 rows selected

 

Executed in 34.913 seconds

 

可见由于对于date_uid字段来说,数据存储到表格里的时候是无序的,导致索引的聚集因子过大,如果在查询中使用date_uid的索引,反而降低了查询的效率。

 

 

改进测试:

对于date_uid字段,因为是自增长类型,之后新填入的数据几乎都是顺序增长的。

那么建议在仓库比较空闲时重建此表格,按date_uid顺序重新插入数据,测试一下索引的可用情况。

 

建立测试表格:

CREATE TABLE T_GATE_CTN_FT2 AS SELECT * FROM T_GATE_CTN_FT ORDER BY DATE_UID;

 

CREATE INDEX I_GATE_CTN_FT2_DATE_UID ON T_GATE_CTN_FT2(DATE_UID);

检查统计信息:

SELECT * FROM USER_TABLES WHERE TABLE_NAME='T_GATE_CTN_FT2';

 

SELECT * FROM USER_INDEXES WHERE TABLE_NAME='T_GATE_CTN_FT2';

注意聚集因子为106590.

 

替换表格,检查执行计划:

select sum(T5253.TEU) as c1,

T2881.OWC_WEEK_DESC as c2,

T2881.OWC_WEEK_NUMBER as c3,

T2881.OWC_YEAR_NO as c4

from T_CTN_DM T2867, T_DATE_DM T2881, T_GATE_CTN_FT2 T5253

where (T2867.CTN_STATUS = 'F' and T2867.CTN_TYPE_UID = T5253.CTN_TYPE_UID and

T2881.DATE_UID = T5253.DATE_UID and T5253.MOVE_KIND = 'RECV' and

(T2881.OWC_YEAR_NO + 1 in (2012) or T2881.OWC_YEAR_NO in (2012)))

group by T2881.OWC_WEEK_NUMBER, T2881.OWC_YEAR_NO, T2881.OWC_WEEK_DESC

order by c2, c3

;

 

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 746580745

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 325K| 28

| 1 | SORT GROUP BY | | 325K| 28

|* 2 | HASH JOIN | | 325K| 28

|* 3 | TABLE ACCESS FULL | T_CTN_DM | 5549 | 94333

| 4 | NESTED LOOPS | | |

| 5 | NESTED LOOPS | | 325K| 22

|* 6 | TABLE ACCESS FULL | T_DATE_DM | 359 | 16514

|* 7 | INDEX RANGE SCAN | I_GATE_CTN_FT2_DATE_UID | 1827 |

|* 8 | TABLE ACCESS BY INDEX ROWID| T_GATE_CTN_FT2 | 907 | 25396

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("T2867"."CTN_TYPE_UID"="T5253"."CTN_TYPE_UID")

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

3 - filter("T2867"."CTN_STATUS"='F')

6 - filter("T2881"."OWC_YEAR_NO"=2012 OR "T2881"."OWC_YEAR_NO"+1=2012)

7 - access("T2881"."DATE_UID"="T5253"."DATE_UID")

8 - filter("T5253"."MOVE_KIND"='RECV')

 

执行情况:

SQL>

ALTER SYSTEM FLUSH BUFFER_CACHE;

 

System altered

 

Executed in 0.156 seconds

ALTER SYSTEM FLUSH SHARED_POOL;

 

System altered

 

Executed in 0.468 seconds

select sum(T5253.TEU) as c1,

T2881.OWC_WEEK_DESC as c2,

T2881.OWC_WEEK_NUMBER as c3,

T2881.OWC_YEAR_NO as c4

from T_CTN_DM T2867, T_DATE_DM T2881, T_GATE_CTN_FT2 T5253

where (T2867.CTN_STATUS = 'F' and T2867.CTN_TYPE_UID = T5253.CTN_TYPE_UID and

T2881.DATE_UID = T5253.DATE_UID and T5253.MOVE_KIND = 'RECV' and

(T2881.OWC_YEAR_NO + 1 in (2012) or T2881.OWC_YEAR_NO in (2012)))

group by T2881.OWC_WEEK_NUMBER, T2881.OWC_YEAR_NO, T2881.OWC_WEEK_DESC

order by c2, c3

;

Result:

……

……

105 rows selected

 

Executed in 8.391 seconds

 

可见当表格数据按照date_uid顺序存入时,索引的聚集因子比较低,相应的索引的可用性比较好,oracle的CBO会自动选择索引扫描。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29594798/viewspace-1138309/,如需转载,请注明出处,否则将追究法律责任。

下一篇: 没有了~
请登录后发表评论 登录
全部评论

注册时间:2014-04-08

  • 博文量
    2
  • 访问量
    3850