# oracle索引群集因子

cost =
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effective table selectivity)

Historically it has been common practice to say that a good index has a low clustering_factor, and a bad
index has a high clustering_factor.
There is obviously a degree of truth in this comment, especially in the light of what the clustering_factor
represents. However, I have always had an aversion to words like low, high, small, large, and expressions like close
to zero, when talking about Oracle. After all, is 10,000 a low clustering_factor or a high clustering_
factor? It’s low if you have 10,000 blocks in your table, and high if you have 100 blocks in your table. So you
might want to write a couple of little scripts that join user_tables to user_indexes (and other scripts for
partitioned tables, etc.) so that you can compare the critical figures.
In fact, for reasons I describe in Chapter 5, I often use the column avg_data_blocks_per_key to get
an idea of how good Oracle thinks the index is.

In Figure 4-1, we have a table with four blocks and 20 rows, and an index on the column V1,
whose values are shown. If you start to walk across the bottom of the index, the first rowid
points to the third row in the first block. We haven’t visited any blocks yet, so this is a new
block, so we count 1. Take one step along the index, and the rowid points to the fourth row of
the second block—we’ve changed block, so increment the count. Take one step along the
index, and the rowid points to the second row of the first block—we’ve changed block again, so
increment the count again. Take one step along the index, and the rowid points to the fifth row
of the first block—we haven’t changed blocks, so don’t increment the count.
In the diagram, I have put a number against each row of the table—this is to show the
value of the counter as the walk gets to that row. By the time we get to the end of the index, we
have changed table blocks ten times, so the clustering factor is 10.
Notice how small clumps of data stop the clustering_factor from growing—look at block 2
where the value 8 appears four times because four consecutive entries in the index point to the
same block; the same effect shows up in block 3 to give three rows the value 6.
The table doesn’t have to be completely sorted for this type of thing to happen; it only
needs to have little clumps (or clusters) of rows that are nearly sorted—hence the term
clustering_factor, rather than sort_factor.
Given the way the clustering_factor is calculated, you will appreciate that the smallest
possible value has to be the same as the number of blocks in the table, and the largest possible
value has to be the same as the number of rows in the table—provided you have computed
statistics.
If there are lots of blocks like block 2 in the table, the clustering_factor will turn out to be
quite close to the number of blocks in the table, but if the data in the table is randomly scattered,
the clustering_factor will tend to come out close to the number of rows in the table.  44_740323_1190603589.jpg

• 博文量
19
• 访问量
68590