When Should You Rebuild an Index?

Version 1
    Share:|

    by Jonathan Lewis

    In an earlier article about B-tree indexes, I ended with the comment,  “Resist the argument that you need to rebuild Indexes regularly because ‘they  become unbalanced.’ It isn’t a valid argument.” Some time later, I received an  email message pointing out that there were other, valid, reasons for rebuilding  indexes. This is true, and rebuilding B-tree indexes does sometimes have  benefits, so I thought I’d write a short article investigating the reasons for  rebuilding B-tree indexes.

     

    Why Rebuild an Index?

    There are only two reasons for rebuilding indexes. One is driven by the  question, “Will the performance gain be worth the  cost of rebuilding this index?”  The other is driven by a similar question: “Is the  administrative benefit worth the cost?”

     

    One convenient aspect of phrasing the questions like this is that if the cost  is effectively zero, then even in the case in which rebuilding an index is a  pointless exercise, I’m not going to insist that you to stop  doing it. Of course, the time and effort may not really be zero, and just  occasionally, an index rebuild can cause subsequent short-term performance  problems, so do carry on reading the rest of the article.

     

    The standard argument for rebuilding indexes is based on the fact that B-tree  indexes tend to “degenerate” (i.e., become less space-efficient) as they are  populated over time. And the argument is true — but only because the standard  theory predicts that B-tree indexes will tend to operate at 75 percent packing,  and Oracle creates them, by default, at 90 percent packing. (In fact, Oracle  does not coalesce adjacent, low-usage, leaf blocks  automatically, so its packing will often be a few percent lower than the  theoretical value). So the question of the rebuild generally comes down to the  following implicit question:

     

    If I think the index is running at X percent packing, should I rebuild it  to 90 percent?

     

    Let’s start by considering the significance of the change from X percent, and  before we worry about values for the magic number X.

    Potential Benefits

    When you rebuild an index, the physical effect is to reduce the size of the  index (especially the leaf block count). This may result in three possible  benefits:

        • The optimiser calculations may produce a smaller value for the cost of using  the index, so the optimiser may use the index in more execution plans.

        • Typical queries that use the index may have to visit fewer index blocks, and  therefore run more efficiently.

        • Because the index is smaller, it may become a better survivor in the buffer  cache LRU chain, so that blocks from the index are found in the buffer more  frequently, and fewer blocks from other objects are flushed to allow index  blocks to be reloaded. If this occurs, system-level I/O is reduced, resulting in  a possible performance gain for everyone.

    So let’s check each effect in turn, and see what the different benefits look  like in different circumstances. We’ll start with the optimiser cost  calculations and a simple worked example.

    Optimiser Costing

    If you’ve read the article, “Why Isn't Oracle Using  My Index?!” you will be familiar with the formula originally published by  Wolfgang Breitling at IOUG-2002 which gives the general cost of accessing a  table through an index as:

    blevel +

    selectivity * leaf_blocks +

    selectivitity * clustering_factor.

     

    Consider an index on 10,000,000 rows, with an average index entry size of 40  bytes. This would give us about 200 entries per leaf  block if we use an 8K block size. Assume that the index (according  to some statistics in the viewindex_stats) is running  at 50 percent efficiency and is therefore, according to common dictate, a good  candidate for rebuilding (to the default 90 percent — but we will use 100  percent in our example). In other words, the index is currently running with  about 100 entries per block.

     

    Let’s do the arithmetic, first before the rebuild:

    10,000,000 rows at 100 rows per block

    => 100,000 leaf blocks

    100,000 leaf blocks (at 50 percent packing)

    => 1,000 level 1 branch blocks

    1,000 level 1 branch blocks (at 50 percent packing)

    => 10 level 2 branch blocks

    10 level 2 branch blocks (at 50 percent packing)

    => 1 level 3 branch block

    Now after the rebuild (with pctfree = 0):

    10,000,000 rows at 200 rows per block

    => 50,000 leaf blocks

    50,000 leaf blocks (at 100 percent packing)

    => 250 level 1 branch blocks

    250 level 1 branch blocks (at 100 percent packing)

    => 2 level 2 branch blocks

    2 level 2 branch blocks (at 100 percent packing)

    => 1 level 3 branch block

     

    So the index has blevel = 3 before and after  rebuilding, but 100,000 leaf_blocks before the  rebuild, dropping to 50,000 afterwards. So does this tell us anything about the  new cost estimate? Not really, although it is an important little detail that  the blevel, which is one component of the cost  calculation, rarely changes on rebuilding an index. The exponential relationship  between the blevel and the maximum row count makes  this almost inevitable.

    We need to examine not just the leaf_blocks, but  the entire cost component: selectivity * leaf_blocks.  How much does this change? Well, that depends on how many rows you have for each  index value. Let’s try a couple of numbers, one for a high-precision index (five  rows per value), and one for a lower quality index (50 per value).

     

    Rows per value = 5           => selectivity = 1/2,000,000
    Selectivity * leaf_blocks (old) = 100,000 / 2,000,000 = 0.05
    Selectivity * leaf_blocks (new) = 50,000 / 2,000,000 = 0.025

     

    Because of rounding (I believe Oracle rounds up at this point in the  formula), this component of the cost estimate does not change.

    Rows per key = 50            => selectivity = 1/200,000
    Selectivity * leaf_blocks (old) = 100,000 / 200,000 = 0.5
    Selectivity * leaf_blocks (new) = 50,000 / 200,000 = 0.25

     

    Again, this component of the formula shows no change. In fact, it is not  until each key value represents 100 rows in the table  that the cost estimate is reduced as a result of our rebuilding the index to  pack it from 50 percent to 100 percent efficiency.

     

    So, there are cases where rebuilding an index will change the  optimiser’s opinion of the usefulness of that index — but those cases may  be fairly rare in your system.

    Typical Queries

    So what if the optimiser isn’t convinced that your index has improved; maybe  the end-user queries that use that index already will run better. As ever, it is  important to know the data and know the application. When you start to think  about rebuilding an index “because index_stats says it  is running at 50 percent,” the first thing you should do is to ask yourself if  (for example) that 50 percent means that half the index is nearly packed and the  other half is nearly empty — one magic number for the whole index may not be  giving you any useful information. (In the absence of knowledge, you might try a  rather expensive treedump to check the details).

     

    Then, when you have decided what the number means, you have to decide if  rebuilding would actually have any perceptible effect for your users. Depending  on the way the table and index are used, the answer may be no.

     

    Consider this example. The table is addressed by an “Object Oriented” system,  which always uses meaningless numeric keys. Every query uses a key value to  collect data. One trip down the index fetches one  rowid from a leaf block, and  then one row from the table. Rebuilding the index to pack 200  rowids per leaf_block  doesn’t make any difference if you want only one row.

     

    How about an example closer to the other end of the spectrum: a  “child†table in which one foreign key value  picks up 100 rows? Packing the index means that you only visit one  leaf_block instead of two for those  rowids. Terrific — except maybe those 100 table  rows come from 100 table blocks, so the effort of rebuilding the index makes a  difference of slightly less than one percent to the end-user. You have to check  out the figures when trying to work out whether or not it’s going to be  worth rebuilding an index, especially if there’s some reason that make it  awkward to do the rebuild.

    Buffering Benefits

    Clearly, if you have a well-packed index, it is likely to cause less  “pollution” in the buffer — there are fewer index blocks, so the whole index may  get into the buffer, and because there are fewer blocks to knock other data  blocks out of the buffer, there may be a general reduction in I/O. On the down  side, a well-packed index may be subject to extra contention on inserts,  updates, and deletes. It’s harder to measure a case like this, but it’s probably  a good enough reason to rebuild an index (partition) that’s about to be made  read-only.

     

    Three thoughts to consider, though. If it’s a really popular index, then the  LRU and touch count could be  keeping it buffered all the time anyway, so whilst it may be “wasting space” in  the buffer, it may not be causing any extra I/O. Secondly, for many queries the  significant cost of the query is the cost of visiting table blocks which have a  low revisit rate and cause much more aggressive buffer flushing, so worrying  about indexes may be distracting you from the problem you should be addressing.  Finally, it is possible that by rebuilding indexes too frequently, you may be  exaggerating the problem rather than fixing the problem — you might like to try  the following (fairly long-running, expect it to take a few minutes) test on a  system with an 8K block size and not in an  ASSM tablespace if you are using Oracle9:

    drop table t1;
    create table t1(n1 number(38));
    create index i1 on t1(n1);

     

    execute dbms_random.seed(0)

     

    begin
    for i in 1..400000 loop
    insert into t1 values(
    trunc(power(10,14) * dbms_random.value)
    );
    commit;
    end loop;
    end;
    /

     

    -- default pctfree for rebuild
    alter index i1 rebuild pctfree 10;

     

    begin
    for i in 1..100000 loop
    insert into t1 values(
    trunc(power(10,14) * dbms_random.value)
    );
    commit;
    end loop;
    end;
    /

     

    analyze index i1 validate structure;
    select lf_blks from index_stats;

    Check the number of leaf blocks in the index when  you do the rebuild part of the way through the test,  then the repeat the experiment without the rebuild.  You may be surprised at the difference in results. (In my case, the count with  the “performance-enhancing” rebuild was 2227, but when I didn’t do the rebuild,  the count stayed down at 1778 blocks).

     

    Please note, this test does not prove (and is not intended to prove) that you  should not rebuild indexes. It shows only that it is  possible for a rebuild to have a counter-intuitive  effect. There are scenarios, based on data patterns, where this type of effect  is likely to appear in critical subsections of an  index shortly after a rebuild. The point I am making is that you should always  think carefully before deciding to rebuild an index.

    Reminder

    Throughout this article, I have been discussing B-tree indexes. There is a  completely different argument about bitmap indexes that I have already covered  in various articles about bitmap indexes previously published on www.dbazine.com.

     

    I have also ignored the possibility that you could choose to rebuild indexes  (or at least index partitions) to 100 percent packing  (pctfree = 0) just before you make a  tablespace read-only. Even then, it might not be worth  the effort if it causes problems whilst you are doing the rebuild (remember,  there was a fairly serious bug in online index rebuilds which may still be  present in be recent versions of Oracle).

     

    Finally, there are always some indexes which, for application-dependent  reasons, behave in a completely catastrophic manner — there are always  special cases where a regular rebuild is likely to be  a good idea. Even then, you should check whether a regular  coalesce might be a better idea in the short term,  with a strategic redesign involving function-based  indexes as a longer term solution.

     

    Conclusion

    There is only one sound argument for rebuilding an index:

    Will the total cost of rebuilding the index be a reasonable price to pay for  the resulting benefit to the system?

    The answer to this question is frequently a resounding NO. In fact, sometimes  the overall impact of rebuilding an active index will be detrimental to the  system. However, there are still plenty of misconceptions about indexes that  result in DBAs the world over wasting valuable time and effort rebuilding  indexes unnecessarily.

     

    BUT, if you have a regular window when the system is not in use, have a  simple batch job that runs in that window without putting pressure on other  batch tasks, and that batch job can never fail — then  you might as well rebuild all the “safe” indexes you want to; there is often a  slight performance gain to be had, and if it costs you nothing, then you might  as well take it.

    --

    Jonathan  Lewis is a freelance consultant with more than 18 years  experience of Oracle. He specialises in physical database design and the  strategic use of the Oracle database engine, is author of Practical  Oracle 8i - Building Efficient Databases published by  Addison-Wesley, and is one of the best-known speakers on the UK Oracle circuit.  Further details of his published papers, presentations and seminars and tutorials can be  found at http://www.jlcomp.demon.co.uk,  which also hosts The Co-operative Oracle Users’ FAQ for the  Oracle-related Usenet newsgroups.