Bitmap Indexes: Understanding Bitmap Indexes 

Version 1
    Share This:

    by Jonathan Lewis

     

    Bitmap indexes are a great boon to certain kinds of application, but there is  a lot of mis-information in the field about how they work, when to use them, and  the side-effects. This article examines the structure of bitmap indexes, and  tries to explain how some of the more commonly repeated misconceptions came into  existence.

    Everybody Knows …

    If you did a quick survey to discover the understanding that people had of  bitmap indexes, you would probably find the following comments being quoted  fairly frequently:

    a) When there are bitmap indexes on tables then updates will take out full  table locks.

    b) Bitmap indexes are good for low-cardinality columns.

    c) Bitmap index scans are more efficient than tablescans even when returning  a large fraction of a table.

     

    The third claim is really little more than a (possibly untested) corollary to  the second claim. And all three claims are in that grey area somewhere between  false and extremely misleading.

     

    Of course, there is a faint element of truth to these claims -- just enough  to explain why they should have arisen in the first place.

    This purpose of this article is to examine the structure of bitmap indexes,  review the claims, and try to sort out some of the costs and benefits of using  bitmap indexes.

     

    What Is a Bitmap Index?

    Indexes are created to allow Oracle to identify requested rows as efficiently  as possible. Bitmap indexes are no exception -- however the strategy behind  bitmap indexes is very different from the strategy behind B*tree indexes. To  demonstrate this, we can start by examining a few block dumps.

    Consider the SQL script in figure 1.

     

    jlewis3_img1.gif

    Figure 1: Sample data.

     

    Note how we have defined the btree_col and bitmap_col so that they hold  identical data that cycles through the values zero to nine.

    On a 9.2 database with a block size of 8K, the resulting table was 882 blocks  long. The B*tree index had 57 leaf blocks, and the bitmap index had 10 leaf  blocks.

    jlewis3_img2.gif

    Figure 2: Symbolic block dumps.

     

    Clearly the bitmap index was in some way much more tightly packed than the  B*tree index. To see the packing, we can produce a symbolic dump from the  indexes using commands like:

        alter system
        dump datafile x block y;

     

    See figure 2 for results -- be warned, however, that symbolic block dumps can  be a little misleading. Some of the information they display is derived, some is  re-arranged for the sake of clarity.

     

    Do Bitmaps Lock Tables?

    Looking at figure 2, we see in the B*tree index that every entry  consists of a set of flags, a lock byte, and (in this case) two  columns of data. The two columns are in fact the indexed value,  and a rowid -- and every row in our table has a corresponding entry of  this form in the index. (If the index were a unique index, we would still see  the same content in each entry, but the layout would be a little different).

     

    In the bitmap index, every entry consists of a set of flags, a  lock byte, and (in this case) four columns of data. The four  columns are in fact the indexed value, a pair of rowids and a  stream of bits. The pair of rowids identifies a contiguous section of the  table, and the stream of bits is encoded to tell us which rows in that range  of rowids hold that value.

     

    Look at the size of the bit stream though -- the length of the column in the  example above is 3,521 bytes, or roughly 27,000 bits. Allowing about 12%  overhead for check sums and so on, this single entry could cover about 24,000  rows in the table. But there is only one lock byte for the entire entry,  which means a single lock will have some sort of impact on as many as 24,000  rows in the table.

     

    So this is where that dubious claim originates -- if you think that a bitmap  index causes a full table lock, then you have been experimenting with tables  that are too small.

     

    A single bitmap lock could cover thousands of rows -- which is pretty bad  news -- but it does not lock the table.

     

    Consequences of Bitmap Locks

    We shouldn't stop with that conclusion, though, as it would be easy to  mis-interpret the result. We need to understand what actions will cause that one  critical lock byte to be taken, and exactly what effect that will have on the  thousands of related rows.

    We can investigate this with a much smaller test (see figure 3). We start by  building a small table, and then doing different updates to different rows in  that table.

    Sample data set:

    jlewis3_img3.gif

    Figure 3: Preparing for update tests.

     

    Note that we have updated the indexed column of one row in the table. If we  dump the index and table blocks, we will see that there is a lock byte set on  that one row in the table, but two sections of the bitmap index  are locked. The two sections will be the section for nearby rows where the  current value is 1 (the "from" section) and the section for nearby rows  where the value is 2 (the "to" section). (In fact we should see that  those two sections of the bitmap have been copied and both copies are  locked).

     

    The question we have to pursue now, is how aggressive is Oracle's locking in  this case.

    The answer may come as a bit of a surprise to those who think in terms of  "bitmap indexes cause table locks."

    We can do any of the following (each one is a separate test).

    Update a row in the "from" section, provided we do not try to update  the bitmap column.

       update t1
        set id = 5
        where id = 0;

     

    Update a row in the "to" section, provided we do not try to update the  bitmap column.

       update t1
        set id = 6
        where bit_col = 2;

     

    These tests show us that a row can be covered by a locked bitmap section, and  still be available for update.

    Lock collisions are possible, of course, for example neither of the following  statements is updating a locked table row, but either of them would cause their

    session to wait on a "TX" lock in mode 4 (shared)

        update t1
        set bit_col = 4
        where id = 2; -- bit_col = 2

        update t1
        set bit_col = 2
        where id = 3 -- bit_col = 3

     

    Note, however, that the problem requires two things to be true. First, we  must be updating the indexed column, and secondly the row that we are updating  must be covered by a previously locked bitmap section i.e. it must be "fairly  near" another row that is in mid-update, and there is a strictly limited list of  values (viz: 4 values) that could cause a collision.

     

    Bear in mind that we can, with our sample scenario, update the bitmap indexed  column in a nearby row, provided that neither the initial nor final value is 1  or 2. For example:

        update t1
        set bit_col = 4
        where bit_col = 3;

     

    So, bitmap indexes do NOT cause table locks; and if our updates do not affect  the bitmapped column, the presence of the bitmap indexes causes no problems at  all, and even if our updates do update bitmapped columns we may be able to  engineer a set of non-colliding updates.

     

    Problems with Bitmaps

    Of course, there are some problems with using bitmaps that go beyond the  question of update collisions.

    Remember that inserts and deletes on a table will result in  updates to all the associated indexes. Given the large number of rows covered by  a single bitmap index entry, any degree of concurrency of inserts or deletes has  a fairly high chance of affecting overlapping index sections and causing massive  contention.

     

    Moreover, even serialised DML that affects bitmap indexes may have a more  significant performance impact than you would expect.

    I pointed out that a simple update to a single row typically results in an  entire bitmap section being copied. Look back at (figure 1), and remind yourself  how big a single bitmap section could be. In the example it was 3,500 bytes, (in  Oracle 9 the limit is close to half a block). You can find that a small number  of changes to your data can have a surprisingly large impact on the size of any  bitmap index that gets updated as a consequence.

     

    You can get lucky -- but in general you should start with the assumption that  even a serialised batch update will be most effective if you drop the bitmap  indexes before the batch and rebuild them afterwards.

     

    Low Cardinality Columns

    It is often claimed that "bitmap indexes are good for low cardinality  columns." If we are a little fussy about the language, we might prefer to say  "low distinct cardinality." In either case, the intent is to identify columns  that hold a relatively small number of different values.

     

    This is indeed a reasonably accurate statement - provided it is qualified and  explained properly. Unfortunately, many people seem to think that this means a  bitmap index is magically so efficient that you can use it to access large  fractions of a table in a way that would not be considered sensible with a  B*tree index.

     

    The classic example quoted for bitmap indexing is the extreme one of sex; a  column holding just two values (or three if you include the "n/a" dictated by  the ISO standard). We will be slightly less extreme, and consider an example  based on the countries that make up the United Kingdom -- England Ireland,  Scotland and Wales.

     

    Assume we have a block size of 8K, and a (reasonably ordinary) row size of  200 bytes, for a total of 40 rows per block. Insert a few million rows into that  table, ensuring that the distribution of the four countries is uniformly random.  There will be roughly 10 rows per block for each country.

    If I use the bitmap index to access all the rows for England, I will visit  every block in the table (ten times) in order. Surely it would be more efficient  to do a tablescan than to use that index.

     

    In fact even if I expand my data set to 40 countries, I am still likely to  find one row in each block in the table. Perhaps by the time my data has  expanded to global proportions (say 640 countries so that any given country  appears once every 16 blocks), it might be cheaper to access the data by index  rather than by tablescan. But a column with 640 different values hardly seems to  qualify, at first sight, for the description of "low distinct cardinality."

    Of course, descriptive expressions like "low", "small", "close to zero" need  some qualification. Is 10,000 close to zero, for example ? If the alternative is  ten billion then the answer is yes!

     

    Forget the vague expressions like "low cardinality." In most cases there are  only two points to bear in mind when considering bitmap indexes. First, it is  the number of different blocks in the table that you have to visit for a typical  index value that is the major cost of using an individual index; changing an  index from B*tree to bitmap won't magically make it a better index. Secondly, it  is Oracle's optimiser mechanism for combining multiple bitmap indexes  that makes them useful.

     

    Consider this example based on the UK population of roughly 64M people.

        • 50M have brown eyes

        • 35M are female

        • 17M have black hair

        • 1.8M live in the Birmingham area

        • 1.2M are aged 25

        • 750,000 work in London

    Any one fact gives us a huge number of people -- but how many brown-eyed,  black-haired, women aged 25 live in Birmingham and work in London? Perhaps a  couple of dozen.

     

                jlewis3_img4.gif    

    Figure 4: Modelling the UK population.

     

    Individually an index (B*tree or bitmap) on any one of these facts would be  completely useless if we translated this data set and query into an Oracle  database.

     

    A multi-column B*tree index on all six facts might be quite helpful -- until  we decided to ask for men who were 6-foot tall with beards, instead of women  with brown eyes and black hair.

     

    You might like to try the experiment (see figure 4) which needs about 2.0 GB  of free space and may take a couple of hours to complete at around the 500MHz  CPU mark.

     

    Due to restrictions on space, I built a smaller model -- emulating a  population of only 36 million. The time to build, and size of objects came out  as follows on a 600MHz, Win2000 box running 9.2.0.1.

     

    Object  

    Size (MB)

    Build Time (mi:ss)

    T1

    845

    16:12

    I1 (sex)

    11

    1:39

    I2 (eyes)

    16

    1:43

    I2 (hair)

    37

    2:17

    I4 (town)

    40

    2:25

    I5 (age)

    42

    2:28

    I6 (work)

    45

    2:42

     

    Note particularly the total space taken by the indexes -- 191 MB. Just one  multi-column index on the same six columns (even with maximum compression) would  take at least 430MB, using I don't know how much CPU time to build; and not many  systems would have catered for a full in-memory build as the required  sort_area_size would be about 900MB.

     

    So what can all these bitmap indexes do for us? Consider the query:

      
    select  count(facts)
      
    from    t1
      
    where   eyes = 1
      
    and     sex = 1
      
    and     hair = 1
      
    and     town = 15
      
    and     age = 25
      
    and     work = 40
      

    On the reduced data set that I had created, a hint to use a full tablescan,  resulted in a run time of one minute and 20 seconds (returning the answer  eight). Of course, with a real set of fact data, the table would have been much  bigger, and the time much greater.

     

    With a full, six-column, 430MB index, this query would probably have returned  in the time it takes to do about 10 physical reads (one table block for each  row, and a couple extra for reading index blocks) -- the proverbial sub-second  response.

     

    With the bitmap indexes as defined, the response time was five  seconds. Most of that time was spent in bitmap index range-scans that did  physical reads to get index blocks into memory. The actual execution path is  shown in figure 5.

     

    jlewis3_img5.gif

    Figure 5: Execution path.

     

    There are two interesting points to consider with this result. First is that  Oracle ignored the three "worst" (i.e. least selective) indexes. Second is that  although the response time seems slow, the index sizes are so small that it is  feasible to think about keeping them in a large buffer_pool_keep (or, for  Oracle 9, db_keep_cache_size) to eliminate the cost of the physical reads  -- an option that would probably not be feasible if you needed several  multi-column B*tree indexes to do the same job.

     

    Let's think about the ignored indexes -- it is possible for a bitmap plan  like this to use an apparently arbitrary number of indexes, and I have seen  cases where Oracle has used more indexes than the limit of five that applies to  the and_equal access path for single-column B*tree indexes.

     

    The three missing indexes have not been ignored because of some artificial  limit. The cost based optimizer weighs the cost of reading each extra index  against the additional precision gained, so bitmap indexes on the classic (male,  female) column tend to be ignored despite claims to the contrary. (Delete the  clause work = 40 from the sample query, though, and you will see the index on  column sex is actually used).

     

    Of course, such bitmaps can be built very quickly and tend to be very small,  so you might want to build them anyway, just in case.

    Sizing

    The sizing of indexes, and the option for maximum buffering have to be  considered in the cost, of course, and the question often arises -- how big will  a bitmap index be ?

    In the example above, I have tried to build a worst-case scenario, making it  as hard as possible for Oracle to gain any advantages in compression.

    In the worst case, the size of a bitmap in bits would be:

     

        Number of different possible values for the column *
        Number of rows that Oracle thinks could fit in a block *
        Number of blocks below high water mark.

     

    Add about 10 percent for checksum information and overhead, and divide by  eight to get the number of bytes.

    Fortunately, Oracle has some steps for reducing the size of the wasted space  -- the most important of which is the command to tell Oracle exactly how many  rows per block you have in the worst case in a specific table:

    Alter table XXX
          minimize records_per_block;

    However, apart from keeping Oracle informed with this command, you also find  that the size of the index is very strongly affected by the clustering of the  data.

     

    In my example, I have constructed the data to be as thinly scattered as  possible -- for example the town column rotates through the values 0 to 30. If I  restructure (effectively sort) the data so that all the towns with code 0  are together, followed by all the towns with code 1, the size of the index drops  from 40MB to a mere 7MB.

     

    This dramatic variation in size is yet another reason for reviewing the claim  about "low cardinality." The potential benefit of a bitmap index varies with the  clustering of data (as does the potential benefit of a B*tree index, of course).  When you are considering bitmap indexes, do not be put off by a column which has  a "large" number of different values. If every value appears a "large" number of  times, and if the rows for each value are reasonably clustered, then a bitmap  index may be highly appropriate. In a table with 100M rows, a column with 10,000  different values could easily turn out to be a perfect candidate for a bitmap  index.

     

    Conclusion

    There are several seriously misleading statements commonly made about bitmap  indexes. Some may lead you into avoiding bitmap indexes when they could be very  useful, others may lead you into creating bitmap indexes which are totally  inappropriate.

    Fortunately it is quite hard to make big mistakes with bitmap indexes, but it  is a good idea to have some idea of what Oracle does with them so that you can  make best use of them.

     

    The key facts to remember are:

        • If a B*tree index is not an efficient mechanism for accessing data, it is  unlikely to become more efficient simply because you convert it to a bitmap  index.

        • Bitmap indexes can usually be built quickly, and tend to be surprisingly  small.

        • The size of the bitmap index varies dramatically with the distribution of  the data.

        • Bitmap indexes are typically useful only for queries that can use several  such indexes at once.

        • Updates to bitmapped columns, and general insertion/deletion of data can  cause serious lock contention.

        • Updates to bitmapped columns, and general insertion/deletion of data can  degrade the quality of the indexes quite dramatically.

     

    Remember too, that the optimiser improves with every release of Oracle. The  boundary between the utilisation mechanisms for B*tree and bitmap indexes  becomes increasingly blurred with the evolution of compressed indexes, index  skip scans, and btree to bitmap conversions.

     

    References

    Oracle 9i Release 2 Datawarehousing Guide, Chapter 6.

    --

    Jonathan Lewis is a  freelance consultant with more than 17 years' experience in Oracle. He  specialises in physical database design and the strategic use of the Oracle  database engine, is author of Practical  Oracle 8i - Building Efficient Databases published by Addison-Wesley,  and is one of the best-known speakers on the UK Oracle circuit. Further details  of his published papers, tutorials,  and seminars can be found at www.jlcomp.demon.co.uk, which  also hosts The Co-operative Oracle Users' FAQ for the Oracle-related  Usenet newsgroups.