Why Isn't Oracle Using My Index?!

by Jonathan Lewis

 

The question in the title of this piece is probably the single most  frequently occurring question that appears in the Metalink forums and Usenet  newsgroups. This article uses a test case that you can rebuild on your own  systems to demonstrate the most fundamental issues with how cost-based  optimisation works. And at the end of the article, you should be much better  equipped to give an answer the next time you hear that dreaded question.

 

Because of the wide variety of options that are available when installing  Oracle, it isn't usually safe to predict exactly what will happen when someone  runs a script that you have dictated to them. But I'm going to risk it, in the  hope that your database is a fairly vanilla installation, with the default  values for the mostly commonly tweaked parameters. The example has been built  and tested on an 8.1.7 database with the db_block_size  set to the commonly used value of 8K and the  db_file_multiblock_read_count set to the equally  commonly used value 8. The results may be a little different under Oracle  9.2

Run the script from Figure 1, which creates a couple of tables, then indexes  and analyses them.

 

create table t1 as
select  
trunc((rownum-1)/15)  n1,
trunc((rownum-1)/15)       n2,            
rpad('x', 215)        v1
from all_objects<
where rownum <= 3000;

create table t2 as
select
mod(rownum,200) n1,
mod(rownum,200) n2,
rpad('x',215) v1
from all_objects
where rownum <= 3000;

create index t1_i1 on t1(N1);
create index t2_i1 on t2(n1);

analyze table t1 compute
statistics;
analyze table t2 compute
statistics;

Figure 1: The test data sets.

 

Once you have got this data in place, you might want to convince yourself  that the two sets of data are identical — in particular, that the  N1 columns in both data sets have values ranging from  0 to 199, with 15 occurrences of each value. You might try the following  check:

select n1, count(*)
from t1
group by n1;

 

and the matching query against T2 to prove the  point.

If you then execute the queries:

select * from t1 where n1 = 45;
select * from t2 where n1 = 45;

 

You will find that each query returns 15 rows. However if you

set autotrace trace only explain

 

you will discover that the two queries have different execution paths.

 

The query against table T1 uses the index, but the  query against table T2 does a full tablescan.

 

So you have two sets of identical data, with dramatically different access  paths for the same query.

What Happened to the Index?

Note: if you've ever come across any of those "magic number" guidelines  regarding the use of indexes, e.g., "Oracle will use an index for less than 23  percent, 10 percent, 2 percent (pick number at random) of the data," then you  may at this stage begin to doubt their validity. In this example, Oracle has  used a tablescan for 15 rows out of 3,000, i.e., for just one half of one  percent of the data!

 

To investigate problems like this, there is one very simple ploy that I  always try as the first step: Put in some hints to make Oracle do what I think  it ought to be doing, and see if that gives me any clues.

In this case, a simple hint:

/*+ index(t2, t2_i1) */

 

is sufficient to switch Oracle from the full tablescan to the indexed access  path. The three paths with costs (abbreviated to C=nnn) are shown in Figure  2:


select * from t1 where n1 = 45;

EXECUTION PLAN
--------------
TABLE ACCESS BY INDEX ROWID OF T1 (C=2)
INDEX(RANGE SCAN) OF T1_I1 (C=1)


select * from t2 where n1 = 45;

EXECUTION PLAN
--------------
TABLE ACCESS FULL OF T2 (C=15)


select /*+ index(t2 t2_i1) */
*
from t1
where n1 = 45;

EXECUTION PLAN
--------------
TABLE ACCESS BY INDEX ROWID OF T2 (C=16) 
INDEX(RANGE SCAN) OF T2_I1 (C=1)

Figure 2: The different queries and their  costs.

 

So why hasn't Oracle used the index by default in for the T2 query? Easy — as  the execution plan shows, the cost of doing the tablescan is cheaper than the  cost of using the index.

Why is the Tablescan Cheaper?

This, of course, is simply begging the question. Why is the cost of the tablescan cheaper than the cost  of using the index?

 

By looking into this question, you uncover the key mechanisms (and critically  erroneous assumptions) of the Cost Based Optimiser.

Let's start by examining the indexes by running the query:

select
table_name,
blevel,
avg_data_blocks_per_key,
avg_leaf_blocks_per_key,
clustering_factor
from     user_indexes;

The results are given in the table below:

T1

T2

Blevel

1

1

Data block / key

1

15

Leaf block / key

1

1

Clustering factor

96

3000

 

Note particularly the value for "data blocks per key." This is the number of  different blocks in the table that Oracle thinks it will have to visit if you  execute a query that contains an equality test on a complete key value for this  index.

 

So where do the costs for our queries come from? As far as Oracle is  concerned, if we fire in the key value 45, we get the data from table  T1 by hitting one index leaf block and one table  block — two blocks, so a cost of two.

 

If we try the same with table T2, we have to hit  one index leaf block and 15 table blocks — a total of 16 blocks, so a cost of  16.

Clearly, according to this viewpoint, the index on table T1 is much more desirable than the index on table T2. This leaves two questions outstanding, though:

 

Where does the tablescan cost come from, and why are the figures for the avg_data_blocks_per_key so different between the two  tables?

The answer to the second question is simple. Look back at the definition of  table T1 — it uses the  trunc() function to generate the N1 values, dividing the "rownum - 1 "by 15 and  truncating.

Trunc(675/15) = 45
Trunc(676/15) = 45

Trunc(689/15) = 45

 

All the rows with the value 45 do actually appear one after the other in a  tight little clump (probably all fitting one data block) in the table.

Table T2 uses the mod() function to generate the N1 values, using modulus 200  on the rownum:

mod(45,200) = 45
mod(245,200) = 45

mod(2845,200) = 45

 

The rows with the value 45 appear every two hundredth position in the table  (probably resulting in no more than one row in every relevant block).

 

By doing the analyze, Oracle was able to get a  perfect description of the data scatter in our table. So the optimiser was able  to work out exactly how many blocks Oracle would have to visit to answer our  query — and, in simple cases, the number of block visits is the cost of the  query.

But Why the Tablescan?

So we see that an indexed access into T2 is more expensive than the same path  into T1, but why has Oracle switched to the tablescan?

 

This brings us to the two simple-minded, and rather inappropriate,  assumptions that Oracle makes.

 

The first is that every block acquisition equates to a physical disk read,  and the second is that a multiblock read is just as quick as a single block  read.

 

So what impact do these assumptions have on our experiment?

 

If you query the user_tables view with the  following SQL:

select   
table_name,
blocks
from user_tables;

 

you will find that our two tables each cover 96 blocks.

 

At the start of the article, I pointed out that the test case was running a  version 8 system with the value 8 for the  db_file_multiblock_read_count.

 

Roughly speaking, Oracle has decided that it can read the entire 96 block  table in 96/8 = 12 disk read requests.

 

Since it takes 16 block (= disk read) requests to access the table by index,  it is clearer quicker (from Oracle's sadly deluded perspective) to scan the  table — after all 12 is less than 16.

 

Voila! If the data you are targetting is suitably scattered across  the table, you get tablescans even for a very small percentage of the data — a  problem that can be exaggerated in the case of very big blocks and very small  rows.

Correction

In fact, you will have noticed that my calculated number of scan reads was  12, whilst the cost reported in the execution plan was 15. It is a slight  simplfication to say that the cost of a tablescan (or an index fast full scan  for that matter) is

'number of blocks' /
db_file_multiblock_read_count.

 

Oracle uses an "adjusted" multi-block read value for the calculation  (although it then tries to use the actual requested size when the scan starts to  run).

 

For reference, the following table compares a few of the actual and adjusted  values:

Actual

Adjusted

4

4.175

8

6.589

16

10.398

32

16.409

64

25.895

128

40.865

 

As you can see, Oracle makes some attempt to protect you from the error of  supplying an unfeasibly large value for this parameter.

 

There is a minor change in version 9, by the way, where the tablescan cost is  further adjusted by adding one to result of the division — which means  tablescans in V9 are generally just a little more expensive than in V8, so  indexes are just a little more likely to be used.

Adjustments

We have seen that there are two assumptions built into the optimizer that are  not very sensible.

  • A single block read costs just as much as a multi-block read — (not really  likely, particularly when running on file systems without direction)

  • A block access will be a physical disk read — (so what is the buffer cache  for?)

Since the early days of Oracle 8.1, there have been a couple of parameters  that allow us to correct these assumption in a reasonably truthful way.

 

See Tim Gorman's article for a proper description of these parameters, but  briefly:

 

Optimizer_index_cost_adj takes a value between 1  and 10000 with a default of 100. Effectively, this parameter describes how cheap  a single block read is compared to a multiblock read. For example the value 30  (which is often a suitable first guess for an OLTP system) would tell Oracle  that a single block read costs 30% of a multiblock read. Oracle would therefore  incline towards using indexed access paths for low values of this parameter.

 

Optimizer_index_caching takes a value between 0 and  100 with a default of 0. This tells Oracle to assume that that percentage of  index blocks will be found in the buffer cache. In this case, setting values  close to 100 encourages the use of indexes over tablescans.

 

The really nice thing about both these parameters is that they can be set to  "truthful" values.

 

Set the optimizer_index_caching to something in the  region of the "buffer cache hit ratio." (You have to make your own choice about  whether this should be the figure derived from the default  pool, keep pool, or both).

 

The optimizer_index_cost_adj is a little more  complicated. Check the typical wait times in  v$system_event for the events "db file scattered read"  (multi block reads) and "db file sequential reads" (single block reads). Divide  the latter by the former and multiply by one hundred.

Improvements

Don't forget that the two parameters may need to be adjusted at different  times of the day and week to reflect the end-user workload. You can't just  derive one pair of figures, and use them for ever.

 

Happily, in Oracle 9, things have improved. You can now collect system  statistics, which are originally included just the four:

      • Average single block read time

      • Average multi block read time

      • Average actual multiblock read

      • Notional usable CPU speed.

 

Suffice it to say that this feature is worth an article in its own right —  but do note that the first three allow Oracle to discover the truth about the  cost of multi block reads. And in fact, the CPU speed allows Oracle to work out  the CPU cost of unsuitable access mechanisms like reading every single row in a  block to find a specific data value and behave accordingly.

 

When you migrate to version 9, one of the first things you should investigate  is the correct use of system statistics. This one feature alone may reduce the  amount of time you spend trying to "tune" awkward SQL.

 

In passing, despite the wonderful effect of system statistics both of the  optimizer adjusting parameters still apply — although the exact formula for  their use seems to have changed between version 8 and version 9.

Variations on a Theme

Of course, I have picked one very special case — equality on a single column  non-unique index, where thare are no nulls in the table — and treated it very  simply. (I haven't even mentioned the relevance of the index  blevel and clustering_factor  yet.) There are numerous different strategies that Oracle uses to work out more  general cases.

 

Consider some of the cases I have conveniently overlooked:

      • Multi-column indexes

      • Part-used multi-column indexes

      • Range scans

      • Unique indexes

      • Non-unique indexes representing unique constraints

      • Index skip scans

      • Index only queries

      • Bitmap indexes

      • Effects of nulls

The list goes on and on. There is no one simple formula that tells you how  Oracle works out a cost — there is only a general guideline that gives you the  flavour of the approach and a list of different formulae that apply in different  cases.

 

However, the purpose of this article was to make you aware of the general  approach and the two assumptions built into the optimiser's strategy. And I hope  that this may be enough to take you a long way down the path of understanding  the (apparently) strange things that the optimiser has been known to do.

Further Reading

Tim Gorman: www.evdbt.com. "The Search for  Intelligent Life in the Cost Based Optimiser."

Wolfgang Breitling: www.centrexcc.com. "Looking under  the hood of the CBO."

--

Jonathan  Lewis is a freelance consultant with more than 17 years  experience in Oracle. He specialises in physical database design and the  strategic use of the Oracle database engine, is author of Practical  Oracle 8i - Building Efficient Databases published by  Addison-Wesley, and is one of the best-known speakers on the UK Oracle circuit.  Further details of his published papers, tutorials, and  seminars can be  found at www.jlcomp.demon.co.uk,  which also hosts The Co-operative Oracle Users' FAQ for the  Oracle-related Usenet newsgroups.