by Jonathan Lewis
The question in the title of this piece is probably the single most frequently occurring question that appears in the Metalink forums and Usenet newsgroups. This article uses a test case that you can rebuild on your own systems to demonstrate the most fundamental issues with how cost-based optimisation works. And at the end of the article, you should be much better equipped to give an answer the next time you hear that dreaded question.
Because of the wide variety of options that are available when installing Oracle, it isn't usually safe to predict exactly what will happen when someone runs a script that you have dictated to them. But I'm going to risk it, in the hope that your database is a fairly vanilla installation, with the default values for the mostly commonly tweaked parameters. The example has been built and tested on an 8.1.7 database with the db_block_size set to the commonly used value of 8K and the db_file_multiblock_read_count set to the equally commonly used value 8. The results may be a little different under Oracle 9.2
Run the script from Figure 1, which creates a couple of tables, then indexes and analyses them.
create table t1 as
Figure 1: The test data sets.
Once you have got this data in place, you might want to convince yourself that the two sets of data are identical — in particular, that the N1 columns in both data sets have values ranging from 0 to 199, with 15 occurrences of each value. You might try the following check:
select n1, count(*)
group by n1;
and the matching query against T2 to prove the point.
If you then execute the queries:
select * from t1 where n1 = 45;
select * from t2 where n1 = 45;
You will find that each query returns 15 rows. However if you
set autotrace trace only explain
you will discover that the two queries have different execution paths.
The query against table T1 uses the index, but the query against table T2 does a full tablescan.
So you have two sets of identical data, with dramatically different access paths for the same query.
What Happened to the Index?
Note: if you've ever come across any of those "magic number" guidelines regarding the use of indexes, e.g., "Oracle will use an index for less than 23 percent, 10 percent, 2 percent (pick number at random) of the data," then you may at this stage begin to doubt their validity. In this example, Oracle has used a tablescan for 15 rows out of 3,000, i.e., for just one half of one percent of the data!
To investigate problems like this, there is one very simple ploy that I always try as the first step: Put in some hints to make Oracle do what I think it ought to be doing, and see if that gives me any clues.
In this case, a simple hint:
/*+ index(t2, t2_i1) */
is sufficient to switch Oracle from the full tablescan to the indexed access path. The three paths with costs (abbreviated to C=nnn) are shown in Figure 2:
select * from t1 where n1 = 45;
Figure 2: The different queries and their costs.
So why hasn't Oracle used the index by default in for the T2 query? Easy — as the execution plan shows, the cost of doing the tablescan is cheaper than the cost of using the index.
Why is the Tablescan Cheaper?
This, of course, is simply begging the question. Why is the cost of the tablescan cheaper than the cost of using the index?
By looking into this question, you uncover the key mechanisms (and critically erroneous assumptions) of the Cost Based Optimiser.
Let's start by examining the indexes by running the query:
The results are given in the table below:
Data block / key
Leaf block / key
Note particularly the value for "data blocks per key." This is the number of different blocks in the table that Oracle thinks it will have to visit if you execute a query that contains an equality test on a complete key value for this index.
So where do the costs for our queries come from? As far as Oracle is concerned, if we fire in the key value 45, we get the data from table T1 by hitting one index leaf block and one table block — two blocks, so a cost of two.
If we try the same with table T2, we have to hit one index leaf block and 15 table blocks — a total of 16 blocks, so a cost of 16.
Clearly, according to this viewpoint, the index on table T1 is much more desirable than the index on table T2. This leaves two questions outstanding, though:
Where does the tablescan cost come from, and why are the figures for the avg_data_blocks_per_key so different between the two tables?
The answer to the second question is simple. Look back at the definition of table T1 — it uses the trunc() function to generate the N1 values, dividing the "rownum - 1 "by 15 and truncating.
Trunc(675/15) = 45
Trunc(676/15) = 45
Trunc(689/15) = 45
All the rows with the value 45 do actually appear one after the other in a tight little clump (probably all fitting one data block) in the table.
Table T2 uses the mod() function to generate the N1 values, using modulus 200 on the rownum:
mod(45,200) = 45
mod(245,200) = 45
mod(2845,200) = 45
The rows with the value 45 appear every two hundredth position in the table (probably resulting in no more than one row in every relevant block).
By doing the analyze, Oracle was able to get a perfect description of the data scatter in our table. So the optimiser was able to work out exactly how many blocks Oracle would have to visit to answer our query — and, in simple cases, the number of block visits is the cost of the query.
But Why the Tablescan?
So we see that an indexed access into T2 is more expensive than the same path into T1, but why has Oracle switched to the tablescan?
This brings us to the two simple-minded, and rather inappropriate, assumptions that Oracle makes.
The first is that every block acquisition equates to a physical disk read, and the second is that a multiblock read is just as quick as a single block read.
So what impact do these assumptions have on our experiment?
If you query the user_tables view with the following SQL:
you will find that our two tables each cover 96 blocks.
At the start of the article, I pointed out that the test case was running a version 8 system with the value 8 for the db_file_multiblock_read_count.
Roughly speaking, Oracle has decided that it can read the entire 96 block table in 96/8 = 12 disk read requests.
Since it takes 16 block (= disk read) requests to access the table by index, it is clearer quicker (from Oracle's sadly deluded perspective) to scan the table — after all 12 is less than 16.
Voila! If the data you are targetting is suitably scattered across the table, you get tablescans even for a very small percentage of the data — a problem that can be exaggerated in the case of very big blocks and very small rows.
In fact, you will have noticed that my calculated number of scan reads was 12, whilst the cost reported in the execution plan was 15. It is a slight simplfication to say that the cost of a tablescan (or an index fast full scan for that matter) is
'number of blocks' /
Oracle uses an "adjusted" multi-block read value for the calculation (although it then tries to use the actual requested size when the scan starts to run).
For reference, the following table compares a few of the actual and adjusted values:
As you can see, Oracle makes some attempt to protect you from the error of supplying an unfeasibly large value for this parameter.
There is a minor change in version 9, by the way, where the tablescan cost is further adjusted by adding one to result of the division — which means tablescans in V9 are generally just a little more expensive than in V8, so indexes are just a little more likely to be used.
We have seen that there are two assumptions built into the optimizer that are not very sensible.
A single block read costs just as much as a multi-block read — (not really likely, particularly when running on file systems without direction)
A block access will be a physical disk read — (so what is the buffer cache for?)
Since the early days of Oracle 8.1, there have been a couple of parameters that allow us to correct these assumption in a reasonably truthful way.
See Tim Gorman's article for a proper description of these parameters, but briefly:
Optimizer_index_cost_adj takes a value between 1 and 10000 with a default of 100. Effectively, this parameter describes how cheap a single block read is compared to a multiblock read. For example the value 30 (which is often a suitable first guess for an OLTP system) would tell Oracle that a single block read costs 30% of a multiblock read. Oracle would therefore incline towards using indexed access paths for low values of this parameter.
Optimizer_index_caching takes a value between 0 and 100 with a default of 0. This tells Oracle to assume that that percentage of index blocks will be found in the buffer cache. In this case, setting values close to 100 encourages the use of indexes over tablescans.
The really nice thing about both these parameters is that they can be set to "truthful" values.
Set the optimizer_index_caching to something in the region of the "buffer cache hit ratio." (You have to make your own choice about whether this should be the figure derived from the default pool, keep pool, or both).
The optimizer_index_cost_adj is a little more complicated. Check the typical wait times in v$system_event for the events "db file scattered read" (multi block reads) and "db file sequential reads" (single block reads). Divide the latter by the former and multiply by one hundred.
Don't forget that the two parameters may need to be adjusted at different times of the day and week to reflect the end-user workload. You can't just derive one pair of figures, and use them for ever.
Happily, in Oracle 9, things have improved. You can now collect system statistics, which are originally included just the four:
Average single block read time
Average multi block read time
Average actual multiblock read
Notional usable CPU speed.
Suffice it to say that this feature is worth an article in its own right — but do note that the first three allow Oracle to discover the truth about the cost of multi block reads. And in fact, the CPU speed allows Oracle to work out the CPU cost of unsuitable access mechanisms like reading every single row in a block to find a specific data value and behave accordingly.
When you migrate to version 9, one of the first things you should investigate is the correct use of system statistics. This one feature alone may reduce the amount of time you spend trying to "tune" awkward SQL.
In passing, despite the wonderful effect of system statistics both of the optimizer adjusting parameters still apply — although the exact formula for their use seems to have changed between version 8 and version 9.
Variations on a Theme
Of course, I have picked one very special case — equality on a single column non-unique index, where thare are no nulls in the table — and treated it very simply. (I haven't even mentioned the relevance of the index blevel and clustering_factor yet.) There are numerous different strategies that Oracle uses to work out more general cases.
Consider some of the cases I have conveniently overlooked:
Part-used multi-column indexes
Non-unique indexes representing unique constraints
Index skip scans
Index only queries
Effects of nulls
The list goes on and on. There is no one simple formula that tells you how Oracle works out a cost — there is only a general guideline that gives you the flavour of the approach and a list of different formulae that apply in different cases.
However, the purpose of this article was to make you aware of the general approach and the two assumptions built into the optimiser's strategy. And I hope that this may be enough to take you a long way down the path of understanding the (apparently) strange things that the optimiser has been known to do.
Tim Gorman: www.evdbt.com. "The Search for Intelligent Life in the Cost Based Optimiser."
Wolfgang Breitling: www.centrexcc.com. "Looking under the hood of the CBO."
Jonathan Lewis is a freelance consultant with more than 17 years experience in Oracle. He specialises in physical database design and the strategic use of the Oracle database engine, is author of Practical Oracle 8i - Building Efficient Databases published by Addison-Wesley, and is one of the best-known speakers on the UK Oracle circuit. Further details of his published papers, tutorials, and seminars can be found at www.jlcomp.demon.co.uk, which also hosts The Co-operative Oracle Users' FAQ for the Oracle-related Usenet newsgroups.