10G Optimizer Statistics Gathering

Version 1

    10G Grid Control provides  an optimizer statistics administration panel that allows administrators  to gather, restore, delete, lock and unlock object statistics.  Before  we learn how to use 10G Grid Control R2’s object statistics  administrations panels, let's take a quick look at object statistics and  find out why they are so important to effective query optimization.


    Part 1 of this three part series will provide a high-level overview of  optimizer statistics. Part 2 will focus on maintenance windows that  control when automatic statistics gathering jobs run. In part 3, we’ll  learn how to use 10G Grid Control R2 to ensure that the optimizer has  the correct statistical information it needs to generate optimal access  paths.


    This  blog in not intended to be an all encompassing primer on the Oracle  optimizer. That goal  is far beyond the scope of a single blog. There are a few good books on  the subject.  If you want to learn how the optimizer works, I highly suggest that you  read Jonathan  Lewis's book titled Cost-Based Oracle Fundamentals. It is one of the most educational  and informative  books I have ever read on the cost-based optimizer.


    So little  is known    about the inner-workings of the cost based optimizer that is often  called the    Oracle database's "little black box." The optimizer's job is to  analyze    the statement being executed and determine the most efficient access  path to    the data for that statement. Craig Mullins provides an excellent  description    of a database optimizer in The DB2 Developer's Guide "the optimizer is equivalent to  an expert    system. An expert system is a standard set of rules when combined with  situational    data can return an expert opinion."


    The cost  based    optimizer uses statistics generated by the DBMS_STATS procedure as the  situational    data when creating its expert opinion on which access path to the data  is most    optimal. These statistics are stored in the data dictionary and  describe the data object's space characteristics, data uniqueness and  data distribution.


    The  cost-based    optimizer is only as good as the statistics it uses as input.  Statistics collections    should be run on a regular basis to ensure that the statistics are  current (representative    of the data being accessed). The optimizer is then able to create a  highly accurate    access path that is based on the least cost. If statistics are not  available,    the optimizer uses an algorithm to calculate the statistics, which  often    leads to "less than optimal" access paths. In other words, Oracle    guesses and it is usually not a very good guesser.


    You can  start your    education on the different access paths that are available to the  optimizer    by reading Oracle's Database Performance Tuning Guide that is provided  in each    Oracle     release's documentation. Before you buy third-party books  on any topic,    I highly suggest that you read Oracle's documentation first.


    CPU  and I/O    Performance Statistics

    Before we begin our discussion on object statistics, let's spend a few  minutes    discussing the system performance statistics that are used as input to  the optimizer.    Don't confuse system performance statistics with database performance  statistics    that are accessed using the V$ tables. We are discussing the system's  CPU and    I/O performance and utilization.

    In Oracle  10G, the optimizer takes CPU and I/O costs into consideration during  query optimization.    If the optimizer understands what the performance  characteristics are    for I/O and CPU, it will be more effective at query optimization.    Current system statistics allow the optimizer to make more intelligent  decisions    during the optimization process, which ultimately results in higher  quality    access paths being generated.


    Unlike  object statistics,    administrators must continue to gather system statistics manually in  Oracle    10G. This is done by executing the DBMS_STATS.GATHER_SYSTEM_STATS  procedure.    The key to gathering good system statistics is to run the procedure  when a normal    workload is being executed. Running it when the workload is light, or  too heavy,    can result in poor access paths being generated.


    If you  would like to learn more about how system statistics affect the Oracle  optimizer, please refer to Jonathan Lewis's excellent article titled  Understanding System Statistics on Oracle Technet.  Please  pay special attention to Lewis's information pertaining to each release  of the database.


    Object  Statistics

    As we learned previously, object statistics describe the contents of  the data and how it is stored in    the physcal database objects. Before we continue, it may be beneficial  for me    to provide you with a few examples of how these statistics could  affect query performance.


    If a  table only    contains a few hundred rows, queries may run faster if the optimizer  chooses    to read all of the blocks in the table as opposed to using an index.  The I/O    generated traversing the index blocks to get to the table row entries  would    be higher than if Oracle read just the blocks allocated to the table  being accessed.


    What if  we access    a table using a column with poor cardinality? Cardinality describes  the uniqueness    of a particular value stored in a column. Poor cardinality means that  the column    has many values that are the same. If our statement contains a WHERE  clause    that searches for a column value that is contained in 90% of the  table's rows,    it is best that Oracle, once again, read each and every row in that  table.


    Conversely  if a    WHERE clause searches for a column value that appears in 1% of the  table rows,    it would be beneficial for the optimizer to choose an index. The  optimizer also    uses object statistics to determine which join method should be used  and the    order in which the joined tables are accessed.


    Here are  just a    few of the object statistcs that the optimizer uses as input during  access path generation:

    • Number  of rows in the tables being accessed
    • Number  of blocks allocated to the table
    • Average  table row length
    • Number  of distinct values for a column
    • Number  of nulls in a column
    • Number  of leaf blocks allocated to indexes
    • Number  of leveles in the B-Tree
    • Clustering  factor - how the order of the index matches the order of the table rows

    As I  stated previously, the intent of this blog is to not be an all-inclusive primer on the  optimizer, but we should now understand why current object statistics are so  important    to the generation of efficient query access paths.


    Gathering  Object    Statistics

    Oracle provides the DBMS_STATS package to allow administrators to  gather, modify,    view, export, import, and delete statistics. The DBMS_STATS package  gathers    statistics on columns, tables, indexes and partitions and stores the  results    in Oracle's data dictionary. The optimizer reads the object statistics  stored    in the data dictionary durinq query optimization.


    After  DBMS_STATS is executed, current statements in the shared pool that  access the newly analyzed tables are invalidated. This    forces the statement to be re-parsed which will allow the optimizer to  generate    access paths based on the new statistics.


    There are  many    different options available in DBMS_STATS. Administrators are able to  select    which objects to analyze, how much of each object to analyze (large  tables may    take a loooonng time to analyze), select the number of parallel  processes that    will perform the analyze, etc. To learn more about all of the options  available,    please refer to Oracle's Database Performance Tuning Guide and PL/SQL  Packages    and Types reference manuals that are provided in each Oracle release's  documentation.


    10G  Automatic    Statistics Generation

    In releases prior to Oracle10G, administrators were required to  schedule DBMS_STATS    jobs on a regular basis to ensure that valid statistics were available  to the    optimizer. In releases prior to Oracle9i, it was necessary to "guess"  how much of the    data changed to determine if statistics collection was necessary.


    In  Oracle9i, the    GATHER AUTO option of the DBMS_STATS procedure could be used to help  determine    if statistics generation was required. If more than 10% of the rows  changed    in the table since the last analyze was performed, the DBMS_STATS  procedure    (with the GATHER_AUTO option activated) analyzed the table.


    By  default, Oracle10G    automates these tasks by evaluating the statistics for all of the  tables in    the database and running analyze when required. Oracle10G's default  maintenance    window is nightly from 10 PM to 6 AM and all day on weekends. During  these time    periods, statistics are automatically collected using the  GATHER_STATS_JOB procedure. The maintenance window time-periods can be adjusted to tailor them to  each individual    application's business processing requirements.

    In Oracle  Database    10g, Automatic Optimizer Statistics Collection reduces the likelihood  of poorly    performing SQL statements due to stale or invalid statistics and  enhances SQL    execution performance by providing optimal input to the query  optimizer.


    To ensure  that    10G is automatically gathering statistics for data objects, you'll  need to verify that: The  statistics job is running by executing the following SQL statement:



      • The  modification      monitoring feature that identifies stale statistics is enabled. This  feature      is enabled when the STATISTICS_LEVEL parameter is set to TYPICAL  (default)      or ALL.

      Gathering  Statistics    Manually in Oracle10G

      Now that I have described how wonderful 10G's automatic statistics  gathering    feature is, why would you ever want to schedule a statistics job  manually? Let    me describe one real-life example from a recent project here at Giant  Eagle.

      We  previously learned    that automatic object statistics gathering runs during the database's  maintenance    window. We have a warehouse system that contains an extremely volatile  table.    Hundreds of thousands of rows are added and removed around the clock.  The table    is loaded and literally hundreds of SQL statements are run to allow  our marketing    personnel to make critical business decisions.


      The level  of activity    is based on the particular business process taking place. At one  particular    time, the table can contain hundreds of thousands of rows and at other  times    it can contain dozens. OK, so when do you generate statistics? If you  run DBMS_STATS    at the same time each day, in most cases you would think you would get  a predictable    set of statistics generated. Not so in this table's case, sometimes  you get    hundreds of rows and some times you get hundreds of thousands.

      If we are  unlucky    and generate statistics on the table when it has hundreds of rows,  access paths    suffer when the table grows to hundreds of thousands. To prevent poor  access paths from being generated (a big problem in a data warehouse),  we run the DBMS_STATS utility immediately after the table is loaded. The  optimizer has current statistics    to use as input and generates optimal access paths for our critical  business    queries.


      10G  Grid Control R2 Manage    Optimizer Statistics Panel

      A new link titled 'Manage Optimizer' statistics has been added to the  database    administration home page in Enterprise Manager. Clicking on the link  displays    the new Manage Optimizer Statistics home page. The new home page  provides wizards    and templates that facilitate and automate optimizer statistics  management.    In 10GR2 administrators are able to use Enterprise Manager to gather,  restore,    delete, lock and unlock statistics. We'll learn how to use this panel  in my    next blog.


      Thanks  for reading,
      Chris Foot