Monday, June 16, 2008

DB2 LUW Hints - Optimizer selection

Many Oracle DBA's, myself included, are familiar and may even use optimizer hints quite frequently. In a DB2 LUW environment, it is recommended you try other approaches to influence the optimizer before implementing hints. In DB2 LUW terminology, a hint is an optimizer profile.

DB2 relies on a cost-based optimizer to choose the optimal access path. There are many aspects and factors that influence or affect the optimizer decision. System configuration, such as buffer pool sizes, sortheaps, cpu parallelism, i/o characteristics - as well as, indexes and constraints to name just a few. Catalog statistics play a major role in influencing the optimizer, so it's no surprise that the lack of statistics can be detrimental.

So here are a few things to examine before going down the path of hints or optimizer profiles.

I would examine both the type and level of runstats being executed and review the registry variables that may affect optimization.

There are several flavors of statistics that DB2 uses in optimization selection. There are:
basic statistics : number of rows, pages, and active blocks in tables, number of data values and length of data values for a column, data range information.
non-uniform statistics : most frequently used value, quantiles ; used in equality predicates, used in range predicates.
column group statistics : number of distinct values in a group of columns ; important for correlation.
detailed index statistics: clustering details ; used to estimate data fetch pages and model I/O vs buffer sizes.
user-defined statistics: UDF statistics to specify I/O and CPU costs.

These statistics are stored in the system catalog and can be updated if necessary using the db2look utility. One reason you may want to update statistics is to mirror or replicate production statistics in a non-production environment. The db2look utility can easily do this using the -m (mirror) option to extract statistics. This article covers the use of that utility in more detail :

http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0508kapoor/


So, statistics and the type or level of statistics play a major role in optimization. Thus, the lack of statistics can often be detrimental to an application.

One reason DB2 sometimes does not choose the most optimal path is simply we have not supplied DB2 with enough information, or level of statistics, for DB2 to make the appropriate optimization selection.

There are some newer options available on the runstats utility to improve performance by providing more specific information. Let's take a look at one I've found to be helpful.

Column group statistics. Column group statistics provide information used to detect correlation.
Without this information DB2 may underestimate the join cardinality. Column group statistics can be particularly useful in equi-join predicates. For example:

where tablea.col1 = tableb.col1 and tablea.col2 = tableb.col2

An example runstat might be:

Runstats on usera.tablea
on all columns <--- gathers basic column information

and columns (( col1, col2) <-- gathers column group information.

Column group information is stored in SYSCOLGROUPS and SYSCOLGROUPSCOLS.

Now let's take a look at some of the registry variables that can affect optimization. You can find detailed information on these registry variables at the information center.

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0005664.htm

The query optimization configuration parameter (DFT_QUERYOPT), as well as, several registry variables affect the access plan chosen. DB2_REDUCED_OPTIMIZATION=UNIQUEINDEX is a registry variable that will be default starting in V9.5. It instructs the optimizer to use the unique index that is fully qualified to access a table.


So, you feel you have exhausted all other options and are wondering how to implement a hint.

1) Set the registry variable DB2_OPTPROFILE=YES (this is needed in V8.2 and is default in V9)
2) Create the SYSTOOLS.OPT_PROFILE table to house the profile
3) Import the profile into the table. For example:
IMPORT FROM file1.del OF DEL MODIFIED BY LOBSINFILE INSERT INTO SYSTOOLS.OPT_PROFILE;
4) Test your troubled SQL and ensure it is know taking the access path you would like it to traverse. .PROF1 is the name of the xml you imported from file1.del above.

db2 connect to ;
db2 set current optimization profile=".PROF1"
db2 set current explain mode explain;
db2 -tvf query.txt

There are a couple good places to get detailed information about optimization profiles and statistical views. If you need more detail I'd suggest the following:

http://www.ibm.com/developerworks/db2/library/techarticle/dm-0612chen/index.html

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/t0024532.htm

Optimization profiles can be used in 8.2 and above. If you are still using DB2 LUW 8.1 where the optimization profile is not available, you may want to investigate the use of the "SELECTIVITY" clause. You can find more information about using selectivity clauses on SQL statements at the following developer works article:

http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/index.html