As the query optimizer compares the various possible access choices,
it must assign a numeric cost value to each candidate and use that value to
determine the implementation which consumes the least amount of processing
time. This costing value is a combination of CPU and I/O time
In step 3 and in step 5 in Join order optimization,
the query optimizer has to estimate a cost and choose an access method for
a given dial combination. The choices made are similar to those for row selection
except that a plan using a probe must be chosen.
The costing value is based on the following assumptions:
- Table pages and index pages must be retrieved from auxiliary storage.
For example, the query optimizer is not aware that an entire table may be
loaded into active memory as the result of a Set Object
Access (SETOBJACC) CL command. Usage of this command may significantly
improve the performance of a query, but the query optimizer does not change
the query implementation to take advantage of the memory resident state of
the table.
- The query is the only process running on the server. No allowance is given
for server CPU utilization or I/O waits which occur because of other processes
using the same resources. CPU related costs are scaled to the relative processing
speed of the server running the query.
- The values in a column are uniformly distributed across the table. For
example, if 10% of the rows in a table have the same value, then it is assumed
that every tenth row in the table contains that value.
- The values in a column are independent from the values in any other columns
in a row, unless there is an index available whose key definition is (A,B).
Multi key field indexes allows the optimizer to detect when the values between
columns are correlated. For example, if a column named A has
a value of 1 in 50% of the rows in a table and a column named B has
a value of 2 in 50% of the rows, then it is expected that
a query which selects rows where A = 1, and B = 2 selects
25% of the rows in the table.
The main factors of the join cost calculations for secondary dials
are the number of rows selected in all previous dials and the number of rows
which match, on average, each of the rows selected from previous dials. Both
of these factors can be derived by estimating the number of matching rows
for a given dial.
When the join operator is something other than equal, the expected
number of matching rows is based on the following default filter factors:
- 33% for less-than, greater-than, less-than-equal-to, or greater-than-equal-to
- 90% for not equal
- 25% for BETWEEN range (OPNQRYF %RANGE)
- 10% for each IN list value (OPNQRYF %VALUES)
For example, when the join operator is less-than, the expected
number of matching rows is .33 * (number of rows in the dial). If no join
specifications are active for the current dial, the cartesian product is assumed
to be the operator. For cartesian products, the number of matching rows is
every row in the dial, unless local row selection can be applied to the index.
When the join operator is equal, the expected number
of rows is the average number of duplicate rows for a given value.