DB2 Universal Databaseā¢ for iSeriesā¢ provides
a nested loop join method. For this method, the processing of the tables
in the join are ordered. This order is called the join order. The first
table in the final join order is called the primary table. The other
tables are called secondary tables. Each join table position is called
a dial.
The nested loop will be implemented either using an index on secondary
tables, a hash table, or a table scan (arrival sequence) on the secondary
tables. In general, the join will be implemented using either an index or
a hash table.
Index nested loop join implementation
During the
join, DB2 Universal Database for iSeries:
- Accesses the first primary table row selected by the predicates
local to the primary table.
- Builds a key value from the join columns in the primary table.
- Depending on the access to the first secondary table:
- If using an index to access the secondary table, Radix Index Probe is
used to locate the first row that satisfies the join condition for the first
secondary table by using an index with keys matching the join condition or
local row selection columns of the secondary table.
- Applies bitmap selection, if applicable.
All rows that satisfy the
join condition from each secondary dial are located using an index. Rows are
retrieved from secondary tables in random sequence. This random disk I/O time
often accounts for a large percentage of the processing time of the query.
Since a given secondary dial is searched once for each row selected from the
primary and the preceding secondary dials that satisfy the join condition
for each of the preceding secondary dials, a large number of searches may
be performed against the later dials. Any inefficiencies in the processing
of the later dials can significantly inflate the query processing time. This
is the reason why attention to performance considerations for join queries
can reduce the run-time of a join query from hours to minutes.
If an
efficient index cannot be found, a temporary index may be created. Some join
queries build temporary indexes over secondary dials even when an index exists
for all of the join keys. Because efficiency is very important for secondary
dials of longer running queries, the query optimizer may choose to build a
temporary index which contains only entries which pass the local row selection
for that dial. This preprocessing of row selection allows the database manager
to process row selection in one pass instead of each time rows are matched
for a dial.
- If using a Hash Table Probe to access the secondary table, a hash temporary
result table is created that contains all of the rows selected by local selection
against the table on the first probe. The structure of the hash table is such
that rows with the same join value are loaded into the same hash table partition
(clustered). The location of the rows for any given join value can be found
by applying a hashing function to the join value.
A nested loop join using
a Hash Table Probe has several advantages over a nested loop join using an
Index Probe:
- The structure of a hash temporary result table is simpler than that of
an index, so less CPU processing is required to build and probe a hash table.
- The rows in the hash result table contain all of the data required by
the query so there is no need to access the dataspace of the table with random
I/O when probing the hash table.
- Like join values are clustered, so all matching rows for a given join
value can typically be accessed with a single I/O request.
- The hash temporary result table can be built using SMP parallelism.
- Unlike indexes, entries in hash tables are not updated to reflect changes
of column values in the underlying table. The existence of a hash table does
not affect the processing cost of other updating jobs in the server.
- If using a Sorted List Probe to access the secondary table, a sorted list
result is created that contains all of the rows selected by local selection
against the table on the first probe. The structure of the sorted list table
is such that rows with the same join value are sorted together in the list.
The location of the rows for any given join value can be found by probing
using the join value.
- If using a table scan to access the secondary table, scan the secondary
to locate the first row that satisfies the join condition for the first secondary
table using the table scan to match the join condition or local row selection
columns of the secondary table. The join may be implemented with a table scan
when the secondary table is a user-defined table function.
- Determines if the row is selected by applying any remaining
selection local to the first secondary dial.
If the secondary dial row
is not selected then the next row that satisfies the join condition is located.
Steps 1 through 4 are repeated until a row that satisfies both the join condition
and any remaining selection is selected from all secondary tables
- Returns the result join row.
- Processes the last secondary table again to find the next row that satisfies
the join condition in that dial.
During this processing, when no more
rows that satisfy the join condition can be selected, the processing backs
up to the logical previous dial and attempts to read the next row that satisfies
its join condition.
- Ends processing when all selected rows from the primary table are processed.
Note the following characteristics of a nested loop join:
- If ordering or grouping is specified and all the columns are over a single
table and that table is eligible to be the primary, then the optimizer costs
the join with that table as the primary and performing the grouping and ordering
with an index.
- If ordering and grouping is specified on two or more tables or if temporaries
are allowed, DB2 Universal Database for iSeries breaks
the processing of the query into two parts:
- Perform the join selection omitting the ordering or grouping processing
and write the result rows to a temporary work table. This allows the optimizer
to consider any table of the join query as a candidate for the primary table.
- The ordering or grouping processing is then performed on the data in the
temporary work table.
Queries that cannot use hash join
Hash join cannot
be used for queries that:
- Hash join cannot be used for queries involving physical files or tables
that have read triggers.
- Require that the cursor position be restored as the result of the SQL
ROLLBACK HOLD statement or the ROLLBACK CL command. For SQL applications using
commitment control level other than *NONE, this requires that *ALLREAD be
specified as the value for the ALWBLK precompiler parameter.
- Hash join cannot be used for a table in a join query where the join condition
something other than an equals operator.
- CQE does not support hash join if the query contains any of the following:
- Subqueries unless all subqueries in the query can be transformed to inner
joins.
- UNION or UNION ALL
- Perform left outer or exception join.
- Use a DDS created join logical file.