Join optimization with DB2 Multisystem

The distributed query optimizer generates a plan to join distributed files.

The distributed query optimizer looks at file sizes, expected number of records selected for each file, and the type of distributed joins that are possible; and then the optimizer breaks the query into multiple steps. Each step creates an intermediate result file that is used as input for the next step.

During optimization, a cost is calculated for each join step based on the type of distributed join. The cost reflects, in part, the amount of data movement required for that join step. The cost is used to determine the final distributed plan.

As much processing as possible is completed during each step; for example, record selection isolated to a given step is performed during that step, and as many files as possible are joined for each step. Each join step might involve more than one type of distributed join. A collocated join and a directed join can be combined into one collocated join by directing the necessary file first. A directed join and a re-partitioned join can be combined by directing all the files first and then performing the join. Note that directed and re-partitioned joins are really just a collocated join, with one or more files being directed before the join occurs.

When joining distributed files with local files, the distributed query optimizer calculates a cost, similar to the cost calculated when joining distributed files. Based on this cost, the distributed query optimizer can choose to perform one of the following actions:

Broadcast all of the local files to the data nodes of the distributed file and perform a collocated join.
Broadcast all of the local and distributed files to the data nodes of the largest distributed file and perform a collocated join.
Direct the distributed files back to the coordinator node and perform the join there.

Join optimization with DB2® Multisystem

Join optimization with DB2^® Multisystem