Manually collecting and refreshing statistics

You can manually collect and refresh statistics through iSeries™ Navigator or by using Statistics APIs.

To collect statistics using iSeries Navigator, right-click a table or alias and select Statistic Data. On the Statistic Data dialog, click New. Then select the columns that you want to collect statistics for. Once you have selected the columns, you can collect the statistics immediately or collect them in the background.

To refresh a statistic using iSeries Navigator, right-click a table or alias and select Statistic Data. Click Update. Select the statistic that you want to refresh. You can collect the statistics immediately or collect them in the background.

There are several scenarios in which the manual management (create, remove, refresh, and so on) of column statistics may be beneficial and recommended.

High Availability (HA) solutions

When considering the design of high availability solutions where data is replicated to a secondary system by using journal entries, it is important to know that column statistics information is not journaled. That means that, on your backup system, no column statistics are available when you first start using that system. To prevent the "warm up" effect that this may cause, you may want to propagate the column statistics were gathered on your production system and recreate them on your backup system manually.

ISV (Independent Solution Provider) preparation

An ISV may want to deliver a solution to a customer that already includes column statistics frequently used in the application instead of waiting for the automatic statistics collection to create them. A way to accomplish this is to run the application on the development system for some time and examine which column statistics were created automatically. You can then generate a script file to be shipped as part of the application that should be executed on the customer system after the initial data load took place.

Business Intelligence environments

In a large Business Intelligence environment, it is quite common for large data load and update operations to occur overnight. As column statistics are marked as stale only when they are touched by the Statistics Manager, and then refreshed after first touch, you may want to consider refreshing them manually after loading the data.

You can do this easily by toggling the system value QDBFSTCCOL to *NONE and then back to *ALL. This causes all stale column statistics to be refreshed and starts collection of any column statistics previously requested by the system but not yet available. Since this process relies on the access plans stored in the Plan Cache, avoid performing a system initial program load (IPL) before toggling QDBFSTCCOL since an IPL clears the Plan Cache.

You should be aware that this procedure works only if you do not delete (drop) the tables and recreate them in the process of loading your data. When deleting a table, access plans in the Plan Cache that refer to this table are deleted. Information about column statistics on that table is also lost. The process in this environment is either to add data to your tables or to clear the tables instead of deleting them.

Massive data updates

Updating rows in a column statistics-enabled table that significantly change the cardinality, add new ranges of values, or change the distribution of data values can affect the performance for queries when they are first run against the new data. This may happen because, on the first run of such a query, the optimizer uses stale column statistics to make decisions on the access plan. At that point, it starts a request to refresh the column statistics.

If you know that you are doing this kind of update to your data, you may want to toggle the system value QDBFSTCCOL to *NONE and back to *ALL or *SYSTEM. This causes an analysis of the Plan Cache. The analysis includes searching for column statistics that were used in the generation of an access plan, analyzing them for staleness, and requesting updates for the stale statistics.

If you massively update or load data and run queries against these tables at the same time, then the automatic collection of column statistics tries to refresh every time 15% of the data is changed. This can be redundant processing since you are still in the process of updating or loading the data. In this case, you may want to block automatic statistics collection for the tables in question and deblock it again after the data update or load finishes. An alternative is to turn off automatic statistics collection for the whole system before updating or loading the data and switching it back on after the updating or loading has finished.

Backup and recovery

When thinking about backup and recovery strategies, keep in mind that creation of column statistics is not journaled. Column statistics that exist at the time a save operation occurs are saved as part of the table and restored with the table. Any column statistics created after the save took place are lost and cannot be recreated by using techniques such as applying journal entries. If you have a rather long interval between save operations and rely heavily on journaling for restoring your environment to a current state, consider keeping track of column statistics that are generated after the latest save operation.