Automatic statistics collection

When the Statistics Manager prepares its responses to the Optimizer, it keeps track of the responses that are generated by using default filter factors (because column statistics or indexes were not available). It uses this information during the time that the access plan is being written to the Plan Cache to automatically generate a statistic collection request for the columns. If system resources allow it, the Statistics Manager generates statistics collections in real time for direct use by the current query, avoiding a default answer to the Optimizer.

Otherwise, as system resources become available, the requested column statistics will be collected in the background. That way, the next time that the query is executed, the missing column statistics will be available to the Statistics Manager, thus allowing it to provide more accurate information to the Optimizer at that time. More statistics make it easier for the Optimizer to generate a good performing access plan.

If a query is canceled before or during execution, the requests for column statistics are still processed, as long as the execution reaches the point where the generated access plan is written to the Plan Cache.

To minimize the number of passes through a table during statistics collection, the Statistics Manger groups multiple requests for the same table together. For example, two queries are executed against table T1. The first query has selection criteria on column C1 and the second over column C2. If no statistics are available for the table, the Statistics Manager identifies both of these columns as good candidates for column statistics. When the Statistics Manager reviews requests, it looks for multiple requests for the same table and groups them together into one request. This allows both column statistics to be created with only one pass through table T1.

One thing to note is that column statistics normally are automatically created when the Statistics Manager must answer questions from the optimizer using default filter factors. However, when an index is available that might be used to generate the answer, then column statistics are not automatically generated. There may be cases where optimization time would benefit from column statistics in this scenario because using column statistics to answer questions from the optimizer is generally more efficient than using the index data. So if you have cases where the query performance seems extended, you might want to verify that there is are indexes over the relevant columns in your query. If this is the case, try manually generating columns statistics for these columns.

As stated before, statistics collection occurs as system resources become available. If you have schedule a low priority job that is permanently active on your system and that is supposed to use all spare CPU cycles for processing, your statistics collection will never become active.