ICU sort sequence

When an ICU (International Components for Unicode) sort sequence table is used, the database uses the system's ICU support (Option 39) to determine the weight of the data according to language-specific rules specified by to the locale of the table.

An ICU sort sequence table named en_us (United States locale) can sort data differently than another ICU table named fr_FR (French locale) for example.

The system's ICU support properly handles data that is not normalized, producing the same results as if the data were normalized. The system's ICU sort sequence table can sort all character, graphic, and unicode (UTF-8, UTF-16 and UCS-2) data.

For example, a UTF-8 character column named NAME contains the following three names (the hex values of the column are given as well) :

NAME HEX ( NAME )
Gómez 47C3B36D657A
Gomer 476F6D6572
Gumby 47756D6279

A *HEX sort sequence will order the NAME values as follows:

NAME
Gomer
Gumby
Gómez

An ICU sort sequence table named en_us will correctly order the NAME values.

NAME
Gomer
Gómez
Gumby

When an ICU sort sequence table is specified, the performance of SQL statements that use the table can be much slower than using a non-ICU sort sequence table or *HEX sort sequence. The slower performance results from calling the system's ICU support to get the weighted value for each piece of data that needs to be sorted. An ICU sort sequence table can provide more sorting function but at the cost of slower running SQL statements. However, indexes created with an ICU sort sequence table can be created over columns to help reduce the need of calling the system's ICU support. In this case the index key would already contain the ICU weighted value so there is no need to call the system's ICU support.

Related information
International Components for Unicode