Specify the length of the field in these positions. The length
of a field containing UTF-16 data can range from 1 through 16 383 code
units. The length of a field containing UTF-8 data can range from 1 through
32 766 code units.
When determining the program length of a field
containing Unicode data, consider the following rules:
Each UTF-16 code unit is 2 bytes long.
The length of the field is specified in the number of UTF-16 code units.
For example, a field containing 3 UTF-16 code units has 6 bytes of data.
Each UTF-8 code unit is 1 byte long. A UTF-8 character can be 1, 2, 3,
or 4 code units in length.
After converting between Unicode data and EBCDIC, the resulting data should
be equal to, longer, or shorter than the original length of the data before
the conversion. For example, 1 UTF-16 code unit is composed of 2 bytes of
data. That character might convert to 1 single-byte character set (SBCS) character
composed of 1 byte of data, 1 1 graphic double-byte character set (DBCS) character
composed of 2 bytes of data, or 1 bracketed DBCS character composed of 4 bytes
of data. It is, therefore, recommended that, when converting a Unicode field
(in the physical file) to a field with a different type in the logical file,
the field in the logical file be defined with the VARLEN keyword. The length
of the logical file field should be defined large enough to hold the maximum
size that the Unicode field can be converted to. This will account for the
expansion that can occur.
On a logical file, if the length is not specified, and a UTF-16
to EBCDIC conversion will be taking place, the length of the corresponding
physical file field will be taken, except in the following case:
If the physical file field is UTF-16 capable and the logical file field
has a data type of O, then the length of the logical file field will be 2
times the field size of the physical file field.