Unicode is a universal encoding scheme for written characters and text that enables the exchange of data internationally. Follow this topic to learn about how to specify DDS position 30 through 37 and position 45 through 80 for describing database files. Positions not mentioned have no special considerations for Unicode.
A Unicode field can contain all types of characters used on an IBM® iSeries™ server, including double-byte character set (DBCS) characters. Unicode data is composed of code units, which represent the minimal byte combination that can represent a unit of text.
There are three transformation formats (encoding forms) of Unicode that are supported with physical and logical file DDS:
A UTF-8 code unit is 1 byte in length. A UTF-8 character can be 1, 2, 3, or 4 code units in length. A UTF-8 data string can contain any character, including surrogates and combining characters.
A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character, including UTF-16 surrogates and combining characters.
UCS-2 is a subset of UTF-16, and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16, except that UTF-16 also supports combining characters and surrogates. If you do not need support for combining characters and surrogates, then you can choose to use the UCS-2 type, because there is more database functionality available for it.