This topic describes Unicode considerations for positional entries and keyword entries for printer files. It also describes the CCSID keyword for Unicode data in printer files.
Unicode is a universal encoding scheme for written characters and text that enables the exchange of data internationally. A Unicode field can contain all types of characters used on an iSeries™ server, including ideographic (DBCS) characters. The term code unit is used in this topic to mean the minimal bit combination that can represent a unit of encoded text for processing or interchange.
DDS printer files support two transformation formats (encoding forms) of Unicode:
A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character, including UTF-16 surrogates and combining characters.
UCS-2 is a subset of UTF-16 and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16 except that UTF-16 also supports combining characters and surrogates. If you do not need combining characters and surrogates, you might choose to use UCS-2.