76 lines
4.9 KiB
HTML
76 lines
4.9 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html lang="en-us" xml:lang="en-us">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<meta name="security" content="public" />
|
|
<meta name="Robots" content="index,follow" />
|
|
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
|
|
<meta name="DC.Type" content="concept" />
|
|
<meta name="DC.Title" content="UCS-2 and its relationship to Unicode" />
|
|
<meta name="abstract" content="Because the UCS-2 standard is limited to 65 535 characters, and the data processing industry needs over 94 000 characters, the UCS-2 standard is in the process of being superseded by the Unicode UTF-16 standard." />
|
|
<meta name="description" content="Because the UCS-2 standard is limited to 65 535 characters, and the data processing industry needs over 94 000 characters, the UCS-2 standard is in the process of being superseded by the Unicode UTF-16 standard." />
|
|
<meta name="DC.Relation" scheme="URI" content="rbagsunicodeucs2.htm" />
|
|
<meta name="DC.Relation" scheme="URI" content="rbagsutf16.htm" />
|
|
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
|
|
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
|
|
<meta name="DC.Format" content="XHTML" />
|
|
<meta name="DC.Identifier" content="rbagsucs2" />
|
|
<meta name="DC.Language" content="en-us" />
|
|
<!-- All rights reserved. Licensed Materials Property of IBM -->
|
|
<!-- US Government Users Restricted Rights -->
|
|
<!-- Use, duplication or disclosure restricted by -->
|
|
<!-- GSA ADP Schedule Contract with IBM Corp. -->
|
|
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
|
|
<link rel="stylesheet" type="text/css" href="./ic.css" />
|
|
<title>UCS-2 and its relationship to Unicode</title>
|
|
</head>
|
|
<body id="rbagsucs2"><a name="rbagsucs2"><!-- --></a>
|
|
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
|
|
<h1 class="topictitle1">UCS-2 and its relationship to Unicode</h1>
|
|
<div><p>Because the UCS-2 standard is limited to 65 535 characters,
|
|
and the data processing industry needs over 94 000 characters, the UCS-2 standard
|
|
is in the process of being superseded by the Unicode UTF-16 standard.</p>
|
|
<p>Because UTF-16 is a superset of the existing UCS-2 standard, you can develop
|
|
your applications using the existing UCS-2 support as long as your applications
|
|
treat the UCS-2 as if it were UTF-16.</p>
|
|
<p>i5/OS™ supports UCS-2 encoding with CCSID 13488.</p>
|
|
<div class="section"><h4 class="sectiontitle">UCS, UCS-2 (Universal Multiple-Octet Coded Character Set)</h4><p>The
|
|
ISO 10646 standard is a character code designed to encode text for storage
|
|
in computer files. The design of the ISO 10646 standard is based on today's
|
|
prevalent character code, ASCII (and ISO 8859-1, an extended version of the
|
|
ASCII code). But ISO 10646 goes beyond ASCII's ability to encode only the
|
|
Latin alphabet. The ISO 10646 encoding provides the capability to encode all
|
|
of the characters used for written languages throughout the world.</p>
|
|
</div>
|
|
<div class="section"><h4 class="sectiontitle">Two UCS encoding schemes</h4><p>In order to accommodate
|
|
the many thousands of characters used in international text, ISO/IEC 10646
|
|
specifies the Universal Multiple-Octet Coded Character Set (UCS). UCS can
|
|
be implemented through two encoding schemes:</p>
|
|
<ul><li>UCS-2: Each character is represented by 16 bits or 2 bytes. (The number
|
|
2 in UCS-2 indicates 2 bytes.) For example, uppercase A is represented by
|
|
0041.</li>
|
|
</ul>
|
|
<ul><li>UCS-4: Each character is represented by 32 bits or 4 bytes. (The number
|
|
4 in UCS-4 indicates 4 bytes.) For example, uppercase A is represented by
|
|
0000 0041.</li>
|
|
</ul>
|
|
<p>The major difference between the 2-byte and 4-byte representation
|
|
is that the 4-byte representation allows for the presentation or use of additional
|
|
characters beyond the capability of UCS-2. That is, you can encode more characters
|
|
in UCS-4 than you can in UCS-2.</p>
|
|
<p>i5/OS does not
|
|
support UCS-4 encoding with a CCSID value.</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="rbagsunicodeucs2.htm" title="Unicode is a standard that precisely defines a character set as well as a small number of encodings for it. It enables you to handle text in any language efficiently. It allows a single application to work for a global audience.">Work with Unicode</a></div>
|
|
</div>
|
|
<div class="relconcepts"><strong>Related concepts</strong><br />
|
|
<div><a href="rbagsutf16.htm" title="UTF-16 is an encoding of Unicode in which each character is composed of either one or two 16-bit elements.">UTF-16</a></div>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html> |