ibm-information-center/dist/eclipse/plugins/i5OS.ic.nls_5.4.0.1/rbagsunicodeucs2.htm

129 lines
9.3 KiB
HTML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="Work with Unicode" />
<meta name="abstract" content="Unicode is a standard that precisely defines a character set as well as a small number of encodings for it. It enables you to handle text in any language efficiently. It allows a single application to work for a global audience." />
<meta name="description" content="Unicode is a standard that precisely defines a character set as well as a small number of encodings for it. It enables you to handle text in any language efficiently. It allows a single application to work for a global audience." />
<meta name="DC.Relation" scheme="URI" content="rbagshandlingdata.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsunicode.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsutf8.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsutf16.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsutf32.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsucs2.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsunicodeandprior.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsicu.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagswhyuseucs2.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsucs2andas400.htm" />
<meta name="DC.Relation" scheme="URI" content="rbagsinstallscenarios.htm" />
<meta name="DC.Relation" scheme="URI" content="http://www.unicode.org" />
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rbagsunicodeucs2" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Work with Unicode</title>
</head>
<body id="rbagsunicodeucs2"><a name="rbagsunicodeucs2"><!-- --></a>
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Work with Unicode</h1>
<div><p><dfn class="term">Unicode</dfn> is a standard that precisely defines a character
set as well as a small number of encodings for it. It enables you to handle
text in any language efficiently. It allows a single application to work for
a global audience.</p>
<p>Before Unicode, the encoding systems that existed did not cover all the
necessary numbers, characters, and symbols in use. Different encoding systems
might assign the same number to different characters. If you used the wrong
encoding system, your output might not have been what you expected to see.</p>
<p>Unicode provides a unique number for every character, regardless of platform,
language, or program. Using Unicode, you can develop a software product that
works with various platforms, languages, and countries. Unicode also allows
data to be transported through many different systems. Modern systems provide
Internationalization solutions based on Unicode.</p>
<p>Unicode was developed as a single-coded character set that contains support
for all languages in the world. The first version of Unicode used 16-bit numbers,
which allowed for encoding 65 536 characters without complicated multibyte
schemes. With the inclusion of more characters, and following implementation
needs of many different platforms, Unicode was extended to allow more than
one million characters. Several other encoding schemes were added. This introduced
more complexity into the Unicode standard, but far less than managing a large
number of different encodings.</p>
<p>Starting with Unicode 2.0 (published in 1996), the Unicode standard began
assigning numbers from 0 to 1 114 111. This gives more than enough
room for all written languages in the world. The original repertoire covered
all major languages commonly used in computing. Unicode continues to grow,
and it includes more scripts.</p>
<p>The design of Unicode differs in several ways from traditional character
sets and encoding schemes:</p>
<ul><li>Its repertoire enables users to include text efficiently in almost all
languages within a single document.</li>
<li>It can be encoded in a byte-based way with one or more bytes per character,
but the default encoding scheme uses 16-bit units that allow much simpler
processing for all common characters.</li>
<li>Many characters, such as letters with accents and umlauts, can be combined
from the base character and accent or umlaut modifiers. This combining reduces
the number of different characters that need to be encoded separately. <em>Precomposed</em> variants
for characters that existed in common character sets at the time were included
for compatibility. For example, Latin small letter A used with a combining
tilde results in <img src="la190000.gif" alt="A graphical character" />.</li>
</ul>
<p>Characters and their usage are well-defined and described. Traditional
character sets typically provide only the name or a picture of a character
and its number and byte encoding; Unicode has a comprehensive database of
properties available. It also defines a number of processes and algorithms
for dealing with many aspects of text processing to make it more interoperable.</p>
<p>The early inclusion of all characters of commonly used character sets makes
Unicode a useful mechanism for converting between traditional character sets,
and makes it feasible to process non-Unicode text by first converting the
text into Unicode, processing the text, and then converting it back to the
original encoding without loss of data.</p>
</div>
<div>
<ul class="ullinks">
<li class="ulchildlink"><strong><a href="rbagsunicode.htm">Why use Unicode?</a></strong><br />
Unicode has many advantageous functions.</li>
<li class="ulchildlink"><strong><a href="rbagsutf8.htm">UTF-8</a></strong><br />
A Unicode Transformation Format (UTF) is the algorithmic mapping from every Unicode value to a unique byte sequence.</li>
<li class="ulchildlink"><strong><a href="rbagsutf16.htm">UTF-16</a></strong><br />
UTF-16 is an encoding of Unicode in which each character is composed of either one or two 16-bit elements.</li>
<li class="ulchildlink"><strong><a href="rbagsutf32.htm">UTF-32</a></strong><br />
UTF-32 is an encoding of Unicode in which each character is composed of 4 bytes.</li>
<li class="ulchildlink"><strong><a href="rbagsucs2.htm">UCS-2 and its relationship to Unicode</a></strong><br />
Because the UCS-2 standard is limited to 65 535 characters, and the data processing industry needs over 94 000 characters, the UCS-2 standard is in the process of being superseded by the Unicode UTF-16 standard.</li>
<li class="ulchildlink"><strong><a href="rbagsunicodeandprior.htm">How Unicode relates to prior standards such as ASCII and EBCDIC</a></strong><br />
This topic provides a historical perspective on the Unicode standard, and explains how it can reduce the complexity of handling character data in globalized applications.</li>
<li class="ulchildlink"><strong><a href="rbagsicu.htm">International Components for Unicode</a></strong><br />
The International Components for Unicode (ICU) is a C library that provides a full-featured, industrial strength, Unicode support.</li>
<li class="ulchildlink"><strong><a href="rbagswhyuseucs2.htm">Mapping of data</a></strong><br />
i5/OS™ uses
the EBCDIC encoding scheme. However, not all clients attached to it use an
EBCDIC encoding scheme to store, retrieve, and process data. Therefore, some
clients use Unicode as an <em>exchange mechanism</em> that is safe across all
platforms. </li>
<li class="ulchildlink"><strong><a href="rbagsucs2andas400.htm">Unicode on i5/OS</a></strong><br />
i5/OS™ provides
support for Unicode.</li>
</ul>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rbagshandlingdata.htm" title="There are ways in which the operating system enables you to handle data in a globalized environment. These topics describe Unicode and Unicode data, the Chinese standard GB18030, how to use CCSIDs to integrate multiple language environments consistently, and how to use bidirectional data, DBCS data, and locales.">Handle data in globalized applications</a></div>
</div>
<div class="relconcepts"><strong>Related concepts</strong><br />
<div><a href="rbagsinstallscenarios.htm" title="Use these scenarios to better understand multilingual support.">Scenarios: Set up i5/OS with a national language version</a></div>
<div><a href="http://www.unicode.org" target="_blank">Unicode</a></div>
</div>
</div>
</body>
</html>