Unicode Consortium's Format A

Standard character sets in the Unicode Consortium's Format A are available from ftp://ftp.unicode.org/Public/MAPPINGS. To locate the character set you need, use one of the following methods:

  • If the character set is a standard Microsoft Windows Code Page, change to the directory, VENDORS/MICSFT/WINDOWS.
  • If the character set is an ISO 8859 character set, change to the directory, ISO8859.
  • If the character set is none of the above, explore the other directories in ftp://ftp.unicode.org/Public/MAPPINGS in order to locate the desired character set.
  • If you cannot find the character set you need, you must define a nonstandard character set or modify an existing Relativity character set.

Relativity augments this format with a Charset and EndCharset clause to allow more than one character set to appear in the character set source file.

A character set source file is a simple text file with the extension of .cs. Comments may appear anywhere in the file and are introduced with the hash character (#) and extend to the end of the line. Numeric values may begin with '0x' to indicate a hexadecimal constant, a '0' only to indicate an octal constant, or '1' through '9' to indicate a decimal constant. Strings are delimited by quotes ("").

One or more character sets may be in this file. The format of each character set is:

Charset "Character Set Name" character-set-number
code-point-value unicode-value 
code-point-value unicode-value
.
.
.
EndCharset

where:

Character Set Name is a string that is the name of the character set to display in the Relativity Database Administrator.

character-set-number is the number to assign to the character set. For character sets installed via the Relativity Database Administrator, this value is ignored and assumed to be zero.

code-point-value is the 8-bit numeric value of the Glyph in the character set. A code-point-value that is not present in the character set is assumed to be unused.

unicode-value is the 16-bit numeric value of the Glyph in the Unicode standard. This value also may be a Unicode constant, in which case, it will be delimited by a less than symbol (<) and a greater than symbol (>). The most commonly used of these is the constant <NOT USED>, which indicates that the Code Point does not have a Glyph assigned to it.