FTPEXT Working Group B. Curtin INTERNET DRAFT Defense Information Systems Agency Expires 26 May 1997 26 November 1996 Internationalization of the File Transfer Protocol Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Please send comments to the FTP Extension working group (FTPEXT-WG) of the Internet Engineering Task Force (IETF) at . Subscription address is . Discussions of the group are archived at . Abstract The File Transfer Protocol, as defined in RFC 959 [RFC959] and RFC 1123 Section 4 [RFC1123], is one of the oldest and widely used protocols on the Internet. The protocol's primary character set, 7 bit ASCII, has served the protocol well through the early growth years of the Internet. However, as the Internet becomes more global, there is a need to support character sets beyond 7 bit ASCII. This document addresses the internationalization (I18n) of FTP, which includes supporting the multiple character sets found throughout the Internet community. This is achieved by extending the FTP specification and giving recommendations for proper internationalization support. Expires 26 May 1997 [Page 1] INTERNET DRAFT FTP Internationalization 26 November, 1996 Table of Contents 1. INTRODUCTION.................................................3 1.1 SCOPE.......................................................3 2.0 INTERNATIONALIZATION........................................3 2.1 INTERNATIONAL CHARACTER SET.................................3 2.2 TRANSFER ENCODING...........................................4 2.3 TRANSLATIONS................................................6 2.3.1 ISO/IEC 8859-8 EXAMPLE....................................9 2.3.2 VENDOR CODEPAGE EXAMPLE..................................10 3. CONFORMANCE.................................................11 3.1 INTERNATIONAL SERVERS......................................11 3.1.1 SERVER STRATEGIES EXAMPLES...............................12 3.2 INTERNATIONAL CLIENTS......................................12 4.0 SECURITY...................................................13 5.0 ACKNOWLEDGEMENTS...........................................13 BIBLIOGRAPHY...................................................14 AUTHOR'S ADDRESS...............................................15 Expires 26 May 1997 [Page 2] INTERNET DRAFT FTP Internationalization 26 November, 1996 1. Introduction As the Internet grows throughout the world the requirement to support character sets outside of the ASCII / Latin-1 character set becomes ever more urgent. For FTP, because of the large installed base, it is paramount that this be done without breaking existing clients and servers. This document addresses this need. In doing so it defines a solution which will still allow the installed base to interoperate with new international clients and servers. 1.1 Scope This document enhances the capabilities of the File Transfer Protocol by defining a Universal Character Set (UCS), a UCS transformation format (UTF), and removing the 7-bit restrictions on pathnames used in client commands and server responses. 2.0 Internationalization The File Transfer Protocol was developed in a period when the predominate character sets were 7 bit ASCII and 8 bit EBCDIC. Today these character sets can not support the wide range of characters needed by multinational systems. Given that there are a number of character sets in current use that provide more characters than 7-bit ASCII, it makes sense to decide on a convenient way to represent the union of those possibilities. To work globally either requires support of a number of character sets and to be able to translate between them, or the use of a single preferred character set . To assure interoperability this document recommends the latter approach and defines a single character set, in addition to NVT ASCII and EBCDIC, which is understandable by all systems. For FTP this character set will be ISO/IEC 10646:1993 and the UTF-8 encoding. For support of global compatibility it is strongly recommended that clients and servers use UTF-8 encoding when performing operations on filenames. Clients and servers are, however, under no obligation to perform any translation on the contents of a file for operations such as STOR or RETR. A more thorough description, beyond what is given in the document, on UTF-8, ISO/IEC 10646, and UNICODE can be found in RFC 2044 [RFC2044]. 2.1 International Character Set The character set defined for international support of FTP shall be the Universal Character Set as defined in ISO Expires 26 May 1997 [Page 3] INTERNET DRAFT FTP Internationalization 26 November, 1996 10646:1993 [ISO-10646] as amended. This standard incorporates the script and symbol character sets of many existing international, national, and corporate standards. ISO/IEC 10646 defines two alternate forms of encoding, UCS-4 and UCS-2. UCS-4 is a four byte (31 bit) encoding containing 2**31 code positions divided into 128 groups of 256 planes. Each plane consists of 256 rows of 256 cells. UCS-2 is a 2 byte (16 bit) character set consisting of plane zero or the Basic Multilingual Plane (BMP). Currently, no codesets have been defined outside of the 2 byte BMP. The Unicode standard version 2.0 [UNICODE] is consistent with the UCS-2 subset of ISO/IEC 10646. The Unicode standard version 2.0 includes the repertoire of IS 10646 characters, amendments 1-7 of IS 10646, and editorial and technical corrigenda. NOTE -- implementers should be aware that ISO 10646 amended from time to time; 4 amendments have been adopted since the initial 1993 publication, none of which significantly affects this specification. A fifth amendment, now under consideration, will introduce incompatible changes to the standard: 6556 Korean Hangul syllables allocated between code positions 3400 and 4DFF (hexadecimal) will be moved to new positions (and 4516 new syllables added), thus making references to the old positions invalid. Since the Unicode consortium has already adopted the corresponding amendment in Unicode 2.0, adoption of DAM 5 is considered likely and implementers should probably consider the old code positions as already invalid. Despite this one-time change, the relevant standard bodies have committed themselves not to change any allocated code position in the future. To encode Korean Hangul irrespective of these changes, the conjoining Hangul Jamo in the range 1110-11F9 can be used. 2.2 Transfer Encoding UCS Transformation Format 8 (UTF-8) [UTF-8], also known as UTF-2, will be used as a transfer encoding to transmit the international character set. UTF-8 is a file safe encoding which avoids the use of byte values which have special significance during the parsing of file name character strings. UTF-8 is an 8 bit encoding of the characters in the UCS. Some of UTF-8's benefits are that it is compatible with 7 bit ASCII, so it doesn't affect programs that give special meanings to various ASCII characters; it is immune to synchronization errors; and it has enough space to support large character sets. UTF-8 encoding represents each UCS character as a sequence of Expires 26 May 1997 [Page 4] INTERNET DRAFT FTP Internationalization 26 November, 1996 1 to 6 bytes in length. For all sequences of one byte the most significant bit is ZERO. For all sequences of more than one byte the number of ONE bits in the first byte, starting from the most significant bit position, indicates the number of bytes in the UTF-8 sequence followed by a ZERO bit. For example, the first byte of a 3 byte UTF-8 sequence would have 1110 as its most significant bits. Each additional bytes (continuing bytes) in the UTF-8 sequence, contain a ONE bit followed by a ZERO bit as their most significant bits. The remaining free bit positions in the continuing bytes are used to identify characters in the UCS. The relationship between UCS and UTF-8 is demonstrated in the following table: UCS-4 range UTF-8 byte sequence 0000 0000-0000 007F 0xxxxxxx 0000 0080-0000 07FF 110xxxxx 10xxxxxx 0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0400 0000-7FFF FFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx A beneficial property of UTF-8 is that its single byte sequence is consistent with the ASCII character set. This feature will allow a transition where old ASCII-only clients can still interoperate with new servers which support the UTF-8 encoding. Another feature is that the encoding rules make it very unlikely that a character sequence from a different character set will be mistaken for a UTF-8 encoded character sequence. Clients and servers can use a simple routine to determine if the character set being exchanged is a valid UTF-8: int utf8_valid(const unsigned char *buf, unsigned int len) { const unsigned char *endbuf = buf + len; int trailing = 0; /* trailing (continuation) bytes to follow */ while (buf != endbuf) { unsigned char c = *buf++; if (trailing) if ((c&0xC0) == 0x80) trailing--; else return 0; else Expires 26 May 1997 [Page 5] INTERNET DRAFT FTP Internationalization 26 November, 1996 if ((c&0x80) == 0x00) continue; else if ((c&0xE0) == 0xC0) trailing = 1; else if ((c&0xF0) == 0xE0) trailing = 2; else if ((c&0xF8) == 0xF0) trailing = 3; else if ((c&0xFC) == 0xF8) trailing = 4; else if ((c&0xFE) == 0xFC) trailing = 5; else return 0; } return trailing == 0; } 2.3 Translations Translation from the local filesystem character set to UTF-8 will normally involve a two step process. First translate the local character set to the UCS; then translate the UCS to UTF-8. The first step in the process can be performed by maintaining a translation table which includes the local character set code and the corresponding UCS code. For instance the ISO/IEC 8859-8 [ISO-8859] code for the Hebrew letter "VAV" is 0xE4. The corresponding 4 byte ISO/IEC 10646 code is 0x000005D5. The next step is to translate the UCS character code to the UTF-8 encoding. The following routine can be used to determine and encode the correct number of bytes based on the UCS-4 character code: int ucs4_to_utf8 (unsigned long *ucs4_buf, unsigned int ucs4_len, unsigned char *utf8_buf) { const unsigned long *ucs4_endbuf = ucs4_buf + ucs4_len; unsigned long ucs4_ch; while (ucs4_buf != ucs4_endbuf) { ucs4_ch = *ucs4_buf; if ( ucs4_ch <= 0x7FUL) /* ASCII chars no conversion needed */ *utf8_buf++ = (unsigned char) ucs4_ch; else if ( ucs4_ch <= 0x07FFUL ) /* In the 2 byte utf-8 range */ { *utf8_buf++= (unsigned char) (0xC0UL + (ucs4_buf/0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + (ucs4_buf%0x40UL)); } else if ( ucs4_ch <= 0xFFFFUL ) /* In the 3 byte utf-8 range. The values 0x0000FFFE, 0x0000FFFF and Expires 26 May 1997 [Page 6] INTERNET DRAFT FTP Internationalization 26 November, 1996 0x0000D800 - 0x0000DFFF do not occur in UCS-4 */ { *utf8_buf++= (unsigned char) (0xE0UL + (ucs4_buf/0x1000UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x40UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + (ucs4_buf%0x40UL)); } else if ( ucs4_ch <= 0x1FFFFFUL ) /* In the 4 byte { utf-8 range */ *utf8_buf++= (unsigned char) (0xF0UL + (ucs4_buf/0x040000UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x10000)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x40UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + (ucs4_buf%0x40UL)); } else if ( ucs4_ch <= 0x03FFFFFFUL ) /* In the 5 byte { utf-8 range */ *utf8_buf++= (unsigned char) (0xF8UL +(ucs4_buf/0x01000000UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x040000UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x1000UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x40UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + (ucs4_buf%0x40UL)); } else if ( ucs4_ch <= 0x7FFFFFFFUL ) /* In the 6 byte { utf-8 range */ *utf8_buf++= (unsigned char) (0xF8UL +(ucs4_buf/0x40000000UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x01000000UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x040000UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x1000UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL + ((ucs4_buf/0x40UL)%0x40UL)); *utf8_buf++= (unsigned char) (0x80UL Expires 26 May 1997 [Page 7] INTERNET DRAFT FTP Internationalization 26 November, 1996 + (ucs4_buf%0x40UL)); } } } When moving from UTF-8 encoding to the local character set the reverse procedure is used. First the UTF-8 encoding is transformed into the UCS-4 character set. The UCS-4 is then converted to the local character set from a translation table (i.e. the opposite of the table used to form the UCS-4 character code). To convert from UTF-8 to UCS-4 the free bits (those that do not define UTF-8 sequence size or signify continuation bytes) in a UTF-8 sequence are concatenated as a bit string. The bits are then distributed into a four byte sequence starting from the least significant bits. Those bits not assigned a bit in the four byte sequence are padded with ZERO bits. The following routine converts the UTF-8 encoding to UCS-4 character codes: int utf8_to_ucs4 (unsigned long *ucs4_buf, unsigned int utf8_len, unsigned char *utf8_buf) { const unsigned char *utf8_endbuf = utf8_buf + utf8_len; while (utf8_buf != utf8_endbuf) { if ((*utf8_buf & 0x80) == 0x00) /* ASCII chars no conversion { needed */ *ucs4_buf++ = (unsigned long) *utf8_buf; utf8_buf++; } else if ((*utf8_buf & 0xE0)== 0xC0) /* In the 2 byte utf-8 { range */ *ucs4_buf++ = (unsigned long) ((*utf8_buf - 0xC0) * 0x40) + ( *(utf_buf+1) - 0x80)); utf8_buf += 2; } else if ( (*utf8_buf & 0xF0) == 0xE0 ) /* In the 3 byte utf-8 { range */ *ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xE0) * 0x1000) + (( *(utf8_buf+1) - 0x80) * 0x40) + ( *(utf_buf+2) - 0x80)); utf8_buf += 3; } else if ((*utf8_buf & 0xF8) == 0xF0) /* In the 4 byte utf-8 Expires 26 May 1997 [Page 8] INTERNET DRAFT FTP Internationalization 26 November, 1996 { range */ *ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xF0) * 0x040000) + (( *(utf8_buf+1) - 0x80) * 0x1000) + (( *(utf8_buf+2) - 0x80) * 0x40) + ( *(utf_buf+3) - 0x80)); utf8_buf += 4; } else if ((*utf8_buf & 0xFC) == 0xF8) /* In the 5 byte utf-8 { range */ *ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xF8) * 0x01000000) + ((*(utf8_buf+1) - 0x80) * 0x040000) + (( *(utf8_buf+2) - 0x80) * 0x1000) + (( *(utf8_buf+3) - 0x80) * 0x40) + ( *(utf_buf+4) - 0x80)); utf8_buf+=5; } else if ((*utf8_buf & 0xFE) == 0xFC) /* In the 6 byte utf-8 { range */ *ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xFC) * 0x40000000) + ((*(utf8_buf+1) - 0x80) * 0x010000000) + ((*(utf8_buf+2) - 0x80) * 0x040000) + (( *(utf8_buf+3) - 0x80) * 0x1000) + (( *(utf8_buf+4) - 0x80) * 0x40) + ( *(utf_buf+5) - 0x80)); utf8_buf+=6; } } } 2.3.1 ISO/IEC 8859-8 Example This example demonstrates mapping ISO/IEC 8859-8 character set to UTF-8 and back to ISO/IEC 8859-8. As noted earlier, the Hebrew letter "VAV" is translated from the ISO/IEC 8859-8 character code 0xE4 to the corresponding 4 byte ISO/IEC 10646 code of 0x000005D5 by a simple lookup of a translation/mapping file. The UCS-4 character code is transformed into UTF-8 using the ucs4_to_utf8 routine described earlier by: 1. Because the UCS-4 character is between 0x80 and 0x07FF it will map to a 2 byte UTF-8 sequence. 2. The first byte is defined by (0xC0 + (0x000005D5 / 0x40)) = 0xD7. 3. The second byte is defined by (0x80 + (0x000005D5 % Expires 26 May 1997 [Page 9] INTERNET DRAFT FTP Internationalization 26 November, 1996 0x40)) = 0x95. The UTF-8 encoding is transferred back to UCS-4 by using the utf8_to_ucs4 routine described earlier by: 1. Because the first byte of the sequence, when the '&' operator with a value of 0xE0 is applied, will produce 0xC0 (0xD7 & 0xE0 = 0xC0) the UTF-8 is a 2 byte sequence. 2. The four byte UCS-4 character code is produced by (((0xD7 - 0xC0) * 0x40) + (0x95 -0x80)) = 0x000005D5. Finally, the UCS-4 character code is translated to ISO/IEC 8859-8 character code (using the translation table which matches ISO/IEC 8859-8 to UCS-4 ) to produce the original 0xE4 code for the Hebrew letter "VAV". 2.3.2 Vendor Codepage Example This example demonstrates the mapping of a codepage to UTF-8 and back to a vendor codepage. Mapping between vendor codepages can be done in a very similar manner as described above. For instance both the PC and Mac codepages reflect the character set from the Thai standard TIS 620-2533. The character code on both platforms for the Thai letter "SO SO" is 0xAB. This character can then be mapped into the UCS-4 by way of a translation/mapping file to produce the UCS-4 code of 0x0E0B. The UCS-4 character code is transformed into UTF-8 using the ucs4_to_utf8 routine described earlier by: 1. Because the UCS-4 character is between 0x0800 and 0xFFFF it will map to a 3 byte UTF-8 sequence. 2. The first byte is defined by (0xE0 + (0x00000E0B / 0x1000) = 0x00. 3. The second byte is defined by (0x80 + ((0x00000E0B / 0x40) % 0x40))) = 0xB8. 4. The third byte is defined by (0x80 + (0x00000E0B % 0x40)) = 0x8B. The UTF-8 encoding is transferred back to UCS-4 by using the utf8_to_ucs4 routine described earlier by: 1. Because the first byte of the sequence, when the '&' operator with a value of 0xF0 is applied, will produce 0xE0 (0xE0 & 0xF0 = 0xE0) the UTF-8 is a 3 byte sequence. 2. The four byte UCS-4 character code is produced by (((0xE0 - 0xE0) * 0x1000) + ((0xB8 - 0x80) * 0x40) + (0x8B -0x80) = 0x0000E0B. Expires 26 May 1997 [Page 10] INTERNET DRAFT FTP Internationalization 26 November, 1996 Finally, the UCS-4 character code is translated to either the PC or MAC codepage character code (using the translation table which matches codepage to UCS-4 ) to produce the original 0xAB code for the Thai letter "SO SO". 3. Conformance File names are sequences of bytes. The character set of names that are valid UTF-8 sequences is UTF-8. The character set of other names is undefined. Conforming internationalized client and servers must either support UTF-8 or support a local character set which is supported by both the client and server. Clients and servers, unless otherwise configured to support a specific native character set, should check for a valid UTF-8 byte sequence to determine if the pathname being presented is UTF-8. 3.1 International Servers The 7-bit restriction on pathnames used in server responses is dropped. If servers and clients are not configured to share the same character set, servers should use UTF-8 encoding for all pathname transfers. There are several plausible UTF-8 server implementation strategies: - A server that copies filenames transparently from a local filesystem may continue to do so. It is then up to the local file creators to use UTF-8 filenames. -A server may translate filenames from a local character set to UTF-8. Each filename will be translated to UTF-8 before it is sent to the client. - UTF-8 Filenames received from the client must be translated back if possible. Many existing servers interpret 8-bit filenames as being in the local character set. They may continue to do so for filenames that are not valid UTF-8. A high-quality translating server will use the following procedure: If fn is valid UTF-8 and can be translated to the local character set: Translate fn to the local character set, obtaining localfn. Expires 26 May 1997 [Page 11] INTERNET DRAFT FTP Internationalization 26 November, 1996 Attempt to operate on localfn. Upon success: Stop. Upon temporary error: Return an error message to the client. Stop. Attempt to operate on fn. Upon temporary error: Return an error message to the client. Stop. Otherwise: Attempt to operate on fn. Upon temporary error: Return an error message to the client. Stop. 3.1.1 Server Strategies Examples There are a number of server strategies which might be employed: - Server's OS uses one fixed character set. In this case, the server should easily be able to support built-in translation to UTF-8. This is trivial where that fixed character set is ASCII, ISO 8859/1, or UTF-8. - Server supports charset labeling of files and/or directories, such that different file names may have different charsets. The server should attempt to translate all file names to UTF-8, but if it can't then it should leave that name in its raw form. - Server's OS does not mandate the character set, but the administrator configures it in the FTP server. The server should be configured to use a particular translation table. (Maybe external, but the server might have some common choices built-in.) This also allows the flexibility of defining different charsets for different directories. - Server's OS does not mandate the character set and it is not configured. The server should simply use the raw bytes in the file name. They might be ASCII or UTF-8. - Server is a mirror, and wants to look just like the site it is mirroring. It should save the exact file name bytes that it received from the main server. 3.2 International Clients The 7-bit restriction on pathnames used by client commands is dropped. Expires 26 May 1997 [Page 12] INTERNET DRAFT FTP Internationalization 26 November, 1996 While clients are not obligated to support all of the characters or the associated glyphs defined in the UCS, clients which are presented UTF-8 filenames by the server should parse UTF-8 correctly, and attempt to display the filename within the limitation of the resources available. Unknown UTF-8 glyphs might be displayed as question marks, or hex, or something else. This is a quality-of-implementation issue. Client developers should be aware that it will be possible for pathnames to contain mixed characters (e.g. /Latin1DirectoryName/HebrewFileName). They should be prepared to handle the Bi-directional (BIDI) display of these character sets (i.e. right to left display for the directory and left to right display for the filename). Character semantics of other names shall remain undefined. If a client detects that a server is non-UTF-8, it should change its display appropriately. How a client implementation handles non UTF-8 is a quality of implementation issue. It may try to assume some other encoding, give the user a chance to try to assume something, or save encoding assumptions for a server from one FTP session to another. Client implementation notes: Many existing clients interpret 8-bit filenames as being in the local character set. They may continue to do so for filenames that are not valid UTF-8. 4.0 Security This document addresses the support of character sets beyond 1 byte. Conformance to this document should not induce a security threat. 5.0 Acknowledgements The following people have contributed to this document: Alex Belits D. J. Berstein Martin J. Duerst Mark Harris Paul Hethmon Alun Jones James Matthews Keith Moore Benjamin Riefenstahl (and others from the FTPEXT working group) Expires 26 May 1997 [Page 13] INTERNET DRAFT FTP Internationalization 26 November, 1996 Bibliography [ISO-8859] ISO 8859. International standard -- Information processing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1 (1987) -- Part 2: Latin alphabet No. 2 (1987) -- Part 3: Latin alphabet No. 3 (1988) -- Part 4: Latin alphabet No. 4 (1988) -- Part 5: Latin/Cyrillic alphabet (1988) -- Part 6: Latin/Arabic alphabet (1987) -- Part : Latin/Greek alphabet (1987) -- Part 8: Latin/Hebrew alphabet (1988) -- Part 9: Latin alphabet No. 5 (1989) -- Part10: Latin alphabet No. 6 (1992) [ISO-10646] ISO/IEC 10646-1:1993. International standard -- Information technology -- Universal multiple-octet coded character set (UCS) -- Part 1: Architecture and basic multilingual plane. [RFC959] J. Postel, J Reynolds, "File Transfer Protocol (FTP)", RFC 959, October 1985. [RFC1123] R. Braden, "Requirements for Internet Hosts -- Application and Support", RFC 1123, October 1989. [RFC2044] F. Yergeau, "UTF-8, a transformation format of Unicode and ISO 10646", RFC 2044, October 1996. [UNICODE] The Unicode Consortium, "The Unicode Standard - Version 2.0", Addison Westley Developers Press, July 1996. [UTF-8] ISO/IEC 10646-1:1993 AMENDMENT 2 (1996). UCS Transformation Format 8 (UTF-8). Expires 26 May 1997 [Page 14] INTERNET DRAFT FTP Internationalization 26 November, 1996 Author's Address JIEO Attn JEBBD (Bill Curtin) Ft. Monmouth, N.J. 07703-5613 curtinw@ftm.disa.mil Expires 26 May 1997 [Page 15]