Network Working Group                                            HF. Zhu
Internet Draft: Chinese Character Encoding                    Tsinghua U
Document: internet-drafts/draft-zhu-apng-cc-encoding-v2-00.txt    DY. Hu
                                                              Tsinghua U
                                                                ZG. Wang
                                                                    CITS
                                                                 TC. Kao
                                                                     III
                                                               WC. Chang
                                                                     III
                                                              M. Crispin
                                                            U Washington

                                                               July 1995


            Chinese Character Encoding for Internet Messages

                                                                        
Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts  are  working
   documents of the Internet Engineering Task Force (IETF), its  areas,
   and its working groups.  Note that other groups may also  distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ds.internic.net (US East Coast), nic.nordu.net
   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
   Rim).

   This is a draft document of APNG-CC, the Chinese Character
   sub-working group of the I18N/L10N (Internationalization and
   Localization) working group of APNG (Asia-Pacific Networking Group).
   A revised version of this draft document will be submitted to the RFC
   editor as an Informational RFC for the Internet Community.
   Discussion and suggestions for improvement are requested, and should
   be sent to apng-cc@apng.org or zhf@net.edu.cn (the coordinator). This
   document has the same content as draft-zhu-apng-cc-encoding-v1-00.txt, but
   is written in a different way and will expire before Febuaray 1996.  
   Distribution of this memo is unlimited.


Abstract

   This memo provides methods for transporting Chinese characters
   through, but not limited to, electronic mail [RFC-822] and network
   news [RFC-1036] in the Internet community.


Introduction

   As the use of Internet covers more and more Chinese people in the
   world, the need has increased for the ability to send documents
   containing Chinese characters on the Internet.  The methods 
   described in this document provide means of transporting existing 
   Chinese character sets as well as leaving sufficient space for future 
   extension.

   This document describes three kinds of encodings:
      1. ISO-2022-CN and ISO-2022-CN-EXD
      2. CN-GB and CN-Big5 series
      3. ISO-10646 and Unicode
  
  ISO-2022-CN is based upon ISO-2022 [ISO-2022],   similar  to  earlier
  work on ISO-2022-JP [RFC-1468] and ISO-2022-KR [RFC-1557] for Japanese
  and Korean languages.  It is 7-bit, and supports both simplified Chinese
  characters using GB-2312-80 [GB-2312] and traditional Chinese characters
  using the first two planes of CNS-11643 [CNS-11643], as well as  ASCII
  [ASCII] characters.
  
  ISO-2022-CN-EXD is an extended form of ISO-2022-CN that additionally
  supports other GB character sets and all planes of CNS-11643.
  
  CN-GB and CN-Big5 series character sets are 8-bit, and reflect  common
  domestic usage of email on both sides of the Taiwan Straits.
  
  ISO-10646 and Unicode are 8-bit character sets, based on the usage of
  ISO/IEC-10646 [ISO-10646] and Unicode [Unicode 1.1].

Specification

1. 7-bit MIME character sets: ISO-2022-CN and ISO-2022-CN-EXD

 1.1  Description

   Since ISO-2022-CN and ISO-2022-CN-EXD are 7-bit encodings, they do
   not require the 8-bit SMTP extensions.   ISO-2022-CN-EXD could
   be used to support Big-5 [Big-5] too, while ISO-2022-CN can also
   support almost all the characters in Big-5 (except two seldom-used
   characters).
   <<?? seldom-used ??>>

 1.2  ISO-2022-CN
   
   The starting code of ISO-2022-CN is ASCII.  ASCII and Chinese characters
   are distinguished by the use of designations (ESC sequences) and shift
   functions.
   
   For example, once the sequence ESC $ ) A (four bytes, hexadecimal 
   values: 1B 24 29 41) appears, in the upcoming bytes, the bytes 
   followed SO (one byte, hexadecimal value 0E) are Chinese characters 
   as defined in GB-2312-80.  To shift back to ASCII, SI (one byte, 
   hexadecimal value 0F) should be used, the bytes followed SI are ASCII
   characters.  Here, the ESC $ ) A is called designation, and SI, SO 
   are shift functions.

   Designations define the Chinese character sets  used  in  the  text. 
   There are three kinds of designations: SOdesignation, SS2designation
   and SS3designation.
   
   The SOdesignation is in the form ESC $ ) <F>,  where  < F>   is  the
    "final character" assigned to the character set by ISO (refer to the
    ISO registry [ISOREG] for more details).  The SS2designatin  is  in
    the form ESC $ * <F>, and the SS3designation is in the form ESC $ + 
   < F> .     A  designation  overrides  any  previous  designation  for
   subsequent octects in the text.
   
   There are four kinds of shifts: SI, SO, SS2 and SS3.
   
   The shift SI (an octet with  hexadecimal  value  0F)   declares  that
   subsequent octets are interpreted in ASCII.

   The shift SO (an octect with hexadecimal  value  0E)   declares  that
   subsequent octets are interpreted in  the  character  set  defined  by
   SOdesignation.

   The shift SS2 (two octets with hexadecimal values  1B  4E)   declares
   that the subsequent two octets are interpreted in the  character  set
   defined by SS2designation, after which the previous interpretation  
   (from SI or SO) is restored.

   The shift SS3 (two octets with hexadecimal values  1B  4F)   declares
   that the subsequent two octets are interpreted in the  character  set
   defined by SS3designation, after which the previous interpretation  
   (from SI or SO) is restored.

   Another example, the sequence ESC $ ) G indicates bytes followed
   SO are Chinese characters from CNS-11643-plane1, until another 
   SOdesignation appears.  The sequence ESC $ * H indicates the two 
   bytes immediately followed SS2 represent a character in CNS-11643-plane2, 
   until another SS2designation is met.
   
   The escape sequence, shift function and character set used in an
   ISO-2022-CN text are as follows:

   Character sets                                       Shift in with
  --------------------------------------------------------------------
    ASCII                                                    SI
    GB-2312, CNS-11643-plane1                                SO
             CNS-11643-plane2                                SS2


      ESC $ ) A   Indicates the bytes followed SO are Chinese characters 
                  as defined in GB-2312-80, until another SOdesignation 
                  appears

      ESC $ ) G   Indicates the bytes followed SO are as defined in 
                  CNS-11643-plane1, until another SOdesignation appears

      ESC $ * H   Indicates the two bytes immediately followed SS2 is a
                  Chinese character as defined in CNS-11643-plane2, until 
                  another SS2designation appears

   If there are any GB or CNS characters on a line, there must be a
   shift to ASCII before the end of the line (i.e., before the CRLF),
   because the next line starts in the character set that was shifted to
   before the end of the previous line.  In other words, each line
   starts in ASCII, and ends in ASCII.
   
   The name given to this character encoding is "ISO-2022-CN". This name
   is intended to be used as the "charset" parameter in MIME messages.

       Content-Type: text/plain; charset=iso-2022-cn

   The ISO-2022-CN encoding is already in 7-bit form, so it is not
   necessary to use a Content-Transfer-Encoding header.

   ISO-2022-CN may also be used in MIME2 headers.
   
   Other restrictions are given in the Formal Syntax of ISO-2022-CN.

 1.3  ISO-2022-CN-EXD

   ISO-2022-CN is a subset of ISO-2022-CN-EXD.  ISO-2022-CN-EXD supports
   all characters in existing GB and CNS-11643 character sets.

   The escape sequence, shift function and character set used in an
   ISO-2022-CN-EXD text are as follows:

   Character sets                                       Shift in with
  --------------------------------------------------------------------
    ASCII                                                    SI
    GB-2312, GB-12345, CNS-11643-plane1, GB-2312+GB-8565     SO
    GB-7589, GB-13131, CNS-11643-plane2                      SS2
    GB-7590, GB-13132 or other new GBs,CNS-11643-plane3 or   SS3
     other planes of CNS-11643

      Note: Currently, there are some GB sets that have not been 
      registered in ISO. Here <X> represents the final character 
      that will be assigned by ISO for those sets.

      ESC $ ) A   Indicates the bytes followed SO are Chinese characters 
                  as defined in GB-2312-80, until another SOdesignation 
                  appears
      ESC $ * <X> Indicates the two bytes immediately followed SS2 is a 
                  Chinese character as defined in GB-7589-87 [GB-7589], 
                  until another SS2designation appears
      ESC $ + <X> Indicates the two bytes immediately followed SS3 is a 
                  Chinese character as defined in GB-7590-87 [GB-7590], 
                  until another SS3designation appears

      ESC $ ) <X> Indicates the bytes followed SO are as defined in 
                  GB-12345-90 [GB-12345], until another SOdesignation
                  appears
      ESC $ * <X> Indicates the two bytes immediately followed SS2 is a 
                  Chinese character as defined in GB-13131-91 [GB-13131], 
                  until another SS2designation appears
      ESC $ + <X> Indicates the two bytes immediately followed SS3 is a 
                  Chinese character as defined in GB-13132-91 [GB-13131], 
                  until another SS3designation appears

      ESC $ ) E   Indicates the bytes followed SO are as defined in GB-2312+
                  GB-8565 [GB-8565], until another SOdesignation appears

      ESC $ ) G   Indicates the bytes followed SO are as defined in 
                  CNS-11643-plane1, until another SOdesignation appears

      ESC $ * H   Indicates the two bytes immediately followed SS2 is a
                  Chinese character as defined in CNS-11643-plane2, until 
                  another SS2designation appears
      ESC $ + I   Indicates the immediate two bytes followed SS3 is a
                  Chinese character as defined in CNS-11643-1992-plane3,
                  until another SS3designation appears
      ESC $ + J   Indicates the immediate two bytes followed SS3 is a 
                  Chinese character as defined in CNS-11643-1992-plane4,
                  until another SS3designation appears
      ESC $ + K   Indicates the immediate two bytes followed SS3 is a
                  Chinese character as defined in CNS-11643-1992-plane5,
                  until another SS3designation appears
      ESC $ + L   Indicates the immediate two bytes followed SS3 is a
                  Chinese character as defined in CNS-11643-1992-plane6,
                  until another SS3designation appears
      ESC $ + M   Indicates the immediate two bytes followed SS3 is a
                  Chinese character as defined in CNS-11643-1992-plane7,
                  until another SS3designation appears

   Same as ISO-2022-CN, each line should start in ASCII, and end in ASCII.

   The name given to this character encoding is "ISO-2022-CN-EXD". This name
   is intended to be used as the "charset" parameter in MIME messages.

       Content-Type: text/plain; charset=iso-2022-cn-exd

   The ISO-2022-CN-EXD encoding is also in 7-bit form, so it is not
   necessary to use a Content-Transfer-Encoding header.

   ISO-2022-CN-EXD may also be used in MIME2 headers.
   
   Other restrictions are given in the Formal Syntax of ISO-2022-CN-EXD.


  1.4  Support Big-5 with ISO-2022-CN and ISO-2022-EXD

     <<??? Could Mr. TC.Kao complete this paragraph: give the formula of 
     conversion to and from Big-5 ? >>

2. 8-bit MIME character sets: CN-GB, CN-Big5 series

   The CN-GB,CN-Big5 series of charset names are given below.
   Among other things, these support current practice; specifically,
   CN-GB reflects the current usage for simplified Chinese e-mail, 
   and CN-Big5 reflects the current usage for traditional Chinese e-mail.

     Note: the use of 8-bit character sets requires the use of
     either an 8-to-7 Content-Transfer-Encoding mechanism such as
     "BASE64" or "QUOTED-PRINTABLE" if the network is not 8-bit clean,
     or the 8-bit SMTP extensions [SMTPEXT] with the "8BIT"
     Content-Transfer-Encoding on 8-bit clean networks.  Otherwise,
     an 8-bit message which passes through a 7-bit mailer is likely
     to have the 8th bit truncated, resulting in an unreadable
     message.  Although "just send 8-bit data" has been common
     practice in the past, it is incorrect according to the
     Internet standards and causes interoperability problems.

 2.1   CN-GB and CN-GB-xxxxx-xx

   E-mail using GB characters is sent in this way:

   GB-2312-80 characters are used with ASCII characters, 
   not GB-1988-80 [GB-1988].

   GB-2312-80 is also 7-bit, to avoid conflicting with ASCII.  If the
   character is from GB-2312-80, the MSB (bit-8) of each byte is set to 
   1, and therefore becomes a 8-bit character.  Otherwise, the byte is
   interpreted as ASCII.  This constructs a character set named "GB
   Internal Code".

   This method is alos adopted in the .gb files in the Internet.

   To use this character scheme with MIME, CN-GB is used as the value
   for the charset parameter:
      Content-Type: text/plain; charset=cn-gb

   There are character sets other than GB-2312-80 in GB.  They can be
   used with the format of CN-GB-xxxxx-xx, in which xxxxx is the GB
   standard number, and xx is the year of edition.  They should be
   coded in 8-bit as CN-GB.

   Currently, these GB standards are GB-7589-87, GB-7590-87, GB-12345-90, 
   GB-13131-91 and GB-13132-91.  GB-7589-87 and GB-7590-87 supplement the 
   simplified characters in GB-2312-80.  Each character in GB-12345-90, 
   GB-13131-91, and GB-13132-91 are traditional Chinese characters that 
   correspond to the character in GB-2312-80, GB-7589-87, and GB-7590-87 
   with the same code.  Therefore, their "charset" names should be 
   CN-GB-7589-87,CN-GB-7590-87, CN-GB-12345-90, CN-GB-13131-91 and 
   CN-GB-13132-91.

   There is also a kind of dependent character set that can only be used
   with one of the above sets.  For example, if GB-8565 is used, it can
   only be used with GB-2312 or GB-12345, in this case, "+" is permitted
   to appear in the charset name, i.e. CN_GB-2312-80+GB-8565-88.

   CN-GB and CN-GB-xxxxx-xx may also be used in MIME2 headers.

   To avoid hindering interoperability, CN-GB is encouraged to be used
   whenever possible.

  2.2   CN-Big5 and CN-Big5-<variant>-<edition>

   BIG-5 is a character set of traditional Chinese characters, widely
   used in Taiwan and overseas.  E-mail using BIG-5 characters is
   sent in this way:

   BIG-5 characters are used with ASCII characters.

   BIG-5 is a two-byte coding, in which the first byte is 7-bit,  and
   the second byte is 8-bit.  If the character is from BIG-5, the MSB
   (bit-8) of the first byte is set to 1, and therefore becomes an 8-bit
   character.  Otherwise, the byte is interpreted as ASCII.  (Big-5 uses
   the code space: [0xa1-0xfe,0x40-0x7e] and [0xa1-0xfe,0xa1-0xfe], and
   two other user areas with the first byte in the range of [0x81-0xa0].)

   To use this character scheme with MIME, CN-Big5 is used as the value
   for the charset parameter:
      Content-Type: text/plain; charset=cn-big5

   The <variant> and <edition> indicates manufacturer's version and year
   of edition which some implementations might want to check.  These are
   optional.

   CN-Big5 and CN-Big5-<variant>-<edition> may also be used in MIME2
   headers.

   To avoid hindering interoperability, CN-Big5 is encouraged to be
   used whenever possible (as opposed to a variant).


3. 8-bit MIME character sets:  Unicode, ISO-10646

   Many Chinese characters are supported by Unicode, ISO-10646.
   They can also be transferred in Unicode or 10646 forms.  For
   details of using Unicode with MIME, refer to RFC-1641 [RFC-1641],
   RFC-1642[RFC-1642].  For assigned names for 10646 sets, refer to
   RFC-1700 [RFC-1700].


Background Information

1. Use of Chinese in Chinese-speaking nations and regions

   The mainland provinces of China use simplified Chinese character in 
   daily life.  GB is the standard electronic character set.  It is the
   main means for communications between people who share simplified 
   Chinese characters in the world.

   Taiwan uses traditional Chinese characters in daily life. Big-5 is 
   a widely-used character set of traditional Chinese characters, and 
   the de-facto industrial standard in Taiwan, while CNS-11643 is a
   formal way for information interchange.

   Hong Kong uses traditional Chinese characters in daily life, but uses
   both GB and Big-5 in electronic form, because Hong Kong people often
   communicate with people in all of China's provinces.

   Singapore seldom uses Chinese characters, and uses the simplified
   form when Chinese characters are used.  In electronic form, Unicode
   is more popular, however GB is also used.

2. About ISO-2022-CN and ISO-2022-CN-EXD

   ISO-2022-CN and ISO-2022-CN-EXD encodings are based on the usage of 
   ISO 2022.

3. Miscellaneous about Chinese character sets

   The GB-1988-80 character set is identical to ISO 646 [ISO-646] except 
   for currency symbol and tilde. The currency symbol and the tilde are 
   replaced by the Yuan sign and a short line.  This set is GB's variant 
   of ISO 646.  This character set and CNS-5205 [CNS-5205] are not 
   encouraged for use in the Internet, since ASCII combined with GB-2312 or
   CNS-11643-plane 1 and plane 2 comprises all characters in them.

   The GB-2312-80 character set consists of simplified Chinese
   characters, digits, Latin, Greek and Russian alphabets, and some
   other symbols; in all, 7445 characters.  Each character is two bytes.

   CNS-11643 is a character set used in Taiwan.  It contains several 
   character planes (sub-character sets) arranged as an 94x94 character 
   set for each plane.  Plane 1 and Plane 2 contain nearly the same 
   characters as the Big-5 character set and the same code sequences,
   with some additional symbols.  <<??move to 1.5 of the "Specification"
   part"??>>


4. Miscellaneous implementation information

   For maximum interoperability, implementations SHOULD at least support
   sending and receiving ISO-2022-CN.  Supportting all registered character 
   sets in ISO-2022-CN-EXD is greatly encouraged.

   It is also better to be able to support CN-GB (the status quo for
   simplified Chinese e-mail ) and CN-Big5 (the status quo for traditional
   Chinese e-mail).  But sending ISO-2022-CN message is always encouraged 
   whenever possible.
   
   To the maximum extent possible, implementations should be capable of
   displaying messages in any of the encodings introduced in this document,
   even if they only transmit messages in one form.  Ideally, this would be 
   done by shifting to the appropriate font (e.g. on X-windows displays) but
   suitable translation tables may also be used.

   <<??Misunderstandings that ESC cannot pass different softwares??>>
   
   The human user (not implementor) should try to keep lines within 80
   display columns, or, preferably, within 75 (or so) columns, to allow
   insertion of ">" at the beginning of each line in excerpts. Each
   Chinese character takes up two columns, and the shift sequences do
   not take up any columns. The implementor is reminded that Chinese
   characters take up two bytes and should not be split in the middle to
   break lines for displaying, etc.


X.400 Considerations

   X.400 has the ability of carrying different character sets in a
   message by using the body part "GeneralText" defined by ISO/IEC-10021-7.
   [ISO-10021].

   The X.400 ASN.1 definition of the GeneralText body part is:

      general-text-body-part EXTENDED-BODY-PART-TYPE
        PARAMETERS GeneralTextParameters IDENTIFIED BY id-ep-general-text
        DATA       GeneralTextData
        ::= id-et-general-text

      GeneralTextParameters ::= SET OF CharacterSetRegistration

      CharacterSetRegistration ::= INTEGER (1..32767)

      GeneralTextData ::= GeneralString

   Therefore, using ISO-2022-CN simply set the "CharacterSetRegistration"
   part as 6,58,171,172.
   <<??Ask the author of RFC-1502--Harald Tveit Alvestrand-- to see whether 
   the numbers should be quoted in ( and ) ??But mail sent to Harald--
   Harald.Alvestrand@delab.sintef.no-- got bounced, could Mark help ?>>

   Similiarly, using ISO-2022-CN-EXD should set the registered numbers of
   all character sets in the "CharacterSetRegistration" part.  For the
   registered numbers, please refer to ISO registry.  Besides character
   sets supported by ISO-2022-CN, currently registered numbers are:

     GB-2312+GB-8565:   ???
     CNS-11643-plane-3: ???
     CNS-11643-plane-4: ???
     CNS-11643-plane-5: ???
     CNS-11643-plane-6: ???
     CNS-11643-plane-7: ???

   For ISO-10646 and Unicode, <<?????>>

   For CN-GB and CN-Big5 series of character sets, currently there is 
   no formal methods that could be used in X.400 yet.

   For detail about X.400 use of character sets, please refer to 
   RFC-1502 [RFC-1502].

Formal Syntax of ISO-2022-CN and ISO-2022-CN-EXD

1.  Formal Syntax of ISO-2022-CN

   The notational conventions used here are identical to those used in
   RFC 822.

   body            ::= *e_line *( *designator *( e_line / h_line ))

   designator      ::= SOdesignator / SS2designator / SS3designator

   SOdesignator ::= ESC "$" ")" final_char

   SS2designator ::= ESC "$" "*" final_char

   SS3designator ::= ESC "$" "+" final_char

   final_char  ::=  final_char_gb / final_char_cns

   final_char_gb ::="A"

   final_char_cns  ::= "G" / "H"

   e_line          ::= *text CRLF

   h_line          ::= *text 1*( segment *text ) CRLF

   segment         ::= ( SO / SS2 / SS3 ) 1*(one_of_94 one_of_94) SI

                                                     ; ( Octal, Decimal.)

   ESC             ::= <ISO-646 ESC, escape>         ; ( 33, 27.)

   SI              ::= <ASCII SI, shift in>          ; ( 17, 15.)

   SO              ::= <ASCII SO, shift out>         ; ( 16, 14.)

   SS2             ::= <ISO 2022 Single_shift two>   ; ( 33 116, 27 78.)

   SS3             ::= <ISO 2022 Single_shift three> ; ( 33 117, 27 79.)

   SP              ::= <ASCII SP, space>             ; ( 40, 32.)

   one_of_94       ::= <any char in 94_char set>     ; (41-176, 33-126.)

   CHAR            ::= <any ASCII character>         ; ( 0-177, 0.-127.)

   text            ::= <any CHAR, including bare CR & bare LF, but NOT
                       including CRLF, ESC, SI, or SO>  


2.  Formal Syntax of ISO-2022-CN-EXD

   The notational conventions used here are identical to those used in
   RFC 822.

   body            ::= *e_line *( *designator *( e_line / h_line ))

   designator      ::= SOdesignator / SS2designator / SS3designator

   SOdesignator ::= ESC "$" ")" final_char

   SS2designator ::= ESC "$" "*" final_char

   SS3designator ::= ESC "$" "+" final_char

   final_char  ::=  final_char_gb / final_char_cns

   final_char_gb ::="A" / "E" / <X>

   final_char_cns  ::= "G" / "H" / "I" / "J" / "K" / "L" / "M"

   e_line          ::= *text CRLF

   h_line          ::= *text 1*( segment *text ) CRLF

   segment         ::= ( SO / SS2 / SS3 ) 1*(one_of_94 one_of_94) SI

                                                     ; ( Octal, Decimal.)

   ESC             ::= <ISO-646 ESC, escape>         ; ( 33, 27.)

   SI              ::= <ASCII SI, shift in>          ; ( 17, 15.)

   SO              ::= <ASCII SO, shift out>         ; ( 16, 14.)

   SS2             ::= <ISO 2022 Single_shift two>   ; ( 33 116, 27 78.)

   SS3             ::= <ISO 2022 Single_shift three> ; ( 33 117, 27 79.)

   SP              ::= <ASCII SP, space>             ; ( 40, 32.)

   one_of_94       ::= <any char in 94_char set>     ; (41-176, 33-126.)

   CHAR            ::= <any ASCII character>         ; ( 0-177, 0.-127.)

   text            ::= <any CHAR, including bare CR & bare LF, but NOT
                       including CRLF, ESC, SI, or SO>  


References

   [ASCII] American National Standards Institute, "Coded character set
   -- 7-bit American National Standard Code for Information
   Interchange", ANSI X3.4-1986.

   [BIG-5] Institute for Information Industry, " Chinese Coded
   Character Set in Computer ", March, 1984

   [CNS-5205] "Information processing -- 7-Bit Coded Character Set For
   Information Interchange", CNS-5205.

   [CNS-11643] "Chinese Standard Interchange Code", CNS-11643 version 
   1992; "Standard Interchange Code for Generally-Used Chinese 
   Characters", CNS-11643 version 1986.

   [GB-1988] "7-bit Coding Character Set for Information Interchange", 
   GB-1988-80.

   [GB-2312] "Coding of Chinese Ideogram Set for Information Interchange
   Basic Set",  GB-2312-80.

   [GB-7589] "Code of Chinese Ideograms Set for Information Interchange, 
   the 2nd Supplementary Set", UDC 681.3.048, GB 7589-87.

   [GB-7590] "Code of Chinese Ideogram Set for Information Interchange, 
   the 4th Supplementary Set",UDC 681.3.048, GB 7590-87.

   [GB-8565] "Information Processing Coded Character Sets for Text 
   Communication", UDC 681.3,  GB-8565-88.

   [GB-12345] "Code of Chinese Ideogram Set for Information Interchange
   Supplementary Set", GB/T 12345-90.

   [GB-13131] "Code of Chinese Ideogram Set for Information Interchange, 
   the 3rd Supplementary Set", GB-13131-91.

   [GB-13132] "Code of Chinese Ideogram Set for Information Interchange,
   the 5th Supplementary Set", GB-13132-91.

   [ISO-646] International Organization for Standardization (ISO),
   "Information technology -- ISO 7-bit coded character set for
   information interchange", International Standard, Ref. No. ISO/IEC
   646:1991.

   [ISO-2022] International Organization for Standardization (ISO),
   "Information processing -- ISO 7-bit and 8-bit coded character sets
   -- Code extension techniques", International Standard, Ref. No. ISO
   2022-1986 (E).

   [ISO-10021] Information Technology - Text communication - 
   Message-Oriented Text Interchange Systems (MOTIS), ISO 10021,
   October 1988.

   [ISO-10646] ISO/IEC 10646-1:1993(E) Information Technology--Universal
   Multiple-octet Coded Character Set (UCS).

   [ISOREG] International Organization for Standardization (ISO),
   "International Register of Coded Character Sets To Be Used With
   Escape Sequences".

   [MIME-1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
   Mail Extensions) Part One: Mechanisms for Specifying and Describing
   the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
   September 1993.

   [MIME-2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
   Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
   University of Tennessee, September 1993.

   [RFC-822] Crocker, D., "Standard for the Format of ARPA Internet Text
   Messages", STD 11, RFC 822, University of Delaware, August 1982.

   [RFC-1036] Horton M., and Adams, R., "Standard for Interchange of
   USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for
   Seismic Studies, December 1987.

   [RFC-1468] Murai J., Crispin  M.   and  E. van  der  Poel,   Japanese
   Character Encoding for Internet Messages, June 1993.

   [RFC-1557] Choi U., Chon K. and Park H.,  Korean  Character  Encoding
   for Internet Messages, December 1993.

   [RFC-1641] Goldsmith D., and Davis M., "Using Unicode with MIME", RFC
   1641, Taligent Inc., July 1994

   [RFC-1642] Goldsmith D., and Davis M.," UTF-7, A Mail-Safe Transformation 
   Format of Unicode", July 1994

   [RFC-1700] Reynolds J., and Postel J., "Assigned Numbers",RFC 1700, 
   STD 2, ISI, October 1994

   [SMTP] Postel, Jonathan B. "Simple Mail Transfer Protocol", STD 10,
   RFC 821, USC/Information Sciences Institute, August 1982.

   [SMTPEXT] Klensin, J.; Freed, N.; Rose, M.; Stefferud, E.; and
   Crocker, D., "SMTP Service Extensions", RFC 1651, July 1994.

   [Unicode 1.1] "The Unicode Standard, Version 1.1": Version 1.0,
   Volume 1 (ISBN 0-201-56788-1), Version 1.0, Volume 2 (ISBN
   0-201-60845-6), and "Unicode Technical Report #4, The Unicode
   Standard, Version 1.1" (available from The Unicode Consortium, and
   soon to be published by Addison-Wesley).

Acknowledgments

   This document is the result of cooperation in the APNG-CC, the
   Chinese Character sub-working group of the I18N/L10N
   (Internationalization and Localization) working group of APNG
   (Asia-Pacific Networking Group).  The membership of APNG-CC
   consists of individuals from both sides of the Taiwan Strait,
   HongKong, and from Singapore and other countries.  The authors
   wish to thank all members of APNG-CC.

   Prof.Yao Shiquan and Ms.Lin Ning of CITS (China Information Technology 
   Standardization Technical Committee), Prof. Zhao Jingrong, Prof. Li Xing,
   and Mr.YouYue of Tsinghua University gave many help in the process 
   of the work.

   Many thanks to Mr. C.J.Cherng and Mr. C.K.Fan of III (Institute for 
   Information Industry), and Mr. Chang JingShin from Tsinghua University 
   in Hsinchu, Taiwan.

   In particular, Mr.Masataka Ohta, who is the coordinator of APNG-I18N, 
   contributed many efforts towards the work from the beginning of APNG-CC.

   The authors also wish to thank the following people who contributed
   in many ways towards this draft.

   Martin J Duerst            Yuan Jiang
   Kenichi Handa              Stephen G Simpson
   Zhang Ling
   Zhu Bin
   Nelson Chin
   Lu Chin
   Ding ZyKaan
   Zhang ZhouCai
   Feng Hui
   Chen Shuyi
   Lua Kim Teng
   Victor Cheng
   Ken Lunde
   
   <<< More names ?? >>>

Security Considerations

   Security issues are not discussed in this memo.

Authors' Addresses

   Zhu,Hai-feng  (HF. Zhu)
   Dept. of Computer Science & Technology
   Tsinghua University
   Beijing, 100084
   China

   Tel: +86-1-2561144 ext. 3492
   Fax: +86-1-2564173
   Email: zhf@net.edu.cn


   Hu,Dao-yuan  (DY. Hu)
   Tsinghua Networking Center
   Tsinghua University
   Beijing, 100084
   China

   Tel: +86-1-2594016
   Fax: +86-1-2564173
   Email: hdy@tsinghua.edu.cn

   Wang,Zhi-guan  (ZG. Wang)
   SC2 Division <<???>>
   China Information Technology Standardization Technical Committee
   (CITS)
   Beijing, 100083
   China

   Tel: +86-1-4012392
   Fax: +86-1-4010601


   Kao,Tien-cheu (TC. Kao)
   I.T. Promotion Division
   Institute for Information Industry(III) 
   Taipei
   Taiwan

   Tel: +886-2-5631688
   Fax: +886-2-563-4209
   Email: tckao@iiidns.iii.org.tw


   Chang,Wen-chung  (WC. Chang)
   Institute for Information Industry(III) 
   Taipei
   Taiwan

   Tel: +886-2-7327771
   Fax: +886-2-7370188
   Email: chung@iiidns.iii.org.tw


   Mark R. Crispin
   Networks and Distributed Computing
   University of Washington
   4545 15th Avenue NE
   Seattle, WA  98105-4527
   USA

   Tel: +1 (206) 543-5762
   Fax: +1 (206) 685-4045
   Email: MRC@CAC.Washington.EDU