HTTP/1.1 200 OK Date: Tue, 09 Apr 2002 00:36:11 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Sun, 15 Nov 1992 04:38:00 GMT ETag: "3ddd94-2762-2b05d428" Accept-Ranges: bytes Content-Length: 10082 Connection: close Content-Type: text/plain Network Working Group Jun Murai Internet Draft Mark Crispin Erik van der Poel 10th September 1992 Japanese Character Encoding for Internet Messages Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. This draft document will be submitted to the RFC editor as an informational document. This document will expire before 10th March 1993. Distribution of this memo is unlimited. Please send comments to ietf-822@dimacs.rutgers.edu. Introduction This document describes the encoding used in the bodies of electronic mail and network news messages in several Japanese networks. It was first specified by and used in JUNET [JUNET]. The encoding is now also widely used in Japanese IP communities. This document provides a name for the encoding which is intended to be used in the "charset" parameter field of MIME [MIME] messages and RFC 1342 [RFC1342] headers. This document only describes the encoding of plain text. The encoding of other subtypes of text, such as rich text, is not discussed here. Murai et al Expires 10th March 1993 [Page 1] Internet Draft Updated 10th September 1992 Description The message body starts in ASCII, and switches to Japanese characters through an escape sequence. For example, the escape sequence ESC $ B (three bytes, hexadecimal values: 1B 24 42) indicates that the bytes following this escape sequence are Japanese characters, which are encoded in two bytes each. To switch back to ASCII, the escape sequence ESC ( B is used. The following table gives the escape sequences and the character sets used in JUNET messages. ESC ( B ASCII ESC ( J JIS X 0201-1976 ("Roman" set) ESC $ @ JIS X 0208-1978 ESC $ B JIS X 0208-1983 The "Roman" character set of JIS X 0201-1976 is identical to ASCII except for backslash (\) and tilde (~). The backslash is replaced by the Yen sign, and the tilde is replaced by macron (overline). This set is Japan's national variant of ISO 646. The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana and some other symbols and characters. Each character takes up two bytes. For further details about the JIS Japanese national character set standards, refer to the JIS standards themselves. For further information about the escape sequences, see ISO 2022 [ISO2022]. If there are JIS X 0208 characters on a line, there must be a switch to ASCII or to the "Roman" set of JIS X 0201 before the end of the line (i.e. before the CRLF). This means that the next line starts in the character set that was switched to before the end of the previous line. Other restrictions are given in the Formal Syntax below. Formal Syntax The notational conventions used here are identical to those used in RFC 822 [RFC822]. The * (asterisk) convention is as follows: l*m something meaning at least l and at most m somethings, with l and m taking default values of 0 and infinity, respectively. Murai et al Expires 10th March 1993 [Page 2] Internet Draft Updated 10th September 1992 line = *text *1( *segment single-byte-seq *text ) CRLF segment = single-byte-segment / double-byte-segment single-byte-segment = single-byte-seq 1*text double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 ) single-byte-seq = ESC "(" ( "B" / "J" ) double-byte-seq = ESC "$" ( "@" / "B" ) ; ( Octal, Decimal.) ESC = ; ( 33, 27.) SI = ; ( 17, 15.) SO = ; ( 16, 14.) one-of-94 = ; (41-176, 33.-126.) CHAR = ; ( 0-177, 0.-127.) text = MIME and RFC 1342 Considerations The name given to the JUNET character encoding is "ISO-2022-JP". This name is intended to be used in MIME messages as follows: Content-Type: text/plain; charset=iso-2022-jp The JUNET encoding is already in 7-bit form, so it is not necessary to use a Content-Transfer-Encoding header. It should be noted that applying the Base64 or Quoted-Printable encoding will render the message unreadable in current JUNET software. The name ISO-2022-JP may also be used in RFC 1342 headers, though in this case, the text should be encoded using either the "B" or "Q" encoding, to avoid getting damaged by header-processing software. As ISO-2022-JP text often contains many bytes that have a special meaning in headers, it is probably easier to use the "B" encoding, rather than trying to determine which particular byte values need "Q" encoding. Murai et al Expires 10th March 1993 [Page 3] Internet Draft Updated 10th September 1992 Background Information The JUNET encoding was described in the JUNET User's Guide [JUNET] (JUNET Riyou No Tebiki Dai Ippan). The encoding is based on the particular usage of ISO 2022 announced by 4/1 (see [ISO2022] for details). However, the escape sequence normally used for this announcement is not included in JUNET messages. The so-called half-width (hankaku) Katakana, that is, the Kana set of JIS X 0201-1976, are not used in JUNET messages. In the past, some systems erroneously used the escape sequence ESC ( H in JUNET messages. This escape sequence is officially registered for a Swedish character set, and should not be used in JUNET messages. Some systems do not distinguish between ESC ( B and ESC ( J or between ESC $ @ and ESC $ B for display. However, when relaying a message to another system, the escape sequences must not be altered in any way. The human user (not implementor) should try to keep lines within 80 display columns, or, preferably, within 75 (or so) columns, to allow insertion of ">" at the beginning of each line in excerpts. Each JIS X 0208 character takes up two columns, and the escape sequences do not take up any columns. The implementor is reminded that JIS X 0208 characters take up two bytes and should not be split in the middle to break lines for displaying, etc. The JIS X 0208 standard was revised in 1990, to add two characters at the end of the table. Although ISO 2022 specifies special additional escape sequences to indicate the use of revised character sets, it is suggested here not to make use of this special escape sequence in ISO-2022-JP text, even if the two characters added to JIS X 0208 in 1990 are used. References [ISO2022] International Organization for Standardization (ISO), "Information processing -- ISO 7-bit and 8-bit coded character sets -- Code extension techniques", International Standard, 1986, Ref. No. ISO 2022-1986 (E) [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET Murai et al Expires 10th March 1993 [Page 4] Internet Draft Updated 10th September 1992 User's Guide (First Edition)"), February 1988 [MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", Proposed (Internet) standard, June 1992, rfc1341 [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet Text Messages", Internet standard, August 1982, rfc822 [RFC1342] Keith Moore, "Representation of Non-ASCII Text in Internet Message Headers", Proposed (Internet) standard, June 1992, rfc1342 Security Considerations Security considerations are not discussed in this memo. Acknowledgements Many people assisted in drafting this document. The authors wish to thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi Handa. Authors' Addresses Jun Murai Keio University 5322 Endo, Fujisawa Kanagawa 252 Japan Fax: +81 (466) 49-1101 EMail: jun@wide.ad.jp Mark Crispin Panda Programming 6158 Lariat Loop NE Bainbridge Island, WA 98110-2098 USA Phone: +1 (206) 842-2385 EMail: MRC@PANDA.COM Murai et al Expires 10th March 1993 [Page 5] Internet Draft Updated 10th September 1992 Erik M. van der Poel A-105 Park Avenue 4-4-10 Ohta, Kisarazu Chiba 292 Japan Phone: +81 (438) 22-5836 Fax: +81 (438) 22-5837 EMail: erik@poel.juice.or.jp Murai et al Expires 10th March 1993 [Page 6]