Expires on 5-March-2003 H. Nussbacher Request for Comments: XXXX Israeli Inter-University Category: Informational Computer Center Obsoletes: RFC 1555 Y. Bourvine The Hebrew University of Jerusalem September 2002 Hebrew Character Encoding for Internet Messages draft-nussbacher-bourvine-hebrew-email-02.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes the encoding used in electronic mail for transferring Hebrew over the Internet. introduction Hebrew needs special treatment while being sent inside electronic mail due to the following reasons: - The Hebrew character set commonly used today is using the "upper half" of the ASCII table, thus the correct character set must be specified for the text to be rendered correctly. - Since Hebrew is written right-to-left a directionality must be specified, so the text is displayed correctly and not reversed. This memo is based on the Israeli standard IS-1904 and is compatible with it. This memo makes use of MIME [RFC2045] and ISO-8859-8. - 1 - I-D Hebrew Character Encoding Expires: march 2003 Page 2 This document replaces the specification of RFC 1555. The default directionality in a composed Hebrew message was changed to Implicit from Visual, and a receiving entity needs to understand both visual and implicit messages. Explicit directionality has been removed, as it was never used. Requirements Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Description All Hebrew text when transferred via e-mail MUST first be translated into ISO-8859-8, and then encoded using either Quoted-Printable (preferable) or Base64, as defined in MIME. The following table provides the four most common Hebrew encodings; the ISO-8859-8 is the dominating encoding while the others might still be used on few old legacy systems. Name: Letter PC IBM "old ISO (DOS) code" 8859-8 Encoding: 8-bit 7-bit 8-bit Ascii EBCDIC Ascii Ascii CodePage: 862 424 ---------- ----- ------ ----- ------ alef 128 41 96 224 bet 129 42 97 225 gimel 130 43 98 226 dalet 131 44 99 227 he 132 45 100 228 vav 133 46 101 229 zayin 134 47 102 230 het 135 48 103 231 tet 136 49 104 232 yod 137 51 105 233 kaf sofit 138 52 106 234 kaf 139 53 107 235 lamed 140 54 108 236 mem sofit 141 55 109 237 mem 142 56 110 238 nun sofit 143 57 111 239 nun 144 58 112 240 samekh 145 59 113 241 ayin 146 62 114 242 pe sofit 147 63 115 243 pe 148 64 116 244 tsadi sofit 149 65 117 245 tsadi 150 66 118 246 qof 151 67 119 247 - 2 - I-D Hebrew Character Encoding Expires: march 2003 Page 3 resh 152 68 120 248 shin 153 69 121 249 tav 154 71 122 250 Note: All values are in decimal ASCII except for the EBCDIC column which is in hexadecimal. The default directionality of the text is logical (implicit). This means that the Hebrew text is encoded according to the directionality of the involved characters, and is trasmitted in the same order as a person would type the stated text. The methods to control directionality are supported and are covered in the complementary RFC 1556, "Handling of Bi-directional Texts in MIME". The algorithm used to convert from logical directionality to visual is the Unicode one. This algorithm is used to reformat the text for displaying on a non-Hebrew aware terminal. All discussion regarding Hebrew in email, as well as discussions of Hebrew in other TCP/IP protocols, is discussed in the ilan- h@vm.tau.ac.il list. To subscribe send mail to listserv@vm.tau.ac.il with one line of text as follows: subscribe ilan-h firstname lastname Character set Due to the lack of directionality field in MIME headers it was decided to superimpose the directionality over the character set. Thus, the following character sets are available: ISO-8859-8 for visual directionality ISO-8858-8-i for implicit (logical) directionality. MIME Considerations Mail that is sent that contains Hebrew SHOULD contain the following minimum amount of MIME headers: MIME-Version: 1.0 Content-type: text/plain; charset=ISO-8859-8-i Content-transfer-encoding: BASE64 | Quoted-Printable Users SHOULD keep their text to within 72 columns so as to allow email quoting via the prefixing of each line with a ">". Users SHOULD also realize that not all MIME implementations handle email quoting properly, so quoting email that contains Hebrew text may lead to problems. In the future, when all email systems implement fully transparent 8- bit email as defined in STD0010 and RFC 1652 this standard will become partially obsolete. The "Content-type:" field will still be necessary, as well as directionality (which might be implicit for 8BIT, but is something for future discussion), but the "Content- - 3 - I-D Hebrew Character Encoding Expires: march 2003 Page 4 transfer-encoding" will be altered to use 8BIT rather than Base64 or Quoted-Printable. Optional It is RECOMMENDED, although NOT REQUIRED, to support Hebrew encoding in mail headers as specified in RFC 2047. Specifically, the Q-encoding format is to be the default method used for encoding Hebrew in Internet mail headers and not the B-encoding method. Conformance A conforming sender MUST use Logical order (charset=ISO-8858-8-i). A conforming receiver MUST be able to properly decode logical order (charset=ISO-8859-8-i) encoding and SHOULD be able to properly decode visual order encoding (charset=ISO-8859-8). The latter is for support older software which implemented RFC-1555 visual mode. Caveats Within Israel there are in excess of 40 Listserv lists which will now start using Hebrew for part of their conversations. Normally, Listserv will deliver mail from a distribution list with a "shortened" header, one that does not include the extra MIME headers. This will cause the MIME encoding to be left intact and the user agent decoding software will not be able to interpret the mail. Each user is able to customize how Listserv delivers mail. For lists that contain Hebrew, users SHOULD send mail to Listserv with the following command: set listname full where listname is the name of the list which the user wants full, unabridged headers to appear. This will update their private entry and all subsequent mail from that list will be with full RFC822 headers, including MIME headers. In addition, Listserv usually maintains automatic archives of all postings to a list. These archives, contained in the file "listname LOGyymm", do not contain the MIME headers, so all encoding information will be lost. This is a limitation of the Listserv software. Example Below is a short example of Quoted-Printable encoded Hebrew email: Date: Sun, 06 Jun 93 15:25:35 IDT From: Hank Nussbacher Subject: Sample Hebrew mail To: Hank Nussbacher , Yehavi Bourvine MIME-Version: 1.0 - 4 - I-D Hebrew Character Encoding Expires: march 2003 Page 5 Content-Type: Text/plain; charset=ISO-8859-8-i Content-Transfer-Encoding: QUOTED-PRINTABLE The end of this line contains Hebrew .=EC=E0=F8=F9=E9 =F5= =F8=E0=EE =ED=E5=EC=F9 Hank Nussbacher =F8=EB=E1=F1=E5= =F0 =F7=F0=E4 Acknowledgements Many thanks to Rafi Sadowsky and Nathaniel Borenstein for all their help. References [ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic Character Sets, Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text Messages", STD 11, RFC 822, UDEL, August 1982. [STD0010] Klensin, J., Freed N., Rose M., Stefferud E., and D. Crocker, "SMTP Service Extensions", RFC 1869, United Nations University, Innosoft International, Inc., Dover Beach Consulting, Inc., Network Management Associates, Inc., The Branch Office, February 1993. [RFC1652] Klensin, J., Freed N., Rose M., Stefferud E., and D. Crocker, "SMTP Service Extension for 8bit-MIME Transport", RFC 1652, United Nations University, Innosoft International, Inc., Dover Beach Consulting, Inc., Network Management Associates, Inc., The Branch Office, February 1993. [RFC2045] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", Bellcore, Innosoft, September 1993. [RFC2047] Moore K., "MIME Part Two: Message Header Extensions for Non-ASCII Text", University of Tennessee, September 1993. [IS-1489] (Israeli Standard): Information Technology: Architecture for implementation of the Hebrew language in telematic systems. [IS-1904] (Israeli Standard): Application of Hebrew in mail messages transfer in TCP/IP networks (3/2002 edition). Security Considerations - 5 - I-D Hebrew Character Encoding Expires: march 2003 Page 6 There are no security issues which are directly related to this memo. However, users SHOULD notice that if one mixes Hebrew and ASCII characters in a message, there are multiple messages that will display on screen in the same way. Although this is not believed to raise any serious security issue, users SHOULD be aware of it in case of sensitive or ambigous issues. Authors' Address Hank Nussbacher Computer Center Tel Aviv University Ramat Aviv Israel Fax: +972 3 6409118 Phone: +972 3 6408309 EMail: hank@interall.co.il Yehavi Bourvine Computer Center Hebrew University Jerusalem Israel Phone: +972 2 6585684 Fax: +972 2 6527349 EMail: yehavi@vms.huji.ac.il - 6 -