EAI A. Yang Internet-Draft TWNIC Obsoletes: 5335 (if approved) S. Steele Updates: 2045,5322 Microsoft (if approved) D. Crocker Intended status: Standards Track Brandenburg InternetWorking Expires: July 29, 2011 N. Freed Oracle January 25, 2011 Internationalized Email Headers draft-ietf-eai-rfc5335bis-08 Abstract Internet mail was originally limited to 7-bit ASCII. Recent enhancements support Unicode's UTF-8 encoding in portions of a message. Full internationalization of electronic mail requires additional enhancement, including support for UTF-8 in user-oriented header fields, such as in the To, From, and Subject fields. This document specifies an enhancement to Internet mail that permits native UTF-8 support in the header and body of a message. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on July 29, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Yang, et al. Expires July 29, 2011 [Page 1] Internet-Draft I18N Email Headers January 2011 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Changes to the 8-bit clean Model . . 3 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2. Support for UTF-8 Encoding . . . . . . . . . . . . . . . . . . 4 2.1. Message Object ABNF Changes . . . . . . . . . . . . . . . 4 2.2. Normalization . . . . . . . . . . . . . . . . . . . . . . 5 2.3. Content-Transfer-Encoding . . . . . . . . . . . . . . . . 5 3. Internet Message Format Enhancement . . . . . . . . . . . . . 5 4. Message Labeling . . . . . . . . . . . . . . . . . . . . . . . 6 5. MIME Enhancement . . . . . . . . . . . . . . . . . . . . . . . 6 5.1. Content-Transfer-Encoding . . . . . . . . . . . . . . . . 6 5.2. MIME Header Field . . . . . . . . . . . . . . . . . . . . 6 5.3. Content-Type: message/utf8-rfc822 . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 9.2. Informative References . . . . . . . . . . . . . . . . . . 10 Appendix A. Changes to support UTF-8 . . . . . . . . . . . . . . 10 Yang, et al. Expires July 29, 2011 [Page 2] Internet-Draft I18N Email Headers January 2011 1. Introduction Internet mail distinguishes a message from its transport and further divides a message between a header and a body [RFC5598]. Internet mail header fields contain a variety of strings that are intended to be user-visible. The range of supported characters for these strings was originally limited to a subset of [ASCII]; globalization of the Internet requires support of the much larger set contained in UTF-8 [RFC5198]. Complex encoding alternatives to UTF-8, as an overlay to the existing ASCII base, would introduce inefficiencies as well as opportunities for processing errors. Native support for UTF-8 encoding [RFC3629] is widely available among systems now used over the Internet. Hence supporting this encoding directly within email is desired. This document specifies an enhancement to Internet mail that permits the use of UTF-8 encoding, rather than only ASCII, as the base form for header fields. This specification is based on a model of native, end-to-end support for UTF-8, which uses an "8-bit clean" environment . Support for carriage across legacy, 7-bit infrastructure and for processing by 7-bit receivers requires additional mechanisms that are not provided by this specification. 1.1. Changes to the 8-bit clean Model This is an extensive revision to the draft. Changes include: o Greatly simplified ABNF that is much more basic and integrated. o Clean separation of the changes in an email header [RFC5322] from those in a MIME header [RFC2045] o Change to the default MIME content-transfer-encoding to be 8bit o Elimination of all discussion of transport o An appendix that lists the derived ABNF rules that inherit support UTF-8, due to the changed basic rules Still Pending: o ABF to support IDN o Fix "Normalization" section; I could not figure out what it needs to say. I wasn't trying to change the existing spec, but simply fix the writing. Yang, et al. Expires July 29, 2011 [Page 3] Internet-Draft I18N Email Headers January 2011 o Review/fix MIME C-T-E details The goal of the changes is to dramatically simplify the specification and the software needed to support a message with UTF8 encoding. Rather than specify a wide range of UTF8-specific changes to the existing ABNF rules, it focuses on the few, underlying ABNF rules that are the basis for user-visible ASCII text. The premise for this is simple: If the message is to be in UTF-8, then it is in UTF-8. Subtle or complex rules that selectively add UTF-8 are not worth the effort, once the message has already entered into the realm of UTF-8. The question, then, is whether this change has planted some landmines, such as in Trace header fields? 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Syntax descriptions use Augmented BNF (ABNF) [RFC5234]. Basic terms for this specification include: ASCII: An encoding of Control Characters and Basic Latin that occupies 7-bits, per [ASCII]. Such a string is fully compatible with email as specified in [RFC5322]. UTF-8: An encoding of Unicode in 8-bit bytes, per [RFC3629]. 2. Support for UTF-8 Encoding 2.1. Message Object ABNF Changes Internet Mail that conforms to this specification is classed as supporting UTF-8. However, UTF-8 characters within the ASCII range retain the restrictions defined for original, legacy, Latin-only email. Therefore, ABNF enhancements to include UTF-8 incrementally add the non-ASCII portions of UTF-8 to that established base of ASCII. UTF-8 characters are defined by using the following ABNF taken from [RFC3629]: Yang, et al. Expires July 29, 2011 [Page 4] Internet-Draft I18N Email Headers January 2011 UTF8-enhancement = UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = UTF8-3 = UTF8-4 = 2.2. Normalization See [RFC5198] for a discussion of normalization. A normalized form [NFC] MAY be used. However [NFC] can lose information that is needed to correctly spell some names in unusual circumstances. 2.3. Content-Transfer-Encoding This specification is based on a requirement for an "8-bit clean" infrastructure. Support for UTF-8 semantics within a 7-bit environment requires translation conventions that are not specified here. Consequently a Content-Transfer-Encoding value of 7-bit is not useful for a message that is labeled as containing UTF-8. 3. Internet Message Format Enhancement This section specifies UTF-8 enhancements for the header of an Internet Mail message, as defined in [RFC5322]. ABNF used in this section is taken from that specification and the ABNF specification. This specification retains the [RFC5322] rules for defining header field names. The bodies of header fields are allowed to contain UTF-8 characters, but the header field names themselves must contain only ASCII characters. The following rules extend the corresponding rules in [RFC5322] and [RFC5234] in order to allow additional UTF-8 characters. Yang, et al. Expires July 29, 2011 [Page 5] Internet-Draft I18N Email Headers January 2011 VCHAR =/ UTF8-non-ascii ctext =/ UTF8-enhancement atext =/ UTF8-enhancement qtext =/ UTF8-enhancement TENTATIVE (DCrocker): text =/ UTF8-enhancement ; note that this upgrades the body to UTF-8 {{ how to add IDN to this? }} domain = dot-atom / domain-literal / obs-domain This means that all the [RFC5322] constructs that build upon these will permit UTF-8 characters, including comments and quoted strings. [RFC5322] has the rule which specifies permissible names for user-defined header fields. The current specification defines no changes to that rule. This ABNF enables Message-ID strings to be full UTF-8. However the specification directs that Message-ID strings SHOULD be restricted to ASCII. 4. Message Labeling For clarity and convenience, a message SHOULD contain an explicit label indicating the character base it uses. This section defines a new header field for this label: fields =/ msg-character msg-character = "MSG-Char:" ( "ASCII" / "UTF-8" ) CRLF 5. MIME Enhancement 5.1. Content-Transfer-Encoding The default "Content-Transfer-Encoding: is 8BIT" and is assumed if the Content-Transfer-Encoding header field is not present. 5.2. MIME Header Field MIME contains at least one header field that is intended for user display, namely Content-Description. This section specifies UTF-8 enhancements to MIME header fields, as defined in [RFC2045]. ABNF Yang, et al. Expires July 29, 2011 [Page 6] Internet-Draft I18N Email Headers January 2011 rules used in this section is taken from that specification and the ABNF specification. The enhanced ABNF rules are: text =/ UTF8-non-ascii 5.3. Content-Type: message/utf8-rfc822 The type message/utf-rfc822 is similar to message/rfc822. However it specifies that characters are interpreted as UTF-8 rather than being limited to ASCII. Type name: message Subtype name: utf8-rfc822 Required parameters: none Optional parameters: none Encoding considerations: Any content-transfer-encoding is permitted. The 8-bit or binary content-transfer-encodings are recommended where permitted. Security considerations: See Section 6. Interoperability considerations: The media type provides functionality similar to the message/rfc822 content type for email messages with international email headers. When there is a need to embed or return such content in another message, there is generally an option to use this media type and leave the content unchanged or down-convert the content to message/rfc822. Both of these choices will interoperate with the installed base, but with different properties. Systems unaware of internationalized headers will typically treat a message/utf8-rfc822 body part as an unknown attachment, while they will understand the structure of a message/rfc822. However, systems that understand message/ utf8-rfc822 will provide functionality superior to the result of a down-conversion to message/rfc822. The most interoperable choice depends on the deployed software. Published specification: RFC XXXX Applications that use this media type: SMTP servers and email clients that support multipart/report generation or parsing. Email clients which forward messages with international headers as attachments. Yang, et al. Expires July 29, 2011 [Page 7] Internet-Draft I18N Email Headers January 2011 Additional information: Magic number(s): none File extension(s): The extension ".u8msg" is suggested. Macintosh file type code(s): A uniform type identifier (UTI) of "public.utf8-email-message" is suggested. This conforms to "public.message" and "public.composite-content", but does not necessarily conform to "public.utf8-plain-text". Person & email address to contact for further information: See the Author's Address section of this document. Intended usage: COMMON Restrictions on usage: This is a structured media type which embeds other MIME media types. The 8-bit or binary content-transfer- encoding SHOULD be used unless this media type is sent over a 7-bit-only transport. Author: See the Author's Address section of this document. Change controller: IETF Standards Process 6. Security Considerations If a user has a mailbox address in UTF-8 and a mailbox address in ASCII, a digital certificate that identifies that user might have both addresses in the identity. Having multiple email addresses as identities in a single certificate is already supported in PKIX (Public Key Infrastructure for X.509 Certificates) [RFC5280] and OpenPGP [RFC3156]. Because UTF-8 often requires several octets to encode a single character, internationalized local parts and header value may cause mail addresses to become longer. As specified in [RFC5322], each line of characters MUST be no more 998 octets, excluding the CRLF. On the other hand, MDA (Mail Delivery Agent) processes that parse, store, or handle email addresses or local parts must take extra care not to overflow buffers, truncate addresses, or exceed storage allotments. Also, they must take care, when comparing, to use the entire lengths of the addresses. The security impact of UTF-8 headers on email signature systems such as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is discussed in [I-D.eai-frmwrk-4952bis], Section 14. Yang, et al. Expires July 29, 2011 [Page 8] Internet-Draft I18N Email Headers January 2011 7. IANA Considerations IANA is requested to update the registration of the message/ utf8-rfc822 MIME type using the registration form contained in Section 5.3. 8. Acknowledgements This document incorporates many ideas first described in Internet- Draft form by Paul Hoffman, although many details have changed from that earlier work. The author especially thanks Jeff Yeh for his efforts and contributions on editing previous versions. Most of the content of this document is provided by John C Klensin. Also, some significant comments and suggestions were received from Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris Newman, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team (Joint Engineering Team) and were incorporated into the document. The editor sincerely thanks them for their contributions. 9. References 9.1. Normative References [ASCII] "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. [I-D.eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and Framework for Internationalized Email", draft-ietf-eai-frmwrk-4952bis-10 (work in progress), September 2010. [Latin] Unicode Consortium, "C0 Controls and Basic Latin", http://unicode.org /charts/PDF/U0000.pdf, 2010. [NFC] Davis, M. and K. Whistler, "Unicode Standard Annex #15: Unicode Normalization Forms", September 2010, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Yang, et al. Expires July 29, 2011 [Page 9] Internet-Draft I18N Email Headers January 2011 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network Interchange", RFC 5198, March 2008. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC5598] Crocker, D., "Internet Mail Architecture", RFC 5598, July 2009. [Unicode] Unicode Consortium, "Unicode 6.0 Character Code Charts", http://unicode.org /charts/, 2010. 9.2. Informative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, "MIME Security with OpenPGP", RFC 3156, August 2001. [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, May 2008. Appendix A. Changes to support UTF-8 This section provides a basic audit of the places in a message that now can permit UTF-8 rather than being restricted to ASCII, based on the changes to underlying ABNF. The audit ignores rule for Yang, et al. Expires July 29, 2011 [Page 10] Internet-Draft I18N Email Headers January 2011 "obsolete" constructs in RFC 5322. (This is a first cut and the list is likely incomplete): VCHAR: quoted-pair, unstructured > ccontent, qcontent > comment, quoted-string > word, local-part > phrase > display-name, keywords ctext: ccontent > comment atext: atom, dot-atom-text qtext: qcontent > quoted-string Authors' Addresses Abel Yang TWNIC 4F-2, No. 9, Sec 2, Roosevelt Rd. Taipei, 100 Taiwan Phone: +886 2 23411313 ext 505 EMail: abelyang@twnic.net.tw Shawn Steele Microsoft EMail: Shawn.Steele@microsoft.com Yang, et al. Expires July 29, 2011 [Page 11] Internet-Draft I18N Email Headers January 2011 D. Crocker Brandenburg InternetWorking 675 Spruce Dr. Sunnyvale USA Phone: +1.408.246.8253 EMail: dcrocker@bbiw.net URI: http://bbiw.net Ned Freed Oracle 800 Royal Oaks Monrovia, CA 91016-6347 USA EMail: ned.freed@mrochek.com Yang, et al. Expires July 29, 2011 [Page 12]