Email Address Internationalization A. Yang (EAI) TWNIC Internet-Draft S. Steele Obsoletes: 5335 (if approved) Microsoft Updates: 2045,5322 (if approved) N. Freed Intended status: Standards Track Oracle Expires: March 21, 2012 September 18, 2011 Internationalized Email Headers draft-ietf-eai-rfc5335bis-12 Abstract Internet mail was originally limited to 7-bit ASCII. MIME added support for the use of 8-bit character sets in body parts, and also defined an encoded-word construct so other character sets could be used in certain header field values. But full internationalization of electronic mail requires additional enhancements to allow the use of Unicode, including characters outside the ASCII repertoire, in mail addresses as well as direct use of Unicode in header fields like From:, To:, and Subject:, without requiring the use of complex encoded-word constructs. This document specifies an enhancement to the Internet Message Format that allows use of Unicode in mail addresses and most header field content. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 21, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Yang, et al. Expires March 21, 2012 [Page 1] Internet-Draft I18N Email Headers September 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology Used In This Specification . . . . . . . . . . . . 3 3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4 3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4 3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5 3.3. Use of 8-bit UTF-8 in Message-Ids . . . . . . . . . . . . 5 3.4. Effects on Line Length Limits . . . . . . . . . . . . . . 5 3.5. Changes to MIME Message Type Encoding Restrictions . . . . 6 3.6. Use of MIME Encoded-Words . . . . . . . . . . . . . . . . 6 3.7. The Message/global Media Type . . . . . . . . . . . . . . 6 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 7. Edit history . . . . . . . . . . . . . . . . . . . . . . . . . 9 7.1. draft-ietf-eai-rfc5335bis-00 . . . . . . . . . . . . . . . 9 7.2. draft-ietf-eai-rfc5335bis-01 . . . . . . . . . . . . . . . 10 7.3. draft-ietf-eai-rfc5335bis-02 . . . . . . . . . . . . . . . 10 7.4. draft-ietf-eai-rfc5335bis-03 . . . . . . . . . . . . . . . 10 7.5. draft-ietf-eai-rfc5335bis-04 . . . . . . . . . . . . . . . 10 7.6. draft-ietf-eai-rfc5335bis-05 . . . . . . . . . . . . . . . 10 7.7. draft-ietf-eai-rfc5335bis-06 . . . . . . . . . . . . . . . 10 7.8. draft-ietf-eai-rfc5335bis-07 . . . . . . . . . . . . . . . 10 7.9. draft-ietf-eai-rfc5335bis-09 . . . . . . . . . . . . . . . 10 7.10. draft-ietf-eai-rfc5335bis-10 . . . . . . . . . . . . . . . 11 7.11. draft-ietf-eai-rfc5335bis-11 . . . . . . . . . . . . . . . 11 7.12. draft-ietf-eai-rfc5335bis-12 . . . . . . . . . . . . . . . 11 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 8.1. Normative References . . . . . . . . . . . . . . . . . . . 11 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 Yang, et al. Expires March 21, 2012 [Page 2] Internet-Draft I18N Email Headers September 2011 1. Introduction Internet mail distinguishes a message from its transport and further divides a message between a header and a body [RFC5598]. Internet mail header field values contain a variety of strings that are intended to be user-visible. The range of supported characters for these strings was originally limited to 7-bit [ASCII]. MIME [RFC2045] [RFC2046] [RFC2047] provides the ability to use additional character sets, but this support is limited to body part data and to special encoded-word constructs that were only allowed in a limited number of places in header field values. Globalization of the Internet requires support of the much larger set of characters provided by Unicode [RFC5198] in both mail addresses and most header field values. Additionally, complex encoding schemes like encoded-words introduce inefficiencies as well as significant opportunities for processing errors. And finally, native support for the UTF-8 charset is now available on most systems. Hence it is strongly desirable for Internet mail to support UTF-8 [RFC3629] directly. This document specifies an enhancement to the Internet Message Format [RFC5322] and to MIME that permits the direct use of UTF-8, rather than only ASCII, in header field values, including mail addresses. A new media type, message/global, is defined for messages that use this extended format. This specification also lifts the MIME restriction on having non-identity content-transfer-encodings on any subtype of the message top-level type so that message/global parts can be safely transmitted across existing mail infrastructure. This specification is based on a model of native, end-to-end support for UTF-8, which depends on having an "8-bit clean" environment assured by the transport system. Support for carriage across legacy, 7-bit infrastructure and for processing by 7-bit receivers requires additional mechanisms that are not provided by these specifications. 2. Terminology Used In This Specification A plain ASCII string is fully compatible with [RFC5321] and [RFC5322]. In this document, non-ASCII strings are UTF-8 strings if they are in header field values which contain at least one (see Section 3.1). Unless otherwise noted, all terms used here are defined in [RFC5321], [RFC5322], [I-D.ietf-eai-frmwrk-4952bis], or [I-D.ietf-eai-rfc5336bis]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", Yang, et al. Expires March 21, 2012 [Page 3] Internet-Draft I18N Email Headers September 2011 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The term "8-bit" means octets are present in the data with values above 0x7F. 3. Changes to Message Header Fields To permit Unicode characters in field values, the header definition in [RFC5322] is extended to support the new format. The following sections specify the necessary changes to RFC 5322's ABNF. The syntax rules not mentioned below remain defined as in [RFC5322]. Note that this protocol does not change RFC 5322 rules for defining header field names. The bodies of header fields are allowed to contain Unicode characters, but the header field names themselves must contain only ASCII characters. Also note that messages in this format require the use of the &UTF8SMTPbis; extension [I-D.ietf-eai-rfc5336bis] to be transferred via SMTP. 3.1. UTF-8 Syntax and Normalization UTF-8 characters can be defined in terms of octets using the following ABNF [RFC5234], taken from [RFC3629]: UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = UTF8-3 = UTF8-4 = See [RFC5198] for a discussion of Unicode normalization; normalization form [NFC] SHOULD be used. Actually, if one is going to do internationalization properly, one of the most often-cited goals is to permit people to spell their names correctly. Since many mailbox local parts reflect personal names, that principle applies to mailboxes as well. The NFKC normalization form SHOULD NOT be used because it may lose information that is needed to correctly spell some names in some unusual circumstances. Yang, et al. Expires March 21, 2012 [Page 4] Internet-Draft I18N Email Headers September 2011 3.2. Syntax Extensions to RFC 5322 The following rules extend the ABNF syntax defined in [RFC5322] and [RFC5234] in order to allow UTF-8 content. VCHAR =/ UTF8-non-ascii ctext =/ UTF8-non-ascii atext =/ UTF8-non-ascii qtext =/ UTF8-non-ascii text =/ UTF8-non-ascii ; note that this upgrades the body to UTF-8 dtext =/ UTF8-non-ascii The preceding changes mean that the following constructs now allow UTF-8: 1. Unstructured text, used in header fields like Subject: or Content-description:. 2. Any construct that uses atoms, including but not limited to the local parts of addresses and message-ids. This includes addresses in the "for" clauses of Received: header fields. 3. Quoted strings. 4. Domains. Note that header field names are not on this list; these are still restricted to ASCII. 3.3. Use of 8-bit UTF-8 in Message-Ids Implementers of message-id generation algorithms MAY prefer to restrain their output to ASCII since that has some advantages, such as when constructing References fields in mailing-list threads where some senders use EAI and others not. 3.4. Effects on Line Length Limits Section 2.1.1 of [RFC5322] limits lines to 998 characters and recommends that the lines be restricted to only 78 characters. This specification changes the former limit to 988 octets. (Note that in Yang, et al. Expires March 21, 2012 [Page 5] Internet-Draft I18N Email Headers September 2011 ASCII octets and characters are effectively the same but this is not true in UTF-8.) The 78 character limit remains defined in terms of characters, not octets, since it is intended to address display width issues, not line length issues. 3.5. Changes to MIME Message Type Encoding Restrictions This specification updates Section 6.4 of [RFC2045]. [RFC2045] prohibits applying a content-transfer-encoding to any subtypes of "message/". This specification relaxes that rule -- it allows newly defined MIME types to permit content-transfer-encoding, and it allows content-transfer-encoding for message/global (see Section 3.7). Background: Normally, transfer of message/global will be done in 8-bit-clean channels, and body parts will have "identity" encodings, that is, no decoding is necessary. But in the case where a message containing a message/global is downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding might have to be applied to the message; if the message travels multiple times between a 7-bit environment and an environment implementing these extensions, multiple levels of encoding may occur. This is expected to be rarely seen in practice, and the potential complexity of other ways of dealing with the issue are thought to be larger than the complexity of allowing nested encodings where necessary. 3.6. Use of MIME Encoded-Words The MIME encoded-words facility [RFC2047] provides the ability to place non-ASCII text, but only in a subset of the places allowed by this extension. Additionally, encoded-words are substantially more complex since they allow the use of arbitrary charsets. Accordingly, encoded-words SHOULD NOT be used when generating header fields for messages employing this extension. Agents MAY, when incorporating material from another message, convert encoded-word use to direct use of UTF-8. Note that care must be taken when decoding encoded-words because the results after replacing an encoded-word with its decoded equivalent in UTF-8 may be syntactically invalid. Processors that elect to decode encoded-words MUST NOT generate syntactically invalid fields. 3.7. The Message/global Media Type Internationalized messages in this format MUST only be transmitted as authorized by [I-D.ietf-eai-rfc5336bis] or within a non-SMTP environment that supports these messages. A message is a "message/ Yang, et al. Expires March 21, 2012 [Page 6] Internet-Draft I18N Email Headers September 2011 global message" if: o it contains 8-bit UTF-8 header values as specified in this document, or o it contains 8-bit UTF-8 values in the header fields of body parts. The content of a message/global part is otherwise identical to that of a message/rfc822 part. If this type is sent to a 7-bit-only system, it has to have an appropriate content-transfer-encoding applied. (Note that a system compliant with MIME that doesn't recognize message/global is supposed to treat it as "application/octet-stream" as described in Section 5.2.4 of [RFC2046].) Type name: message Subtype name: global Required parameters: none Optional parameters: none Encoding considerations: Any content-transfer-encoding is permitted. The 8-bit or binary content-transfer-encodings are recommended where permitted. Security considerations: See Section 4. Interoperability considerations: This media type provides functionality similar to the message/rfc822 content type for email messages with international email headers. When there is a need to embed or return such content in another message, there is generally an option to use this media type and leave the content unchanged or down-convert the content to message/rfc822. Both of these choices will interoperate with the installed base, but with different properties. Systems unaware of internationalized headers will typically treat a message/global body part as an unknown attachment, while they will understand the structure of a message/rfc822. However, systems that understand message/global will provide functionality superior to the result of a down- conversion to message/rfc822. The most interoperable choice depends on the deployed software. Yang, et al. Expires March 21, 2012 [Page 7] Internet-Draft I18N Email Headers September 2011 Published specification: RFC XXXX Applications that use this media type: SMTP servers and email clients that support multipart/report generation or parsing. Email clients that forward messages with international headers as attachments. Additional information: Magic number(s): none File extension(s): The extension ".u8msg" is suggested. Macintosh file type code(s): A uniform type identifier (UTI) of "public.utf8-email-message" is suggested. This conforms to "public.message" and "public.composite-content", but does not necessarily conform to "public.utf8-plain-text". Person & email address to contact for further information: See the Author's Address section of this document. Intended usage: COMMON Restrictions on usage: This is a structured media type that embeds other MIME media types. The 8-bit or binary content-transfer- encoding SHOULD be used unless this media type is sent over a 7-bit-only transport. Author: See the Author's Address section of this document. Change controller: IETF Standards Process 4. Security Considerations Because UTF-8 often requires several octets to encode a single character, internationalization may cause header field values in general and mail addresses in particular to become longer. As specified in [RFC5322], each line of characters MUST be no more than 998 octets, excluding the CRLF. On the other hand, MDA (Mail Delivery Agent) processes that parse, store, or handle email addresses or local parts must take extra care not to overflow buffers, truncate addresses, or exceed storage allotments. Also, they must take care, when comparing, to use the entire lengths of the addresses. There are lots of ways to use UTF-8 to represent something equivalent or similar to a particular displayed character or group of characters; see the security considerations in [RFC3629] for details Yang, et al. Expires March 21, 2012 [Page 8] Internet-Draft I18N Email Headers September 2011 on the problems this can cause. The normalization process is described in Section 3.1 is recommended to minimize these issues. The security impact of UTF-8 headers on email signature systems such as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is discussed in [I-D.ietf-eai-frmwrk-4952bis], Section 14. If a user has a non-ASCII mailbox address and an ASCII mailbox address, a digital certificate that identifies that user might have both addresses in the identity. Having multiple email addresses as identities in a single certificate is already supported in PKIX (Public Key Infrastructure for X.509 Certificates) [RFC5280] and OpenPGP [RFC3156], but there may be user interface issues associated with the introduction of UTF-8 into addresses in this context. 5. IANA Considerations IANA is requested to update the registration of the message/global MIME type using the registration form contained in Section 3.7. 6. Acknowledgements This document incorporates many ideas first described in Internet- Draft form by Paul Hoffman, although many details have changed from that earlier work. The author especially thanks Jeff Yeh for his efforts and contributions on editing previous versions. Most of the content of this document was provided by John C Klensin and Dave Crocker. Significant comments and suggestions were received from Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris Newman, Kristin Hubner, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team (Joint Engineering Team) and were incorporated into the document. The editors wish to sincerely thank them all for their contributions. 7. Edit history [[RFC Editor: please remove this section before publishing.]] 7.1. draft-ietf-eai-rfc5335bis-00 1. Applied Errata suggested by Alfred Hoenes. 2. Adjust [RFC2821] and [RFC2822] to [RFC5321] and [RFC5322]. Yang, et al. Expires March 21, 2012 [Page 9] Internet-Draft I18N Email Headers September 2011 3. Abrogate in ABNF of . 4. Revoke [RFC5504] from this document. 5. Upgrade some references from I-Ds to RFC. 7.2. draft-ietf-eai-rfc5335bis-01 1. Author name revised. 7.3. draft-ietf-eai-rfc5335bis-02 1. ABNF revised. 7.4. draft-ietf-eai-rfc5335bis-03 1. Fix typos 2. ABNF revised 3. Improve sentence 7.5. draft-ietf-eai-rfc5335bis-04 1. improve sentences and ABNF revised based on AD and Co-chairs 7.6. draft-ietf-eai-rfc5335bis-05 1. ABNF revised based on AD comments 7.7. draft-ietf-eai-rfc5335bis-06 1. ABNF revised 2. improve Section 5 7.8. draft-ietf-eai-rfc5335bis-07 1. Minor ABNF revised in Section 3.2 2. improve Section 5 7.9. draft-ietf-eai-rfc5335bis-09 Version -08 was posted in error and withdrawn. Version 09 is is identical to version 07 except for a date change, addition of this note, and some vertical spacing compression on this page. Yang, et al. Expires March 21, 2012 [Page 10] Internet-Draft I18N Email Headers September 2011 7.10. draft-ietf-eai-rfc5335bis-10 1. Add appendix and overview of changes 2. Replace polls result in Abstract and Section 1 3. Minor Sentence modification 7.11. draft-ietf-eai-rfc5335bis-11 1. Major rewrite of entire document to incorporate Dave Crocker's simplified ABNF. 2. The document has intentionally been refocused on implementors wishing to adapt their software to support EAI, so much of the explanatory and historical text has been removed. (Some of it may be reintroduced later as an appendix. 7.12. draft-ietf-eai-rfc5335bis-12 1. Added a section on the handling of MIME encoded-words. 2. Updated the security considerations to refer to the more complete discussion in RFC 3629. 3. Added a section on the effects on line length limits. 4. Removed the syntax restriction on the use of 8-bit UTF-8 in message-ids. 5. Added text recommending that 8-bit UTF-8 be avoided in message- ids. 8. References 8.1. Normative References [ASCII] "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. [I-D.ietf-eai-frmwrk-4952bis] Klensin, J. and Y. Ko, "Overview and Framework for Internationalized Email", draft-ietf-eai-frmwrk-4952bis-10 (work in progress), September 2010. [I-D.ietf-eai-rfc5336bis] Yao, J. and W. MAO, "SMTP Extension Yang, et al. Expires March 21, 2012 [Page 11] Internet-Draft I18N Email Headers September 2011 for Internationalized Email Address", draft-ietf-eai-rfc5336bis-07 (work in progress), December 2010. [NFC] Davis, M. and K. Whistler, "Unicode Standard Annex #15: Unicode Normalization Forms", September 2010, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network Interchange", RFC 5198, March 2008. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, October 2008. [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC5598] Crocker, D., "Internet Mail Architecture", RFC 5598, July 2009. 8.2. Informative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. Yang, et al. Expires March 21, 2012 [Page 12] Internet-Draft I18N Email Headers September 2011 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non- ASCII Text", RFC 2047, November 1996. [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, "MIME Security with OpenPGP", RFC 3156, August 2001. [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, May 2008. [RFC6152] Klensin, J., Freed, N., Rose, M., and D. Crocker, "SMTP Service Extension for 8-bit MIME Transport", STD 71, RFC 6152, March 2011. Authors' Addresses Abel Yang TWNIC 4F-2, No. 9, Sec 2, Roosevelt Rd. Taipei, 100 Taiwan Phone: +886 2 23411313 ext 505 EMail: abelyang@twnic.net.tw Shawn Steele Microsoft EMail: Shawn.Steele@microsoft.com Ned Freed Oracle 800 Royal Oaks Monrovia, CA 91016-6347 USA EMail: ned+ietf@mrochek.com Yang, et al. Expires March 21, 2012 [Page 13]